WorldWideScience

Sample records for analysis incorporating genomic

  1. Microarray comparative genomic hybridisation analysis incorporating genomic organisation, and application to enterobacterial plant pathogens.

    Directory of Open Access Journals (Sweden)

    Leighton Pritchard

    2009-08-01

    Full Text Available Microarray comparative genomic hybridisation (aCGH provides an estimate of the relative abundance of genomic DNA (gDNA taken from comparator and reference organisms by hybridisation to a microarray containing probes that represent sequences from the reference organism. The experimental method is used in a number of biological applications, including the detection of human chromosomal aberrations, and in comparative genomic analysis of bacterial strains, but optimisation of the analysis is desirable in each problem domain.We present a method for analysis of bacterial aCGH data that encodes spatial information from the reference genome in a hidden Markov model. This technique is the first such method to be validated in comparisons of sequenced bacteria that diverge at the strain and at the genus level: Pectobacterium atrosepticum SCRI1043 (Pba1043 and Dickeya dadantii 3937 (Dda3937; and Lactococcus lactis subsp. lactis IL1403 and L. lactis subsp. cremoris MG1363. In all cases our method is found to outperform common and widely used aCGH analysis methods that do not incorporate spatial information. This analysis is applied to comparisons between commercially important plant pathogenic soft-rotting enterobacteria (SRE Pba1043, P. atrosepticum SCRI1039, P. carotovorum 193, and Dda3937.Our analysis indicates that it should not be assumed that hybridisation strength is a reliable proxy for sequence identity in aCGH experiments, and robustly extends the applicability of aCGH to bacterial comparisons at the genus level. Our results in the SRE further provide evidence for a dynamic, plastic 'accessory' genome, revealing major genomic islands encoding gene products that provide insight into, and may play a direct role in determining, variation amongst the SRE in terms of their environmental survival, host range and aetiology, such as phytotoxin synthesis, multidrug resistance, and nitrogen fixation.

  2. An approach to incorporate linkage disequilibrium structure into genomic association analysis

    Institute of Scientific and Technical Information of China (English)

    Fengyu Zhang; Diane Wagener

    2008-01-01

    In this study, we propose to use the principal component analysis (PCA) and regression model to incorporate linkage disequilibrium (LD) in genomic association data analysis. To accommodate LD in genomic data and reduce multiple testing, we suggest performing PCA and extracting the PCA score to capture the variation of genomic data, after which regression analysis is used to assess the association of the disease with the principal component score. An empirical analysis result shows that both genotype-basod correlation matrix and haplotype-based LD matrix can produce similar results for PCA. Principal component score seems to be more powerful in detecting genetic association because the principal component score is quantitatively measured and may be able to capture the effect of multiple loci.

  3. INCORPORATION OF BACTERIOPHAGE GENOME BY SPORES OF BACILLUS SUBTILIS.

    Science.gov (United States)

    TAKAHASHI, I

    1964-06-01

    Takahashi, I. (Microbiology Research Institute, Ottawa, Ontario, Canada). Incorporation of bacteriophage genome by spores of Bacillus subtilis. J. Bacteriol. 87:1499-1502. 1964-The buoyant density in a CsCl gradient of deoxyribonucleic acid (DNA) extracted from spores of Bacillus subtilis was found to be identical to that of DNA from vegetative cells. Density-gradient centrifugation of DNA of spores derived from cultures infected with phage PBS 1 revealed the presence of a minor band whose density corresponded to that of the phage DNA in addition to the spore DNA. No intermediate bands were present. The relative amount of the phage DNA present in the spores was estimated to be 11%, suggesting that spores of this organism may incorporate several copies of the phage genome. Although the possibility that true lysogeny may occur cannot be entirely eliminated, the results seem to indicate that the phage genomes incorporated into spores are not attached to the host chromosome in this system.

  4. Incorporating Genomics and Bioinformatics across the Life Sciences Curriculum

    Energy Technology Data Exchange (ETDEWEB)

    Ditty, Jayna L.; Kvaal, Christopher A.; Goodner, Brad; Freyermuth, Sharyn K.; Bailey, Cheryl; Britton, Robert A.; Gordon, Stuart G.; Heinhorst, Sabine; Reed, Kelynne; Xu, Zhaohui; Sanders-Lorenz, Erin R.; Axen, Seth; Kim, Edwin; Johns, Mitrick; Scott, Kathleen; Kerfeld, Cheryl A.

    2011-08-01

    Undergraduate life sciences education needs an overhaul, as clearly described in the National Research Council of the National Academies publication BIO 2010: Transforming Undergraduate Education for Future Research Biologists. Among BIO 2010's top recommendations is the need to involve students in working with real data and tools that reflect the nature of life sciences research in the 21st century. Education research studies support the importance of utilizing primary literature, designing and implementing experiments, and analyzing results in the context of a bona fide scientific question in cultivating the analytical skills necessary to become a scientist. Incorporating these basic scientific methodologies in undergraduate education leads to increased undergraduate and post-graduate retention in the sciences. Toward this end, many undergraduate teaching organizations offer training and suggestions for faculty to update and improve their teaching approaches to help students learn as scientists, through design and discovery (e.g., Council of Undergraduate Research [www.cur.org] and Project Kaleidoscope [www.pkal.org]). With the advent of genome sequencing and bioinformatics, many scientists now formulate biological questions and interpret research results in the context of genomic information. Just as the use of bioinformatic tools and databases changed the way scientists investigate problems, it must change how scientists teach to create new opportunities for students to gain experiences reflecting the influence of genomics, proteomics, and bioinformatics on modern life sciences research. Educators have responded by incorporating bioinformatics into diverse life science curricula. While these published exercises in, and guidelines for, bioinformatics curricula are helpful and inspirational, faculty new to the area of bioinformatics inevitably need training in the theoretical underpinnings of the algorithms. Moreover, effectively integrating bioinformatics

  5. The Candida genome database incorporates multiple Candida species: multispecies search and analysis tools with curated gene and protein information for Candida albicans and Candida glabrata.

    Science.gov (United States)

    Inglis, Diane O; Arnaud, Martha B; Binkley, Jonathan; Shah, Prachi; Skrzypek, Marek S; Wymore, Farrell; Binkley, Gail; Miyasato, Stuart R; Simison, Matt; Sherlock, Gavin

    2012-01-01

    The Candida Genome Database (CGD, http://www.candidagenome.org/) is an internet-based resource that provides centralized access to genomic sequence data and manually curated functional information about genes and proteins of the fungal pathogen Candida albicans and other Candida species. As the scope of Candida research, and the number of sequenced strains and related species, has grown in recent years, the need for expanded genomic resources has also grown. To answer this need, CGD has expanded beyond storing data solely for C. albicans, now integrating data from multiple species. Herein we describe the incorporation of this multispecies information, which includes curated gene information and the reference sequence for C. glabrata, as well as orthology relationships that interconnect Locus Summary pages, allowing easy navigation between genes of C. albicans and C. glabrata. These orthology relationships are also used to predict GO annotations of their products. We have also added protein information pages that display domains, structural information and physicochemical properties; bibliographic pages highlighting important topic areas in Candida biology; and a laboratory strain lineage page that describes the lineage of commonly used laboratory strains. All of these data are freely available at http://www.candidagenome.org/. We welcome feedback from the research community at candida-curator@lists.stanford.edu.

  6. Teaching strategies to incorporate genomics education into academic nursing curricula.

    Science.gov (United States)

    Quevedo Garcia, Sylvia P; Greco, Karen E; Loescher, Lois J

    2011-11-01

    The translation of genomic science into health care has expanded our ability to understand the effects of genomics on human health and disease. As genomic advances continue, nurses are expected to have the knowledge and skills to translate genomic information into improved patient care. This integrative review describes strategies used to teach genomics in academic nursing programs and their facilitators and barriers to inclusion in nursing curricula. The Learning Engagement Model and the Diffusion of Innovations Theory guided the interpretation of findings. CINAHL, Medline, and Web of Science were resources for articles published during the past decade that included strategies for teaching genomics in academic nursing programs. Of 135 articles, 13 met criteria for review. Examples of effective genomics teaching strategies included clinical application through case studies, storytelling, online genomics resources, student self-assessment, guest lecturers, and a genetics focus group. Most strategies were not evaluated for effectiveness.

  7. Incorporating biological pathways via a Markov random field model in genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Min Chen

    2011-04-01

    Full Text Available Genome-wide association studies (GWAS examine a large number of markers across the genome to identify associations between genetic variants and disease. Most published studies examine only single markers, which may be less informative than considering multiple markers and multiple genes jointly because genes may interact with each other to affect disease risk. Much knowledge has been accumulated in the literature on biological pathways and interactions. It is conceivable that appropriate incorporation of such prior knowledge may improve the likelihood of making genuine discoveries. Although a number of methods have been developed recently to prioritize genes using prior biological knowledge, such as pathways, most methods treat genes in a specific pathway as an exchangeable set without considering the topological structure of a pathway. However, how genes are related with each other in a pathway may be very informative to identify association signals. To make use of the connectivity information among genes in a pathway in GWAS analysis, we propose a Markov Random Field (MRF model to incorporate pathway topology for association analysis. We show that the conditional distribution of our MRF model takes on a simple logistic regression form, and we propose an iterated conditional modes algorithm as well as a decision theoretic approach for statistical inference of each gene's association with disease. Simulation studies show that our proposed framework is more effective to identify genes associated with disease than a single gene-based method. We also illustrate the usefulness of our approach through its applications to a real data example.

  8. Incorporating Experience Curves in Appliance Standards Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Garbesi, Karina; Chan, Peter; Greenblatt, Jeffery; Kantner, Colleen; Lekov, Alex; Meyers, Stephen; Rosenquist, Gregory; Buskirk, Robert Van; Yang, Hung-Chia; Desroches, Louis-Benoit

    2011-10-31

    The technical analyses in support of U.S. energy conservation standards for residential appliances and commercial equipment have typically assumed that manufacturing costs and retail prices remain constant during the projected 30-year analysis period. There is, however, considerable evidence that this assumption does not reflect real market prices. Costs and prices generally fall in relation to cumulative production, a phenomenon known as experience and modeled by a fairly robust empirical experience curve. Using price data from the Bureau of Labor Statistics, and shipment data obtained as part of the standards analysis process, we present U.S. experience curves for room air conditioners, clothes dryers, central air conditioners, furnaces, and refrigerators and freezers. These allow us to develop more representative appliance price projections than the assumption-based approach of constant prices. These experience curves were incorporated into recent energy conservation standards for these products. The impact on the national modeling can be significant, often increasing the net present value of potential standard levels in the analysis. In some cases a previously cost-negative potential standard level demonstrates a benefit when incorporating experience. These results imply that past energy conservation standards analyses may have undervalued the economic benefits of potential standard levels.

  9. Incorporating psychological influences in probabilistic cost analysis

    Energy Technology Data Exchange (ETDEWEB)

    Kujawski, Edouard; Alvaro, Mariana; Edwards, William

    2004-01-08

    Today's typical probabilistic cost analysis assumes an ''ideal'' project that is devoid of the human and organizational considerations that heavily influence the success and cost of real-world projects. In the real world ''Money Allocated Is Money Spent'' (MAIMS principle); cost underruns are rarely available to protect against cost overruns while task overruns are passed on to the total project cost. Realistic cost estimates therefore require a modified probabilistic cost analysis that simultaneously models the cost management strategy including budget allocation. Psychological influences such as overconfidence in assessing uncertainties and dependencies among cost elements and risks are other important considerations that are generally not addressed. It should then be no surprise that actual project costs often exceed the initial estimates and are delivered late and/or with a reduced scope. This paper presents a practical probabilistic cost analysis model that incorporates recent findings in human behavior and judgment under uncertainty, dependencies among cost elements, the MAIMS principle, and project management practices. Uncertain cost elements are elicited from experts using the direct fractile assessment method and fitted with three-parameter Weibull distributions. The full correlation matrix is specified in terms of two parameters that characterize correlations among cost elements in the same and in different subsystems. The analysis is readily implemented using standard Monte Carlo simulation tools such as {at}Risk and Crystal Ball{reg_sign}. The analysis of a representative design and engineering project substantiates that today's typical probabilistic cost analysis is likely to severely underestimate project cost for probability of success values of importance to contractors and procuring activities. The proposed approach provides a framework for developing a viable cost management strategy for

  10. Tracking replication enzymology in vivo by genome-wide mapping of ribonucleotide incorporation.

    Science.gov (United States)

    Clausen, Anders R; Lujan, Scott A; Burkholder, Adam B; Orebaugh, Clinton D; Williams, Jessica S; Clausen, Maryam F; Malc, Ewa P; Mieczkowski, Piotr A; Fargo, David C; Smith, Duncan J; Kunkel, Thomas A

    2015-03-01

    Ribonucleotides are frequently incorporated into DNA during replication in eukaryotes. Here we map genome-wide distribution of these ribonucleotides as markers of replication enzymology in budding yeast, using a new 5' DNA end-mapping method, hydrolytic end sequencing (HydEn-seq). HydEn-seq of DNA from ribonucleotide excision repair-deficient strains reveals replicase- and strand-specific patterns of ribonucleotides in the nuclear genome. These patterns support the roles of DNA polymerases α and δ in lagging-strand replication and of DNA polymerase ɛ in leading-strand replication. They identify replication origins, termination zones and variations in ribonucleotide incorporation frequency across the genome that exceed three orders of magnitude. HydEn-seq also reveals strand-specific 5' DNA ends at mitochondrial replication origins, thus suggesting unidirectional replication of a circular genome. Given the conservation of enzymes that incorporate and process ribonucleotides in DNA, HydEn-seq can be used to track replication enzymology in other organisms.

  11. Comparative Genome Analysis and Genome Evolution

    NARCIS (Netherlands)

    Snel, Berend

    2003-01-01

    This thesis described a collection of bioinformatic analyses on complete genome sequence data. We have studied the evolution of gene content and find that vertical inheritance dominates over horizontal gene trasnfer, even to the extent that we can use the gene content to make genome phylogenies. Usi

  12. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol;

    2010-01-01

    BACKGROUND: The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. RESULTS: Assemblies......) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30x genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were...

  13. The integrated microbial genome resource of analysis.

    Science.gov (United States)

    Checcucci, Alice; Mengoni, Alessio

    2015-01-01

    Integrated Microbial Genomes and Metagenomes (IMG) is a biocomputational system that allows to provide information and support for annotation and comparative analysis of microbial genomes and metagenomes. IMG has been developed by the US Department of Energy (DOE)-Joint Genome Institute (JGI). IMG platform contains both draft and complete genomes, sequenced by Joint Genome Institute and other public and available genomes. Genomes of strains belonging to Archaea, Bacteria, and Eukarya domains are present as well as those of viruses and plasmids. Here, we provide some essential features of IMG system and case study for pangenome analysis.

  14. Coronavirus Genomics and Bioinformatics Analysis

    Directory of Open Access Journals (Sweden)

    Kwok-Yung Yuen

    2010-08-01

    Full Text Available The drastic increase in the number of coronaviruses discovered and coronavirus genomes being sequenced have given us an unprecedented opportunity to perform genomics and bioinformatics analysis on this family of viruses. Coronaviruses possess the largest genomes (26.4 to 31.7 kb among all known RNA viruses, with G + C contents varying from 32% to 43%. Variable numbers of small ORFs are present between the various conserved genes (ORF1ab, spike, envelope, membrane and nucleocapsid and downstream to nucleocapsid gene in different coronavirus lineages. Phylogenetically, three genera, Alphacoronavirus, Betacoronavirus and Gammacoronavirus, with Betacoronavirus consisting of subgroups A, B, C and D, exist. A fourth genus, Deltacoronavirus, which includes bulbul coronavirus HKU11, thrush coronavirus HKU12 and munia coronavirus HKU13, is emerging. Molecular clock analysis using various gene loci revealed that the time of most recent common ancestor of human/civet SARS related coronavirus to be 1999-2002, with estimated substitution rate of 4´10-4 to 2´10-2 substitutions per site per year. Recombination in coronaviruses was most notable between different strains of murine hepatitis virus (MHV, between different strains of infectious bronchitis virus, between MHV and bovine coronavirus, between feline coronavirus (FCoV type I and canine coronavirus generating FCoV type II, and between the three genotypes of human coronavirus HKU1 (HCoV-HKU1. Codon usage bias in coronaviruses were observed, with HCoV-HKU1 showing the most extreme bias, and cytosine deamination and selection of CpG suppressed clones are the two major independent biological forces that shape such codon usage bias in coronaviruses.

  15. Classifying Genomic Sequences by Sequence Feature Analysis

    Institute of Scientific and Technical Information of China (English)

    Zhi-Hua Liu; Dian Jiao; Xiao Sun

    2005-01-01

    Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream,exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis.

  16. Comparative genomic analysis of esophageal cancers.

    Science.gov (United States)

    Caygill, Christine P J; Gatenby, Piers A C; Herceg, Zdenko; Lima, Sheila C S; Pinto, Luis F R; Watson, Anthony; Wu, Ming-Shiang

    2014-09-01

    The following, from the 12th OESO World Conference: Cancers of the Esophagus, includes commentaries on comparative genomic analysis of esophageal cancers: genomic polymorphisms, the genetic and epigenetic drivers in esophageal cancers, and the collection of data in the UK Barrett's Oesophagus Registry.

  17. Genome sequence and analysis of Lactobacillus helveticus

    Directory of Open Access Journals (Sweden)

    Paola eCremonesi

    2013-01-01

    Full Text Available The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of L. helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract.As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones.

  18. Structural and functional analysis of rice genome

    Indian Academy of Sciences (India)

    Akhilesh K. Tyagi; Jitendra P. Khurana; Paramjit Khurana; Saurabh Raghuvanshi; Anupama Gaur; Anita Kapur; Vikrant Gupta; Dibyendu Kumar; V. Ravi; Shubha Vij; Parul Khurana; Sulabha Sharma

    2004-04-01

    Rice is an excellent system for plant genomics as it represents a modest size genome of 430 Mb. It feeds more than half the population of the world. Draft sequences of the rice genome, derived by whole-genome shotgun approach at relatively low coverage (4–6 X), were published and the International Rice Genome Sequencing Project (IRGSP) declared high quality (>10 X), genetically anchored, phase 2 level sequence in 2002. In addition, phase 3 level finished sequence of chromosomes 1, 4 and 10 (out of 12 chromosomes of rice) has already been reported by scientists from IRGSP consortium. Various estimates of genes in rice place the number at > 50,000. Already, over 28,000 full-length cDNAs have been sequenced, most of which map to genetically anchored genome sequence. Such information is very useful in revealing novel features of macro- and micro-level synteny of rice genome with other cereals. Microarray analysis is unraveling the identity of rice genes expressing in temporal and spatial manner and should help target candidate genes useful for improving traits of agronomic importance. Simultaneously, functional analysis of rice genome has been initiated by marker-based characterization of useful genes and employing functional knock-outs created by mutation or gene tagging. Integration of this enormous information is expected to catalyze tremendous activity on basic and applied aspects of rice genomics.

  19. Whole genome analysis of a Vietnamese trio

    Indian Academy of Sciences (India)

    Dang Thanh Hai; Nguyen Dai Thanh; Pham Thi Minh Trang; Le Si Quang; Phan Thi Thu Hang; Dang Cao Cuong; Hoang Kim Phuc; Nguyen Huu Duc; Do Duc Dong; Bui Quang Minh; Pham Bao Son; Le Sy Vinh

    2015-03-01

    We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average coverage. The resulting short reads covered 99.91% of the human reference genome (GRCh37d5). We identified 4,719,412 SNPs and 827,385 short indels that satisfied the Mendelian inheritance law. Among them, 109,914 (2.3%) SNPs and 59,119 (7.1%) short indels were novel. We also detected 30,171 structural variants of which 27,604 (91.5%) were large indels. There were 6,681 large indels in the range 0.1–100 kbp occurring in the child genome that were also confirmed in either the father or mother genome.We compared these large indels against the DGV database and found that 1,499 (22.44%) were KHV specific. De novo assembly of high-quality unmapped reads yielded 789 contigs with the length ≥ 300 bp. There were 235 contigs from the child genome of which 199 (84.7%) were significantly matched with at least one contig from the father or mother genome. Blasting these 199 contigs against other alternative human genomes revealed 4 novel contigs. The novel variants identified from our study demonstrated the necessity of conducting more genome-wide studies not only for Kinh but also for other ethnic groups in Vietnam.

  20. Pathway and network analysis of cancer genomes

    DEFF Research Database (Denmark)

    Creixell, Pau; Reimand, Jueri; Haider, Syed

    2015-01-01

    Genomic information on tumors from 50 cancer types cataloged by the International Cancer Genome Consortium (ICGC) shows that only a few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been...... large interest in developing pathway and network analysis methods that group genes and illuminate the processes involved. We provide an overview of these analysis techniques and show where they guide mechanistic and translational investigations....

  1. Big Data Analysis of Human Genome Variations

    KAUST Repository

    Gojobori, Takashi

    2016-01-25

    Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human genome variations: 1) HapMap Data (1,417 individuals) (http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/2010-08_phaseII+III/forward/), 2) HGDP (Human Genome Diversity Project) Data (940 individuals) (http://www.hagsc.org/hgdp/files.html), 3) 1000 genomes Data (2,504 individuals) http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ If we can integrate all three data into a single volume of data, we should be able to conduct a more detailed analysis of human genome variations for a total number of 4,861 individuals (= 1,417+940+2,504 individuals). In fact, we successfully integrated these three data sets by use of information on the reference human genome sequence, and we conducted the big data analysis. In particular, we constructed a phylogenetic tree of about 5,000 human individuals at the genome level. As a result, we were able to identify clusters of ethnic groups, with detectable admixture, that were not possible by an analysis of each of the three data sets. Here, we report the outcome of this kind of big data analyses and discuss evolutionary significance of human genomic variations. Note that the present study was conducted in collaboration with Katsuhiko Mineta and Kosuke Goto at KAUST.

  2. Design of DNA pooling to allow incorporation of covariates in rare variants analysis.

    Directory of Open Access Journals (Sweden)

    Weihua Guan

    Full Text Available Rapid advances in next-generation sequencing technologies facilitate genetic association studies of an increasingly wide array of rare variants. To capture the rare or less common variants, a large number of individuals will be needed. However, the cost of a large scale study using whole genome or exome sequencing is still high. DNA pooling can serve as a cost-effective approach, but with a potential limitation that the identity of individual genomes would be lost and therefore individual characteristics and environmental factors could not be adjusted in association analysis, which may result in power loss and a biased estimate of genetic effect.For case-control studies, we propose a design strategy for pool creation and an analysis strategy that allows covariate adjustment, using multiple imputation technique.Simulations show that our approach can obtain reasonable estimate for genotypic effect with only slight loss of power compared to the much more expensive approach of sequencing individual genomes.Our design and analysis strategies enable more powerful and cost-effective sequencing studies of complex diseases, while allowing incorporation of covariate adjustment.

  3. Mathematical Analysis of Genomic Evolution

    Directory of Open Access Journals (Sweden)

    Cedric Green

    2011-01-01

    Full Text Available Changes in nucleotide sequences, or mutations, accumulate from generation to generation in the genomes of all living organisms. The mutations can be advantageous, deleterious, or neutral. The goal of this project is to determine the amount of advantageous mutations it takes to get human (Homo sapiens DNA from the DNA of genetically distinct organisms. We do this by collecting the genomic data of such organisms, and estimating the amount of mutations it takes to transform yeast (Saccharomyces cerevisiae DNA to the DNA of a human. We calculate the typical number of mutations occurring annually through the organism's average life span and the average mutation rate. This allows us to determine the total number of mutations as well as the probability of advantageous mutations. Not surprisingly, this probability proves to be fairly small. A more precise estimate can be determined by accounting for the differences in the chromosomal structure and phenomena like horizontal gene transfer.

  4. Incorporating group correlations in genome-wide association studies using smoothed group Lasso.

    Science.gov (United States)

    Liu, Jin; Huang, Jian; Ma, Shuangge; Wang, Kai

    2013-04-01

    In genome-wide association studies, penalization is an important approach for identifying genetic markers associated with disease. Motivated by the fact that there exists natural grouping structure in single nucleotide polymorphisms and, more importantly, such groups are correlated, we propose a new penalization method for group variable selection which can properly accommodate the correlation between adjacent groups. This method is based on a combination of the group Lasso penalty and a quadratic penalty on the difference of regression coefficients of adjacent groups. The new method is referred to as smoothed group Lasso (SGL). It encourages group sparsity and smoothes regression coefficients for adjacent groups. Canonical correlations are applied to the weights between groups in the quadratic difference penalty. We first derive a GCD algorithm for computing the solution path with linear regression model. The SGL method is further extended to logistic regression for binary response. With the assistance of the majorize-minimization algorithm, the SGL penalized logistic regression turns out to be an iteratively penalized least-square problem. We also suggest conducting principal component analysis to reduce the dimensionality within groups. Simulation studies are used to evaluate the finite sample performance. Comparison with group Lasso shows that SGL is more effective in selecting true positives. Two datasets are analyzed using the SGL method.

  5. A Distance Measure for Genome Phylogenetic Analysis

    Science.gov (United States)

    Cao, Minh Duc; Allison, Lloyd; Dix, Trevor

    Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.

  6. Incorporating Protein Biosynthesis into the Saccharomyces cerevisiae Genome-scale Metabolic Model

    DEFF Research Database (Denmark)

    Olivares Hernandez, Roberto

    Based on stoichiometric biochemical equations that occur into the cell, the genome-scale metabolic models can quantify the metabolic fluxes, which are regarded as the final representation of the physiological state of the cell. For Saccharomyces Cerevisiae the genome scale model has been......, translation initiation, translation elongation, translation termination, translation elongation, and mRNA decay. Considering these information from the mechanisms of transcription and translation, we will include this stoichiometric reactions into the genome scale model for S. Cerevisiae to obtain the first...

  7. MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement

    Directory of Open Access Journals (Sweden)

    Zhang Liqing

    2010-01-01

    Full Text Available Abstract Background Ortholog assignment is a critical and fundamental problem in comparative genomics, since orthologs are considered to be functional counterparts in different species and can be used to infer molecular functions of one species from those of other species. MSOAR is a recently developed high-throughput system for assigning one-to-one orthologs between closely related species on a genome scale. It attempts to reconstruct the evolutionary history of input genomes in terms of genome rearrangement and gene duplication events. It assumes that a gene duplication event inserts a duplicated gene into the genome of interest at a random location (i.e., the random duplication model. However, in practice, biologists believe that genes are often duplicated by tandem duplications, where a duplicated gene is located next to the original copy (i.e., the tandem duplication model. Results In this paper, we develop MSOAR 2.0, an improved system for one-to-one ortholog assignment. For a pair of input genomes, the system first focuses on the tandemly duplicated genes of each genome and tries to identify among them those that were duplicated after the speciation (i.e., the so-called inparalogs, using a simple phylogenetic tree reconciliation method. For each such set of tandemly duplicated inparalogs, all but one gene will be deleted from the concerned genome (because they cannot possibly appear in any one-to-one ortholog pairs, and MSOAR is invoked. Using both simulated and real data experiments, we show that MSOAR 2.0 is able to achieve a better sensitivity and specificity than MSOAR. In comparison with the well-known genome-scale ortholog assignment tool InParanoid, Ensembl ortholog database, and the orthology information extracted from the well-known whole-genome multiple alignment program MultiZ, MSOAR 2.0 shows the highest sensitivity. Although the specificity of MSOAR 2.0 is slightly worse than that of InParanoid in the real data experiments

  8. User-level sentiment analysis incorporating social networks

    CERN Document Server

    Tan, Chenhao; Tang, Jie; Jiang, Long; Zhou, Ming; Li, Ping

    2011-01-01

    We show that information about social relationships can be used to improve user-level sentiment analysis. The main motivation behind our approach is that users that are somehow "connected" may be more likely to hold similar opinions; therefore, relationship information can complement what we can extract about a user's viewpoints from their utterances. Employing Twitter as a source for our experimental data, and working within a semi-supervised framework, we propose models that are induced either from the Twitter follower/followee network or from the network in Twitter formed by users referring to each other using "@" mentions. Our transductive learning results reveal that incorporating social-network information can indeed lead to statistically significant sentiment-classification improvements over the performance of an approach based on Support Vector Machines having access only to textual features.

  9. AcCNET (Accessory Genome Constellation Network): comparative genomics software for accessory genome analysis using bipartite networks.

    Science.gov (United States)

    Lanza, Val F; Baquero, Fernando; de la Cruz, Fernando; Coque, Teresa M

    2017-01-15

    AcCNET (Accessory genome Constellation Network) is a Perl application that aims to compare accessory genomes of a large number of genomic units, both at qualitative and quantitative levels. Using the proteomes extracted from the analysed genomes, AcCNET creates a bipartite network compatible with standard network analysis platforms. AcCNET allows merging phylogenetic and functional information about the concerned genomes, thus improving the capability of current methods of network analysis. The AcCNET bipartite network opens a new perspective to explore the pangenome of bacterial species, focusing on the accessory genome behind the idiosyncrasy of a particular strain and/or population.

  10. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  11. Incorporating Handling Qualities Analysis into Rotorcraft Conceptual Design

    Science.gov (United States)

    Lawrence, Ben

    2014-01-01

    This paper describes the initial development of a framework to incorporate handling qualities analyses into a rotorcraft conceptual design process. In particular, the paper describes how rotorcraft conceptual design level data can be used to generate flight dynamics models for handling qualities analyses. Also, methods are described that couple a basic stability augmentation system to the rotorcraft flight dynamics model to extend analysis to beyond that of the bare airframe. A methodology for calculating the handling qualities characteristics of the flight dynamics models and for comparing the results to ADS-33E criteria is described. Preliminary results from the application of the handling qualities analysis for variations in key rotorcraft design parameters of main rotor radius, blade chord, hub stiffness and flap moment of inertia are shown. Varying relationships, with counteracting trends for different handling qualities criteria and different flight speeds are exhibited, with the action of the control system playing a complex part in the outcomes. Overall, the paper demonstrates how a broad array of technical issues across flight dynamics stability and control, simulation and modeling, control law design and handling qualities testing and evaluation had to be confronted to implement even a moderately comprehensive handling qualities analysis of relatively low fidelity models. A key outstanding issue is to how to 'close the loop' with an overall design process, and options for the exploration of how to feedback handling qualities results to a conceptual design process are proposed for future work.

  12. AGAPE (Automated Genome Analysis PipelinE for pan-genome analysis of Saccharomyces cerevisiae.

    Directory of Open Access Journals (Sweden)

    Giltae Song

    Full Text Available The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

  13. AGAPE (Automated Genome Analysis PipelinE) for pan-genome analysis of Saccharomyces cerevisiae.

    Science.gov (United States)

    Song, Giltae; Dickins, Benjamin J A; Demeter, Janos; Engel, Stacia; Gallagher, Jennifer; Choe, Kisurb; Dunn, Barbara; Snyder, Michael; Cherry, J Michael

    2015-01-01

    The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

  14. Strategy for incorporating newly discovered causative genetic variants into genomic evaluations

    Science.gov (United States)

    With sequence data available for an increasing number of dairy cattle, discovery of causative genetic variants is expected to be frequent. Current genomic evaluation systems require genotypes for all markers that contribute to an evaluation. A minimum number of animals with an observation for a new ...

  15. Comparative genome analysis of Basidiomycete fungi

    Energy Technology Data Exchange (ETDEWEB)

    Riley, Robert; Salamov, Asaf; Henrissat, Bernard; Nagy, Laszlo; Brown, Daren; Held, Benjamin; Baker, Scott; Blanchette, Robert; Boussau, Bastien; Doty, Sharon L.; Fagnan, Kirsten; Floudas, Dimitris; Levasseur, Anthony; Manning, Gerard; Martin, Francis; Morin, Emmanuelle; Otillar, Robert; Pisabarro, Antonio; Walton, Jonathan; Wolfe, Ken; Hibbett, David; Grigoriev, Igor

    2013-08-07

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes symbionts, pathogens, and saprotrophs including the majority of wood decaying and ectomycorrhizal species. To better understand the genetic diversity of this phylum we compared the genomes of 35 basidiomycetes including 6 newly sequenced genomes. These genomes span extremes of genome size, gene number, and repeat content. Analysis of core genes reveals that some 48percent of basidiomycete proteins are unique to the phylum with nearly half of those (22percent) found in only one organism. Correlations between lifestyle and certain gene families are evident. Phylogenetic patterns of plant biomass-degrading genes in Agaricomycotina suggest a continuum rather than a dichotomy between the white rot and brown rot modes of wood decay. Based on phylogenetically-informed PCA analysis of wood decay genes, we predict that that Botryobasidium botryosum and Jaapia argillacea have properties similar to white rot species, although neither has typical ligninolytic class II fungal peroxidases (PODs). This prediction is supported by growth assays in which both fungi exhibit wood decay with white rot-like characteristics. Based on this, we suggest that the white/brown rot dichotomy may be inadequate to describe the full range of wood decaying fungi. Analysis of the rate of discovery of proteins with no or few homologs suggests the value of continued sequencing of basidiomycete fungi.

  16. Integrative bayesian network analysis of genomic data.

    Science.gov (United States)

    Ni, Yang; Stingo, Francesco C; Baladandayuthapani, Veerabhadran

    2014-01-01

    Rapid development of genome-wide profiling technologies has made it possible to conduct integrative analysis on genomic data from multiple platforms. In this study, we develop a novel integrative Bayesian network approach to investigate the relationships between genetic and epigenetic alterations as well as how these mutations affect a patient's clinical outcome. We take a Bayesian network approach that admits a convenient decomposition of the joint distribution into local distributions. Exploiting the prior biological knowledge about regulatory mechanisms, we model each local distribution as linear regressions. This allows us to analyze multi-platform genome-wide data in a computationally efficient manner. We illustrate the performance of our approach through simulation studies. Our methods are motivated by and applied to a multi-platform glioblastoma dataset, from which we reveal several biologically relevant relationships that have been validated in the literature as well as new genes that could potentially be novel biomarkers for cancer progression.

  17. e-Fungi: a data resource for comparative analysis of fungal genomes

    Directory of Open Access Journals (Sweden)

    Hubbard Simon J

    2007-11-01

    Full Text Available Abstract Background The number of sequenced fungal genomes is ever increasing, with about 200 genomes already fully sequenced or in progress. Only a small percentage of those genomes have been comprehensively studied, for example using techniques from functional genomics. Comparative analysis has proven to be a useful strategy for enhancing our understanding of evolutionary biology and of the less well understood genomes. However, the data required for these analyses tends to be distributed in various heterogeneous data sources, making systematic comparative studies a cumbersome task. Furthermore, comparative analyses benefit from close integration of derived data sets that cluster genes or organisms in a way that eases the expression of requests that clarify points of similarity or difference between species. Description To support systematic comparative analyses of fungal genomes we have developed the e-Fungi database, which integrates a variety of data for more than 30 fungal genomes. Publicly available genome data, functional annotations, and pathway information has been integrated into a single data repository and complemented with results of comparative analyses, such as MCL and OrthoMCL cluster analysis, and predictions of signaling proteins and the sub-cellular localisation of proteins. To access the data, a library of analysis tasks is available through a web interface. The analysis tasks are motivated by recent comparative genomics studies, and aim to support the study of evolutionary biology as well as community efforts for improving the annotation of genomes. Web services for each query are also available, enabling the tasks to be incorporated into workflows. Conclusion The e-Fungi database provides fungal biologists with a resource for comparative studies of a large range of fungal genomes. Its analysis library supports the comparative study of genome data, functional annotation, and results of large scale analyses over all the

  18. Evidence Supporting the Uptake and Genomic Incorporation of Environmental DNA in the “Ancient Asexual” Bdelloid Rotifer Philodina roseola

    Directory of Open Access Journals (Sweden)

    Olaf R. P. Bininda-Emonds

    2016-09-01

    Full Text Available Increasing evidence suggests that bdelloid rotifers regularly undergo horizontal gene transfer, apparently as a surrogate mechanism of genetic exchange in the absence of true sexual reproduction, in part because of their ability to withstand desiccation. We provide empirical support for this latter hypothesis using the bdelloid Philodina roseola, which we demonstrate to readily internalize environmental DNA in contrast to a representative monogonont rotifer (Brachionus rubens, which, like other monogononts, is facultative sexual and cannot withstand desiccation. In addition, environmental DNA that was more similar to the host DNA was retained more often and for a longer period of time. Indirect evidence (increased variance in the reproductive output of the untreated F1 generation suggests that environmental DNA can be incorporated into the genome during desiccation and is thus heritable. Our observed fitness effects agree with sexual theory and also occurred when the animals were desiccated in groups (thereby acting as DNA donors, but not individually, indicating the mechanism could occur in nature. Thus, although DNA uptake and its genomic incorporation appears proximally related to anhydrobiosis in bdelloids, it might also facilitate accidental genetic exchange with closely related taxa, thereby maintaining higher levels of genetic diversity than is otherwise expected for this group of “ancient asexuals”.

  19. Bayseian genomic models for the incorporation of pathway topology knowledge into association studies.

    Science.gov (United States)

    Brisbin, Abra; Fridley, Brooke L

    2013-08-01

    Pathway topology and relationships between genes have the potential to provide information for modeling effects of mRNA gene expression on complex traits. For example, researchers may wish to incorporate the prior belief that "hub" genes (genes with many neighbors) are more likely to influence the trait. In this paper, we propose and compare six Bayesian pathway-based prior models to incorporate pathway topology information into association analyses. Including prior information regarding the relationships among genes in a pathway was effective in somewhat improving detection rates for genes associated with complex traits. Through an extensive set of simulations, we found that when hub (central) effects are expected, the diagonal degree model is preferred; when spoke (edge) effects are expected, the spatial power model is preferred. When there is no prior knowledge about the location of the effect genes in the pathway (e.g., hub versus spoke model), it is worthwhile to apply multiple models, as the model with the best DIC is not always the one with the best detection rate. We also applied the models to pharmacogenomic studies for the drugs gemcitabine and 6-mercaptopurine and found that the diagonal degree model identified an association between 6-mercaptopurine response and expression of the gene SLC28A3, which was not detectable using the model including no pathway information. These results demonstrate the value of incorporating pathway information into association analyses.

  20. Genomic risk profiling of ischemic stroke: results of an international genome-wide association meta-analysis.

    Directory of Open Access Journals (Sweden)

    James F Meschia

    Full Text Available INTRODUCTION: Familial aggregation of ischemic stroke derives from shared genetic and environmental factors. We present a meta-analysis of genome-wide association scans (GWAS from 3 cohorts to identify the contribution of common variants to ischemic stroke risk. METHODS: This study involved 1464 ischemic stroke cases and 1932 controls. Cases were genotyped using the Illumina 610 or 660 genotyping arrays; controls, with Illumina HumanHap 550Kv1 or 550Kv3 genotyping arrays. Imputation was performed with the 1000 Genomes European ancestry haplotypes (August 2010 release as a reference. A total of 5,156,597 single-nucleotide polymorphisms (SNPs were incorporated into the fixed effects meta-analysis. All SNPs associated with ischemic stroke (P<1×10(-5 were incorporated into a multivariate risk profile model. RESULTS: No SNP reached genome-wide significance for ischemic stroke (P<5×10(-8. Secondary analysis identified a significant cumulative effect for age at onset of stroke (first versus fifth quintile of cumulative profiles based on SNPs associated with late onset, ß = 14.77 [10.85,18.68], P = 5.5×10(-12, as well as a strong effect showing increased risk across samples with a high propensity for stroke among samples with enriched counts of suggestive risk alleles (P<5×10(-6. Risk profile scores based only on genomic information offered little incremental prediction. DISCUSSION: There is little evidence of a common genetic variant contributing to moderate risk of ischemic stroke. Quintiles based on genetic loading of alleles associated with a younger age at onset of ischemic stroke revealed a significant difference in age at onset between those in the upper and lower quintiles. Using common variants from GWAS and imputation, genomic profiling remains inferior to family history of stroke for defining risk. Inclusion of genomic (rare variant information may be required to improve clinical risk profiling.

  1. Incorporating Endmember Variability into Spectral Mixture Analysis Through Endmember Bundles

    Science.gov (United States)

    Bateson, C. Ann; Asner, Gregory P.; Wessman, Carol A.

    1998-01-01

    Variation in canopy structure and biochemistry induces a concomitant variation in the top-of-canopy spectral reflectance of a vegetation type. Hence, the use of a single endmember spectrum to track the fractional abundance of a given vegetation cover in a hyperspectral image may result in fractions with considerable error. One solution to the problem of endmember variability is to increase the number of endmembers used in a spectral mixture analysis of the image. For example, there could be several tree endmembers in the analysis because of differences in leaf area index (LAI) and multiple scatterings between leaves and stems. However, it is often difficult in terms of computer or human interaction time to select more than six or seven endmembers and any non-removable noise, as well as the number of uncorrelated bands in the image, limits the number of endmembers that can be discriminated. Moreover, as endmembers proliferate, their interpretation becomes increasingly difficult and often applications simply need the aerial fractions of a few land cover components which comprise most of the scene. In order to incorporate endmember variability into spectral mixture analysis, we propose representing a landscape component type not with one endmember spectrum but with a set or bundle of spectra, each of which is feasible as the spectrum of an instance of the component (e.g., in the case of a tree component, each spectrum could reasonably be the spectral reflectance of a tree canopy). These endmember bundles can be used with nonlinear optimization algorithms to find upper and lower bounds on endmember fractions. This approach to endmember variability naturally evolved from previous work in deriving endmembers from the data itself by fitting a triangle, tetrahedron or, more generally, a simplex to the data cloud reduced in dimension by a principal component analysis. Conceptually, endmember variability could make it difficult to find a simplex that both surrounds the data

  2. Comparative genomic analysis of soybean flowering genes.

    Directory of Open Access Journals (Sweden)

    Chol-Hee Jung

    Full Text Available Flowering is an important agronomic trait that determines crop yield. Soybean is a major oilseed legume crop used for human and animal feed. Legumes have unique vegetative and floral complexities. Our understanding of the molecular basis of flower initiation and development in legumes is limited. Here, we address this by using a computational approach to examine flowering regulatory genes in the soybean genome in comparison to the most studied model plant, Arabidopsis. For this comparison, a genome-wide analysis of orthologue groups was performed, followed by an in silico gene expression analysis of the identified soybean flowering genes. Phylogenetic analyses of the gene families highlighted the evolutionary relationships among these candidates. Our study identified key flowering genes in soybean and indicates that the vernalisation and the ambient-temperature pathways seem to be the most variant in soybean. A comparison of the orthologue groups containing flowering genes indicated that, on average, each Arabidopsis flowering gene has 2-3 orthologous copies in soybean. Our analysis highlighted that the CDF3, VRN1, SVP, AP3 and PIF3 genes are paralogue-rich genes in soybean. Furthermore, the genome mapping of the soybean flowering genes showed that these genes are scattered randomly across the genome. A paralogue comparison indicated that the soybean genes comprising the largest orthologue group are clustered in a 1.4 Mb region on chromosome 16 of soybean. Furthermore, a comparison with the undomesticated soybean (Glycine soja revealed that there are hundreds of SNPs that are associated with putative soybean flowering genes and that there are structural variants that may affect the genes of the light-signalling and ambient-temperature pathways in soybean. Our study provides a framework for the soybean flowering pathway and insights into the relationship and evolution of flowering genes between a short-day soybean and the long-day plant

  3. PGSB/MIPS Plant Genome Information Resources and Concepts for the Analysis of Complex Grass Genomes.

    Science.gov (United States)

    Spannagl, Manuel; Bader, Kai; Pfeifer, Matthias; Nussbaumer, Thomas; Mayer, Klaus F X

    2016-01-01

    PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functional genomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes.

  4. Human and mouse genome analysis using array comparative genomic hybridization

    NARCIS (Netherlands)

    Snijders, Antoine Maria

    2004-01-01

    Almost all human cancers as well as developmental abnormalities are characterized by the presence of genetic alterations, most of which target a gene or a particular genomic locus resulting in altered gene expression and ultimately an altered phenotype. Different types of genetic alterations include

  5. Genome-wide analysis correlates Ayurveda Prakriti.

    Science.gov (United States)

    Govindaraj, Periyasamy; Nizamuddin, Sheikh; Sharath, Anugula; Jyothi, Vuskamalla; Rotti, Harish; Raval, Ritu; Nayak, Jayakrishna; Bhat, Balakrishna K; Prasanna, B V; Shintre, Pooja; Sule, Mayura; Joshi, Kalpana S; Dedge, Amrish P; Bharadwaj, Ramachandra; Gangadharan, G G; Nair, Sreekumaran; Gopinath, Puthiya M; Patwardhan, Bhushan; Kondaiah, Paturu; Satyamoorthy, Kapaettu; Valiathan, Marthanda Varma Sankaran; Thangaraj, Kumarasamy

    2015-10-29

    The practice of Ayurveda, the traditional medicine of India, is based on the concept of three major constitutional types (Vata, Pitta and Kapha) defined as "Prakriti". To the best of our knowledge, no study has convincingly correlated genomic variations with the classification of Prakriti. In the present study, we performed genome-wide SNP (single nucleotide polymorphism) analysis (Affymetrix, 6.0) of 262 well-classified male individuals (after screening 3416 subjects) belonging to three Prakritis. We found 52 SNPs (p ≤ 1 × 10(-5)) were significantly different between Prakritis, without any confounding effect of stratification, after 10(6) permutations. Principal component analysis (PCA) of these SNPs classified 262 individuals into their respective groups (Vata, Pitta and Kapha) irrespective of their ancestry, which represent its power in categorization. We further validated our finding with 297 Indian population samples with known ancestry. Subsequently, we found that PGM1 correlates with phenotype of Pitta as described in the ancient text of Caraka Samhita, suggesting that the phenotypic classification of India's traditional medicine has a genetic basis; and its Prakriti-based practice in vogue for many centuries resonates with personalized medicine.

  6. Network Based Prediction Model for Genomics Data Analysis*

    OpenAIRE

    Huang, Ying; Wang, Pei

    2012-01-01

    Biological networks, such as genetic regulatory networks and protein interaction networks, provide important information for studying gene/protein activities. In this paper, we propose a new method, NetBoosting, for incorporating a priori biological network information in analyzing high dimensional genomics data. Specially, we are interested in constructing prediction models for disease phenotypes of interest based on genomics data, and at the same time identifying disease susceptible genes. ...

  7. Incorporating geometric and radiative effects into infrared scanning computer analysis

    Science.gov (United States)

    Myrick, D. L.; Kantsios, A. G.

    1983-01-01

    A NASA program, the SILTS experiment (Shuttle Infrared Leeside Temperature Sensing) will utilize an infrared scanning system mounted at the tip of the vertical stabilizer to remotely measure the surface temperature of the leeside of the Space Shuttle during entry from orbit. Scans of the fuselage and one wing will be made alternately. The experiment will correlate real full scale data to ground-based information. In order to quantitatively assess the temperature profile of the surface, an algorithm is required which incorporates the Space Shuttle shape, location of specific materials on the surface, and the measurement geometry between the camera and the surface. This paper will discuss the algorithm.

  8. Enhancing genomics information retrieval through dimensional analysis.

    Science.gov (United States)

    Hu, Qinmin; Huang, Jimmy Xiangji

    2013-06-01

    We propose a novel dimensional analysis approach to employing meta information in order to find the relationships within the unstructured or semi-structured document/passages for improving genomics information retrieval performance. First, we make use of the auxiliary information as three basic dimensions, namely "temporal", "journal", and "author". The reference section is treated as a commensurable quantity of the three basic dimensions. Then, the sample space and subspaces are built up and a set of events are defined to meet the basic requirement of dimensional homogeneity to be commensurable quantities. After that, the classic graph analysis algorithm in the Web environments is applied on each dimension respectively to calculate the importance of each dimension. Finally, we integrate all the dimension networks and re-rank the outputs for evaluation. Our experimental results show the proposed approach is superior and promising.

  9. Genome-wide Analysis of Gene Regulation

    DEFF Research Database (Denmark)

    Chen, Yun

    cells are capable of regulating their gene expression, so that each cell can only express a particular set of genes yielding limited numbers of proteins with specialized functions. Therefore a rigid control of differential gene expression is necessary for cellular diversity. On the other hand, aberrant...... gene regulation will disrupt the cell’s fundamental processes, which in turn can cause disease. Hence, understanding gene regulation is essential for deciphering the code of life. Along with the development of high throughput sequencing (HTS) technology and the subsequent large-scale data analysis......, genome-wide assays have increased our understanding of gene regulation significantly. This thesis describes the integration and analysis of HTS data across different important aspects of gene regulation. Gene expression can be regulated at different stages when the genetic information is passed from gene...

  10. 3-D acquisition geometry analysis: Incorporating information from multiples

    NARCIS (Netherlands)

    Kumar, A.; Blacquiere, G.; Verschuur, D.J.

    2014-01-01

    Recent advances in survey design have led to conventional common-midpoint-based analysis being replaced by the subsurface-based seismic acquisition analysis and design, with the emphasis on advance techniques of illumination analysis. Amongst them are wave-equation-based seismic illumination analyse

  11. Pedigree analysis: One teaching strategy to incorporate genetics into a full FNP program.

    Science.gov (United States)

    Schumacher, Gretchen; Conway, Alice E; Sparlin, Judith A

    2006-05-01

    The successful completion of the genome project in April 2003 and explosion of genetic knowledge is impacting healthcare at a dramatic rate. All healthcare providers need to update themselves on genetics in order to provide comprehensive care. This article describes a national grant obtained to educate faculty regarding incorporating genetics into courses. It also presents an innovate method for incorporating genetics into a full Family Nurse Practitioner (FNP) curriculum. Student responses and guidelines for one assignment are included. Utilizing this type of assignment in FNP courses is beneficial to both students and faculty. With more FNPs assessing patterns for illness in families, primary prevention and earlier intervention in primary care can be achieved.

  12. Genome Data Exploration Using Correspondence Analysis.

    Science.gov (United States)

    Tekaia, Fredj

    2016-01-01

    Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results

  13. Multidimensional gene set analysis of genomic data.

    Directory of Open Access Journals (Sweden)

    David Montaner

    Full Text Available Understanding the functional implications of changes in gene expression, mutations, etc., is the aim of most genomic experiments. To achieve this, several functional profiling methods have been proposed. Such methods study the behaviour of different gene modules (e.g. gene ontology terms in response to one particular variable (e.g. differential gene expression. In spite to the wealth of information provided by functional profiling methods, a common limitation to all of them is their inherent unidimensional nature. In order to overcome this restriction we present a multidimensional logistic model that allows studying the relationship of gene modules with different genome-scale measurements (e.g. differential expression, genotyping association, methylation, copy number alterations, heterozygosity, etc. simultaneously. Moreover, the relationship of such functional modules with the interactions among the variables can also be studied, which produces novel results impossible to be derived from the conventional unidimensional functional profiling methods. We report sound results of gene sets associations that remained undetected by the conventional one-dimensional gene set analysis in several examples. Our findings demonstrate the potential of the proposed approach for the discovery of new cell functionalities with complex dependences on more than one variable.

  14. BambooGDB: a bamboo genome database with functional annotation and an analysis platform.

    Science.gov (United States)

    Zhao, Hansheng; Peng, Zhenhua; Fei, Benhua; Li, Lubin; Hu, Tao; Gao, Zhimin; Jiang, Zehui

    2014-01-01

    Bamboo, as one of the most important non-timber forest products and fastest-growing plants in the world, represents the only major lineage of grasses that is native to forests. Recent success on the first high-quality draft genome sequence of moso bamboo (Phyllostachys edulis) provides new insights on bamboo genetics and evolution. To further extend our understanding on bamboo genome and facilitate future studies on the basis of previous achievements, here we have developed BambooGDB, a bamboo genome database with functional annotation and analysis platform. The de novo sequencing data, together with the full-length complementary DNA and RNA-seq data of moso bamboo composed the main contents of this database. Based on these sequence data, a comprehensively functional annotation for bamboo genome was made. Besides, an analytical platform composed of comparative genomic analysis, protein-protein interactions network, pathway analysis and visualization of genomic data was also constructed. As discovery tools to understand and identify biological mechanisms of bamboo, the platform can be used as a systematic framework for helping and designing experiments for further validation. Moreover, diverse and powerful search tools and a convenient browser were incorporated to facilitate the navigation of these data. As far as we know, this is the first genome database for bamboo. Through integrating high-throughput sequencing data, a full functional annotation and several analysis modules, BambooGDB aims to provide worldwide researchers with a central genomic resource and an extensible analysis platform for bamboo genome. BambooGDB is freely available at http://www.bamboogdb.org/. Database URL: http://www.bamboogdb.org.

  15. Pig genome sequence - analysis and publication strategy

    NARCIS (Netherlands)

    Archibald, A.L.; Bolund, L.; Churcher, C.; Fredholm, M.; Groenen, M.A.M.; Harlizius, B.

    2010-01-01

    Background - The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. Results - Assemblies of the B

  16. Whole genome sequencing as a tool for phylogenetic analysis of clinical strains of Mitis group streptococci.

    Science.gov (United States)

    Rasmussen, L H; Dargis, R; Højholt, K; Christensen, J J; Skovgaard, O; Justesen, U S; Rosenvinge, F S; Moser, C; Lukjancenko, O; Rasmussen, S; Nielsen, X C

    2016-10-01

    Identification of Mitis group streptococci (MGS) to the species level is challenging for routine microbiology laboratories. Correct identification is crucial for the diagnosis of infective endocarditis, identification of treatment failure, and/or infection relapse. Eighty MGS from Danish patients with infective endocarditis were whole genome sequenced. We compared the phylogenetic analyses based on single genes (recA, sodA, gdh), multigene (MLSA), SNPs, and core-genome sequences. The six phylogenetic analyses generally showed a similar pattern of six monophyletic clusters, though a few differences were observed in single gene analyses. Species identification based on single gene analysis showed their limitations when more strains were included. In contrast, analyses incorporating more sequence data, like MLSA, SNPs and core-genome analyses, provided more distinct clustering. The core-genome tree showed the most distinct clustering.

  17. Influence of incorporated wild Solanum genomes on potato properties in terms of starch nanostructure and glycoalkaloid content.

    Science.gov (United States)

    Väänänen, Tiina; Ikonen, Teemu; Rokka, Veli-Matti; Kuronen, Pirjo; Serimaa, Ritva; Ollilainen, Velimatti

    2005-06-29

    Interspecific somatic hybrids produced by protoplast fusion between two wild Solanum species (S. acaule, acl; S. brevidens, brd) and cultivated potato Solanum tuberosum (tbr) were analyzed in terms of the starch nanometer-range structure and glycoalkaloid (GA) contents. The crystallinity of starch granules, the average size of starch crystallites, and the lamellar distances were obtained from tuber samples using wide-angle and small-angle X-ray scattering methods. These measurements showed that incorporation of wild genomes from either nontuberous (brd) or tuberous (acl) Solanum species caused no significant modifications of the nanostructure of potato starch. In contrast, the GA profiles of the hybrids, which were analyzed by LC-ESI-MS in both tuber and foliage samples, differed considerably from those of cultivated potato. Regardless of the low total tuber GA concentrations (approximately 9 mg/100 g of fresh weight), the somatic hybrids contained GAs not detected in the parental species. A high proportion of spirotype GAs consisting of 5,6-dihydrogenated aglycons, for example, alpha-tomatine and tomatidine bound with solatriose, and chacotriose were found in the hybrids. In conclusion, the foliage of interspecific hybrids contained a higher variation in the structures of GAs than did the tubers.

  18. The complete mitochondrial genome of Gossypium hirsutum and evolutionary analysis of higher plant mitochondrial genomes.

    Directory of Open Access Journals (Sweden)

    Guozheng Liu

    Full Text Available BACKGROUND: Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L. is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt genome could be helpful for the evolution research of plant mt genomes. METHODOLOGY/PRINCIPAL FINDINGS: We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. CONCLUSION: The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.

  19. Analysis of brushless DC generator incorporating an axial field coil

    Energy Technology Data Exchange (ETDEWEB)

    Moradi, Hassan, E-mail: H_moradi@sbu.ac.i [Department of Electrical and Computer Engineering, Shahid Beheshti University, GC, Tehran (Iran, Islamic Republic of); Afjei, E. [Department of Electrical and Computer Engineering, Shahid Beheshti University, GC, Tehran (Iran, Islamic Republic of)

    2011-07-15

    Highlights: {yields} Magnetic analysis and experiment of a three-phase field assisted BLDC generator. {yields} Confirm the accuracy of the predicted flux-linkage by 2-D FE analysis. {yields} Confirm the accuracy of the FE analysis results by coupling the FE and BE method. {yields} Control the output voltage to a desired level by control the amplitude of the I{sub f}. {yields} Compatible with any application that requires variable speed operation. -- Abstract: This paper describes the magnetic analysis and experiment of a three-phase field assisted brushless DC (BLDC) generator. Unlike conventional BLDC generators, the permanent magnet is replaced with an assisted field winding. The stator and rotor are constructed with two dependent magnetically sets, in which each stator set includes nine salient poles with coil windings, and the rotor comprises of six salient poles. Other pole combinations also are possible. This construction is similar to a homopolar inductor alternator. The DC current in the assisted field winding produces axial flux which makes the rotor magnetically polarized at its ends. The magnetic field flows axially through the rotor shaft and closes through the stator teeth and the machine housing. To evaluate the generator performance, two types of analysis, namely the numerical technique and the experimental study have been utilized. In the numerical analysis, 2-D finite element (FE) analysis has been carried out using a MagNet CAD package (Infolytica Corporation Ltd.), to confirm the accuracy of the predicted flux-linkage characteristics, whereas in the experimental study, a prototype BLDC generator was constructed for verifying the actual performance. Furthermore, the evaluation method based on a hybrid numerical method coupling the finite element (FE) analysis and boundary element (BE) method, has been carried out to confirm the accuracy of the 2-D FE analysis simulation results. It provides not only confirmations of the investigation in results

  20. NeisseriaBase: a specialised Neisseria genomic resource and analysis platform.

    Science.gov (United States)

    Zheng, Wenning; Mutha, Naresh V R; Heydari, Hamed; Dutta, Avirup; Siow, Cheuk Chuen; Jakubovics, Nicholas S; Wee, Wei Yee; Tan, Shi Yang; Ang, Mia Yang; Wong, Guat Jah; Choo, Siew Woh

    2016-01-01

    Database (VFDB) specific homology searches, the VFDB BLAST is also incorporated into the database. In addition, NeisseriaBase is equipped with in-house designed tools such as the Pairwise Genome Comparison tool (PGC) for comparative genomic analysis and the Pathogenomics Profiling Tool (PathoProT) for the comparative pathogenomics analysis of Neisseria strains. Discussion. This user-friendly database not only provides access to a host of genomic resources on Neisseria but also enables high-quality comparative genome analysis, which is crucial for the expanding scientific community interested in Neisseria research. This database is freely available at http://neisseria.um.edu.my.

  1. NeisseriaBase: a specialised Neisseria genomic resource and analysis platform

    Directory of Open Access Journals (Sweden)

    Wenning Zheng

    2016-03-01

    Factor Database (VFDB specific homology searches, the VFDB BLAST is also incorporated into the database. In addition, NeisseriaBase is equipped with in-house designed tools such as the Pairwise Genome Comparison tool (PGC for comparative genomic analysis and the Pathogenomics Profiling Tool (PathoProT for the comparative pathogenomics analysis of Neisseria strains. Discussion. This user-friendly database not only provides access to a host of genomic resources on Neisseria but also enables high-quality comparative genome analysis, which is crucial for the expanding scientific community interested in Neisseria research. This database is freely available at http://neisseria.um.edu.my.

  2. Incorporating Semantics into Data Driven Workflows for Content Based Analysis

    Science.gov (United States)

    Argüello, M.; Fernandez-Prieto, M. J.

    Finding meaningful associations between text elements and knowledge structures within clinical narratives in a highly verbal domain, such as psychiatry, is a challenging goal. The research presented here uses a small corpus of case histories and brings into play pre-existing knowledge, and therefore, complements other approaches that use large corpus (millions of words) and no pre-existing knowledge. The paper describes a variety of experiments for content-based analysis: Linguistic Analysis using NLP-oriented approaches, Sentiment Analysis, and Semantically Meaningful Analysis. Although it is not standard practice, the paper advocates providing automatic support to annotate the functionality as well as the data for each experiment by performing semantic annotation that uses OWL and OWL-S. Lessons learnt can be transmitted to legacy clinical databases facing the conversion of clinical narratives according to prominent Electronic Health Records standards.

  3. SIGMA: A System for Integrative Genomic Microarray Analysis of Cancer Genomes

    Directory of Open Access Journals (Sweden)

    Davies Jonathan J

    2006-12-01

    Full Text Available Abstract Background The prevalence of high resolution profiling of genomes has created a need for the integrative analysis of information generated from multiple methodologies and platforms. Although the majority of data in the public domain are gene expression profiles, and expression analysis software are available, the increase of array CGH studies has enabled integration of high throughput genomic and gene expression datasets. However, tools for direct mining and analysis of array CGH data are limited. Hence, there is a great need for analytical and display software tailored to cross platform integrative analysis of cancer genomes. Results We have created a user-friendly java application to facilitate sophisticated visualization and analysis such as cross-tumor and cross-platform comparisons. To demonstrate the utility of this software, we assembled array CGH data representing Affymetrix SNP chip, Stanford cDNA arrays and whole genome tiling path array platforms for cross comparison. This cancer genome database contains 267 profiles from commonly used cancer cell lines representing 14 different tissue types. Conclusion In this study we have developed an application for the visualization and analysis of data from high resolution array CGH platforms that can be adapted for analysis of multiple types of high throughput genomic datasets. Furthermore, we invite researchers using array CGH technology to deposit both their raw and processed data, as this will be a continually expanding database of cancer genomes. This publicly available resource, the System for Integrative Genomic Microarray Analysis (SIGMA of cancer genomes, can be accessed at http://sigma.bccrc.ca.

  4. The JPL Cost Risk Analysis Approach that Incorporates Engineering Realism

    Science.gov (United States)

    Harmon, Corey C.; Warfield, Keith R.; Rosenberg, Leigh S.

    2006-01-01

    This paper discusses the JPL Cost Engineering Group (CEG) cost risk analysis approach that accounts for all three types of cost risk. It will also describe the evaluation of historical cost data upon which this method is based. This investigation is essential in developing a method that is rooted in engineering realism and produces credible, dependable results to aid decision makers.

  5. Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace

    Science.gov (United States)

    Thorvaldsdottir, Helga; Liefeld, Ted; Ocana, Marco; Borges-Rivera, Diego; Pochet, Nathalie; Robinson, James T.; Demchak, Barry; Hull, Tim; Ben-Artzi, Gil; Blankenberg, Daniel; Barber, Galt P.; Lee, Brian T.; Kuhn, Robert M.; Nekrutenko, Anton; Segal, Eran; Ideker, Trey; Reich, Michael; Regev, Aviv; Chang, Howard Y.; Mesirov, Jill P.

    2015-01-01

    Integrative analysis of multiple data types to address complex biomedical questions requires the use of multiple software tools in concert and remains an enormous challenge for most of the biomedical research community. Here we introduce GenomeSpace (http://www.genomespace.org), a cloud-based, cooperative community resource. Seeded as a collaboration of six of the most popular genomics analysis tools, GenomeSpace now supports the streamlined interaction of 20 bioinformatics tools and data resources. To facilitate the ability of non-programming users’ to leverage GenomeSpace in integrative analysis, it offers a growing set of ‘recipes’, short workflows involving a few tools and steps to guide investigators through high utility analysis tasks. PMID:26780094

  6. Genome analysis methods - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us PGDBj Registered...ear Year of genome analysis Sequencing method Sequencing method Read counts Read counts Covered genome region Covered...otation method Number of predicted genes Number of predicted genes Genome database Genome database informati... License Update History of This Database Site Policy | Contact Us Genome analysis... methods - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  7. Identification of probable genomic packaging signal sequence from SARS—CoV genome by bioinformatics analysis

    Institute of Scientific and Technical Information of China (English)

    QINLei; XIONGBin; LUOCheng; GUOZong-Ming; HAOPei; SUJiong; NANPeng; FENGYing; SHIYi-Xiang; YUXiao-Jing; LUOXiao-Min; CHENKai-Xian; SHENXu; SHENJian-Hua; ZOUJian-Ping; ZHAOGuo-Ping; SHITie-Liu; HEWei-Zhong; ZHONGYang; JIANGHua-Liang; LIYi-Xue

    2003-01-01

    AIM:To predict the probable genomic packaging signal of SARS-CoV by bioinformatics analysis. The derived packaging signal may be used to design antisense RNA and RNA interfere (RANi) drugs treating SARS. methods: Based on the studies about the genomic packaging signals of MHV and BCoV, especially the information about primary and secondary structures, the putative genomic packaging signal of SARS_CoV were analyzed by using bioinformatic tools. Multi-alignment for the genomic sequences was performed among SARS-CoV,MHV,BCoV, PEDV and HCoV 229E. Secondary structures of RNA sequences were also predicted for the identification fo the possible genomic packaging signals. Meanwhile, the N and M proteins of all five viruses were analyzed to study the evolutionary relationship with genomic packaging signals. RESULTS: The putative genomic packaging signal of SARS-CoV locates at the 3′ end of ORF1b near that of MHV and BCoV, where is the most variable region of this gene. The RNA secondary structure of SARS-CoV genomic packaging signal is very similar to that of MHV and BCoV. The same result was also obtained in studying the genomic packaging signals of PEDV and HCoV 229E. Further more, the genomic sequence multi-alignment indicated that the locations of packaging signals of SARS-CoV, PEDV, and HCoV overlaped each other. It seems that the mutation rate of packaging signal sequences is much higher than the N protein, while only subtle variations for the M protein. CONCLUSIONS: The probable genomic packaging signal of SARS-CoV is analogous to that of MHV and BCoV, with the corresponding secondary RNA structure locating at the similar region of ORF1b. The positions where genomic packaging signals exist have suffered rounds of mutations, which may influence the primary structures of the N and M proteins consequently.

  8. World-Systems Analysis, Globalization, and Incorporated Comparison

    Directory of Open Access Journals (Sweden)

    Phillip McMichael

    2015-08-01

    Full Text Available When Immanuel Wallerstein (1974 subverted the mid-1970s social science scene with his concept of the ‘world-system,’ development, the ‘master’ concept of social theory, suffered a fatal blow. Wallerstein’s critique of development emphasized its misapplication as a national strategy in a hierarchical world where only some states can ‘succeed.’ Wallerstein’s path-breaking epistemological challenge to the modernization paradigm reformulated the unit of analysis of development from the nation-state to the ‘world-system.’ To be sure, the past three decades have seen reformulations, coined to address the failures of the development enterprise: frombasic needs, through participation in the world market, globalization, to local sustainability. But development, the organizing myth of our age, has never recovered.

  9. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    Directory of Open Access Journals (Sweden)

    Katelyn McNair

    2015-06-01

    Full Text Available As more and more prokaryotic sequencing takes place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.

  10. Initial sequencing and analysis of the human genome.

    Science.gov (United States)

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

  11. Genomic analysis of plant chromosomes based on meiotic pairing

    Directory of Open Access Journals (Sweden)

    Lisete Chamma Davide

    2007-12-01

    Full Text Available This review presents the principles and applications of classical genomic analysis, with emphasis on plant breeding. The main mathematical models used to estimate the preferential chromosome pairing in diploid or polyploid, interspecific or intergenera hybrids are presented and discussed, with special reference to the applications and studies for the definition of genome relationships among species of the Poaceae family.

  12. Analysis of intra-genomic GC content homogeneity within prokaryotes

    DEFF Research Database (Denmark)

    Bohlin, J; Snipen, L; Hardy, S.P.;

    2010-01-01

    both aerobic and facultative microbes. Although an association has previously been found between mean genomic GC content and oxygen requirement, our analysis suggests that no such association exits when phylogenetic bias is accounted for. A significant association between GCVAR and mean GC content......Bacterial genomes possess varying GC content (total guanines (Gs) and cytosines (Cs) per total of the four bases within the genome) but within a given genome, GC content can vary locally along the chromosome, with some regions significantly more or less GC rich than on average. We have examined how...... the GC content varies within microbial genomes to assess whether this property can be associated with certain biological functions related to the organism's environment and phylogeny. We utilize a new quantity GCVAR, the intra-genomic GC content variability with respect to the average GC content...

  13. Analysis of Simple Sequence Repeats in Genomes of Rhizobia

    Institute of Scientific and Technical Information of China (English)

    GAO Ya-mei; HAN Yi-qiang; TANG Hui; SUN Dong-mei; WANG Yan-jie; WANG Wei-dong

    2008-01-01

    Simple sequence repeats (SSRs) or microsatellites, as genetic markers, are ubiquitous in genomes of various organisms. The analysis of SSR in rhizobia genome provides useful information for a variety of applications in population genetics of rhizobia. We analyzed the occurrences, relative abundance, and relative density of SSRs, the most common in Bradyrhizobium japonicum, Mesorhizobium loti, and Sinorhizobium meliloti genomes se-quenced in the microorganisms tandem repeats database, and SSRs in the three species genomes were compared with each other. The result showed that there were 1 410, 859, and 638 SSRs in B. japonicum, M. loti, and 5. meliloti genomes, respectively. In the genomes of B. japonicum, M. loti, and 5. meliloti, tetranucleotide, pentanucleotide, and hexanucleotide repeats were more abundant and indicated higher mutation rates in these species. The least abundance was mononucleotide repeat. The SSRs type and distribution were similar among these species.

  14. Analysis of the Vibrionaceae pan-genome

    OpenAIRE

    Kahlke, Tim

    2013-01-01

    Paper 2 of this thesis is not available in Munin: 2. Tim Kahlke, Alexander Goesmann and Peik Haugen: 'The Vibrionaceae pan-genome hints at gene expression as the major driving force for unequal gene distributions on Vibrionaceae chromosomes' (manuscript) In the presented work the bacterial family Vibrionaceae was used as a model to investigate bacterial diversity on a gene level and to analyze the underlying concepts of bacterial niche adaptation and evolution. For this, the genomes ...

  15. AN EXPLORATORY ANALYSIS OF INCORPORATING CUSTOMER EXPERIENCE FRAMEWORKS WITHIN AN EMBA PROGRAM

    OpenAIRE

    Francis Petit

    2009-01-01

    The purpose of this research is to determine how to effectively incorporate customer experience management frameworks within the marketing and management of Executive MBA Programs. To determine this information, two customer experience management frameworks were discussed in detail and then an analysis ensued on its potential applicability of enhancing the EMBA student experience. The main findings of this study indicate that as a result of this experience economy, incorporating a targeted, c...

  16. Chromosomes in the flow to simplify genome analysis.

    Science.gov (United States)

    Doležel, Jaroslav; Vrána, Jan; Safář, Jan; Bartoš, Jan; Kubaláková, Marie; Simková, Hana

    2012-08-01

    Nuclear genomes of human, animals, and plants are organized into subunits called chromosomes. When isolated into aqueous suspension, mitotic chromosomes can be classified using flow cytometry according to light scatter and fluorescence parameters. Chromosomes of interest can be purified by flow sorting if they can be resolved from other chromosomes in a karyotype. The analysis and sorting are carried out at rates of 10(2)-10(4) chromosomes per second, and for complex genomes such as wheat the flow sorting technology has been ground-breaking in reducing genome complexity for genome sequencing. The high sample rate provides an attractive approach for karyotype analysis (flow karyotyping) and the purification of chromosomes in large numbers. In characterizing the chromosome complement of an organism, the high number that can be studied using flow cytometry allows for a statistically accurate analysis. Chromosome sorting plays a particularly important role in the analysis of nuclear genome structure and the analysis of particular and aberrant chromosomes. Other attractive but not well-explored features include the analysis of chromosomal proteins, chromosome ultrastructure, and high-resolution mapping using FISH. Recent results demonstrate that chromosome flow sorting can be coupled seamlessly with DNA array and next-generation sequencing technologies for high-throughput analyses. The main advantages are targeting the analysis to a genome region of interest and a significant reduction in sample complexity. As flow sorters can also sort single copies of chromosomes, shotgun sequencing DNA amplified from them enables the production of haplotype-resolved genome sequences. This review explains the principles of flow cytometric chromosome analysis and sorting (flow cytogenetics), discusses the major uses of this technology in genome analysis, and outlines future directions.

  17. Analysis of intra-genomic GC content homogeneity within prokaryotes

    Directory of Open Access Journals (Sweden)

    Bohlin Jon

    2010-08-01

    Full Text Available Abstract Background Bacterial genomes possess varying GC content (total guanines (Gs and cytosines (Cs per total of the four bases within the genome but within a given genome, GC content can vary locally along the chromosome, with some regions significantly more or less GC rich than on average. We have examined how the GC content varies within microbial genomes to assess whether this property can be associated with certain biological functions related to the organism's environment and phylogeny. We utilize a new quantity GCVAR, the intra-genomic GC content variability with respect to the average GC content of the total genome. A low GCVAR indicates intra-genomic GC homogeneity and high GCVAR heterogeneity. Results The regression analyses indicated that GCVAR was significantly associated with domain (i.e. archaea or bacteria, phylum, and oxygen requirement. GCVAR was significantly higher among anaerobes than both aerobic and facultative microbes. Although an association has previously been found between mean genomic GC content and oxygen requirement, our analysis suggests that no such association exits when phylogenetic bias is accounted for. A significant association between GCVAR and mean GC content was also found but appears to be non-linear and varies greatly among phyla. Conclusions Our findings show that GCVAR is linked with oxygen requirement, while mean genomic GC content is not. We therefore suggest that GCVAR should be used as a complement to mean GC content.

  18. Comparative genomic analysis of eutherian interferon-γ-inducible GTPases.

    Science.gov (United States)

    Premzl, Marko

    2012-11-01

    The interferon-γ-inducible GTPases, IFGGs, are intracellular proteins involved in immune response against pathogens. A comprehensive comparative genomic review and analysis of eutherian IFGGs was carried out using public genomic sequences. The 64 eutherian IFGG genes were examined in detail and annotated. The eutherian IFGG promoter types were first catalogued followed by a phylogenetic analysis of eutherian IFGGs, which described five major IFGG clusters. The patterns of differential gene expansions and protein regions that may regulate IFGG catalytic features suggested a new classification of eutherian IFGGs. This mini-review has also provided new tests of reliability of public genomic sequences as well as tests of protein molecular evolution.

  19. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    Science.gov (United States)

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  20. Mycobacterial species as case-study of comparative genome analysis.

    Science.gov (United States)

    Zakham, F; Belayachi, L; Ussery, D; Akrim, M; Benjouad, A; El Aouad, R; Ennaji, M M

    2011-02-08

    The genus Mycobacterium represents more than 120 species including important pathogens of human and cause major public health problems and illnesses. Further, with more than 100 genome sequences from this genus, comparative genome analysis can provide new insights for better understanding the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length of genomes, GC content, number of genes in different data bases (Genbank, Refseq, and Prodigal). The BLAST matrix of these genomes has been figured to give a lot of information about the similarity between species in a simple scheme. As a result of multiple genome analysis, the pan and core genome have been defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene for tuberculosis and non tuberculosis Mycobacteria to understand the evolutionary events of these species.

  1. Comparative Genome Analysis Provides Insights into the Pathogenicity of Flavobacterium psychrophilum

    DEFF Research Database (Denmark)

    Castillo, Daniel; Christiansen, Rói Hammershaimb; Dalsgaard, Inger

    2016-01-01

    to describe the F. psychrophilum pan-genome and to examine virulence factors, prophages, CRISPR arrays, and genomic islands present in the genomes. Analysis of the genomic DNA sequences were complemented with selected phenotypic characteristics of the strains. The pan genome analysis showed that F...

  2. Whole genome sequencing analysis of Plasmodium vivax using whole genome capture

    Directory of Open Access Journals (Sweden)

    Bright A

    2012-06-01

    Full Text Available Abstract Background Malaria caused by Plasmodium vivax is an experimentally neglected severe disease with a substantial burden on human health. Because of technical limitations, little is known about the biology of this important human pathogen. Whole genome analysis methods on patient-derived material are thus likely to have a substantial impact on our understanding of P. vivax pathogenesis and epidemiology. For example, it will allow study of the evolution and population biology of the parasite, allow parasite transmission patterns to be characterized, and may facilitate the identification of new drug resistance genes. Because parasitemias are typically low and the parasite cannot be readily cultured, on-site leukocyte depletion of blood samples is typically needed to remove human DNA that may be 1000X more abundant than parasite DNA. These features have precluded the analysis of archived blood samples and require the presence of laboratories in close proximity to the collection of field samples for optimal pre-cryopreservation sample preparation. Results Here we show that in-solution hybridization capture can be used to extract P. vivax DNA from human contaminating DNA in the laboratory without the need for on-site leukocyte filtration. Using a whole genome capture method, we were able to enrich P. vivax DNA from bulk genomic DNA from less than 0.5% to a median of 55% (range 20%-80%. This level of enrichment allows for efficient analysis of the samples by whole genome sequencing and does not introduce any gross biases into the data. With this method, we obtained greater than 5X coverage across 93% of the P. vivax genome for four P. vivax strains from Iquitos, Peru, which is similar to our results using leukocyte filtration (greater than 5X coverage across 96% . Conclusion The whole genome capture technique will enable more efficient whole genome analysis of P. vivax from a larger geographic region and from valuable archived sample collections.

  3. A novel statistic for genome-wide interaction analysis.

    Science.gov (United States)

    Wu, Xuesen; Dong, Hua; Luo, Li; Zhu, Yun; Peng, Gang; Reveille, John D; Xiong, Momiao

    2010-09-23

    Although great progress in genome-wide association studies (GWAS) has been made, the significant SNP associations identified by GWAS account for only a few percent of the genetic variance, leading many to question where and how we can find the missing heritability. There is increasing interest in genome-wide interaction analysis as a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genome-wide interaction analysis. To meet challenges raised by genome-wide interactional analysis, we have developed a novel statistic for testing interaction between two loci (either linked or unlinked). The null distribution and the type I error rates of the new statistic for testing interaction are validated using simulations. Extensive power studies show that the developed statistic has much higher power to detect interaction than classical logistic regression. The results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDRanalysis is a valuable tool for finding remaining missing heritability unexplained by the current GWAS, and the developed novel statistic is able to search significant interaction between SNPs across the genome. Real data analysis showed that the results of genome-wide interaction analysis can be replicated in two independent studies.

  4. Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome

    Directory of Open Access Journals (Sweden)

    Childs Kevin L

    2010-11-01

    Full Text Available Abstract Background A goal of the Bovine Genome Database (BGD; http://BovineGenome.org has been to support the Bovine Genome Sequencing and Analysis Consortium (BGSAC in the annotation and analysis of the bovine genome. We were faced with several challenges, including the need to maintain consistent quality despite diversity in annotation expertise in the research community, the need to maintain consistent data formats, and the need to minimize the potential duplication of annotation effort. With new sequencing technologies allowing many more eukaryotic genomes to be sequenced, the demand for collaborative annotation is likely to increase. Here we present our approach, challenges and solutions facilitating a large distributed annotation project. Results and Discussion BGD has provided annotation tools that supported 147 members of the BGSAC in contributing 3,871 gene models over a fifteen-week period, and these annotations have been integrated into the bovine Official Gene Set. Our approach has been to provide an annotation system, which includes a BLAST site, multiple genome browsers, an annotation portal, and the Apollo Annotation Editor configured to connect directly to our Chado database. In addition to implementing and integrating components of the annotation system, we have performed computational analyses to create gene evidence tracks and a consensus gene set, which can be viewed on individual gene pages at BGD. Conclusions We have provided annotation tools that alleviate challenges associated with distributed annotation. Our system provides a consistent set of data to all annotators and eliminates the need for annotators to format data. Involving the bovine research community in genome annotation has allowed us to leverage expertise in various areas of bovine biology to provide biological insight into the genome sequence.

  5. Hyperstructures, genome analysis and I-cells

    DEFF Research Database (Denmark)

    Amar, P.; Ballet, P.; Barlovatz-Meimon, G.;

    2002-01-01

    New concepts may prove necessary to profit from the avalanche of sequence data on the genome, transcriptome, proteome and interactome and to relate this information to cell physiology. Here, we focus on the concept of large activity-based structures, or hyperstructures, in which a variety of type...

  6. mirTarPri: improved prioritization of microRNA targets through incorporation of functional genomics data.

    Directory of Open Access Journals (Sweden)

    Peng Wang

    Full Text Available MicroRNAs (miRNAs are a class of small (19-25 nt non-coding RNAs. This important class of gene regulator downregulates gene expression through sequence-specific binding to the 3'untranslated regions (3'UTRs of target mRNAs. Several computational target prediction approaches have been developed for predicting miRNA targets. However, the predicted target lists often have high false positive rates. To construct a workable target list for subsequent experimental studies, we need novel approaches to properly rank the candidate targets from traditional methods. We performed a systematic analysis of experimentally validated miRNA targets using functional genomics data, and found significant functional associations between genes that were targeted by the same miRNA. Based on this finding, we developed a miRNA target prioritization method named mirTarPri to rank the predicted target lists from commonly used target prediction methods. Leave-one-out cross validation has proved to be successful in identifying known targets, achieving an AUC score up to 0. 84. Validation in high-throughput data proved that mirTarPri was an unbiased method. Applying mirTarPri to prioritize results of six commonly used target prediction methods allowed us to find more positive targets at the top of the prioritized candidate list. In comparison with other methods, mirTarPri had an outstanding performance in gold standard and CLIP data. mirTarPri was a valuable method to improve the efficacy of current miRNA target prediction methods. We have also developed a web-based server for implementing mirTarPri method, which is freely accessible at http://bioinfo.hrbmu.edu.cn/mirTarPri.

  7. Copy Number Variation Analysis by Array Analysis of Single Cells Following Whole Genome Amplification.

    Science.gov (United States)

    Dimitriadou, Eftychia; Zamani Esteki, Masoud; Vermeesch, Joris Robert

    2015-01-01

    Whole genome amplification is required to ensure the availability of sufficient material for copy number variation analysis of a genome deriving from an individual cell. Here, we describe the protocols we use for copy number variation analysis of non-fixed single cells by array-based approaches following single-cell isolation and whole genome amplification. We are focusing on two alternative protocols, an isothermal and a PCR-based whole genome amplification method, followed by either comparative genome hybridization (aCGH) or SNP array analysis, respectively.

  8. MIPS: analysis and annotation of proteins from whole genomes.

    Science.gov (United States)

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  9. Pan-cancer analysis of ROS1 genomic aberrations

    OpenAIRE

    Wang, Yidan; 王奕丹

    2015-01-01

    The ROS proto-oncogene 1 (ROS1) encodes the ROS1 receptor kinase. ROS1 rearrangements are known to be oncogenic in glioblastoma, non–small-cell lung carcinoma (NSCLC) and cholangiocarcinoma. The clinical relevance of ROS1 genomic aberrations in other human cancers is largely unexamined. Here, we performed a pan-cancer analysis of ROS1 genomic aberrations across 20 cancer sites by interrogating the whole-exome sequencing data of the Cancer Genome Atlas (TCGA) via the cBioportal (www.cbioportal...

  10. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.

    Science.gov (United States)

    Riechmann, J L; Heard, J; Martin, G; Reuber, L; Jiang, C; Keddie, J; Adam, L; Pineda, O; Ratcliffe, O J; Samaha, R R; Creelman, R; Pilgrim, M; Broun, P; Zhang, J Z; Ghandehari, D; Sherman, B K; Yu, G

    2000-12-15

    The completion of the Arabidopsis thaliana genome sequence allows a comparative analysis of transcriptional regulators across the three eukaryotic kingdoms. Arabidopsis dedicates over 5% of its genome to code for more than 1500 transcription factors, about 45% of which are from families specific to plants. Arabidopsis transcription factors that belong to families common to all eukaryotes do not share significant similarity with those of the other kingdoms beyond the conserved DNA binding domains, many of which have been arranged in combinations specific to each lineage. The genome-wide comparison reveals the evolutionary generation of diversity in the regulation of transcription.

  11. Comparative Genomics via Wavelet Analysis for Closely Related Bacteria

    Directory of Open Access Journals (Sweden)

    Jiuzhou Song

    2004-01-01

    Full Text Available Comparative genomics has been a valuable method for extracting and extrapolating genome information among closely related bacteria. The efficiency of the traditional methods is extremely influenced by the software method used. To overcome the problem here, we propose using wavelet analysis to perform comparative genomics. First, global comparison using wavelet analysis gives the difference at a quantitative level. Then local comparison using keto-excess or purine-excess plots shows precise positions of inversions, translocations, and horizontally transferred DNA fragments. We firstly found that the level of energy spectra difference is related to the similarity of bacteria strains; it could be a quantitative index to describe the similarities of genomes. The strategy is described in detail by comparisons of closely related strains: S.typhi CT18, S.typhi Ty2, S.typhimurium LT2, H.pylori 26695, and H.pylori J99.

  12. Comparative Genomics via Wavelet Analysis for Closely Related Bacteria

    Science.gov (United States)

    Song, Jiuzhou; Ware, Tony; Liu, Shu-Lin; Surette, M.

    2004-12-01

    Comparative genomics has been a valuable method for extracting and extrapolating genome information among closely related bacteria. The efficiency of the traditional methods is extremely influenced by the software method used. To overcome the problem here, we propose using wavelet analysis to perform comparative genomics. First, global comparison using wavelet analysis gives the difference at a quantitative level. Then local comparison using keto-excess or purine-excess plots shows precise positions of inversions, translocations, and horizontally transferred DNA fragments. We firstly found that the level of energy spectra difference is related to the similarity of bacteria strains; it could be a quantitative index to describe the similarities of genomes. The strategy is described in detail by comparisons of closely related strains: S.typhi CT18, S.typhi Ty2, S.typhimurium LT2, H.pylori 26695, and H.pylori J99.

  13. Complete genome sequence of Enterococcus faecium strain TX16 and comparative genomic analysis of Enterococcus faecium genomes

    Directory of Open Access Journals (Sweden)

    Qin Xiang

    2012-07-01

    Full Text Available Abstract Background Enterococci are among the leading causes of hospital-acquired infections in the United States and Europe, with Enterococcus faecalis and Enterococcus faecium being the two most common species isolated from enterococcal infections. In the last decade, the proportion of enterococcal infections caused by E. faecium has steadily increased compared to other Enterococcus species. Although the underlying mechanism for the gradual replacement of E. faecalis by E. faecium in the hospital environment is not yet understood, many studies using genotyping and phylogenetic analysis have shown the emergence of a globally dispersed polyclonal subcluster of E. faecium strains in clinical environments. Systematic study of the molecular epidemiology and pathogenesis of E. faecium has been hindered by the lack of closed, complete E. faecium genomes that can be used as references. Results In this study, we report the complete genome sequence of the E. faecium strain TX16, also known as DO, which belongs to multilocus sequence type (ST 18, and was the first E. faecium strain ever sequenced. Whole genome comparison of the TX16 genome with 21 E. faecium draft genomes confirmed that most clinical, outbreak, and hospital-associated (HA strains (including STs 16, 17, 18, and 78, in addition to strains of non-hospital origin, group in the same clade (referred to as the HA clade and are evolutionally considerably more closely related to each other by phylogenetic and gene content similarity analyses than to isolates in the community-associated (CA clade with approximately a 3–4% average nucleotide sequence difference between the two clades at the core genome level. Our study also revealed that many genomic loci in the TX16 genome are unique to the HA clade. 380 ORFs in TX16 are HA-clade specific and antibiotic resistance genes are enriched in HA-clade strains. Mobile elements such as IS16 and transposons were also found almost exclusively in HA strains

  14. The genome sequence of Blochmannia floridanus: Comparative analysis of reduced genomes

    Science.gov (United States)

    Gil, Rosario; Silva, Francisco J.; Zientz, Evelyn; Delmotte, François; González-Candelas, Fernando; Latorre, Amparo; Rausell, Carolina; Kamerbeek, Judith; Gadau, Jürgen; Hölldobler, Bert; van Ham, Roeland C. H. J.; Gross, Roy; Moya, Andrés

    2003-01-01

    Bacterial symbioses are widespread among insects, probably being one of the key factors of their evolutionary success. We present the complete genome sequence of Blochmannia floridanus, the primary endosymbiont of carpenter ants. Although these ants feed on a complex diet, this symbiosis very likely has a nutritional basis: Blochmannia is able to supply nitrogen and sulfur compounds to the host while it takes advantage of the host metabolic machinery. Remarkably, these bacteria lack all known genes involved in replication initiation (dnaA, priA, and recA). The phylogenetic analysis of a set of conserved protein-coding genes shows that Bl. floridanus is phylogenetically related to Buchnera aphidicola and Wigglesworthia glossinidia, the other endosymbiotic bacteria whose complete genomes have been sequenced so far. Comparative analysis of the five known genomes from insect endosymbiotic bacteria reveals they share only 313 genes, a number that may be close to the minimum gene set necessary to sustain endosymbiotic life. PMID:12886019

  15. Single-cell analysis in cancer genomics

    Science.gov (United States)

    Saadatpour, Assieh; Lai, Shujing; Guo, Guoji; Yuan, Guo-Cheng

    2017-01-01

    Genetic changes and environmental differences result in cellular heterogeneity among cancer cells within the same tumor, thereby complicating treatment outcomes. Recent advances in single-cell technologies have opened new avenues to characterize the intra-tumor cellular heterogeneity, identify rare cell types, measure mutation rates, and, ultimately, guide diagnosis and treatment. In this paper, we review the recent single-cell technological and computational advances at the genomic, transcriptomic, and proteomic levels, and discuss their applications in cancer research. PMID:26450340

  16. Genomic analysis of epithelial ovarian cancer

    Institute of Scientific and Technical Information of China (English)

    John Farley; Laurent L Ozbun; Michael J Birrer

    2008-01-01

    Ovarian cancer is a major health problem for women in the United States.Despite evidence of considerable heterogeneity,most cases of ovarian cancer are treated in a similar fashion.The molecular basis for the clinicopathologic characteristics of these tumors remains poorly defined.Whole genome expression profiling is a genomic tool,which can identify dysregulated genes and uncover unique sub-classes of tumors.The application of this technology to ovarian cancer has provided a solid molecular basis for differences in histology and grade of ovarian tumors.Differentially expressed genes identified pathways implicated in cell proliferation,invasion,motility,chromosomal instability,and gene silencing and provided new insights into the origin and potential treatment of these cancers.The added knowledge provided by global gene expression profiling should allow for a more rational treatment of ovarian cancers.These techniques are leading to a paradigm shift from empirical treatment to an individually tailored approach.This review summarizes the new genomic data on epithelial ovarian cancers of different histology and grade and the impact it will have on our understanding and treatment of this disease.

  17. Incorporation of wind generation to the Mexican power grid: Steady state analysis

    Energy Technology Data Exchange (ETDEWEB)

    Tovar, J.H.; Guardado, J.L.; Cisneros, F. [Inst. Tecnologico de Morelia (Mexico); Cadenas, R.; Lopez, S. [Comision Federal de Electricidad, Morelia (Mexico)

    1997-09-01

    This paper describes a steady state analysis related with the incorporation of large amounts of eolic generation into the Mexican power system. An equivalent node is used to represent individual eolic generators in the wind farm. Possible overloads, losses, voltage and reactive profiles and estimated severe contingencies are analyzed. Finally, the conclusions of this study are presented.

  18. The complete genome sequence and comparative genome analysis of the high pathogenicity Yersinia enterocolitica strain 8081.

    Directory of Open Access Journals (Sweden)

    Nicholas R Thomson

    2006-12-01

    Full Text Available The human enteropathogen, Yersinia enterocolitica, is a significant link in the range of Yersinia pathologies extending from mild gastroenteritis to bubonic plague. Comparison at the genomic level is a key step in our understanding of the genetic basis for this pathogenicity spectrum. Here we report the genome of Y. enterocolitica strain 8081 (serotype 0:8; biotype 1B and extensive microarray data relating to the genetic diversity of the Y. enterocolitica species. Our analysis reveals that the genome of Y. enterocolitica strain 8081 is a patchwork of horizontally acquired genetic loci, including a plasticity zone of 199 kb containing an extraordinarily high density of virulence genes. Microarray analysis has provided insights into species-specific Y. enterocolitica gene functions and the intraspecies differences between the high, low, and nonpathogenic Y. enterocolitica biotypes. Through comparative genome sequence analysis we provide new information on the evolution of the Yersinia. We identify numerous loci that represent ancestral clusters of genes potentially important in enteric survival and pathogenesis, which have been lost or are in the process of being lost, in the other sequenced Yersinia lineages. Our analysis also highlights large metabolic operons in Y. enterocolitica that are absent in the related enteropathogen, Yersinia pseudotuberculosis, indicating major differences in niche and nutrients used within the mammalian gut. These include clusters directing, the production of hydrogenases, tetrathionate respiration, cobalamin synthesis, and propanediol utilisation. Along with ancestral gene clusters, the genome of Y. enterocolitica has revealed species-specific and enteropathogen-specific loci. This has provided important insights into the pathology of this bacterium and, more broadly, into the evolution of the genus. Moreover, wider investigations looking at the patterns of gene loss and gain in the Yersinia have highlighted common

  19. Diversity of Pseudomonas Genomes, Including Populus-Associated Isolates, as Revealed by Comparative Genome Analysis.

    Science.gov (United States)

    Jun, Se-Ran; Wassenaar, Trudy M; Nookaew, Intawat; Hauser, Loren; Wanchai, Visanu; Land, Miriam; Timm, Collin M; Lu, Tse-Yuan S; Schadt, Christopher W; Doktycz, Mitchel J; Pelletier, Dale A; Ussery, David W

    2015-10-30

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches, including the rhizosphere and endosphere of many plants. Their diversity influences the phylogenetic diversity and heterogeneity of these communities. On the basis of average amino acid identity, comparative genome analysis of >1,000 Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides (eastern cottonwood) trees resulted in consistent and robust genomic clusters with phylogenetic homogeneity. All Pseudomonas aeruginosa genomes clustered together, and these were clearly distinct from other Pseudomonas species groups on the basis of pangenome and core genome analyses. In contrast, the genomes of Pseudomonas fluorescens were organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. Most of our 21 Populus-associated isolates formed three distinct subgroups within the major P. fluorescens group, supported by pathway profile analysis, while two isolates were more closely related to Pseudomonas chlororaphis and Pseudomonas putida. Genes specific to Populus-associated subgroups were identified. Genes specific to subgroup 1 include several sensory systems that act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor. Genes specific to subgroup 2 contain hypothetical genes, and genes specific to subgroup 3 were annotated with hydrolase activity. This study justifies the need to sequence multiple isolates, especially from P. fluorescens, which displays the most genetic variation, in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.

  20. A novel statistic for genome-wide interaction analysis.

    Directory of Open Access Journals (Sweden)

    Xuesen Wu

    2010-09-01

    Full Text Available Although great progress in genome-wide association studies (GWAS has been made, the significant SNP associations identified by GWAS account for only a few percent of the genetic variance, leading many to question where and how we can find the missing heritability. There is increasing interest in genome-wide interaction analysis as a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genome-wide interaction analysis. To meet challenges raised by genome-wide interactional analysis, we have developed a novel statistic for testing interaction between two loci (either linked or unlinked. The null distribution and the type I error rates of the new statistic for testing interaction are validated using simulations. Extensive power studies show that the developed statistic has much higher power to detect interaction than classical logistic regression. The results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDR<0.001 and 0.001genome-wide interaction analysis is a valuable tool for finding remaining missing heritability unexplained by the current GWAS, and the developed novel statistic is able to search significant interaction between SNPs across the genome. Real data analysis showed that the results of genome-wide interaction analysis can be replicated in two independent studies.

  1. Analysis of the core genome and pan-genome of autotrophic acetogenic bacteria

    Directory of Open Access Journals (Sweden)

    JongOh Shin

    2016-09-01

    Full Text Available Acetogens are obligate anaerobic bacteria capable of reducing carbon dioxide (CO2 to multicarbon compounds coupled to the oxidation of inorganic substrates, such as hydrogen (H2 or carbon monoxide (CO, via the Wood-Ljungdahl pathway. Owing to the metabolic capability of CO2 fixation, much attention has been focused on understanding the unique pathways associated with acetogens, particularly their metabolic coupling of CO2 fixation to energy conservation. Most known acetogens are phylogenetically and metabolically diverse bacteria present in 23 different bacterial genera. With the increased volume of available genome information, acetogenic bacterial genomes can be analyzed by comparative genome analysis. Even with the genetic diversity that exists among acetogens, the Wood-Ljungdahl pathway, a central metabolic pathway, and cofactor biosynthetic pathways are highly conserved for autotrophic growth. Additionally, comparative genome analysis revealed that most genes in the acetogen-specific core genome were associated with the Wood-Ljungdahl pathway. The conserved enzymes and those predicted as missing can provide insight into biological differences between acetogens and allow for the discovery of promising candidates for industrial applications.

  2. Analysis of the Core Genome and Pan-Genome of Autotrophic Acetogenic Bacteria

    Science.gov (United States)

    Shin, Jongoh; Song, Yoseb; Jeong, Yujin; Cho, Byung-Kwan

    2016-01-01

    Acetogens are obligate anaerobic bacteria capable of reducing carbon dioxide (CO2) to multicarbon compounds coupled to the oxidation of inorganic substrates, such as hydrogen (H2) or carbon monoxide (CO), via the Wood-Ljungdahl pathway. Owing to the metabolic capability of CO2 fixation, much attention has been focused on understanding the unique pathways associated with acetogens, particularly their metabolic coupling of CO2 fixation to energy conservation. Most known acetogens are phylogenetically and metabolically diverse bacteria present in 23 different bacterial genera. With the increased volume of available genome information, acetogenic bacterial genomes can be analyzed by comparative genome analysis. Even with the genetic diversity that exists among acetogens, the Wood-Ljungdahl pathway, a central metabolic pathway, and cofactor biosynthetic pathways are highly conserved for autotrophic growth. Additionally, comparative genome analysis revealed that most genes in the acetogen-specific core genome were associated with the Wood-Ljungdahl pathway. The conserved enzymes and those predicted as missing can provide insight into biological differences between acetogens and allow for the discovery of promising candidates for industrial applications. PMID:27733845

  3. Enhancing genomic laboratory reports: A qualitative analysis of provider review

    Science.gov (United States)

    Rahm, Alanna Kulchak; Stuckey, Heather; Green, Jamie; Feldman, Lynn; Zallen, Doris T.; Bonhag, Michele; Segal, Michael M.; Fan, Audrey L.; Williams, Marc S.

    2016-01-01

    This study reports on the responses of physicians who reviewed provider and patient versions of a genomic laboratory report designed to communicate results of whole genome sequencing. Semi‐structured interviews addressed concept communication, elements, and format of example genome reports. Analysis of the coded transcripts resulted in recognition of three constructs around communication of genome sequencing results: (1) Providers agreed that whole genomic sequencing results are complex and they welcomed a report that provided supportive interpretation information to accompany sequencing results; (2) Providers strongly endorsed a report that included active clinical guidance, such as reference to practice guidelines, if available; and (3) Providers valued the genomic report as a resource that would serve as the basis to facilitate communication of genome sequencing results with their patients and families. Providers valued both versions of the report, though they affirmed the need for a provider‐oriented report. Critical elements of the report included clear language to explain the result, as well as consolidated yet comprehensive prognostic information with clear guidance over time for the clinical care of the patient. Most importantly, it appears a report with this design has the potential not only to return results but also serves as a communication tool to help providers and patients discuss and coordinate care over time. © 2016 The Authors. American Journal of Medical Genetics Part A published by Wiley Periodicals, Inc. PMID:26842872

  4. Incorporation of Uncertainty and Variability of Drip Shield and Waste Package Degradation in WAPDEG Analysis

    Energy Technology Data Exchange (ETDEWEB)

    J.C. Helton

    2000-04-19

    This presentation investigates the incorporation of uncertainty and variability of drip shield and waste package degradation in analyses with the Waste Package Degradation (WAPDEG) program (CRWMS M&O 1998). This plan was developed in accordance with Development Plan TDP-EBS-MD-000020 (CRWMS M&O 1999a). Topics considered include (1) the nature of uncertainty and variability (Section 6.1), (2) incorporation of variability and uncertainty into analyses involving individual patches, waste packages, groups of waste packages, and the entire repository (Section 6.2), (3) computational strategies (Section 6.3), (4) incorporation of multiple waste package layers (i.e., drip shield, Alloy 22, and stainless steel) into an analysis (Section 6.4), (5) uncertainty in the characterization of variability (Section 6.5), and (6) Gaussian variance partitioning (Section 6.6). The presentation ends with a brief concluding discussion (Section 7).

  5. Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions

    Directory of Open Access Journals (Sweden)

    Villegas Andre

    2010-09-01

    Full Text Available Abstract Background The pan-genome of a bacterial species consists of a core and an accessory gene pool. The accessory genome is thought to be an important source of genetic variability in bacterial populations and is gained through lateral gene transfer, allowing subpopulations of bacteria to better adapt to specific niches. Low-cost and high-throughput sequencing platforms have created an exponential increase in genome sequence data and an opportunity to study the pan-genomes of many bacterial species. In this study, we describe a new online pan-genome sequence analysis program, Panseq. Results Panseq was used to identify Escherichia coli O157:H7 and E. coli K-12 genomic islands. Within a population of 60 E. coli O157:H7 strains, the existence of 65 accessory genomic regions identified by Panseq analysis was confirmed by PCR. The accessory genome and binary presence/absence data, and core genome and single nucleotide polymorphisms (SNPs of six L. monocytogenes strains were extracted with Panseq and hierarchically clustered and visualized. The nucleotide core and binary accessory data were also used to construct maximum parsimony (MP trees, which were compared to the MP tree generated by multi-locus sequence typing (MLST. The topology of the accessory and core trees was identical but differed from the tree produced using seven MLST loci. The Loci Selector module found the most variable and discriminatory combinations of four loci within a 100 loci set among 10 strains in 1 s, compared to the 449 s required to exhaustively search for all possible combinations; it also found the most discriminatory 20 loci from a 96 loci E. coli O157:H7 SNP dataset. Conclusion Panseq determines the core and accessory regions among a collection of genomic sequences based on user-defined parameters. It readily extracts regions unique to a genome or group of genomes, identifies SNPs within shared core genomic regions, constructs files for use in phylogeny programs

  6. Applied bioinformatics: Genome annotation and transcriptome analysis

    DEFF Research Database (Denmark)

    Gupta, Vikas

    and dhurrin, which have not previously been characterized in blueberries. There are more than 44,500 spider species with distinct habitats and unique characteristics. Spiders are masters of producing silk webs to catch prey and using venom to neutralize. The exploration of the genetics behind these properties...... has just started. We have assembled and annotated the first two spider genomes to facilitate our understanding of spiders at the molecular level. The need for analyzing the large and increasing amount of sequencing data has increased the demand for efficient, user friendly, and broadly applicable...

  7. Genome wide copy number analysis of single cells

    Science.gov (United States)

    Baslan, Timour; Kendall, Jude; Rodgers, Linda; Cox, Hilary; Riggs, Mike; Stepansky, Asya; Troge, Jennifer; Ravi, Kandasamy; Esposito, Diane; Lakshmi, B.; Wigler, Michael; Navin, Nicholas; Hicks, James

    2016-01-01

    Summary Copy number variation (CNV) is increasingly recognized as an important contributor to phenotypic variation in health and disease. Most methods for determining CNV rely on admixtures of cells, where information regarding genetic heterogeneity is lost. Here, we present a protocol that allows for the genome wide copy number analysis of single nuclei isolated from mixed populations of cells. Single nucleus sequencing (SNS), combines flow sorting of single nuclei based on DNA content, whole genome amplification (WGA), followed by next generation sequencing to quantize genomic intervals in a genome wide manner. Multiplexing of single cells is discussed. Additionally, we outline informatic approaches that correct for biases inherent in the WGA procedure and allow for accurate determination of copy number profiles. All together, the protocol takes ~3 days from flow cytometry to sequence-ready DNA libraries. PMID:22555242

  8. Cytogenetic analysis from DNA by comparative genomic hybridization.

    Science.gov (United States)

    Tachdjian, G; Aboura, A; Lapierre, J M; Viguié, F

    2000-01-01

    Comparative genomic hybridization (CGH) is a modified in situ hybridization technique which allows detection and mapping of DNA sequence copy differences between two genomes in a single experiment. In CGH analysis, two differentially labelled genomic DNA (study and reference) are co-hybridized to normal metaphase spreads. Chromosomal locations of copy number changes in the DNA segments of the study genome are revealed by a variable fluorescence intensity ratio along each target chromosome. Since its development, CGH has been applied mostly as a research tool in the field of cancer cytogenetics to identify genetic changes in many previously unknown regions. CGH may also have a role in clinical cytogenetics for detection and identification of unbalanced chromosomal abnormalities.

  9. Differential DNA Methylation Analysis without a Reference Genome.

    Science.gov (United States)

    Klughammer, Johanna; Datlinger, Paul; Printz, Dieter; Sheffield, Nathan C; Farlik, Matthias; Hadler, Johanna; Fritsch, Gerhard; Bock, Christoph

    2015-12-22

    Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS), which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish). Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org). The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome.

  10. Differential DNA Methylation Analysis without a Reference Genome

    Directory of Open Access Journals (Sweden)

    Johanna Klughammer

    2015-12-01

    Full Text Available Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS, which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish. Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org. The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome.

  11. Savant Genome Browser 2: visualization and analysis for population-scale genomics.

    Science.gov (United States)

    Fiume, Marc; Smith, Eric J M; Brook, Andrew; Strbenac, Dario; Turner, Brian; Mezlini, Aziz M; Robinson, Mark D; Wodak, Shoshana J; Brudno, Michael

    2012-07-01

    High-throughput sequencing (HTS) technologies are providing an unprecedented capacity for data generation, and there is a corresponding need for efficient data exploration and analysis capabilities. Although most existing tools for HTS data analysis are developed for either automated (e.g. genotyping) or visualization (e.g. genome browsing) purposes, such tools are most powerful when combined. For example, integration of visualization and computation allows users to iteratively refine their analyses by updating computational parameters within the visual framework in real-time. Here we introduce the second version of the Savant Genome Browser, a standalone program for visual and computational analysis of HTS data. Savant substantially improves upon its predecessor and existing tools by introducing innovative visualization modes and navigation interfaces for several genomic datatypes, and synergizing visual and automated analyses in a way that is powerful yet easy even for non-expert users. We also present a number of plugins that were developed by the Savant Community, which demonstrate the power of integrating visual and automated analyses using Savant. The Savant Genome Browser is freely available (open source) at www.savantbrowser.com.

  12. What’s in the genome of a filamentous fungus? Analysis of the Neurospora genome sequence

    Science.gov (United States)

    Mannhaupt, Gertrud; Montrone, Corinna; Haase, Dirk; Mewes, H. Werner; Aign, Verena; Hoheisel, Jörg D.; Fartmann, Berthold; Nyakatura, Gerald; Kempken, Frank; Maier, Josef; Schulte, Ulrich

    2003-01-01

    The German Neurospora Genome Project has assembled sequences from ordered cosmid and BAC clones of linkage groups II and V of the genome of Neurospora crassa in 13 and 12 contigs, respectively. Including additional sequences located on other linkage groups a total of 12 Mb were subjected to a manual gene extraction and annotation process. The genome comprises a small number of repetitive elements, a low degree of segmental duplications and very few paralogous genes. The analysis of the 3218 identified open reading frames provides a first overview of the protein equipment of a filamentous fungus. Significantly, N.crassa possesses a large variety of metabolic enzymes including a substantial number of enzymes involved in the degradation of complex substrates as well as secondary metabolism. While several of these enzymes are specific for filamentous fungi many are shared exclusively with prokaryotes. PMID:12655011

  13. Yeast as a touchstone in post-genomic research: strategies for integrative analysis in functional genomics.

    Science.gov (United States)

    Castrillo, Juan I; Oliver, Stephen G

    2004-01-31

    The new complexity arising from the genome sequencing projects requires new comprehensive post-genomic strategies: advanced studies in regulatory mechanisms, application of new high-throughput technologies at a genome-wide scale, at the different levels of cellular complexity (genome, transcriptome, proteome and metabolome), efficient analysis of the results, and application of new bioinformatic methods in an integrative or systems biology perspective. This can be accomplished in studies with model organisms under controlled conditions. In this review a perspective of the favourable characteristics of yeast as a touchstone model in post-genomic research is presented. The state-of-the art, latest advances in the field and bottlenecks, new strategies, new regulatory mechanisms, applications (patents) and high-throughput technologies, most of them being developed and validated in yeast, are presented. The optimal characteristics of yeast as a well-defined system for comprehensive studies under controlled conditions makes it a perfect model to be used in integrative, "systems biology" studies to get new insights into the mechanisms of regulation (regulatory networks) responsible of specific phenotypes under particular environmental conditions, to be applied to more complex organisms (e.g. plants, human).

  14. Genome analysis and comparative genomics of a Giardia intestinalis assemblage E isolate

    Directory of Open Access Journals (Sweden)

    Andersson Jan O

    2010-10-01

    Full Text Available Abstract Background Giardia intestinalis is a protozoan parasite that causes diarrhea in a wide range of mammalian species. To further understand the genetic diversity between the Giardia intestinalis species, we have performed genome sequencing and analysis of a wild-type Giardia intestinalis sample from the assemblage E group, isolated from a pig. Results We identified 5012 protein coding genes, the majority of which are conserved compared to the previously sequenced genomes of the WB and GS strains in terms of microsynteny and sequence identity. Despite this, there is an unexpectedly large number of chromosomal rearrangements and several smaller structural changes that are present in all chromosomes. Novel members of the VSP, NEK Kinase and HCMP gene families were identified, which may reveal possible mechanisms for host specificity and new avenues for antigenic variation. We used comparative genomics of the three diverse Giardia intestinalis isolates P15, GS and WB to define a core proteome for this species complex and to identify lineage-specific genes. Extensive analyses of polymorphisms in the core proteome of Giardia revealed differential rates of divergence among cellular processes. Conclusions Our results indicate that despite a well conserved core of genes there is significant genome variation between Giardia isolates, both in terms of gene content, gene polymorphisms, structural chromosomal variations and surface molecule repertoires. This study improves the annotation of the Giardia genomes and enables the identification of functionally important variation.

  15. Sequencing and Analysis of a Genomic Fragment Provide an Insight into the Dunaliella viridis Genomic Sequence

    Institute of Scientific and Technical Information of China (English)

    Xiao-Ming SUN; Yuan-Ping TANG; Xiang-Zong MENG; Wen-Wen ZHANG; Shan LI; Zhi-Rui DENG; Zheng-Kai XU; Ren-Tao SONG

    2006-01-01

    Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)n type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features.

  16. Coevolution of aah: A dps-Like Gene with the Host Bacterium Revealed by Comparative Genomic Analysis

    Directory of Open Access Journals (Sweden)

    Liyan Ping

    2012-01-01

    Full Text Available A protein named AAH was isolated from the bacterium Microbacterium arborescens SE14, a gut commensal of the lepidopteran larvae. It showed not only a high sequence similarity to Dps-like proteins (DNA-binding proteins from starved cell but also reversible hydrolase activity. A comparative genomic analysis was performed to gain more insights into its evolution. The GC profile of the aah gene indicated that it was evolved from a low GC ancestor. Its stop codon usage was also different from the general pattern of Actinobacterial genomes. The phylogeny of dps-like proteins showed strong correlation with the phylogeny of host bacteria. A conserved genomic synteny was identified in some taxonomically related Actinobacteria, suggesting that the ancestor genes had incorporated into the genome before the divergence of Micrococcineae from other families. The aah gene had evolved new function but still retained the typical dodecameric structure.

  17. Genomic compositions and phylogenetic analysis of Shigella boydii subgroup

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Comparative Genomic Hybridization (CGH) microarray analysis was used to compare the genomic compositions of all eighteen Shigella boydii serotype representative strains. The results indicated the genomic "backbone" of this subgroup contained 2552 ORFs homologous to nonpathogenic E. coli K12. Compared with the genome of K12199 ORFs were found to be absent in all S. boydii serotype representatives, including mainly outer membrane protein genes and O-antigen biosynthesis genes. Yet the specific ORFs of S. boydii subgroup contained basically bacteriophage genes and the function unknown (FUN) genes. Some iron metabolism, transport and type II secretion system related genes were found in most representative strains. According to the CGH phylogenetic analysis, the eighteen S. boydii serotype representatives were divided into four groups, in which serotype C13 strain was remarkably distinguished from the other serotype strains. This grouping result corresponded to the distribution of some metabolism related genes. Furthermore, the analysis of genome backbone genes, specific genes, and the phylogenetic trees allowed us to discover the evolution laws of S. boydii and to find out important clues to pathogenesis research, vaccination and the therapeutic medicine development.

  18. Digital microarray analysis for digital artifact genomics

    Science.gov (United States)

    Jaenisch, Holger; Handley, James; Williams, Deborah

    2013-06-01

    We implement a Spatial Voting (SV) based analogy of microarray analysis for digital gene marker identification in malware code sections. We examine a famous set of malware formally analyzed by Mandiant and code named Advanced Persistent Threat (APT1). APT1 is a Chinese organization formed with specific intent to infiltrate and exploit US resources. Manidant provided a detailed behavior and sting analysis report for the 288 malware samples available. We performed an independent analysis using a new alternative to the traditional dynamic analysis and static analysis we call Spatial Analysis (SA). We perform unsupervised SA on the APT1 originating malware code sections and report our findings. We also show the results of SA performed on some members of the families associated by Manidant. We conclude that SV based SA is a practical fast alternative to dynamics analysis and static analysis.

  19. Genome-wide identification of the regulatory targets of a transcription factor using biochemical characterization and computational genomic analysis

    Directory of Open Access Journals (Sweden)

    Jolly Emmitt R

    2005-11-01

    Full Text Available Abstract Background A major challenge in computational genomics is the development of methodologies that allow accurate genome-wide prediction of the regulatory targets of a transcription factor. We present a method for target identification that combines experimental characterization of binding requirements with computational genomic analysis. Results Our method identified potential target genes of the transcription factor Ndt80, a key transcriptional regulator involved in yeast sporulation, using the combined information of binding affinity, positional distribution, and conservation of the binding sites across multiple species. We have also developed a mathematical approach to compute the false positive rate and the total number of targets in the genome based on the multiple selection criteria. Conclusion We have shown that combining biochemical characterization and computational genomic analysis leads to accurate identification of the genome-wide targets of a transcription factor. The method can be extended to other transcription factors and can complement other genomic approaches to transcriptional regulation.

  20. Comparative genomics of Mycoplasma: analysis of conserved essential genes and diversity of the pan-genome.

    Directory of Open Access Journals (Sweden)

    Wei Liu

    Full Text Available Mycoplasma, the smallest self-replicating organism with a minimal metabolism and little genomic redundancy, is expected to be a close approximation to the minimal set of genes needed to sustain bacterial life. This study employs comparative evolutionary analysis of twenty Mycoplasma genomes to gain an improved understanding of essential genes. By analyzing the core genome of mycoplasmas, we finally revealed the conserved essential genes set for mycoplasma survival. Further analysis showed that the core genome set has many characteristics in common with experimentally identified essential genes. Several key genes, which are related to DNA replication and repair and can be disrupted in transposon mutagenesis studies, may be critical for bacteria survival especially over long period natural selection. Phylogenomic reconstructions based on 3,355 homologous groups allowed robust estimation of phylogenetic relatedness among mycoplasma strains. To obtain deeper insight into the relative roles of molecular evolution in pathogen adaptation to their hosts, we also analyzed the positive selection pressures on particular sites and lineages. There appears to be an approximate correlation between the divergence of species and the level of positive selection detected in corresponding lineages.

  1. Stacks: an analysis tool set for population genomics.

    Science.gov (United States)

    Catchen, Julian; Hohenlohe, Paul A; Bassham, Susan; Amores, Angel; Cresko, William A

    2013-06-01

    Massively parallel short-read sequencing technologies, coupled with powerful software platforms, are enabling investigators to analyse tens of thousands of genetic markers. This wealth of data is rapidly expanding and allowing biological questions to be addressed with unprecedented scope and precision. The sizes of the data sets are now posing significant data processing and analysis challenges. Here we describe an extension of the Stacks software package to efficiently use genotype-by-sequencing data for studies of populations of organisms. Stacks now produces core population genomic summary statistics and SNP-by-SNP statistical tests. These statistics can be analysed across a reference genome using a smoothed sliding window. Stacks also now provides several output formats for several commonly used downstream analysis packages. The expanded population genomics functions in Stacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics.

  2. Evaluation of a Phylogenetic Marker Based on Genomic Segment B of Infectious Bursal Disease Virus: Facilitating a Feasible Incorporation of this Segment to the Molecular Epidemiology Studies for this Viral Agent.

    Directory of Open Access Journals (Sweden)

    Abdulahi Alfonso-Morales

    Full Text Available Infectious bursal disease (IBD is a highly contagious and acute viral disease, which has caused high mortality rates in birds and considerable economic losses in different parts of the world for more than two decades and it still represents a considerable threat to poultry. The current study was designed to rigorously measure the reliability of a phylogenetic marker included into segment B. This marker can facilitate molecular epidemiology studies, incorporating this segment of the viral genome, to better explain the links between emergence, spreading and maintenance of the very virulent IBD virus (vvIBDV strains worldwide.Sequences of the segment B gene from IBDV strains isolated from diverse geographic locations were obtained from the GenBank Database; Cuban sequences were obtained in the current work. A phylogenetic marker named B-marker was assessed by different phylogenetic principles such as saturation of substitution, phylogenetic noise and high consistency. This last parameter is based on the ability of B-marker to reconstruct the same topology as the complete segment B of the viral genome. From the results obtained from B-marker, demographic history for both main lineages of IBDV regarding segment B was performed by Bayesian skyline plot analysis. Phylogenetic analysis for both segments of IBDV genome was also performed, revealing the presence of a natural reassortant strain with segment A from vvIBDV strains and segment B from non-vvIBDV strains within Cuban IBDV population.This study contributes to a better understanding of the emergence of vvIBDV strains, describing molecular epidemiology of IBDV using the state-of-the-art methodology concerning phylogenetic reconstruction. This study also revealed the presence of a novel natural reassorted strain as possible manifest of change in the genetic structure and stability of the vvIBDV strains. Therefore, it highlights the need to obtain information about both genome segments of IBDV for

  3. Genome-wide Studies of Mycolic Acid Bacteria: Computational Identification and Analysis of a Minimal Genome

    KAUST Repository

    Kamanu, Frederick Kinyua

    2012-12-01

    The mycolic acid bacteria are a distinct suprageneric group of asporogenous Grampositive, high GC-content bacteria, distinguished by the presence of mycolic acids in their cell envelope. They exhibit great diversity in their cell and morphology; although primarily non-pathogens, this group contains three major pathogens Mycobacterium leprae, Mycobacterium tuberculosis complex, and Corynebacterium diphtheria. Although the mycolic acid bacteria are a clearly defined group of bacteria, the taxonomic relationships between its constituent genera and species are less well defined. Two approaches were tested for their suitability in describing the taxonomy of the group. First, a Multilocus Sequence Typing (MLST) experiment was assessed and found to be superior to monophyletic (16S small ribosomal subunit) in delineating a total of 52 mycolic acid bacterial species. Phylogenetic inference was performed using the neighbor-joining method. To further refine phylogenetic analysis and to take advantage of the widespread availability of bacterial genome data, a computational framework that simulates DNA-DNA hybridisation was developed and validated using multiscale bootstrap resampling. The tool classifies microbial genomes based on whole genome DNA, and was deployed as a web-application using PHP and Javascript. It is accessible online at http://cbrc.kaust.edu.sa/dna_hybridization/ A third study was a computational and statistical methods in the identification and analysis of a putative minimal mycolic acid bacterial genome so as to better understand (1) the genomic requirements to encode a mycolic acid bacterial cell and (2) the role and type of genes and genetic elements that lead to the massive increase in genome size in environmental mycolic acid bacteria. Using a reciprocal comparison approach, a total of 690 orthologous gene clusters forming a putative minimal genome were identified across 24 mycolic acid bacterial species. In order to identify new potential drug

  4. Development and phytochemical content analysis of bun incorporated with Kappaphycus Alvarezii seaweed powder

    Science.gov (United States)

    Sasue, Anita; Kasim, Zalifah Mohd

    2016-11-01

    Consumer awareness of the importance of functional foods has greatly grown in the past years. Functional foods with elevated levels of antioxidants are of high demand because of its associated health benefits. As bread is a common component in our daily diet, it may be convenient food to deliver antioxidants at a high concentration. The main approach of this study is to incorporate Kappaphycus alvarezii seaweed powder (SWP) and white flour in the bun formulation in order to develop seaweed bun with higher level of phytochemicals. The fresh Kappaphycus alvarezii seaweeds were washed, soaked in distilled water overnight, dried in a cabinet dryer at 40°C for 24 hours and ground into fine powder using universal miller. There were five different percentages of SWP incorporated into bun that were formulation A - control (0% SWP), B (3% SWP), C (6% SWP), D (9% SWP) and E (12% SWP). All the samples were undergone texture, total phenolic content and DPPH analysis. Seaweed concentration had most significant effect on phytochemical constituents of the bun with TPC (35.07 GAE, mg/100g) and DPPH activity (49.02%) maximized when 12% SWP was incorporated into the flour (P<0.05). The incorporation of the SWP also gives significant effects towards the texture of the bun where the bun becomes harder and denser as compared to the control.

  5. Sequencing and analysis of the giant panda genome

    Institute of Scientific and Technical Information of China (English)

    YANG HuanMing

    2010-01-01

    @@ The giant panda (Ailuropoda melanoleuca) is loved all over the world and is considered a symbol of China, as illustrated by its being one of the mascots for the Beijing 2008 Olympic Games.It is also one of the world's most endangered animals and a flagship species for conservation.Using next-generation sequencing technology (Illumina Genome Analyzer) and our in-house assembly software, we have generated the first map of the giant panda genome sequence.This map will provide an unparalleled amount of information to aid in understanding the genetic and biological nature of this unique species and will contribute significantly to disease control and conservation efforts for this endangered species.In March 2008, the giant panda genome sequencing and analysis project was started at the Beijing Genomics Institute (BGI) in Shenzhen with collaborators from the Kunming Institute of Zoology and the Chengdu Research Base of Giant Panda Breeding.On 21 Jan.2010, this collaboration resulted in the publication, as a cover story in the journal Nature, of the sequencing and analysis of the giant panda genome.

  6. Primer to analysis of genomic data using R

    CERN Document Server

    Gondro, Cedric

    2015-01-01

    Through this book, researchers and students will learn to use R for analysis of large-scale genomic data and how to create routines to automate analytical steps. The philosophy behind the book is to start with real world raw datasets and perform all the analytical steps needed to reach final results. Though theory plays an important role, this is a practical book for advanced undergraduate and graduate classes in bioinformatics, genomics and statistical genetics or for use in lab sessions. This book is also designed to be used by students in computer science and statistics who want to learn the practical aspects of genomic analysis without delving into algorithmic details. The datasets used throughout the book may be downloaded from the publisher’s website.  Chapters show how to handle and manage high-throughput genomic data, create automated workflows and speed up analyses in R. A wide range of R packages useful for working with genomic data are illustrated with practical examples. In recent years R has b...

  7. Castor bean organelle genome sequencing and worldwide genetic diversity analysis.

    Directory of Open Access Journals (Sweden)

    Maximo Rivarola

    Full Text Available Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade.

  8. Castor bean organelle genome sequencing and worldwide genetic diversity analysis.

    Science.gov (United States)

    Rivarola, Maximo; Foster, Jeffrey T; Chan, Agnes P; Williams, Amber L; Rice, Danny W; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M J; Khouri, Hoda M; Beckstrom-Sternberg, Stephen M; Allan, Gerard J; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade.

  9. Castor Bean Organelle Genome Sequencing and Worldwide Genetic Diversity Analysis

    Science.gov (United States)

    Chan, Agnes P.; Williams, Amber L.; Rice, Danny W.; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M. J.; Khouri, Hoda M.; Beckstrom-Sternberg, Stephen M.; Allan, Gerard J.; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D.

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade. PMID:21750729

  10. Dyneins across eukaryotes: a comparative genomic analysis.

    Science.gov (United States)

    Wickstead, Bill; Gull, Keith

    2007-12-01

    Dyneins are large minus-end-directed microtubule motors. Each dynein contains at least one dynein heavy chain (DHC) and a variable number of intermediate chains (IC), light intermediate chains (LIC) and light chains (LC). Here, we used genome sequence data from 24 diverse eukaryotes to assess the distribution of DHCs, ICs, LICs and LCs across Eukaryota. Phylogenetic inference identified nine DHC families (two cytoplasmic and seven axonemal) and six IC families (one cytoplasmic). We confirm that dyneins have been lost from higher plants and show that this is most likely because of a single loss of cytoplasmic dynein 1 from the ancestor of Rhodophyta and Viridiplantae, followed by lineage-specific losses of other families. Independent losses in Entamoeba mean that at least three extant eukaryotic lineages are entirely devoid of dyneins. Cytoplasmic dynein 2 is associated with intraflagellar transport (IFT), but in two chromalveolate organisms, we find an IFT footprint without the retrograde motor. The distribution of one family of outer-arm dyneins accounts for 2-headed or 3-headed outer-arm ultrastructures observed in different organisms. One diatom species builds motile axonemes without any inner-arm dyneins (IAD), and the unexpected conservation of IAD I1 in non-flagellate algae and LC8 (DYNLL1/2) in all lineages reveals a surprising fluidity to dynein function.

  11. Functional genomic analysis of C. elegans molting.

    Directory of Open Access Journals (Sweden)

    Alison R Frand

    2005-10-01

    Full Text Available Although the molting cycle is a hallmark of insects and nematodes, neither the endocrine control of molting via size, stage, and nutritional inputs nor the enzymatic mechanism for synthesis and release of the exoskeleton is well understood. Here, we identify endocrine and enzymatic regulators of molting in C. elegans through a genome-wide RNA-interference screen. Products of the 159 genes discovered include annotated transcription factors, secreted peptides, transmembrane proteins, and extracellular matrix enzymes essential for molting. Fusions between several genes and green fluorescent protein show a pulse of expression before each molt in epithelial cells that synthesize the exoskeleton, indicating that the corresponding proteins are made in the correct time and place to regulate molting. We show further that inactivation of particular genes abrogates expression of the green fluorescent protein reporter genes, revealing regulatory networks that might couple the expression of genes essential for molting to endocrine cues. Many molting genes are conserved in parasitic nematodes responsible for human disease, and thus represent attractive targets for pesticide and pharmaceutical development.

  12. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

    Science.gov (United States)

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-11

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.

  13. Comparative analysis of whole genome structure of Streptococcus suis using whole genome PCR scanning

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    An outbreak associated with Streptococcus suis infection in humans emerged in Sichuan province, China in 2005. The outbreak is atypical for the apparent large number of human cases, high fatality rate and geographical spread. To determine whether the bacterium has changed, we compared both human and animal isolates from the Sichuan outbreak with those collected previously within China and in other countries using whole genome PCR scanning (WGPScaning) comparative sequencing of several known virulence factor genes and multilocus sequence typing (MLST) analysis. WGPScanning analysis showed that all primer pairs yielded PCR products of the expected sizes in all four strains tested. The nucleotide sequences of all the detected virulence factor genes are identical in the four strains and MLST results showed that the four isolates studied and reference strain all belonged to the ST1 com-plex. No new genetic changes were found in the genome structure of the isolates from this Sichuan outbreak.

  14. Comparative analysis of whole genome structure of Streptococcus suis using whole genome PCR scanning

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    An outbreak associated with Streptococcus suis infection in humans emerged in Sichuan province, China in 2005. The outbreak is atypical for the apparent large number of human cases, high fatality rate and geographical spread. To determine whether the bacterium has changed, we compared both human and animal isolates from the Sichuan outbreak with those collected previously within China and in other countries using whole genome PCR scanning (WGPScaning) comparative sequencing of several known virulence factor genes and multilocus sequence typing (MLST) analysis. WGPScanning analysis showed that all primer pairs yielded PCR products of the expected sizes in all four strains tested. The nucleotide sequences of all the detected virulence factor genes are identical in the four strains and MLST results showed that the four isolates studied and reference strain all belonged to the ST1 complex. No new genetic changes were found in the genome structure of the isolates from this Sichuan outbreak.

  15. Integrated translational genomics for analysis of complex traits in sorghum

    Science.gov (United States)

    We will report on the integration of sequencing and genotype data from natural variation (by whole genome resequencing [wgs] or genotype by sequencing [gbs]), transcriptome (RNA-seq) and mutant analysis (also by wgs) with the goal of identifying genes controlling important agronomic traits and tran...

  16. Genome-Wide Association Analysis in Primary Sclerosing Cholangitis

    NARCIS (Netherlands)

    T.H. Karlsen; A. Franke; E. Melum; A.. Kaser; J.R. Hov; T. Balschun; B.A. Lie; A. Bergquist; C. Schramm; T.J. Weismüller; D. Gotthardt; C. Rust; E.E.R. Philipp; T. Fritz; L. Henckaerts; R. Weersma; P. Stokkers; C.Y. Ponsioen; C. Wijmenga; M. Sterneck; M. Nothnagel; J. Hampe; A. Teufel; H. Runz; P. Rosenstiel; A. Stiehl; S. Vermeire; U. Beuers; M. Manns; E. Schrumpf; K.M. Boberg; S. Schreiber

    2010-01-01

    BACKGROUND & AIMS: We aimed to characterize the genetic susceptibility to primary sclerosing cholangitis (PSC) by means of a genome-wide association analysis of single nucleotide polymorphism (SNP) markers. METHODS: A total of 443,816 SNPs on the Affymetrix SNP Array 5.0 (Affymetrix, Santa Clara, CA

  17. Genome-scale metabolic network validation of Shewanella oneidensis using transposon insertion frequency analysis.

    Directory of Open Access Journals (Sweden)

    Hong Yang

    2014-09-01

    Full Text Available Transposon mutagenesis, in combination with parallel sequencing, is becoming a powerful tool for en-masse mutant analysis. A probability generating function was used to explain observed miniHimar transposon insertion patterns, and gene essentiality calls were made by transposon insertion frequency analysis (TIFA. TIFA incorporated the observed genome and sequence motif bias of the miniHimar transposon. The gene essentiality calls were compared to: 1 previous genome-wide direct gene-essentiality assignments; and, 2 flux balance analysis (FBA predictions from an existing genome-scale metabolic model of Shewanella oneidensis MR-1. A three-way comparison between FBA, TIFA, and the direct essentiality calls was made to validate the TIFA approach. The refinement in the interpretation of observed transposon insertions demonstrated that genes without insertions are not necessarily essential, and that genes that contain insertions are not always nonessential. The TIFA calls were in reasonable agreement with direct essentiality calls for S. oneidensis, but agreed more closely with E. coli essentiality calls for orthologs. The TIFA gene essentiality calls were in good agreement with the MR-1 FBA essentiality predictions, and the agreement between TIFA and FBA predictions was substantially better than between the FBA and the direct gene essentiality predictions.

  18. Sequencing and annotated analysis of an Estonian human genome.

    Science.gov (United States)

    Lilleoja, Rutt; Sarapik, Aili; Reimann, Ene; Reemann, Paula; Jaakma, Ülle; Vasar, Eero; Kõks, Sulev

    2012-02-01

    In present study we describe the sequencing and annotated analysis of the individual genome of Estonian. Using SOLID technology we generated 2,449,441,916 of 50-bp reads. The Bioscope version 1.3 was used for mapping and pairing of reads to the NCBI human genome reference (build 36, hg18). Bioscope enables also the annotation of the results of variant (tertiary) analysis. The average mapping of reads was 75.5% with total coverage of 107.72 Gb. resulting in mean fold coverage of 34.6. We found 3,482,975 SNPs out of which 352,492 were novel. 21,222 SNPs were in coding region: 10,649 were synonymous SNPs, 10,360 were nonsynonymous missense SNPs, 155 were nonsynonymous nonsense SNPs and 58 were nonsynonymous frameshifts. We identified 219 CNVs with total base pair coverage of 37,326,300 bp and 87,451 large insertion/deletion polymorphisms covering 10,152,256 bp of the genome. In addition, we found 285,864 small size insertion/deletion polymorphisms out of which 133,969 were novel. Finally, we identified 53 inversions, 19 overlapped genes and 2 overlapped exons. Interestingly, we found the region in chromosome 6 to be enriched with the coding SNPs and CNVs. This study confirms previous findings, that our genomes are more complex and variable as thought before. Therefore, sequencing of the personal genomes followed by annotation would improve the analysis of heritability of phenotypes and our understandings on the functions of genome.

  19. Genome bioinformatic analysis of nonsynonymous SNPs

    Directory of Open Access Journals (Sweden)

    Todd John A

    2007-08-01

    Full Text Available Abstract Background Genome-wide association studies of common diseases for common, low penetrance causal variants are underway. A proportion of these will alter protein sequences, the most common of which is the non-synonymous single nucleotide polymorphism (nsSNP. It would be an advantage if the functional effects of an nsSNP on protein structure and function could be predicted, both for the final identification process of a causal variant in a disease-associated chromosome region, and in further functional analyses of the nsSNP and its disease-associated protein. Results In the present report we have compared and contrasted structure- and sequence-based methods of prediction to over 5500 genes carrying nearly 24,000 nsSNPs, by employing an automatic comparative modelling procedure to build models for the genes. The nsSNP information came from two sources, the OMIM database which are rare (minor allele frequency, MAF, 0.05, have no known link to a disease. For over 40% of the nsSNPs, structure-based methods predicted which of these sequence changes are likely to either disrupt the structure of the protein or interfere with the function or interactions of the protein. For the remaining 60%, we generated sequence-based predictions. Conclusion We show that, in general, the prediction tools are able distinguish disease causing mutations from those mutations which are thought to have a neutral affect. We give examples of mutations in genes that are predicted to be deleterious and may have a role in disease. Contrary to previous reports, we also show that rare mutations are consistently predicted to be deleterious as often as commonly occurring nsSNPs.

  20. Pre-Steady-State Kinetic Analysis of Single-Nucleotide Incorporation by DNA Polymerases.

    Science.gov (United States)

    Su, Yan; Peter Guengerich, F

    2016-06-01

    Pre-steady-state kinetic analysis is a powerful and widely used method to obtain multiple kinetic parameters. This protocol provides a step-by-step procedure for pre-steady-state kinetic analysis of single-nucleotide incorporation by a DNA polymerase. It describes the experimental details of DNA substrate annealing, reaction mixture preparation, handling of the RQF-3 rapid quench-flow instrument, denaturing polyacrylamide DNA gel preparation, electrophoresis, quantitation, and data analysis. The core and unique part of this protocol is the rationale for preparation of the reaction mixture (the ratio of the polymerase to the DNA substrate) and methods for conducting pre-steady-state assays on an RQF-3 rapid quench-flow instrument, as well as data interpretation after analysis. In addition, the methods for the DNA substrate annealing and DNA polyacrylamide gel preparation, electrophoresis, quantitation and analysis are suitable for use in other studies. © 2016 by John Wiley & Sons, Inc.

  1. Whole-genome sequence-based analysis of thyroid function

    OpenAIRE

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J.; Traglia, Michela; Brown, Suzanne J.; Mullin, Benjamin H; Shihab, Hashem A.; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R.; Beilby, John P.; Charoen, Pimphen

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 1...

  2. Large-scale genomic analysis of ovarian carcinomas.

    Science.gov (United States)

    Gorringe, Kylie L; Campbell, Ian G

    2009-04-01

    Epithelial ovarian cancers are typified by frequent genomic aberrations that have been difficult to unravel. Recently, high-resolution array technologies have provided the first glimpse of the remarkable complexity of these aberrations with some ovarian cancers containing hundreds of copy number breakpoints, micro-deletions and amplifications. Many of these alterations contain cancer-related genes suggesting that the majority is disease-associated and not just the product of random genomic instability. Future developments such as next-generation sequencing and integrated analysis of data from multiple array platforms on large numbers of samples are poised to revolutionize our understanding of this complex disease.

  3. Sequencing and Analysis of Neanderthal Genomic DNA

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  4. Genome Assembly and Computational Analysis Pipelines for Bacterial Pathogens

    KAUST Repository

    Rangkuti, Farania Gama Ardhina

    2011-06-01

    Pathogens lie behind the deadliest pandemics in history. To date, AIDS pandemic has resulted in more than 25 million fatal cases, while tuberculosis and malaria annually claim more than 2 million lives. Comparative genomic analyses are needed to gain insights into the molecular mechanisms of pathogens, but the abundance of biological data dictates that such studies cannot be performed without the assistance of computational approaches. This explains the significant need for computational pipelines for genome assembly and analyses. The aim of this research is to develop such pipelines. This work utilizes various bioinformatics approaches to analyze the high-­throughput genomic sequence data that has been obtained from several strains of bacterial pathogens. A pipeline has been compiled for quality control for sequencing and assembly, and several protocols have been developed to detect contaminations. Visualization has been generated of genomic data in various formats, in addition to alignment, homology detection and sequence variant detection. We have also implemented a metaheuristic algorithm that significantly improves bacterial genome assemblies compared to other known methods. Experiments on Mycobacterium tuberculosis H37Rv data showed that our method resulted in improvement of N50 value of up to 9697% while consistently maintaining high accuracy, covering around 98% of the published reference genome. Other improvement efforts were also implemented, consisting of iterative local assemblies and iterative correction of contiguated bases. Our result expedites the genomic analysis of virulent genes up to single base pair resolution. It is also applicable to virtually every pathogenic microorganism, propelling further research in the control of and protection from pathogen-­associated diseases.

  5. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

    Science.gov (United States)

    McKenna, Aaron; Hanna, Matthew; Banks, Eric; Sivachenko, Andrey; Cibulskis, Kristian; Kernytsky, Andrew; Garimella, Kiran; Altshuler, David; Gabriel, Stacey; Daly, Mark; DePristo, Mark A

    2010-09-01

    Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

  6. Phylogeny and comparative genome analysis of a Basidiomycete fungi

    Energy Technology Data Exchange (ETDEWEB)

    Riley, Robert W.; Salamov, Asaf; Grigoriev, Igor; Hibbett, David

    2011-03-14

    Fungi of the phylum Basidiomycota, make up some 37percent of the described fungi, and are important from the perspectives of forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, plant pathogenic rusts and smuts, and some human pathogens. To better understand these important fungi, we have undertaken a comparative genomic analysis of the Basidiomycetes with available sequenced genomes. We report a phylogeny that sheds light on previously unclear evolutionary relationships among the Basidiomycetes. We also define a `core proteome? based on protein families conserved in all Basidiomycetes. We identify key expansions and contractions in protein families that may be responsible for the degradation of plant biomass such as cellulose, hemicellulose, and lignin. Finally, we speculate as to the genomic changes that drove such expansions and contractions.

  7. Finite Element Analysis of Synergy Effect on Concrete Beams Incorporated with Coated Reinforcement and Alternate Aggregates

    Directory of Open Access Journals (Sweden)

    Sakthivel Pandiaraj

    2016-01-01

    Full Text Available The purpose of this study is to compare the ultimate load carrying capacity of conventional reinforced concrete beams with that of investigation specimen incorporated with coated reinforcement and partially with recycled aggregate and quarry dust. A novel technique of coated reinforcement delays the onset of corrosion with enhanced durability of structures. Results show that not even a film of corrosion (white rust can be seen in the investigation specimen. There is a progressive increase in stiffness from the state of the first crack to ultimate stage and a negligible difference in ultimate load carrying capacity of the investigation specimen, when compared with the controlled specimen. Incorporation of galvanization, recycled aggregate, and quarry dust seemed to be compatible with the existing conservative concreting procedures. Experimental results are compared with the numerical solutions aided by finite element analysis (FEA by using ABAQUS.

  8. EdU Incorporation for FACS and Microscopy Analysis of DNA Replication in Budding Yeast.

    Science.gov (United States)

    Talarek, Nicolas; Petit, Julie; Gueydon, Elisabeth; Schwob, Etienne

    2015-01-01

    DNA replication is a key determinant of chromosome segregation and stability in eukaryotes. The yeast Saccharomyces cerevisiae has been extensively used for cell cycle studies, yet simple but key parameters such as the fraction of cells in S phase in a population or the subnuclear localization of DNA synthesis have been difficult to gather for this organism. 5-ethynyl-2'-deoxyuridine (EdU) is a thymidine analogue that can be incorporated in vivo and later detected using copper-catalyzed azide alkyne cycloaddition (Click reaction) without prior DNA denaturation. This chapter describes a budding yeast strain and conditions that allow rapid EdU incorporation at moderate extracellular concentrations, followed by its efficient detection for the analysis of DNA replication in single cells by flow cytometry and fluorescence microscopy.

  9. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    Science.gov (United States)

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp.

  10. A comprehensive analysis of bilaterian mitochondrial genomes and phylogeny.

    Science.gov (United States)

    Bernt, Matthias; Bleidorn, Christoph; Braband, Anke; Dambach, Johannes; Donath, Alexander; Fritzsch, Guido; Golombek, Anja; Hadrys, Heike; Jühling, Frank; Meusemann, Karen; Middendorf, Martin; Misof, Bernhard; Perseke, Marleen; Podsiadlowski, Lars; von Reumont, Björn; Schierwater, Bernd; Schlegel, Martin; Schrödl, Michael; Simon, Sabrina; Stadler, Peter F; Stöger, Isabella; Struck, Torsten H

    2013-11-01

    About 2800 mitochondrial genomes of Metazoa are present in NCBI RefSeq today, two thirds belonging to vertebrates. Metazoan phylogeny was recently challenged by large scale EST approaches (phylogenomics), stabilizing classical nodes while simultaneously supporting new sister group hypotheses. The use of mitochondrial data in deep phylogeny analyses was often criticized because of high substitution rates on nucleotides, large differences in amino acid substitution rate between taxa, and biases in nucleotide frequencies. Nevertheless, mitochondrial genome data might still be promising as it allows for a larger taxon sampling, while presenting a smaller amount of sequence information. We present the most comprehensive analysis of bilaterian relationships based on mitochondrial genome data. The analyzed data set comprises more than 650 mitochondrial genomes that have been chosen to represent a profound sample of the phylogenetic as well as sequence diversity. The results are based on high quality amino acid alignments obtained from a complete reannotation of the mitogenomic sequences from NCBI RefSeq database. However, the results failed to give support for many otherwise undisputed high-ranking taxa, like Mollusca, Hexapoda, Arthropoda, and suffer from extreme long branches of Nematoda, Platyhelminthes, and some other taxa. In order to identify the sources of misleading phylogenetic signals, we discuss several problems associated with mitochondrial genome data sets, e.g. the nucleotide and amino acid landscapes and a strong correlation of gene rearrangements with long branches.

  11. The Chlamydia psittaci genome: a comparative analysis of intracellular pathogens.

    Directory of Open Access Journals (Sweden)

    Anja Voigt

    Full Text Available BACKGROUND: Chlamydiaceae are a family of obligate intracellular pathogens causing a wide range of diseases in animals and humans, and facing unique evolutionary constraints not encountered by free-living prokaryotes. To investigate genomic aspects of infection, virulence and host preference we have sequenced Chlamydia psittaci, the pathogenic agent of ornithosis. RESULTS: A comparison of the genome of the avian Chlamydia psittaci isolate 6BC with the genomes of other chlamydial species, C. trachomatis, C. muridarum, C. pneumoniae, C. abortus, C. felis and C. caviae, revealed a high level of sequence conservation and synteny across taxa, with the major exception of the human pathogen C. trachomatis. Important differences manifest in the polymorphic membrane protein family specific for the Chlamydiae and in the highly variable chlamydial plasticity zone. We identified a number of psittaci-specific polymorphic membrane proteins of the G family that may be related to differences in host-range and/or virulence as compared to closely related Chlamydiaceae. We calculated non-synonymous to synonymous substitution rate ratios for pairs of orthologous genes to identify putative targets of adaptive evolution and predicted type III secreted effector proteins. CONCLUSIONS: This study is the first detailed analysis of the Chlamydia psittaci genome sequence. It provides insights in the genome architecture of C. psittaci and proposes a number of novel candidate genes mostly of yet unknown function that may be important for pathogen-host interactions.

  12. General metabolism of Laribacter hongkongensis: a genome-wide analysis

    Directory of Open Access Journals (Sweden)

    Curreem Shirly O

    2011-04-01

    Full Text Available Abstract Background Laribacter hongkongensis is associated with community-acquired gastroenteritis and traveler's diarrhea. In this study, we performed an in-depth annotation of the genes and pathways of the general metabolism of L. hongkongensis and correlated them with its phenotypic characteristics. Results The L. hongkongensis genome possesses the pentose phosphate and gluconeogenesis pathways and tricarboxylic acid and glyoxylate cycles, but incomplete Embden-Meyerhof-Parnas and Entner-Doudoroff pathways, in agreement with its asaccharolytic phenotype. It contains enzymes for biosynthesis and β-oxidation of saturated fatty acids, biosynthesis of all 20 universal amino acids and selenocysteine, the latter not observed in Neisseria gonorrhoeae, Neisseria meningitidis and Chromobacterium violaceum. The genome contains a variety of dehydrogenases, enabling it to utilize different substrates as electron donors. It encodes three terminal cytochrome oxidases for respiration using oxygen as the electron acceptor under aerobic and microaerophilic conditions and four reductases for respiration with alternative electron acceptors under anaerobic conditions. The presence of complete tetrathionate reductase operon may confer survival advantage in mammalian host in association with diarrhea. The genome contains CDSs for incorporating sulfur and nitrogen by sulfate assimilation, ammonia assimilation and nitrate reduction. The existence of both glutamate dehydrogenase and glutamine synthetase/glutamate synthase pathways suggests an importance of ammonia metabolism in the living environments that it may encounter. Conclusions The L. hongkongensis genome possesses a variety of genes and pathways for carbohydrate, amino acid and lipid metabolism, respiratory chain and sulfur and nitrogen metabolism. These allow the bacterium to utilize various substrates for energy production and survive in different environmental niches.

  13. Integrated analysis of whole genome and transcriptome sequencing reveals diverse transcriptomic aberrations driven by somatic genomic changes in liver cancers.

    Directory of Open Access Journals (Sweden)

    Yuichi Shiraishi

    Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.

  14. Viral genome analysis and knowledge management.

    Science.gov (United States)

    Kuiken, Carla; Yoon, Hyejin; Abfalterer, Werner; Gaschen, Brian; Lo, Chienchi; Korber, Bette

    2013-01-01

    One of the challenges of genetic data analysis is to combine information from sources that are distributed around the world and accessible through a wide array of different methods and interfaces. The HIV database and its footsteps, the hepatitis C virus (HCV) and hemorrhagic fever virus (HFV) databases, have made it their mission to make different data types easily available to their users. This involves a large amount of behind-the-scenes processing, including quality control and analysis of the sequences and their annotation. Gene and protein sequences are distilled from the sequences that are stored in GenBank; to this end, both submitter annotation and script-generated sequences are used. Alignments of both nucleotide and amino acid sequences are generated, manually curated, distilled into an alignment model, and regenerated in an iterative cycle that results in ever better new alignments. Annotation of epidemiological and clinical information is parsed, checked, and added to the database. User interfaces are updated, and new interfaces are added based upon user requests. Vital for its success, the database staff are heavy users of the system, which enables them to fix bugs and find opportunities for improvement. In this chapter we describe some of the infrastructure that keeps these heavily used analysis platforms alive and vital after nearly 25 years of use. The database/analysis platforms described in this chapter can be accessed at http://hiv.lanl.gov http://hcv.lanl.gov http://hfv.lanl.gov.

  15. Sequencing and Analysis of Neanderthal Genomic DNA

    OpenAIRE

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo, Svante; Pritchard, Jonathan K; Rubin, Edward M.

    2006-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library a...

  16. New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits

    Directory of Open Access Journals (Sweden)

    Feltus Frank A

    2011-07-01

    Full Text Available Abstract Background Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18 to duodecaploid (12X = 108. Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. Results A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective. Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. Conclusions The construction of the first switchgrass BAC library and comparative analysis of

  17. Integrative Genomic Analysis of Complex traits

    DEFF Research Database (Denmark)

    Ehsani, Ali Reza

    In the last decade rapid development in biotechnologies has made it possible to extract extensive information about practically all levels of biological organization. An ever-increasing number of studies are reporting miltilayered datasets on the entire DNA sequence, transceroption, protein...... expression, and metabolite abundance of more and more populations in a multitude of invironments. However, a solid model for including all of this complex information in one analysis, to disentangle genetic variation and the underlying genetic architecture of complex traits and diseases, has not yet been...

  18. Incorporating the nuclear vibrational energies into the -atom in molecules- analysis: An analytical study

    CERN Document Server

    Gharabaghi, Masumeh

    2016-01-01

    The orthodox quantum theory of atoms in molecules (QTAIM) is based on the clamped nucleus paradigm and working solely with the electronic wavefunctions, so unable to include nuclear vibrations in the AIM analysis. On the other hand, the recently extended version of the QTAIM, called the multi-component QTAIM (MC-QTAIM), incorporates both electrons and quantum nuclei, i.e. those nuclei treated as quantum waves instead of clamped point charges, into the AIM analysis using non-adiabatic wavefunctions. Thus, the MC-QTAIM is the natural framework to incorporate the role of nuclear vibrations into the AIM analysis. In this study, within the context of the MC-QTAIM, the formalism of including nuclear vibrational energy in the atomic basin energy is developed in detail and its contribution is derived analytically using the recently proposed non-adiabatic Hartree product nuclear wavefunction. It is demonstrated that within the context of this wavefunction the quantum nuclei may be conceived pseudo-adiabatically as qua...

  19. Comparative Genome Analysis of Basidiomycete Fungi

    Energy Technology Data Exchange (ETDEWEB)

    Riley, Robert; Salamov, Asaf; Morin, Emmanuelle; Nagy, Laszlo; Manning, Gerard; Baker, Scott; Brown, Daren; Henrissat, Bernard; Levasseur, Anthony; Hibbett, David; Martin, Francis; Grigoriev, Igor

    2012-03-19

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, symbionts, and plant and animal pathogens. To better understand the diversity of phenotypes in basidiomycetes, we performed a comparative analysis of 35 basidiomycete fungi spanning the diversity of the phylum. Phylogenetic patterns of lignocellulose degrading genes suggest a continuum rather than a sharp dichotomy between the white rot and brown rot modes of wood decay. Patterns of secondary metabolic enzymes give additional insight into the broad array of phenotypes found in the basidiomycetes. We suggest that the profile of an organism in lignocellulose-targeting genes can be used to predict its nutritional mode, and predict Dacryopinax sp. as a brown rot; Botryobasidium botryosum and Jaapia argillacea as white rots.

  20. Genomic analysis of mouse retinal development.

    Directory of Open Access Journals (Sweden)

    Seth Blackshaw

    2004-09-01

    Full Text Available The vertebrate retina is comprised of seven major cell types that are generated in overlapping but well-defined intervals. To identify genes that might regulate retinal development, gene expression in the developing retina was profiled at multiple time points using serial analysis of gene expression (SAGE. The expression patterns of 1,051 genes that showed developmentally dynamic expression by SAGE were investigated using in situ hybridization. A molecular atlas of gene expression in the developing and mature retina was thereby constructed, along with a taxonomic classification of developmental gene expression patterns. Genes were identified that label both temporal and spatial subsets of mitotic progenitor cells. For each developing and mature major retinal cell type, genes selectively expressed in that cell type were identified. The gene expression profiles of retinal Müller glia and mitotic progenitor cells were found to be highly similar, suggesting that Müller glia might serve to produce multiple retinal cell types under the right conditions. In addition, multiple transcripts that were evolutionarily conserved that did not appear to encode open reading frames of more than 100 amino acids in length ("noncoding RNAs" were found to be dynamically and specifically expressed in developing and mature retinal cell types. Finally, many photoreceptor-enriched genes that mapped to chromosomal intervals containing retinal disease genes were identified. These data serve as a starting point for functional investigations of the roles of these genes in retinal development and physiology.

  1. Quantifying element incorporation in multispecies biofilms using nanoscale secondary ion mass spectrometry image analysis

    Energy Technology Data Exchange (ETDEWEB)

    Renslow, Ryan S. [Biological Sciences Division, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99354; Lindemann, Stephen R. [Biological Sciences Division, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99354; Cole, Jessica K. [Biological Sciences Division, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99354; Zhu, Zihua [Environmental Molecular Sciences Laboratory, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99354; Anderton, Christopher R. [Environmental Molecular Sciences Laboratory, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99354

    2016-02-12

    EElucidating nutrient exchange in microbial communities is an important step in understanding the relationships between microbial systems and global biogeochemical cycles, but these communities are complex and the interspecies interactions that occur within them are not well understood. Phototrophic consortia are useful and relevant experimental systems to investigate such interactions as they are not only prevalent in the environment, but some are cultivable in vivo and amenable to controlled scientific experimentation. High spatial resolution secondary ion mass spectrometry (NanoSIMS) is a powerful tool capable of visualizing the metabolic activities of single cells within a biofilm, but quantitative analysis of the resulting data has typically been a manual process, resulting in a task that is both laborious and susceptible to human error. Here, we describe the creation and application of a semi-automated image-processing pipeline that can analyze NanoSIMS-generated data of phototrophic biofilms. The tool employs an image analysis process, which includes both elemental and morphological segmentation, producing a final segmented image that allows for discrimination between autotrophic and heterotrophic biomass, the detection of individual cyanobacterial filaments and heterotrophic cells, the quantification of isotopic incorporation of individual heterotrophic cells, and calculation of relevant population statistics. We demonstrate the functionality of the tool by using it to analyze the uptake of 15N provided as either nitrate or ammonium through the unicyanobacterial consortium UCC-O and imaged via NanoSIMS. We found that the degree of 15N incorporation by individual cells was highly variable when labeled with 15NH4 +, but much more even when biofilms were labeled with 15NO3-. In the 15NH4 +-amended biofilms, the heterotrophic distribution of 15N incorporation was highly skewed, with a large population showing moderate 15N incorporation and a small number of

  2. SU-E-T-615: Plan Comparison Between Photon IMRT and Proton Plans Incorporating Uncertainty Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cheng, C; Wessels, B; Jesseph, F; Mattson, D; Mansur, D [Dept of Radiation Oncology, University Hospitals Case Medical Center, Cleveland, OH (United States)

    2015-06-15

    Purpose: In this study, we investigate the effect of setup uncertainty on DVH calculations which may impact plan comparison. Methods: Treatment plans (6 MV VMAT calculated on Pinnacle TPS) were chosen for different disease sites: brain, prostate, H&N and spine in this retrospective study. A proton plan (PP) using double scattering beams was generated for each selected VMAT plan subject to the same set of dose-volume constraints as in VMAT. An uncertainty analysis was incorporated on the DVH calculations in which isocenter shifts from 1 to 5 mm in each of the ±x, ±y and ±z directions were used to simulate the setup uncertainty and residual positioning errors. A total of 40 different combinations of isocenter shifts were used in the re-calculation of DVH of the PTV and the various OARs for both the VMAT and the corresponding PT. Results: For the brain case, both VMAT and PP are comparable in PTV coverage and OAR sparing, and VMAT is a clear choice for treatment due to its ease of delivery. However, when incorporating isoshifts in DVH calculations, a significant change in dose-volume relationship emerges. For example, both VMAT and PT provide adequate coverage, even with ±3mm isoshift. However, +3mm isoshift results in increase of V40(Lcochlea, VMAT) from 7.2% in the original plan to 45% and V40(R cochlea, VMAT) from 75% to 92%. For protons, V40(Lcochlea, PT) increases from 62% in the initial plan to 75%, while V40(Rcochea, PT) increases from 7% to 26%. Conclusion: DVH alone may not be sufficient to allow an unequivocal decision in plan comparison, especially when two rival plans are very similar in both PTV coverage and OAR sparing. It is a good practice to incorporate uncertainty analysis on photon and proton plan comparison studies to test the plan robustness in plan evaluation.

  3. Survey Sequencing and Comparative Analysis of the Elephant Shark (Callorhinchus milii) Genome

    Science.gov (United States)

    Venkatesh, Byrappa; Kirkness, Ewen F; Loh, Yong-Hwee; Halpern, Aaron L; Lee, Alison P; Johnson, Justin; Dandona, Nidhi; Viswanathan, Lakshmi D; Tay, Alice; Venter, J. Craig; Strausberg, Robert L; Brenner, Sydney

    2007-01-01

    Owing to their phylogenetic position, cartilaginous fishes (sharks, rays, skates, and chimaeras) provide a critical reference for our understanding of vertebrate genome evolution. The relatively small genome of the elephant shark, Callorhinchus milii, a chimaera, makes it an attractive model cartilaginous fish genome for whole-genome sequencing and comparative analysis. Here, the authors describe survey sequencing (1.4× coverage) and comparative analysis of the elephant shark genome, one of the first cartilaginous fish genomes to be sequenced to this depth. Repetitive sequences, represented mainly by a novel family of short interspersed element–like and long interspersed element–like sequences, account for about 28% of the elephant shark genome. Fragments of approximately 15,000 elephant shark genes reveal specific examples of genes that have been lost differentially during the evolution of tetrapod and teleost fish lineages. Interestingly, the degree of conserved synteny and conserved sequences between the human and elephant shark genomes are higher than that between human and teleost fish genomes. Elephant shark contains putative four Hox clusters indicating that, unlike teleost fish genomes, the elephant shark genome has not experienced an additional whole-genome duplication. These findings underscore the importance of the elephant shark as a critical reference vertebrate genome for comparative analysis of the human and other vertebrate genomes. This study also demonstrates that a survey-sequencing approach can be applied productively for comparative analysis of distantly related vertebrate genomes. PMID:17407382

  4. BioMet Toolbox: genome-wide analysis of metabolism

    OpenAIRE

    Cvijovic, M.; R. Olivares-Hernandez; Agren, R.; Dahr, N.; Vongsangnak, W.; Nookaew, I.; K. R. Patil; Nielsen, J.

    2010-01-01

    The rapid progress of molecular biology tools for directed genetic modifications, accurate quantitative experimental approaches, high-throughput measurements, together with development of genome sequencing has made the foundation for a new area of metabolic engineering that is driven by metabolic models. Systematic analysis of biological processes by means of modelling and simulations has made the identification of metabolic networks and prediction of metabolic capabilities under different co...

  5. SIDEKICK: Genomic data driven analysis and decision-making framework

    Directory of Open Access Journals (Sweden)

    Yoon Kihoon

    2010-12-01

    Full Text Available Abstract Background Scientists striving to unlock mysteries within complex biological systems face myriad barriers in effectively integrating available information to enhance their understanding. While experimental techniques and available data sources are rapidly evolving, useful information is dispersed across a variety of sources, and sources of the same information often do not use the same format or nomenclature. To harness these expanding resources, scientists need tools that bridge nomenclature differences and allow them to integrate, organize, and evaluate the quality of information without extensive computation. Results Sidekick, a genomic data driven analysis and decision making framework, is a web-based tool that provides a user-friendly intuitive solution to the problem of information inaccessibility. Sidekick enables scientists without training in computation and data management to pursue answers to research questions like "What are the mechanisms for disease X" or "Does the set of genes associated with disease X also influence other diseases." Sidekick enables the process of combining heterogeneous data, finding and maintaining the most up-to-date data, evaluating data sources, quantifying confidence in results based on evidence, and managing the multi-step research tasks needed to answer these questions. We demonstrate Sidekick's effectiveness by showing how to accomplish a complex published analysis in a fraction of the original time with no computational effort using Sidekick. Conclusions Sidekick is an easy-to-use web-based tool that organizes and facilitates complex genomic research, allowing scientists to explore genomic relationships and formulate hypotheses without computational effort. Possible analysis steps include gene list discovery, gene-pair list discovery, various enrichments for both types of lists, and convenient list manipulation. Further, Sidekick's ability to characterize pairs of genes offers new ways to

  6. Ensemble analysis of adaptive compressed genome sequencing strategies

    Science.gov (United States)

    2014-01-01

    Background Acquiring genomes at single-cell resolution has many applications such as in the study of microbiota. However, deep sequencing and assembly of all of millions of cells in a sample is prohibitively costly. A property that can come to rescue is that deep sequencing of every cell should not be necessary to capture all distinct genomes, as the majority of cells are biological replicates. Biologically important samples are often sparse in that sense. In this paper, we propose an adaptive compressed method, also known as distilled sensing, to capture all distinct genomes in a sparse microbial community with reduced sequencing effort. As opposed to group testing in which the number of distinct events is often constant and sparsity is equivalent to rarity of an event, sparsity in our case means scarcity of distinct events in comparison to the data size. Previously, we introduced the problem and proposed a distilled sensing solution based on the breadth first search strategy. We simulated the whole process which constrained our ability to study the behavior of the algorithm for the entire ensemble due to its computational intensity. Results In this paper, we modify our previous breadth first search strategy and introduce the depth first search strategy. Instead of simulating the entire process, which is intractable for a large number of experiments, we provide a dynamic programming algorithm to analyze the behavior of the method for the entire ensemble. The ensemble analysis algorithm recursively calculates the probability of capturing every distinct genome and also the expected total sequenced nucleotides for a given population profile. Our results suggest that the expected total sequenced nucleotides grows proportional to log of the number of cells and proportional linearly with the number of distinct genomes. The probability of missing a genome depends on its abundance and the ratio of its size over the maximum genome size in the sample. The modified resource

  7. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Directory of Open Access Journals (Sweden)

    Seyhan Yazar

    Full Text Available A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR on Amazon EC2 instances and Google Compute Engine (GCE, using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2 for E.coli and 53.5% (95% CI: 34.4-72.6 for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1 and 173.9% (95% CI: 134.6-213.1 more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  8. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Science.gov (United States)

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  9. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    Science.gov (United States)

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

  10. Pan-Genome Analysis of Brazilian Lineage A Amoebal Mimiviruses

    Directory of Open Access Journals (Sweden)

    Felipe L. Assis

    2015-06-01

    Full Text Available Since the recent discovery of Samba virus, the first representative of the family Mimiviridae from Brazil, prospecting for mimiviruses has been conducted in different environmental conditions in Brazil. Recently, we isolated using Acanthamoeba sp. three new mimiviruses, all of lineage A of amoebal mimiviruses: Kroon virus from urban lake water; Amazonia virus from the Brazilian Amazon river; and Oyster virus from farmed oysters. The aims of this work were to sequence and analyze the genome of these new Brazilian mimiviruses (mimi-BR and update the analysis of the Samba virus genome. The genomes of Samba virus, Amazonia virus and Oyster virus were 97%–99% similar, whereas Kroon virus had a low similarity (90%–91% with other mimi-BR. A total of 3877 proteins encoded by mimi-BR were grouped into 974 orthologous clusters. In addition, we identified three new ORFans in the Kroon virus genome. Additional work is needed to expand our knowledge of the diversity of mimiviruses from Brazil, including if and why among amoebal mimiviruses those of lineage A predominate in the Brazilian environment.

  11. Analysis of the core genome and pangenome of Pseudomonas putida.

    Science.gov (United States)

    Udaondo, Zulema; Molina, Lázaro; Segura, Ana; Duque, Estrella; Ramos, Juan L

    2016-10-01

    Pseudomonas putida are strict aerobes that proliferate in a range of temperate niches and are of interest for environmental applications due to their capacity to degrade pollutants and ability to promote plant growth. Furthermore solvent-tolerant strains are useful for biosynthesis of added-value chemicals. We present a comprehensive comparative analysis of nine strains and the first characterization of the Pseudomonas putida pangenome. The core genome of P. putida comprises approximately 3386 genes. The most abundant genes within the core genome are those that encode nutrient transporters. Other conserved genes include those for central carbon metabolism through the Entner-Doudoroff pathway, the pentose phosphate cycle, arginine and proline metabolism, and pathways for degradation of aromatic chemicals. Genes that encode transporters, enzymes and regulators for amino acid metabolism (synthesis and degradation) are all part of the core genome, as well as various electron transporters, which enable aerobic metabolism under different oxygen regimes. Within the core genome are 30 genes for flagella biosynthesis and 12 key genes for biofilm formation. Pseudomonas putida strains share 85% of the coding regions with Pseudomonas aeruginosa; however, in P. putida, virulence factors such as exotoxins and type III secretion systems are absent.

  12. A GeneTrek analysis of the maize genome.

    Science.gov (United States)

    Liu, Renyi; Vitte, Clémentine; Ma, Jianxin; Mahama, A Assibi; Dhliwayo, Thanda; Lee, Michael; Bennetzen, Jeffrey L

    2007-07-10

    Analysis of the sequences of 74 randomly selected BACs demonstrated that the maize nuclear genome contains approximately 37,000 candidate genes with homologues in other plant species. An additional approximately 5,500 predicted genes are severely truncated and probably pseudogenes. The distribution of genes is uneven, with approximately 30% of BACs containing no genes. BAC gene density varies from 0 to 7.9 per 100 kb, whereas most gene islands contain only one gene. The average number of genes per gene island is 1.7. Only 72% of these genes show collinearity with the rice genome. Particular LTR retrotransposon families (e.g., Gyma) are enriched on gene-free BACs, most of which do not come from pericentromeres or other large heterochromatic regions. Gene-containing BACs are relatively enriched in different families of LTR retrotransposons (e.g., Ji). Two major bursts of LTR retrotransposon activity in the last 2 million years are responsible for the large size of the maize genome, but only the more recent of these is well represented in gene-containing BACs, suggesting that LTR retrotransposons are more efficiently removed in these domains. The results demonstrate that sample sequencing and careful annotation of a few randomly selected BACs can provide a robust description of a complex plant genome.

  13. The sequence and analysis of a Chinese pig genome

    Directory of Open Access Journals (Sweden)

    Fang Xiaodong

    2012-11-01

    Full Text Available Abstract Background The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP, as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome. Results Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes. Conclusion Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models.

  14. A Scale-Driven Change Detection Method Incorporating Uncertainty Analysis for Remote Sensing Images

    Directory of Open Access Journals (Sweden)

    Ming Hao

    2016-09-01

    Full Text Available Change detection (CD based on remote sensing images plays an important role in Earth observation. However, the CD accuracy is usually affected by sunlight and atmospheric conditions and sensor calibration. In this study, a scale-driven CD method incorporating uncertainty analysis is proposed to increase CD accuracy. First, two temporal images are stacked and segmented into multiscale segmentation maps. Then, a pixel-based change map with memberships belonging to changed and unchanged parts is obtained by fuzzy c-means clustering. Finally, based on the Dempster-Shafer evidence theory, the proposed scale-driven CD method incorporating uncertainty analysis is performed on the multiscale segmentation maps and the pixel-based change map. Two experiments were carried out on Landsat-7 Enhanced Thematic Mapper Plus (ETM+ and SPOT 5 data sets. The ratio of total errors can be reduced to 4.0% and 7.5% for the ETM+ and SPOT 5 data sets in this study, respectively. Moreover, the proposed approach outperforms some state-of-the-art CD methods and provides an effective solution for CD.

  15. Structural characterization of genomes by large scale sequence-structure threading: application of reliability analysis in structural genomics

    Directory of Open Access Journals (Sweden)

    Brunham Robert C

    2004-07-01

    Full Text Available Abstract Background We establish that the occurrence of protein folds among genomes can be accurately described with a Weibull function. Systems which exhibit Weibull character can be interpreted with reliability theory commonly used in engineering analysis. For instance, Weibull distributions are widely used in reliability, maintainability and safety work to model time-to-failure of mechanical devices, mechanisms, building constructions and equipment. Results We have found that the Weibull function describes protein fold distribution within and among genomes more accurately than conventional power functions which have been used in a number of structural genomic studies reported to date. It has also been found that the Weibull reliability parameter β for protein fold distributions varies between genomes and may reflect differences in rates of gene duplication in evolutionary history of organisms. Conclusions The results of this work demonstrate that reliability analysis can provide useful insights and testable predictions in the fields of comparative and structural genomics.

  16. Clinical pertinence metric enables hypothesis-independent genome-phenome analysis for neurologic diagnosis.

    Science.gov (United States)

    Segal, Michael M; Abdellateef, Mostafa; El-Hattab, Ayman W; Hilbush, Brian S; De La Vega, Francisco M; Tromp, Gerard; Williams, Marc S; Betensky, Rebecca A; Gleeson, Joseph

    2015-06-01

    We describe an "integrated genome-phenome analysis" that combines both genomic sequence data and clinical information for genomic diagnosis. It is novel in that it uses robust diagnostic decision support and combines the clinical differential diagnosis and the genomic variants using a "pertinence" metric. This allows the analysis to be hypothesis-independent, not requiring assumptions about mode of inheritance, number of genes involved, or which clinical findings are most relevant. Using 20 genomic trios with neurologic disease, we find that pertinence scores averaging 99.9% identify the causative variant under conditions in which a genomic trio is analyzed and family-aware variant calling is done. The analysis takes seconds, and pertinence scores can be improved by clinicians adding more findings. The core conclusion is that automated genome-phenome analysis can be accurate, rapid, and efficient. We also conclude that an automated process offers a methodology for quality improvement of many components of genomic analysis.

  17. CVP ANALYSIS INCORPORATING THE COST OF CAPITAL ON R&D INVESTMENT

    Directory of Open Access Journals (Sweden)

    DIAN PRIHADYANTI

    2011-04-01

    Full Text Available Cost-volume-profit (CVP analysis is a widely used tool for managerial planning. The failure of CVP analysis to incorporate the cost of capital into a product's cost function can lead to underestimating a product's cost, while overstating its profitability. This paper proposes another variation of the CVPanalytical model to include cost of capital on R&D investment and its risk level on strategic decisions. The modified CVP model provides more useful information to management because it focuses on morespecific type of investment which has particular characteristics. The CVP model developed is more complex, because it includes risk and uncertainty for the expected revenue, and specifies the R&D expense as percentage of total sales. However, the model still needs further development.

  18. Genomic insight into the common carp (Cyprinus carpio genome by sequencing analysis of BAC-end sequences

    Directory of Open Access Journals (Sweden)

    Wang Jintu

    2011-04-01

    Full Text Available Abstract Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio, a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3

  19. Comparative analysis of Acinetobacters: three genomes for three lifestyles.

    Directory of Open Access Journals (Sweden)

    David Vallenet

    Full Text Available Acinetobacter baumannii is the source of numerous nosocomial infections in humans and therefore deserves close attention as multidrug or even pandrug resistant strains are increasingly being identified worldwide. Here we report the comparison of two newly sequenced genomes of A. baumannii. The human isolate A. baumannii AYE is multidrug resistant whereas strain SDF, which was isolated from body lice, is antibiotic susceptible. As reference for comparison in this analysis, the genome of the soil-living bacterium A. baylyi strain ADP1 was used. The most interesting dissimilarities we observed were that i whereas strain AYE and A. baylyi genomes harbored very few Insertion Sequence elements which could promote expression of downstream genes, strain SDF sequence contains several hundred of them that have played a crucial role in its genome reduction (gene disruptions and simple DNA loss; ii strain SDF has low catabolic capacities compared to strain AYE. Interestingly, the latter has even higher catabolic capacities than A. baylyi which has already been reported as a very nutritionally versatile organism. This metabolic performance could explain the persistence of A. baumannii nosocomial strains in environments where nutrients are scarce; iii several processes known to play a key role during host infection (biofilm formation, iron uptake, quorum sensing, virulence factors were either different or absent, the best example of which is iron uptake. Indeed, strain AYE and A. baylyi use siderophore-based systems to scavenge iron from the environment whereas strain SDF uses an alternate system similar to the Haem Acquisition System (HAS. Taken together, all these observations suggest that the genome contents of the 3 Acinetobacters compared are partly shaped by life in distinct ecological niches: human (and more largely hospital environment, louse, soil.

  20. Genome sequencing and analysis of BCG vaccine strains.

    Directory of Open Access Journals (Sweden)

    Wen Zhang

    Full Text Available BACKGROUND: Although the Bacillus Calmette-Guérin (BCG vaccine against tuberculosis (TB has been available for more than 75 years, one third of the world's population is still infected with Mycobacterium tuberculosis and approximately 2 million people die of TB every year. To reduce this immense TB burden, a clearer understanding of the functional genes underlying the action of BCG and the development of new vaccines are urgently needed. METHODS AND FINDINGS: Comparative genomic analysis of 19 M. tuberculosis complex strains showed that BCG strains underwent repeated human manipulation, had higher region of deletion rates than those of natural M. tuberculosis strains, and lost several essential components such as T-cell epitopes. A total of 188 BCG strain T-cell epitopes were lost to various degrees. The non-virulent BCG Tokyo strain, which has the largest number of T-cell epitopes (359, lost 124. Here we propose that BCG strain protection variability results from different epitopes. This study is the first to present BCG as a model organism for genetics research. BCG strains have a very well-documented history and now detailed genome information. Genome comparison revealed the selection process of BCG strains under human manipulation (1908-1966. CONCLUSIONS: Our results revealed the cause of BCG vaccine strain protection variability at the genome level and supported the hypothesis that the restoration of lost BCG Tokyo epitopes is a useful future vaccine development strategy. Furthermore, these detailed BCG vaccine genome investigation results will be useful in microbial genetics, microbial engineering and other research fields.

  1. SmashCell: A software framework for the analysis of single-cell amplified genome sequences

    DEFF Research Database (Denmark)

    Harrington, Eoghan D; Arumugam, Manimozhiyan; Raes, Jeroen;

    2010-01-01

    SUMMARY: Recent advances in single-cell manipulation technology, whole genome amplification and high-throughput sequencing have now made it possible to sequence the genome of an individual cell. The bioinformatic analysis of these genomes however is far more complicated than the analysis of those...

  2. Bayesian hierarchical models for network meta-analysis incorporating nonignorable missingness.

    Science.gov (United States)

    Zhang, Jing; Chu, Haitao; Hong, Hwanhee; Virnig, Beth A; Carlin, Bradley P

    2015-07-28

    Network meta-analysis expands the scope of a conventional pairwise meta-analysis to simultaneously compare multiple treatments, synthesizing both direct and indirect information and thus strengthening inference. Since most of trials only compare two treatments, a typical data set in a network meta-analysis managed as a trial-by-treatment matrix is extremely sparse, like an incomplete block structure with significant missing data. Zhang et al. proposed an arm-based method accounting for correlations among different treatments within the same trial and assuming that absent arms are missing at random. However, in randomized controlled trials, nonignorable missingness or missingness not at random may occur due to deliberate choices of treatments at the design stage. In addition, those undertaking a network meta-analysis may selectively choose treatments to include in the analysis, which may also lead to missingness not at random. In this paper, we extend our previous work to incorporate missingness not at random using selection models. The proposed method is then applied to two network meta-analyses and evaluated through extensive simulation studies. We also provide comprehensive comparisons of a commonly used contrast-based method and the arm-based method via simulations in a technical appendix under missing completely at random and missing at random.

  3. Comparative Genome Analysis Provides Insights into the Pathogenicity of Flavobacterium psychrophilum

    DEFF Research Database (Denmark)

    Castillo, Daniel; Christiansen, Rói Hammershaimb; Dalsgaard, Inger;

    2016-01-01

    . psychrophilum could hold at least 3373 genes, while the core genome contained 1743 genes. On average, 67 new genes were detected for every new genome added to the analysis, indicating that F. psychrophilum possesses an open pan genome. The putative virulence factors were equally distributed among isolates......, independent of geographic location, year of isolation and source of isolates. Only one prophage-related sequence was found which corresponded to the previously described prophage 6H, and appeared in 5 out of 11 isolates. CRISPR array analysis revealed two different loci with dissimilar spacer content, which...... to describe the F. psychrophilum pan-genome and to examine virulence factors, prophages, CRISPR arrays, and genomic islands present in the genomes. Analysis of the genomic DNA sequences were complemented with selected phenotypic characteristics of the strains. The pan genome analysis showed that F...

  4. Whole genome microarray analysis, from neonatal blood cards

    Directory of Open Access Journals (Sweden)

    Hogan Michael E

    2009-07-01

    Full Text Available Abstract Background Neonatal blood, obtained from a heel stick and stored dry on paper cards, has been the standard for birth defects screening for 50 years. Such dried blood samples are used, primarily, for analysis of small-molecule analytes. More recently, the DNA complement of such dried blood cards has been used for targeted genetic testing, such as for single nucleotide polymorphism in cystic fibrosis. Expansion of such testing to include polygenic traits, and perhaps whole genome scanning, has been discussed as a formal possibility. However, until now the amount of DNA that might be obtained from such dried blood cards has been limiting, due to inefficient DNA recovery technology. Results A new technology is employed for efficient DNA release from a standard neonatal blood card. Using standard Guthrie cards, stored an average of ten years post-collection, about 1/40th of the air-dried neonatal blood specimen (two 3 mm punches was processed to obtain DNA that was sufficient in mass and quality for direct use in microarray-based whole genome scanning. Using that same DNA release technology, it is also shown that approximately 1/250th of the original purified DNA (about 1 ng could be subjected to whole genome amplification, thus yielding an additional microgram of amplified DNA product. That amplified DNA product was then used in microarray analysis and yielded statistical concordance of 99% or greater to the primary, unamplified DNA sample. Conclusion Together, these data suggest that DNA obtained from less than 10% of a standard neonatal blood specimen, stored dry for several years on a Guthrie card, can support a program of genome-wide neonatal genetic testing.

  5. Genome-wide association analysis of bacterial cold water disease resistance in rainbow trout reveals the potential of a hybrid approach between genomic selection and marker assisted selection

    Science.gov (United States)

    Genomic selection (GS) simultaneously incorporates dense SNP marker genotypes with phenotypic data from related animals to predict animal-specific genomic breeding value (GEBV), which circumvents the need to measure the disease phenotype in potential breeders. Marker assisted selection (MAS) involv...

  6. Comparative Genomic Analysis of Meningitis- and Bacteremia-Causing Pneumococci Identifies a Common Core Genome.

    Science.gov (United States)

    Kulohoma, Benard W; Cornick, Jennifer E; Chaguza, Chrispin; Yalcin, Feyruz; Harris, Simon R; Gray, Katherine J; Kiran, Anmol M; Molyneux, Elizabeth; French, Neil; Parkhill, Julian; Faragher, Brian E; Everett, Dean B; Bentley, Stephen D; Heyderman, Robert S

    2015-10-01

    Streptococcus pneumoniae is a nasopharyngeal commensal that occasionally invades normally sterile sites to cause bloodstream infection and meningitis. Although the pneumococcal population structure and evolutionary genetics are well defined, it is not clear whether pneumococci that cause meningitis are genetically distinct from those that do not. Here, we used whole-genome sequencing of 140 isolates of S. pneumoniae recovered from bloodstream infection (n = 70) and meningitis (n = 70) to compare their genetic contents. By fitting a double-exponential decaying-function model, we show that these isolates share a core of 1,427 genes (95% confidence interval [CI], 1,425 to 1,435 genes) and that there is no difference in the core genome or accessory gene content from these disease manifestations. Gene presence/absence alone therefore does not explain the virulence behavior of pneumococci that reach the meninges. Our analysis, however, supports the requirement of a range of previously described virulence factors and vaccine candidates for both meningitis- and bacteremia-causing pneumococci. This high-resolution view suggests that, despite considerable competency for genetic exchange, all pneumococci are under considerable pressure to retain key components advantageous for colonization and transmission and that these components are essential for access to and survival in sterile sites.

  7. Comparative genome analysis of Bacillus cereus group genomes with Bacillus subtilis

    OpenAIRE

    Anderson, Iain; Sorokin, Alexei; Kapatral, Vinayak; Reznik, Gary; Bhattacharya, Anamitra; Mikhailova, Natalia; Burd, Henry; Joukov, Victor; Kaznadzey, Denis; Walunas, Theresa; D'Souza, Mark; Larsen, Niels; Pusch, Gordon; Liolios, Konstantinos; Grechkin, Yuri

    2005-01-01

    Genome features of the Bacillus cereus group genomes (representative strains of Bacillus cereus, Bacillus anthracis and Bacillus thuringiensis sub spp israelensis) were analyzed and compared with the Bacillus subtilis genome. A core set of 1,381 protein families among the four Bacillus genomes, with an additional set of 933 families common to the B. cereus group, was identified. Differences in signal transduction pathways, membrane transporters, cell surface structures, cell wall, and S-...

  8. Evolutionary insights from suffix array-based genome sequence analysis

    Indian Academy of Sciences (India)

    Anindya Poddar; Nagasuma Chandra; Madhavi Ganapathiraju; K Sekar; Judith Klein-Seetharaman; Raj Reddy; N Balakrishnan

    2007-08-01

    Gene and protein sequence analyses, central components of studies in modern biology are easily amenable to string matching and pattern recognition algorithms. The growing need of analysing whole genome sequences more efficiently and thoroughly, has led to the emergence of new computational methods. Suffix trees and suffix arrays are data structures, well known in many other areas and are highly suited for sequence analysis too. Here we report an improvement to the design of construction of suffix arrays. Enhancement in versatility and scalability, enabled by this approach, is demonstrated through the use of real-life examples. The scalability of the algorithm to whole genomes renders it suitable to address many biologically interesting problems. One example is the evolutionary insight gained by analysing unigrams, bi-grams and higher n-grams, indicating that the genetic code has a direct influence on the overall composition of the genome. Further, different proteomes have been analysed for the coverage of the possible peptide space, which indicate that as much as a quarter of the total space at the tetra-peptide level is left un-sampled in prokaryotic organisms, although almost all tri-peptides can be seen in one protein or another in a proteome. Besides, distinct patterns begin to emerge for the counts of particular tetra and higher peptides, indicative of a ‘meaning’ for tetra and higher n-grams. The toolkit has also been used to demonstrate the usefulness of identifying repeats in whole proteomes efficiently. As an example, 16 members of one COG, coded by the genome of Mycobacterium tuberculosis H37Rv have been found to contain a repeating sequence of 300 amino acids.

  9. Comparative Genomic Analysis of Mannheimia haemolytica from Bovine Sources.

    Directory of Open Access Journals (Sweden)

    Cassidy L Klima

    Full Text Available Bovine respiratory disease is a common health problem in beef production. The primary bacterial agent involved, Mannheimia haemolytica, is a target for antimicrobial therapy and at risk for associated antimicrobial resistance development. The role of M. haemolytica in pathogenesis is linked to serotype with serotypes 1 (S1 and 6 (S6 isolated from pneumonic lesions and serotype 2 (S2 found in the upper respiratory tract of healthy animals. Here, we sequenced the genomes of 11 strains of M. haemolytica, representing all three serotypes and performed comparative genomics analysis to identify genetic features that may contribute to pathogenesis. Possible virulence associated genes were identified within 14 distinct prophage, including a periplasmic chaperone, a lipoprotein, peptidoglycan glycosyltransferase and a stress response protein. Prophage content ranged from 2-8 per genome, but was higher in S1 and S6 strains. A type I-C CRISPR-Cas system was identified in each strain with spacer diversity and organization conserved among serotypes. The majority of spacers occur in S1 and S6 strains and originate from phage suggesting that serotypes 1 and 6 may be more resistant to phage predation. However, two spacers complementary to the host chromosome targeting a UDP-N-acetylglucosamine 2-epimerase and a glycosyl transferases group 1 gene are present in S1 and S6 strains only indicating these serotypes may employ CRISPR-Cas to regulate gene expression to avoid host immune responses or enhance adhesion during infection. Integrative conjugative elements are present in nine of the eleven genomes. Three of these harbor extensive multi-drug resistance cassettes encoding resistance against the majority of drugs used to combat infection in beef cattle, including macrolides and tetracyclines used in human medicine. The findings here identify key features that are likely contributing to serotype related pathogenesis and specific targets for vaccine design

  10. Comparative analysis of whole-genome sequences of Streptococcus suis

    Institute of Scientific and Technical Information of China (English)

    LI Pengli; WEI Wu; LI Yixue; MA Yuanyuan; DING Guohui; LI Xiaoping; WANG Xiaojing; ZHANG Liwen; SUN Jingchun; WANG Yong; TU Kang; WANG Ningning; HAO Pei; WANG Chuan; CAO Zhiwei; SHI Tieliu

    2006-01-01

    The outbreak of Streptococcus suis recently in some districts of Sichuan Province in China has caused over 30 deaths and over 200 infections in human beings. In order to study the pathogenicity mechanism and to prevent the bacteria from spreading and infecting human beings and swine, we have annotated and analyzed the genomes of two strains, Streptococcus suis P1/7 and 89-1591 respectively. The whole length of P1/7 is 2.007 Mb,and has 1969 ORFs. In contrast, the partial genome sequence of 89-1591 is 1.98 Mb in length and exists in 177 contigs with 1918 ORFs. Analysis shows that the average lengths of CDSs in two genomes are very close, and the numbers of the homolog ORFs are 1306 between those two strains. Most of the toxicity factors of the two strains are homologeous, but there are still some significant differences between those two strains. For example, among the 11 genes (cps2A-cps2K) encoding for the capsules in P1/7, 4(cps2A, 2B, 2I, 2J) are not detected in strain 89-1591.At the same time, the genes encoding EF and Haemolysin in P1/7 are also not found in strain 89-1591. Besides, the genes related to DNA replication, repair and recombination differ from each other significantly and there also exist certain differences among the surface proteins. Those characteristics indicate that those two strains have evolved their own specific functions to adapt to the different environments and that the pathogenesis of the two strains is different. We have accumulated comprehensive genomics information for future systematic studies of S.sui. Our results are helpful for disease prevention,vaccine development, as well as drug design for S.suis.

  11. Stsub>2sub>-80 - A new FISH marker for St genome and genome analysis in Triticeae.

    Science.gov (United States)

    Wang, Long; Shi, Qinghua; Su, Handong; Wang, Yi; Lina, Sha; Fan, Xing; Houyang, Kang; Haiqin, Zhang; Zhou, Yong-Hong

    2017-03-17

    St genome is one of the most fundamental genomes in Triticeae. Repetitive sequences are widely used to distinguish different genomes or species. The primary objectives of this study are (1) to screen a new sequence that can easily distinguish the chromosome of St and other genome by fluorescence in situ hybridization (FISH); (2) to investigate the genome constitutions of some species remains uncertain and controversial. We used degenerated oligonucleotide primer PCR (Dop-PCR), Dot-blot and FISH to screen new marker for St genome and tested efficiency of this marker in detection St chromosome at different ploidy level. Signals of new FISH marker (denoted Stsub>2sub>-80) were found in the entire arm of chromosomes in St genome, except in centromeric region; by contrast, Stsub>2sub>-80 signals were found in the terminal region of chromosome in E, H, P and Y genomes. No signal was detected in A and B genomes, and only a teeny signals were detected in the terminal region of chromosomes in D genome. Stsub>2sub>-80 signals were obvious and stable in chromosomes of different genomes in either diploid or polyploid. Therefore, Stsub>2sub>-80 is a potential and useful FISH marker that can be used to the distinguish St and other genomes in Triticeae.

  12. Radiation induced genome instability: multiscale modelling and data analysis

    Science.gov (United States)

    Andreev, Sergey; Eidelman, Yuri

    2012-07-01

    Genome instability (GI) is thought to be an important step in cancer induction and progression. Radiation induced GI is usually defined as genome alterations in the progeny of irradiated cells. The aim of this report is to demonstrate an opportunity for integrative analysis of radiation induced GI on the basis of multiscale modelling. Integrative, systems level modelling is necessary to assess different pathways resulting in GI in which a variety of genetic and epigenetic processes are involved. The multilevel modelling includes the Monte Carlo based simulation of several key processes involved in GI: DNA double strand breaks (DSBs) generation in cells initially irradiated as well as in descendants of irradiated cells, damage transmission through mitosis. Taking the cell-cycle-dependent generation of DNA/chromosome breakage into account ensures an advantage in estimating the contribution of different DNA damage response pathways to GI, as to nonhomologous vs homologous recombination repair mechanisms, the role of DSBs at telomeres or interstitial chromosomal sites, etc. The preliminary estimates show that both telomeric and non-telomeric DSB interactions are involved in delayed effects of radiation although differentially for different cell types. The computational experiments provide the data on the wide spectrum of GI endpoints (dicentrics, micronuclei, nonclonal translocations, chromatid exchanges, chromosome fragments) similar to those obtained experimentally for various cell lines under various experimental conditions. The modelling based analysis of experimental data demonstrates that radiation induced GI may be viewed as processes of delayed DSB induction/interaction/transmission being a key for quantification of GI. On the other hand, this conclusion is not sufficient to understand GI as a whole because factors of DNA non-damaging origin can also induce GI. Additionally, new data on induced pluripotent stem cells reveal that GI is acquired in normal mature

  13. Comparative Genome Analysis Provides Insights into the Pathogenicity of Flavobacterium psychrophilum

    DEFF Research Database (Denmark)

    Castillo, Daniel; Christiansen, Rói Hammershaimb; Dalsgaard, Inger;

    2016-01-01

    to describe the F. psychrophilum pan-genome and to examine virulence factors, prophages, CRISPR arrays, and genomic islands present in the genomes. Analysis of the genomic DNA sequences were complemented with selected phenotypic characteristics of the strains. The pan genome analysis showed that F......, independent of geographic location, year of isolation and source of isolates. Only one prophage-related sequence was found which corresponded to the previously described prophage 6H, and appeared in 5 out of 11 isolates. CRISPR array analysis revealed two different loci with dissimilar spacer content, which...

  14. Identification of Candidate Adherent-Invasive E. coli Signature Transcripts by Genomic/Transcriptomic Analysis.

    Directory of Open Access Journals (Sweden)

    Yuanhao Zhang

    Full Text Available Adherent-invasive Escherichia coli (AIEC strains are detected more frequently within mucosal lesions of patients with Crohn's disease (CD. The AIEC phenotype consists of adherence and invasion of intestinal epithelial cells and survival within macrophages of these bacteria in vitro. Our aim was to identify candidate transcripts that distinguish AIEC from non-invasive E. coli (NIEC strains and might be useful for rapid and accurate identification of AIEC by culture-independent technology. We performed comparative RNA-Sequence (RNASeq analysis using AIEC strain LF82 and NIEC strain HS during exponential and stationary growth. Differential expression analysis of coding sequences (CDS homologous to both strains demonstrated 224 and 241 genes with increased and decreased expression, respectively, in LF82 relative to HS. Transition metal transport and siderophore metabolism related pathway genes were up-regulated, while glycogen metabolic and oxidation-reduction related pathway genes were down-regulated, in LF82. Chemotaxis related transcripts were up-regulated in LF82 during the exponential phase, but flagellum-dependent motility pathway genes were down-regulated in LF82 during the stationary phase. CDS that mapped only to the LF82 genome accounted for 747 genes. We applied an in silico subtractive genomics approach to identify CDS specific to AIEC by incorporating the genomes of 10 other previously phenotyped NIEC. From this analysis, 166 CDS mapped to the LF82 genome and lacked homology to any of the 11 human NIEC strains. We compared these CDS across 13 AIEC, but none were homologous in each. Four LF82 gene loci belonging to clustered regularly interspaced short palindromic repeats region (CRISPR--CRISPR-associated (Cas genes were identified in 4 to 6 AIEC and absent from all non-pathogenic bacteria. As previously reported, AIEC strains were enriched for pdu operon genes. One CDS, encoding an excisionase, was shared by 9 AIEC strains. Reverse

  15. Recombination analysis based on the complete genome of bocavirus

    Directory of Open Access Journals (Sweden)

    Chen Shengxia

    2011-04-01

    Full Text Available Abstract Bocavirus include bovine parvovirus, minute virus of canine, porcine bocavirus, gorilla bocavirus, and Human bocaviruses 1-4 (HBoVs. Although recent reports showed that recombination happened in bocavirus, no systematical study investigated the recombination of bocavirus. The present study performed the phylogenetic and recombination analysis of bocavirus over the complete genomes available in GenBank. Results confirmed that recombination existed among bocavirus, including the likely inter-genotype recombination between HBoV1 and HBoV4, and intra-genotype recombination among HBoV2 variants. Moreover, it is the first report revealing the recombination that occurred between minute viruses of canine.

  16. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  17. BioMet Toolbox: genome-wide analysis of metabolism

    DEFF Research Database (Denmark)

    Cvijovic, M.; Olivares Hernandez, Roberto; Agren, R.

    2010-01-01

    models. Systematic analysis of biological processes by means of modelling and simulations has made the identification of metabolic networks and prediction of metabolic capabilities under different conditions possible. For facilitating such systemic analysis, we have developed the BioMet Toolbox, a web......-based resource for stoichiometric analysis and for integration of transcriptome and interactome data, thereby exploiting the capabilities of genome-scale metabolic models. The BioMet Toolbox provides an effective user-friendly way to perform linear programming simulations towards maximized or minimized growth...... rates, substrate uptake rates and metabolic production rates by detecting relevant fluxes, simulate single and double gene deletions or detect metabolites around which major transcriptional changes are concentrated. These tools can be used for high-throughput in silico screening and allows fully...

  18. Comparative analysis of genomic signal processing for microarray data clustering.

    Science.gov (United States)

    Istepanian, Robert S H; Sungoor, Ala; Nebel, Jean-Christophe

    2011-12-01

    Genomic signal processing is a new area of research that combines advanced digital signal processing methodologies for enhanced genetic data analysis. It has many promising applications in bioinformatics and next generation of healthcare systems, in particular, in the field of microarray data clustering. In this paper we present a comparative performance analysis of enhanced digital spectral analysis methods for robust clustering of gene expression across multiple microarray data samples. Three digital signal processing methods: linear predictive coding, wavelet decomposition, and fractal dimension are studied to provide a comparative evaluation of the clustering performance of these methods on several microarray datasets. The results of this study show that the fractal approach provides the best clustering accuracy compared to other digital signal processing and well known statistical methods.

  19. The integrated microbial genomes (IMG) system in 2007: datacontent and analysis tool extensions

    Energy Technology Data Exchange (ETDEWEB)

    Markowitz, Victor M.; Szeto, Ernest; Palaniappan, Krishna; Grechkin, Yuri; Chu, Ken; Chen, I-Min A.; Dubchak, Inna; Anderson, Iain; Lykidis, Athanasios; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2007-08-01

    The Integrated Microbial Genomes (IMG) system is a data management, analysis and annotation platform for all publicly available genomes. IMG contains both draft and complete JGI microbial genomes integrated with all other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and annotating genomes, genes and functions, individually or in a comparative context. Since its first release in 2005, IMG's data content and analytical capabilities have been constantly expanded through quarterly releases. IMG is provided by the DOE-Joint Genome Institute (JGI) and is available from http://img.jgi.doe.gov.

  20. Genome-wide identification of specific oligonucleotides using artificial neural network and computational genomic analysis

    Directory of Open Access Journals (Sweden)

    Chen Jiun-Ching

    2007-05-01

    Full Text Available Abstract Background Genome-wide identification of specific oligonucleotides (oligos is a computationally-intensive task and is a requirement for designing microarray probes, primers, and siRNAs. An artificial neural network (ANN is a machine learning technique that can effectively process complex and high noise data. Here, ANNs are applied to process the unique subsequence distribution for prediction of specific oligos. Results We present a novel and efficient algorithm, named the integration of ANN and BLAST (IAB algorithm, to identify specific oligos. We establish the unique marker database for human and rat gene index databases using the hash table algorithm. We then create the input vectors, via the unique marker database, to train and test the ANN. The trained ANN predicted the specific oligos with high efficiency, and these oligos were subsequently verified by BLAST. To improve the prediction performance, the ANN over-fitting issue was avoided by early stopping with the best observed error and a k-fold validation was also applied. The performance of the IAB algorithm was about 5.2, 7.1, and 6.7 times faster than the BLAST search without ANN for experimental results of 70-mer, 50-mer, and 25-mer specific oligos, respectively. In addition, the results of polymerase chain reactions showed that the primers predicted by the IAB algorithm could specifically amplify the corresponding genes. The IAB algorithm has been integrated into a previously published comprehensive web server to support microarray analysis and genome-wide iterative enrichment analysis, through which users can identify a group of desired genes and then discover the specific oligos of these genes. Conclusion The IAB algorithm has been developed to construct SpecificDB, a web server that provides a specific and valid oligo database of the probe, siRNA, and primer design for the human genome. We also demonstrate the ability of the IAB algorithm to predict specific oligos through

  1. Genome sequence and analysis of the tuber crop potato

    DEFF Research Database (Denmark)

    Xu, X.; Pan, S.; Cheng, S.;

    2011-01-01

    and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade...... contributed to the evolution of tuber development. The potato genome sequence provides a platform for genetic improvement of this vital crop....

  2. Comparative Genomics and Transcriptomic Analysis of Mycobacterium Kansasii

    KAUST Repository

    Alzahid, Yara

    2014-04-01

    The group of Mycobacteria is one of the most intensively studied bacterial taxa, as they cause the two historical and worldwide known diseases: leprosy and tuberculosis. Mycobacteria not identified as tuberculosis or leprosy complex, have been referred to by ‘environmental mycobacteria’ or ‘Nontuberculous mycobacteria (NTM). Mycobacterium kansasii (M. kansasii) is one of the most frequent NTM pathogens, as it causes pulmonary disease in immuno-competent patients and pulmonary, and disseminated disease in patients with various immuno-deficiencies. There have been five documented subtypes of this bacterium, by different molecular typing methods, showing that type I causes tuberculosis-like disease in healthy individuals, and type II in immune-compromised individuals. The remaining types are said to be environmental, thereby, not causing any diseases. The aim of this project was to conduct a comparative genomic study of M. kansasii types I-V and investigating the gene expression level of those types. From various comparative genomics analysis, provided genomics evidence on why M. kansasii type I is considered pathogenic, by focusing on three key elements that are involved in virulence of Mycobacteria: ESX secretion system, Phospholipase c (plcb) and Mammalian cell entry (Mce) operons. The results showed the lack of the espA operon in types II-V, which renders the ESX- 1 operon dysfunctional, as espA is one of the key factors that control this secretion system. However, gene expression analysis showed this operon to be deleted in types II, III and IV. Furthermore, plcB was found to be truncated in types III and IV. Analysis of Mce operons (1-4) show that mce-1 operon is duplicated, mce-2 is absent and mce-3 and mce-4 is present in one copy in M. kansasii types I-V. Gene expression profiles of type I-IV, showed that the secreted proteins of ESX-1 were slightly upregulated in types II-IV when compared to type I and the secreted forms of ESX-5 were highly down

  3. Dating the age of admixture via wavelet transform analysis of genome-wide data

    NARCIS (Netherlands)

    I. Pugach (Irina); R. Matveyev (Rostislav); A. Wollstein (Andreas); M.H. Kayser (Manfred); M. Stoneking (Mark)

    2011-01-01

    textabstractWe describe a PCA-based genome scan approach to analyze genome-wide admixture structure, and introduce wavelet transform analysis as a method for estimating the time of admixture. We test the wavelet transform method with simulations and apply it to genome-wide SNP data from eight admixe

  4. IMG 4 version of the integrated microbial genomes comparative analysis system

    Energy Technology Data Exchange (ETDEWEB)

    Markowitz, Victor M. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Chen, I-Min A. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Palaniappan, Krishna [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Chu, Ken [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Szeto, Ernest [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Pillay, Manoj [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Ratner, Anna [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Huang, Jinghua [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Woyke, Tanja [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Huntemann, Marcel [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Anderson, Iain [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Billis, Konstantinos [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Varghese, Neha [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Mavromatis, Konstantinos [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Pati, Amrita [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Ivanova, Natalia N. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Kyrpides, Nikos C. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program

    2013-10-27

    The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Finally, different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).

  5. IMG 4 version of the integrated microbial genomes comparative analysis system.

    Science.gov (United States)

    Markowitz, Victor M; Chen, I-Min A; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Woyke, Tanja; Huntemann, Marcel; Anderson, Iain; Billis, Konstantinos; Varghese, Neha; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N; Kyrpides, Nikos C

    2014-01-01

    The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG's data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG's annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).

  6. Stochastic modelling of landfill leachate and biogas production incorporating waste heterogeneity. Model formulation and uncertainty analysis.

    Science.gov (United States)

    Zacharof, A I; Butler, A P

    2004-01-01

    A mathematical model simulating the hydrological and biochemical processes occurring in landfilled waste is presented and demonstrated. The model combines biochemical and hydrological models into an integrated representation of the landfill environment. Waste decomposition is modelled using traditional biochemical waste decomposition pathways combined with a simplified methodology for representing the rate of decomposition. Water flow through the waste is represented using a statistical velocity model capable of representing the effects of waste heterogeneity on leachate flow through the waste. Given the limitations in data capture from landfill sites, significant emphasis is placed on improving parameter identification and reducing parameter requirements. A sensitivity analysis is performed, highlighting the model's response to changes in input variables. A model test run is also presented, demonstrating the model capabilities. A parameter perturbation model sensitivity analysis was also performed. This has been able to show that although the model is sensitive to certain key parameters, its overall intuitive response provides a good basis for making reasonable predictions of the future state of the landfill system. Finally, due to the high uncertainty associated with landfill data, a tool for handling input data uncertainty is incorporated in the model's structure. It is concluded that the model can be used as a reasonable tool for modelling landfill processes and that further work should be undertaken to assess the model's performance.

  7. Analysis of CdS/CdTe devices incorporating a ZnTe:Cu/Ti Contact

    Energy Technology Data Exchange (ETDEWEB)

    Gessert, T.A. [National Renewable Energy Laboratory, 1617 Cole Blvd, Golden, Colorado 80401 (United States)]. E-mail: tim_gessert@nrel.gov; Asher, S. [National Renewable Energy Laboratory, 1617 Cole Blvd, Golden, Colorado 80401 (United States); Johnston, S. [National Renewable Energy Laboratory, 1617 Cole Blvd, Golden, Colorado 80401 (United States); Young, M. [National Renewable Energy Laboratory, 1617 Cole Blvd, Golden, Colorado 80401 (United States); Dippo, P. [National Renewable Energy Laboratory, 1617 Cole Blvd, Golden, Colorado 80401 (United States); Corwine, C. [National Renewable Energy Laboratory, 1617 Cole Blvd, Golden, Colorado 80401 (United States)

    2007-05-31

    High-performance CdS/CdTe photovoltaic devices can be produced using a ZnTe:Cu/Ti back contact deposited onto the CdTe layer. We observe that prolonged exposure of the ZnTe:Cu and Ti sputtering targets to an oxygen-containing plasma significantly reduces device open-circuit voltage and fill factor. High-resolution compositional analysis of these devices reveals that Cu concentration in the CdTe and CdS layers is lower for devices with poor performance. Capacitance-voltage analysis and related numerical simulations indicate that the net acceptor concentration in the CdTe is also lower for devices with poor performance. Photoluminescence analyses of the junction region reveal that the intensity of a luminescent peak associated with a defect complex involving interstitial Cu (Cu{sub i}) and oxygen on Te (O{sub Te}) is reduced in devices with poor performance. Combined with thermodynamic considerations, these results suggest that oxygen incorporation into the ZnTe:Cu sputtering target reduces the ability of sputtered ZnTe:Cu film to diffuse Cu into the CdTe.

  8. NexGen PVAs: Incorporating Eco-Evolutionary Processes into Population Viability Models

    Science.gov (United States)

    We examine how the integration of evolutionary and ecological processes in population dynamics – an emerging framework in ecology – could be incorporated into population viability analysis (PVA). Driven by parallel, complementary advances in population genomics and computational ...

  9. [The Mycobacterium leprae genome: from sequence analysis to therapeutic implications].

    Science.gov (United States)

    Honore, N

    2002-01-01

    The genome of Mycobacterium leprae, the causative agent of leprosy, was analyzed by rapid sequencing of cosmids and plasmids prepared from DNA isolated from one patient's strain. Results showed that the bacillus possesses a single circular chromosome that differs from other known mycobacterium chromosomes with regard to size (3.2 Mb) and G + C content (57.8%). Computer analysis demonstrated that only half of the sequence contains protein-coding genes. The other half contains pseudogenes and non-coding sequences. These findings indicate that M. leprae has undergone a major reductive evolution leaving a minimal set of functional genes for survival. Study of the coding region of the sequence provides evidence accounting for the particular pathogenic properties of M. leprae which is an obligate intracellular parasite. Disappearance of numerous enzymatic pathways in comparison with M. tuberculosis, an intracellular pathogen comparable to M. leprae, could explain the differences observed between the two organisms. Genomic analysis of the leprosy bacillus also provided insight into the molecular basis for resistance to various antibiotics and allowed identification of several potential targets for new drug treatments.

  10. A genome-wide 20 K citrus microarray for gene expression analysis

    OpenAIRE

    Gadea Jose; Forment Javier; Santiago Julia; Marques M Carmen; Juarez Jose; Mauri Nuria; Martinez-Godoy M Angeles

    2008-01-01

    Abstract Background Understanding of genetic elements that contribute to key aspects of citrus biology will impact future improvements in this economically important crop. Global gene expression analysis demands microarray platforms with a high genome coverage. In the last years, genome-wide EST collections have been generated in citrus, opening the possibility to create new tools for functional genomics in this crop plant. Results We have designed and constructed a publicly available genome-...

  11. A genome-wide 20 K citrus microarray for gene expression analysis

    OpenAIRE

    Martinez-Godoy, M Angeles; Mauri, Nuria; Juarez, Jose; Marques, M Carmen; Santiago, Julia; Forment, Javier; Gadea, Jose

    2008-01-01

    Background Understanding of genetic elements that contribute to key aspects of citrus biology will impact future improvements in this economically important crop. Global gene expression analysis demands microarray platforms with a high genome coverage. In the last years, genome-wide EST collections have been generated in citrus, opening the possibility to create new tools for functional genomics in this crop plant. Results We have designed and constructed a publicly available genome-wide cDNA...

  12. Analysis Of Segmental Duplications In The Pig Genome Based On Next-Generation Sequencing

    DEFF Research Database (Denmark)

    Fadista, João; Bendixen, Christian

    extensively studied in other organisms, its analysis in pig has been hampered by the lack of a complete pig genome assembly. By measuring the depth of coverage of Illumina whole-genome shotgun sequencing reads of the Tabasco animal aligned to the latest pig genome assembly (Sus scrofa 10 – based also...... on Tabasco), led us to the detection of a high-resolution map of segmental duplications in the pig genome. Comparing these segments with four other Duroc animals sequenced at our institute, supplied the resources needed to describe the first genome-wide and systematic analysis of segmental duplications...

  13. Analysis of dinucleotide signatures in HIV-1 subtype B genomes

    Indian Academy of Sciences (India)

    Aridaman Pandit; Jyothirmayi Vadlamudi; Somdatta Sinha

    2013-12-01

    Dinucleotide usage is known to vary in the genomes of organisms. The dinucleotide usage profiles or genome signatures are similar for sequence samples taken from the same genome, but are different for taxonomically distant species. This concept of genome signatures has been used to study several organisms including viruses, to elucidate the signatures of evolutionary processes at the genome level. Genome signatures assume greater importance in the case of host–pathogen interactions, where molecular interactions between the two species take place continuously, and can influence their genomic composition. In this study, analyses of whole genome sequences of the HIV-1 subtype B, a retrovirus that caused global pandemic of AIDS, have been carried out to analyse the variation in genome signatures of the virus from 1983 to 2007.We show statistically significant temporal variations in some dinucleotide patterns highlighting the selective evolution of the dinucleotide profiles of HIV-1 subtype B, possibly a consequence of host specific selection.

  14. Clinical Omics Analysis of Colorectal Cancer Incorporating Copy Number Aberrations and Gene Expression Data

    Directory of Open Access Journals (Sweden)

    Tsuyoshi Yoshida

    2010-07-01

    Full Text Available Background: Colorectal cancer (CRC is one of the most frequently occurring cancers in Japan, and thus a wide range of methods have been deployed to study the molecular mechanisms of CRC. In this study, we performed a comprehensive analysis of CRC, incorporating copy number aberration (CRC and gene expression data. For the last four years, we have been collecting data from CRC cases and organizing the information as an “omics” study by integrating many kinds of analysis into a single comprehensive investigation. In our previous studies, we had experienced difficulty in finding genes related to CRC, as we observed higher noise levels in the expression data than in the data for other cancers. Because chromosomal aberrations are often observed in CRC, here, we have performed a combination of CNA analysis and expression analysis in order to identify some new genes responsible for CRC. This study was performed as part of the Clinical Omics Database Project at Tokyo Medical and Dental University. The purpose of this study was to investigate the mechanism of genetic instability in CRC by this combination of expression analysis and CNA, and to establish a new method for the diagnosis and treatment of CRC. Materials and methods: Comprehensive gene expression analysis was performed on 79 CRC cases using an Affymetrix Gene Chip, and comprehensive CNA analysis was performed using an Affymetrix DNA Sty array. To avoid the contamination of cancer tissue with normal cells, laser micro-dissection was performed before DNA/RNA extraction. Data analysis was performed using original software written in the R language. Result: We observed a high percentage of CNA in colorectal cancer, including copy number gains at 7, 8q, 13 and 20q, and copy number losses at 8p, 17p and 18. Gene expression analysis provided many candidates for CRC-related genes, but their association with CRC did not reach the level of statistical significance. The combination of CNA and gene

  15. Establishing a framework for comparative analysis of genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  16. Analysis of the Complete Mitochondrial Genome Sequence of the Diploid Cotton Gossypium raimondii by Comparative Genomics Approaches

    Directory of Open Access Journals (Sweden)

    Changwei Bi

    2016-01-01

    Full Text Available Cotton is one of the most important economic crops and the primary source of natural fiber and is an important protein source for animal feed. The complete nuclear and chloroplast (cp genome sequences of G. raimondii are already available but not mitochondria. Here, we assembled the complete mitochondrial (mt DNA sequence of G. raimondii into a circular genome of length of 676,078 bp and performed comparative analyses with other higher plants. The genome contains 39 protein-coding genes, 6 rRNA genes, and 25 tRNA genes. We also identified four larger repeats (63.9 kb, 10.6 kb, 9.1 kb, and 2.5 kb in this mt genome, which may be active in intramolecular recombination in the evolution of cotton. Strikingly, nearly all of the G. raimondii mt genome has been transferred to nucleus on Chr1, and the transfer event must be very recent. Phylogenetic analysis reveals that G. raimondii, as a member of Malvaceae, is much closer to another cotton (G. barbadense than other rosids, and the clade formed by two Gossypium species is sister to Brassicales. The G. raimondii mt genome may provide a crucial foundation for evolutionary analysis, molecular biology, and cytoplasmic male sterility in cotton and other higher plants.

  17. Genome-Wide Analysis Reveals Coating of the Mitochondrial Genome by TFAM

    OpenAIRE

    Wang, Yun E.; Marinov, Georgi K.; Wold, Barbara J.; Chan, David C.

    2013-01-01

    Mitochondria contain a 16.6 kb circular genome encoding 13 proteins as well as mitochondrial tRNAs and rRNAs. Copies of the genome are organized into nucleoids containing both DNA and proteins, including the machinery required for mtDNA replication and transcription. The transcription factor TFAM is critical for initiation of transcription and replication of the genome, and is also thought to perform a packaging function. Although specific binding sites required for initiation of transcriptio...

  18. a Maximum Entropy Model of the Bearded Capuchin Monkey Habitat Incorporating Topography and Spectral Unmixing Analysis

    Science.gov (United States)

    Howard, A. M.; Bernardes, S.; Nibbelink, N.; Biondi, L.; Presotto, A.; Fragaszy, D. M.; Madden, M.

    2012-07-01

    Movement patterns of bearded capuchin monkeys (Cebus (Sapajus) libidinosus) in northeastern Brazil are likely impacted by environmental features such as elevation, vegetation density, or vegetation type. Habitat preferences of these monkeys provide insights regarding the impact of environmental features on species ecology and the degree to which they incorporate these features in movement decisions. In order to evaluate environmental features influencing movement patterns and predict areas suitable for movement, we employed a maximum entropy modelling approach, using observation points along capuchin monkey daily routes as species presence points. We combined these presence points with spatial data on important environmental features from remotely sensed data on land cover and topography. A spectral mixing analysis procedure was used to generate fraction images that represent green vegetation, shade and soil of the study area. A Landsat Thematic Mapper scene of the area of study was geometrically and atmospherically corrected and used as input in a Minimum Noise Fraction (MNF) procedure and a linear spectral unmixing approach was used to generate the fraction images. These fraction images and elevation were the environmental layer inputs for our logistic MaxEnt model of capuchin movement. Our models' predictive power (test AUC) was 0.775. Areas of high elevation (>450 m) showed low probabilities of presence, and percent green vegetation was the greatest overall contributor to model AUC. This work has implications for predicting daily movement patterns of capuchins in our field site, as suitability values from our model may relate to habitat preference and facility of movement.

  19. Incorporation of a Wind Generator Model into a Dynamic Power Flow Analysis

    Directory of Open Access Journals (Sweden)

    Angeles-Camacho C.

    2011-07-01

    Full Text Available Wind energy is nowadays one of the most cost-effective and practical options for electric generation from renewable resources. However, increased penetration of wind generation causes the power networks to be more depend on, and vulnerable to, the varying wind speed. Modeling is a tool which can provide valuable information about the interaction between wind farms and the power network to which they are connected. This paper develops a realistic characterization of a wind generator. The wind generator model is incorporated into an algorithm to investigate its contribution to the stability of the power network in the time domain. The tool obtained is termed dynamic power flow. The wind generator model takes on account the wind speed and the reactive power consumption by induction generators. Dynamic power flow analysis is carried-out using real wind data at 10-minute time intervals collected for one meteorological station. The generation injected at one point into the network provides active power locally and is found to reduce global power losses. However, the power supplied is time-varying and causes fluctuations in voltage magnitude and power fl ows in transmission lines.

  20. Incorporating temporal variability to improve geostatistical analysis of satellite-observed CO2 in China

    Institute of Scientific and Technical Information of China (English)

    ZENG ZhaoCheng; LEI LiPing; GUO LiJie; ZHANG Li; ZHANG Bing

    2013-01-01

    Observations of atmospheric carbon dioxide (CO2) from satellites offer new data sources to understand global carbon cycling.The correlation structure of satellite-observed CO2 can be analyzed and modeled by geostatistical methods,and CO2 values at unsampled locations can be predicted with a correlation model.Conventional geostatistical analysis only investigates the spatial correlation of CO2,and does not consider temporal variation in the satellite-observed CO2 data.In this paper,a spatiotemporal geostatistical method that incorporates temporal variability is implemented and assessed for analyzing the spatiotemporal correlation structure and prediction of monthly CO2 in China.The spatiotemporal correlation is estimated and modeled by a product-sum variogram model with a global nugget component.The variogram result indicates a significant degree of temporal correlation within satellite-observed CO2 data sets in China.Prediction of monthly CO2 using the spatiotemporal variogram model and spacetime kriging procedure is implemented.The prediction is compared with a spatial-only geostatistical prediction approach using a cross-validation technique.The spatiotemporal approach gives better results,with higher correlation coefficient (r2),and less mean absolute prediction error and root mean square error.Moreover,the monthly mapping result generated from the spatiotemporal approach has less prediction uncertainty and more detailed spatial variation of CO2 than those from the spatial-only approach.

  1. Sensitivity Analysis of Flutter Response of a Wing Incorporating Finite-Span Corrections

    Science.gov (United States)

    Issac, Jason Cherian; Kapania, Rakesh K.; Barthelemy, Jean-Francois M.

    1994-01-01

    Flutter analysis of a wing is performed in compressible flow using state-space representation of the unsteady aerodynamic behavior. Three different expressions are used to incorporate corrections due to the finite-span effects of the wing in estimating the lift-curve slope. The structural formulation is based on a Rayleigh-Pitz technique with Chebyshev polynomials used for the wing deflections. The aeroelastic equations are solved as an eigen-value problem to determine the flutter speed of the wing. The flutter speeds are found to be higher in these cases, when compared to that obtained without accounting for the finite-span effects. The derivatives of the flutter speed with respect to the shape parameters, namely: aspect ratio, area, taper ratio and sweep angle, are calculated analytically. The shape sensitivity derivatives give a linear approximation to the flutter speed curves over a range of values of the shape parameter which is perturbed. Flutter and sensitivity calculations are performed on a wing using a lifting-surface unsteady aerodynamic theory using modules from a system of programs called FAST.

  2. Genomic analysis of stress response against arsenic in Caenorhabditis elegans.

    Directory of Open Access Journals (Sweden)

    Surasri N Sahu

    Full Text Available Arsenic, a known human carcinogen, is widely distributed around the world and found in particularly high concentrations in certain regions including Southwestern US, Eastern Europe, India, China, Taiwan and Mexico. Chronic arsenic poisoning affects millions of people worldwide and is associated with increased risk of many diseases including arthrosclerosis, diabetes and cancer. In this study, we explored genome level global responses to high and low levels of arsenic exposure in Caenorhabditis elegans using Affymetrix expression microarrays. This experimental design allows us to do microarray analysis of dose-response relationships of global gene expression patterns. High dose (0.03% exposure caused stronger global gene expression changes in comparison with low dose (0.003% exposure, suggesting a positive dose-response correlation. Biological processes such as oxidative stress, and iron metabolism, which were previously reported to be involved in arsenic toxicity studies using cultured cells, experimental animals, and humans, were found to be affected in C. elegans. We performed genome-wide gene expression comparisons between our microarray data and publicly available C. elegans microarray datasets of cadmium, and sediment exposure samples of German rivers Rhine and Elbe. Bioinformatics analysis of arsenic-responsive regulatory networks were done using FastMEDUSA program. FastMEDUSA analysis identified cancer-related genes, particularly genes associated with leukemia, such as dnj-11, which encodes a protein orthologous to the mammalian ZRF1/MIDA1/MPP11/DNAJC2 family of ribosome-associated molecular chaperones. We analyzed the protective functions of several of the identified genes using RNAi. Our study indicates that C. elegans could be a substitute model to study the mechanism of metal toxicity using high-throughput expression data and bioinformatics tools such as FastMEDUSA.

  3. Analysis of chimpanzee history based on genome sequence alignments.

    Directory of Open Access Journals (Sweden)

    Jennifer L Caswell

    2008-04-01

    Full Text Available Population geneticists often study small numbers of carefully chosen loci, but it has become possible to obtain orders of magnitude for more data from overlaps of genome sequences. Here, we generate tens of millions of base pairs of multiple sequence alignments from combinations of three western chimpanzees, three central chimpanzees, an eastern chimpanzee, a bonobo, a human, an orangutan, and a macaque. Analysis provides a more precise understanding of demographic history than was previously available. We show that bonobos and common chimpanzees were separated approximately 1,290,000 years ago, western and other common chimpanzees approximately 510,000 years ago, and eastern and central chimpanzees at least 50,000 years ago. We infer that the central chimpanzee population size increased by at least a factor of 4 since its separation from western chimpanzees, while the western chimpanzee effective population size decreased. Surprisingly, in about one percent of the genome, the genetic relationships between humans, chimpanzees, and bonobos appear to be different from the species relationships. We used PCR-based resequencing to confirm 11 regions where chimpanzees and bonobos are not most closely related. Study of such loci should provide information about the period of time 5-7 million years ago when the ancestors of humans separated from those of the chimpanzees.

  4. Genome-Wide Analysis of Human Metapneumovirus Evolution

    Science.gov (United States)

    Kim, Jin Il; Park, Sehee; Lee, Ilseob; Park, Kwang Sook; Kwak, Eun Jung; Moon, Kwang Mee; Lee, Chang Kyu; Bae, Joon-Yong; Park, Man-Seong; Song, Ki-Joon

    2016-01-01

    Human metapneumovirus (HMPV) has been described as an important etiologic agent of upper and lower respiratory tract infections, especially in young children and the elderly. Most of school-aged children might be introduced to HMPVs, and exacerbation with other viral or bacterial super-infection is common. However, our understanding of the molecular evolution of HMPVs remains limited. To address the comprehensive evolutionary dynamics of HMPVs, we report a genome-wide analysis of the eight genes (N, P, M, F, M2, SH, G, and L) using 103 complete genome sequences. Phylogenetic reconstruction revealed that the eight genes from one HMPV strain grouped into the same genetic group among the five distinct lineages (A1, A2a, A2b, B1, and B2). A few exceptions of phylogenetic incongruence might suggest past recombination events, and we detected possible recombination breakpoints in the F, SH, and G coding regions. The five genetic lineages of HMPVs shared quite remote common ancestors ranging more than 220 to 470 years of age with the most recent origins for the A2b sublineage. Purifying selection was common, but most protein genes except the F and M2-2 coding regions also appeared to experience episodic diversifying selection. Taken together, these suggest that the five lineages of HMPVs maintain their individual evolutionary dynamics and that recombination and selection forces might work on shaping the genetic diversity of HMPVs. PMID:27046055

  5. Delineation of Steroid-Degrading Microorganisms through Comparative Genomic Analysis

    Directory of Open Access Journals (Sweden)

    Lee H. Bergstrand

    2016-03-01

    Full Text Available Steroids are ubiquitous in natural environments and are a significant growth substrate for microorganisms. Microbial steroid metabolism is also important for some pathogens and for biotechnical applications. This study delineated the distribution of aerobic steroid catabolism pathways among over 8,000 microorganisms whose genomes are available in the NCBI RefSeq database. Combined analysis of bacterial, archaeal, and fungal genomes with both hidden Markov models and reciprocal BLAST identified 265 putative steroid degraders within only Actinobacteria and Proteobacteria, which mainly originated from soil, eukaryotic host, and aquatic environments. These bacteria include members of 17 genera not previously known to contain steroid degraders. A pathway for cholesterol degradation was conserved in many actinobacterial genera, particularly in members of the Corynebacterineae, and a pathway for cholate degradation was conserved in members of the genus Rhodococcus. A pathway for testosterone and, sometimes, cholate degradation had a patchy distribution among Proteobacteria. The steroid degradation genes tended to occur within large gene clusters. Growth experiments confirmed bioinformatic predictions of steroid metabolism capacity in nine bacterial strains. The results indicate there was a single ancestral 9,10-seco-steroid degradation pathway. Gene duplication, likely in a progenitor of Rhodococcus, later gave rise to a cholate degradation pathway. Proteobacteria and additional Actinobacteria subsequently obtained a cholate degradation pathway via horizontal gene transfer, in some cases facilitated by plasmids. Catabolism of steroids appears to be an important component of the ecological niches of broad groups of Actinobacteria and individual species of Proteobacteria.

  6. Comparative genomic analysis of Vibrio parahaemolyticus: serotype conversion and virulence

    Directory of Open Access Journals (Sweden)

    Gil Ana I

    2011-06-01

    Full Text Available Abstract Background Vibrio parahaemolyticus is a common cause of foodborne disease. Beginning in 1996, a more virulent strain having serotype O3:K6 caused major outbreaks in India and other parts of the world, resulting in the emergence of a pandemic. Other serovariants of this strain emerged during its dissemination and together with the original O3:K6 were termed strains of the pandemic clone. Two genomes, one of this virulent strain and one pre-pandemic strain have been sequenced. We sequenced four additional genomes of V. parahaemolyticus in this study that were isolated from different geographical regions and time points. Comparative genomic analyses of six strains of V. parahaemolyticus isolated from Asia and Peru were performed in order to advance knowledge concerning the evolution of V. parahaemolyticus; specifically, the genetic changes contributing to serotype conversion and virulence. Two pre-pandemic strains and three pandemic strains, isolated from different geographical regions, were serotype O3:K6 and either toxin profiles (tdh+, trh- or (tdh-, trh+. The sixth pandemic strain sequenced in this study was serotype O4:K68. Results Genomic analyses revealed that the trh+ and tdh+ strains had different types of pathogenicity islands and mobile elements as well as major structural differences between the tdh pathogenicity islands of the pre-pandemic and pandemic strains. In addition, the results of single nucleotide polymorphism (SNP analysis showed that 94% of the SNPs between O3:K6 and O4:K68 pandemic isolates were within a 141 kb region surrounding the O- and K-antigen-encoding gene clusters. The "core" genes of V. parahaemolyticus were also compared to those of V. cholerae and V. vulnificus, in order to delineate differences between these three pathogenic species. Approximately one-half (49-59% of each species' core genes were conserved in all three species, and 14-24% of the core genes were species-specific and in different

  7. 13C metabolic flux analysis at a genome-scale.

    Science.gov (United States)

    Gopalakrishnan, Saratram; Maranas, Costas D

    2015-11-01

    Metabolic models used in 13C metabolic flux analysis generally include a limited number of reactions primarily from central metabolism. They typically omit degradation pathways, complete cofactor balances, and atom transition contributions for reactions outside central metabolism. This study addresses the impact on prediction fidelity of scaling-up mapping models to a genome-scale. The core mapping model employed in this study accounts for (75 reactions and 65 metabolites) primarily from central metabolism. The genome-scale metabolic mapping model (GSMM) (697 reaction and 595 metabolites) is constructed using as a basis the iAF1260 model upon eliminating reactions guaranteed not to carry flux based on growth and fermentation data for a minimal glucose growth medium. Labeling data for 17 amino acid fragments obtained from cells fed with glucose labeled at the second carbon was used to obtain fluxes and ranges. Metabolic fluxes and confidence intervals are estimated, for both core and genome-scale mapping models, by minimizing the sum of square of differences between predicted and experimentally measured labeling patterns using the EMU decomposition algorithm. Overall, we find that both topology and estimated values of the metabolic fluxes remain largely consistent between core and GSM model. Stepping up to a genome-scale mapping model leads to wider flux inference ranges for 20 key reactions present in the core model. The glycolysis flux range doubles due to the possibility of active gluconeogenesis, the TCA flux range expanded by 80% due to the availability of a bypass through arginine consistent with labeling data, and the transhydrogenase reaction flux was essentially unresolved due to the presence of as many as five routes for the inter-conversion of NADPH to NADH afforded by the genome-scale model. By globally accounting for ATP demands in the GSMM model the unused ATP decreased drastically with the lower bound matching the maintenance ATP requirement. A non

  8. Genome Sizes of Nine Insect Species Determined by Flow Cytometry and k-mer Analysis

    Science.gov (United States)

    He, Kang; Lin, Kejian; Wang, Guirong; Li, Fei

    2016-01-01

    The flow cytometry method was used to estimate the genome sizes of nine agriculturally important insects, including two coleopterans, five Hemipterans, and two hymenopterans. Among which, the coleopteran Lissorhoptrus oryzophilus (Kuschel) had the largest genome of 981 Mb. The average genome size was 504 Mb, suggesting that insects have a moderate-size genome. Compared with the insects in other orders, hymenopterans had small genomes, which were averagely about ~200 Mb. We found that the genome sizes of four insect species were different between male and female, showing the organismal complexity of insects. The largest difference occurred in the coconut leaf beetle Brontispa longissima (Gestro). The male coconut leaf beetle had a 111 Mb larger genome than females, which might be due to the chromosome number difference between the sexes. The results indicated that insect invasiveness was not related to genome size. We also determined the genome sizes of the small brown planthopper Laodelphax striatellus (Fallén) and the parasitic wasp Macrocentrus cingulum (Brischke) using k-mer analysis with Illunima Solexa sequencing data. There were slight differences in the results from the two methods. k-mer analysis indicated that the genome size of L. striatellus was 500–700 Mb and that of M. cingulum was ~150 Mb. In all, the genome sizes information presented here should be helpful for designing the genome sequencing strategy when necessary. PMID:27932995

  9. Genome-scale metabolic analysis of Clostridium thermocellum for bioethanol production

    Directory of Open Access Journals (Sweden)

    Brooks J Paul

    2010-03-01

    Full Text Available Abstract Background Microorganisms possess diverse metabolic capabilities that can potentially be leveraged for efficient production of biofuels. Clostridium thermocellum (ATCC 27405 is a thermophilic anaerobe that is both cellulolytic and ethanologenic, meaning that it can directly use the plant sugar, cellulose, and biochemically convert it to ethanol. A major challenge in using microorganisms for chemical production is the need to modify the organism to increase production efficiency. The process of properly engineering an organism is typically arduous. Results Here we present a genome-scale model of C. thermocellum metabolism, iSR432, for the purpose of establishing a computational tool to study the metabolic network of C. thermocellum and facilitate efforts to engineer C. thermocellum for biofuel production. The model consists of 577 reactions involving 525 intracellular metabolites, 432 genes, and a proteomic-based representation of a cellulosome. The process of constructing this metabolic model led to suggested annotation refinements for 27 genes and identification of areas of metabolism requiring further study. The accuracy of the iSR432 model was tested using experimental growth and by-product secretion data for growth on cellobiose and fructose. Analysis using this model captures the relationship between the reduction-oxidation state of the cell and ethanol secretion and allowed for prediction of gene deletions and environmental conditions that would increase ethanol production. Conclusions By incorporating genomic sequence data, network topology, and experimental measurements of enzyme activities and metabolite fluxes, we have generated a model that is reasonably accurate at predicting the cellular phenotype of C. thermocellum and establish a strong foundation for rational strain design. In addition, we are able to draw some important conclusions regarding the underlying metabolic mechanisms for observed behaviors of C. thermocellum

  10. Re-annotation and re-analysis of the Campylobacter jejuni NCTC11168 genome sequence

    Directory of Open Access Journals (Sweden)

    Dorrell Nick

    2007-06-01

    Full Text Available Abstract Background Campylobacter jejuni is the leading bacterial cause of human gastroenteritis in the developed world. To improve our understanding of this important human pathogen, the C. jejuni NCTC11168 genome was sequenced and published in 2000. The original annotation was a milestone in Campylobacter research, but is outdated. We now describe the complete re-annotation and re-analysis of the C. jejuni NCTC11168 genome using current database information, novel tools and annotation techniques not used during the original annotation. Results Re-annotation was carried out using sequence database searches such as FASTA, along with programs such as TMHMM for additional support. The re-annotation also utilises sequence data from additional Campylobacter strains and species not available during the original annotation. Re-annotation was accompanied by a full literature search that was incorporated into the updated EMBL file [EMBL: AL111168]. The C. jejuni NCTC11168 re-annotation reduced the total number of coding sequences from 1654 to 1643, of which 90.0% have additional information regarding the identification of new motifs and/or relevant literature. Re-annotation has led to 18.2% of coding sequence product functions being revised. Conclusions Major updates were made to genes involved in the biosynthesis of important surface structures such as lipooligosaccharide, capsule and both O- and N-linked glycosylation. This re-annotation will be a key resource for Campylobacter research and will also provide a prototype for the re-annotation and re-interpretation of other bacterial genomes.

  11. Complete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V Streptococcus agalactiae

    NARCIS (Netherlands)

    Tettelin, H; Masignani, [No Value; Cieslewicz, MJ; Eisen, JA; Peterson, S; Paulsen, IT; Nelson, KE; Margarit, [No Value; Read, TD; Madoff, LC; Beanan, MJ; Brinkac, LM; Daugherty, SC; DeBoy, RT; Durkin, AS; Kolonay, JF; Madupu, R; Lewis, MR; Radune, D; Fedorova, NB; Scanlan, D; Khouri, H; Mulligan, S; Carty, HA; Cline, RT; Van Aken, SE; Gill, J; Scarselli, M; Mora, M; Iacobini, ET; Brettoni, C; Galli, G; Mariani, M; Vegni, F; Maione, D; Rinaudo, D; Rappuoli, R; Telford, JL; Kasper, DL; Grandi, G; Fraser, CM

    2002-01-01

    The 2,160,267 bp genome sequence of Streptococcus agalactiae, the leading cause of bacterial sepsis, pneumonia, and meningitis in neonates in the U.S. and Europe, is predicted to encode 2,175 genes. Genome comparisons among S. agalactiae, Streptococcus pneumoniae, Streptococcus pyogenes, and the oth

  12. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae : Implications for the microbial "pan-genome"

    NARCIS (Netherlands)

    Tettelin, H; Masignani, [No Value; Cieslewicz, MJ; Donati, C; Medini, D; Ward, NL; Angiuoli, SV; Crabtree, J; Jones, AL; Durkin, AS; DeBoy, RT; Davidsen, TM; Mora, M; Scarselli, M; Ros, IMY; Peterson, JD; Hauser, CR; Sundaram, JP; Nelson, WC; Madupu, R; Brinkac, LM; Dodson, RJ; Rosovitz, MJ; Sullivan, SA; Daugherty, SC; Haft, DH; Selengut, J; Gwinn, ML; Zhou, LW; Zafar, N; Khouri, H; Radune, D; Dimitrov, G; Watkins, K; O'Connor, KJB; Smith, S; Utterback, TR; White, O; Rubens, CE; Grandi, G; Madoff, LC; Kasper, DL; Telford, JL; Wessels, MR; Rappuoli, R; Fraser, CM

    2005-01-01

    The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and als

  13. Research study on analysis/use technologies of genome information; Genome joho kaidoku riyo gijutsu no chosa kenkyu

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-03-01

    For wide use of genome information in the industrial field, the required R and D was surveyed from the standpoints of biology and information science. To clarify the present state and issues of the international research on genome analysis, the genome map as well as sequence and function information are first surveyed. The current analysis/use technologies of genome information are analyzed, and the following are summarized: prediction and identification of gene regions in genome sequences, techniques for searching and selecting useful genes, and techniques for predicting the expression of gene functions and the gene-product structure and functions. It is recommended that R and D and data collection/interpretation necessary to clarify inter-gene interactions and information networks should be promoted by integrating Japanese advanced know-how and technologies. As examples of the impact of the research results on industry and society, the present state and future expected effect are summarized for medicines, diagnosis/analysis instruments, chemicals, foods, agriculture, fishery, animal husbandry, electronics, environment and information. 278 refs., 42 figs., 5 tabs.

  14. Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

    Energy Technology Data Exchange (ETDEWEB)

    Schulman, Al

    2009-08-09

    Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops

  15. Technology-Driven and Evidence-Based Genomic Analysis for Integrated Pediatric and Prenatal Genetics Evaluation

    Institute of Scientific and Technical Information of China (English)

    Yuan Wei; Fang Xu; Peining Li

    2013-01-01

    The first decade since the completion of the Human Genome Project has been marked with rapid development of genomic technologies and their immediate clinical applications.Genomic analysis using oligonucleotide array comparative genomic hybridization (aCGH) or single nucleotide polymorphism (SNP) chips has been applied to pediatric patients with developmental and intellectual disabilities (DD/ID),multiple congenital anomalies (MCA) and autistic spectrum disorders (ASD).Evaluation of analytical and clinical validities of aCGH showed > 99% sensitivity and specificity and increased analytical resolution by higher density probe coverage.Reviews of case series,multi-center comparison and large patient-control studies demonstrated a diagnostic yield of 12%-20%; approximately 60% of these abnormalities were recurrent genomic disorders.This pediatric experience has been extended toward prenatal diagnosis.A series of reports indicated approximately 10% of pregnancies with ultrasound-detected structural anomalies and normal cytogenetic findings had genomic abnormalities,and 30% of these abnormalities were syndromic genomic disorders.Evidence-based practice guidelines and standards for implementing genomic analysis and web-delivered knowledge resources for interpreting genomic findings have been established.The progress from this technology-driven and evidence-based genomic analysis provides not only opportunities to dissect disease-causing mechanisms and develop rational therapeutic interventions but also important lessons for integrating genomic sequencing into pediatric and prenatal genetic evaluation.

  16. Multicolor flow cytometry analysis of the proliferations of T-lymphocyte subsets in vitro by EdU incorporation.

    Science.gov (United States)

    Sun, Yanli; Sun, Yu; Lin, Guigao; Zhang, Rui; Zhang, Kuo; Xie, Jiehong; Wang, Lunan; Li, Jinming

    2012-10-01

    EdU (5-ethynyl-2'-deoxyuridine) incorporation has proved advantageous in the studies of cell kinetics, DNA synthesis, and cellular proliferation in vitro and in vivo compared to [(3) H]thymidine incorporation and BrdU (5-bromo-2'-deoxyuridine) incorporation. Here, we describe a method that combines EdU incorporation and immunostaining with flow cytometric analysis to detect the proliferations of T lymphocyte subsets in vitro and optimized the assay's conditions. We found that the number of EdU(+) cells were associated with EdU concentration, incubation time, and the volume of Click reaction solution, the best EdU concentration 10-50 μM, the optimal incubation time 8-12 h and the proper volume of Click volume 100 μl for labeling 1 × 10(6) lymphocytes. Fixation was better to be performed before permeabilization, not together with. Furthermore, the permeabilization detergent reagent, PBS with 0.05% saponin was better than Tris buffer saline (TBS) with 0.1% Triton X-100. In addition, sufficient wash with PBS with 0.05% saponin has no influence on the staining of EdU(+) cells. Also, the lymphocytes incorporating EdU could be stored at 4°C, -80°C, and in liquid nitrogen up to 21 days. The present study will aid in optimization of flow cytometry assay to detect the proliferations of T cell subsets by EdU incorporation and the labeling of cell surface antigens.

  17. Comparative Genomics Analysis of Streptomyces Species Reveals Their Adaptation to the Marine Environment and Their Diversity at the Genomic Level.

    Science.gov (United States)

    Tian, Xinpeng; Zhang, Zhewen; Yang, Tingting; Chen, Meili; Li, Jie; Chen, Fei; Yang, Jin; Li, Wenjie; Zhang, Bing; Zhang, Zhang; Wu, Jiayan; Zhang, Changsheng; Long, Lijuan; Xiao, Jingfa

    2016-01-01

    Over 200 genomes of streptomycete strains that were isolated from various environments are available from the NCBI. However, little is known about the characteristics that are linked to marine adaptation in marine-derived streptomycetes. The particularity and complexity of the marine environment suggest that marine streptomycetes are genetically diverse. Here, we sequenced nine strains from the Streptomyces genus that were isolated from different longitudes, latitudes, and depths of the South China Sea. Then we compared these strains to 22 NCBI downloaded streptomycete strains. Thirty-one streptomycete strains are clearly grouped into a marine-derived subgroup and multiple source subgroup-based phylogenetic tree. The phylogenetic analyses have revealed the dynamic process underlying streptomycete genome evolution, and lateral gene transfer is an important driving force during the process. Pan-genomics analyses have revealed that streptomycetes have an open pan-genome, which reflects the diversity of these streptomycetes and guarantees the species a quick and economical response to diverse environments. Functional and comparative genomics analyses indicate that the marine-derived streptomycetes subgroup possesses some common characteristics of marine adaptation. Our findings have expanded our knowledge of how ocean isolates of streptomycete strains adapt to marine environments. The availability of streptomycete genomes from the South China Sea will be beneficial for further analysis on marine streptomycetes and will enrich the South China Sea's genetic data sources.

  18. Comparative genomics analysis of rice and pineapple contributes to understand the chromosome number reduction and genomic changes in grasses

    Directory of Open Access Journals (Sweden)

    Jinpeng Wang

    2016-10-01

    Full Text Available Rice is one of the most researched model plant, and has a genome structure most resembling that of the grass common ancestor after a grass common tetraploidization ~100 million years ago. There has been a standing controversy whether there had been 5 or 7 basic chromosomes, before the tetraploidization, which were tackled but could not be well solved for the lacking of a sequenced and assembled outgroup plant to have a conservative genome structure. Recently, the availability of pineapple genome, which has not been subjected to the grass-common tetraploidization, provides a precious opportunity to solve the above controversy and to research into genome changes of rice and other grasses. Here, we performed a comparative genomics analysis of pineapple and rice, and found solid evidence that grass-common ancestor had 2n =2x =14 basic chromosomes before the tetraploidization and duplicated to 2n = 4x = 28 after the event. Moreover, we proposed that enormous gene missing from duplicated regions in rice should be explained by an allotetraploid produced by prominently divergent parental lines, rather than gene losses after their divergence. This means that genome fractionation might have occurred before the formation of the allotetraploid grass ancestor.

  19. Exploring a Nonmodel Teleost Genome Through RAD Sequencing-Linkage Mapping in Common Pandora, Pagellus erythrinus and Comparative Genomic Analysis.

    Science.gov (United States)

    Manousaki, Tereza; Tsakogiannis, Alexandros; Taggart, John B; Palaiokostas, Christos; Tsaparis, Dimitris; Lagnel, Jacques; Chatziplis, Dimitrios; Magoulas, Antonios; Papandroulakis, Nikos; Mylonas, Constantinos C; Tsigenopoulos, Costas S

    2015-12-29

    Common pandora (Pagellus erythrinus) is a benthopelagic marine fish belonging to the teleost family Sparidae, and a newly recruited species in Mediterranean aquaculture. The paucity of genetic information relating to sparids, despite their growing economic value for aquaculture, provides the impetus for exploring the genomics of this fish group. Genomic tool development, such as genetic linkage maps provision, lays the groundwork for linking genotype to phenotype, allowing fine-mapping of loci responsible for beneficial traits. In this study, we applied ddRAD methodology to identify polymorphic markers in a full-sib family of common pandora. Employing the Illumina MiSeq platform, we sampled and sequenced a size-selected genomic fraction of 99 individuals, which led to the identification of 920 polymorphic loci. Downstream mapping analysis resulted in the construction of 24 robust linkage groups, corresponding to the karyotype of the species. The common pandora linkage map showed varying degrees of conserved synteny with four other teleost genomes, namely the European seabass (Dicentrarchus labrax), Nile tilapia (Oreochromis niloticus), stickleback (Gasterosteus aculeatus), and medaka (Oryzias latipes), suggesting a conserved genomic evolution in Sparidae. Our work exploits the possibilities of genotyping by sequencing to gain novel insights into genome structure and evolution. Such information will boost the study of cultured species and will set the foundation for a deeper understanding of the complex evolutionary history of teleosts.

  20. Exploring a Nonmodel Teleost Genome Through RAD Sequencing—Linkage Mapping in Common Pandora, Pagellus erythrinus and Comparative Genomic Analysis

    Directory of Open Access Journals (Sweden)

    Tereza Manousaki

    2016-03-01

    Full Text Available Common pandora (Pagellus erythrinus is a benthopelagic marine fish belonging to the teleost family Sparidae, and a newly recruited species in Mediterranean aquaculture. The paucity of genetic information relating to sparids, despite their growing economic value for aquaculture, provides the impetus for exploring the genomics of this fish group. Genomic tool development, such as genetic linkage maps provision, lays the groundwork for linking genotype to phenotype, allowing fine-mapping of loci responsible for beneficial traits. In this study, we applied ddRAD methodology to identify polymorphic markers in a full-sib family of common pandora. Employing the Illumina MiSeq platform, we sampled and sequenced a size-selected genomic fraction of 99 individuals, which led to the identification of 920 polymorphic loci. Downstream mapping analysis resulted in the construction of 24 robust linkage groups, corresponding to the karyotype of the species. The common pandora linkage map showed varying degrees of conserved synteny with four other teleost genomes, namely the European seabass (Dicentrarchus labrax, Nile tilapia (Oreochromis niloticus, stickleback (Gasterosteus aculeatus, and medaka (Oryzias latipes, suggesting a conserved genomic evolution in Sparidae. Our work exploits the possibilities of genotyping by sequencing to gain novel insights into genome structure and evolution. Such information will boost the study of cultured species and will set the foundation for a deeper understanding of the complex evolutionary history of teleosts.

  1. Exploring a Nonmodel Teleost Genome Through RAD Sequencing—Linkage Mapping in Common Pandora, Pagellus erythrinus and Comparative Genomic Analysis

    Science.gov (United States)

    Manousaki, Tereza; Tsakogiannis, Alexandros; Taggart, John B.; Palaiokostas, Christos; Tsaparis, Dimitris; Lagnel, Jacques; Chatziplis, Dimitrios; Magoulas, Antonios; Papandroulakis, Nikos; Mylonas, Constantinos C.; Tsigenopoulos, Costas S.

    2015-01-01

    Common pandora (Pagellus erythrinus) is a benthopelagic marine fish belonging to the teleost family Sparidae, and a newly recruited species in Mediterranean aquaculture. The paucity of genetic information relating to sparids, despite their growing economic value for aquaculture, provides the impetus for exploring the genomics of this fish group. Genomic tool development, such as genetic linkage maps provision, lays the groundwork for linking genotype to phenotype, allowing fine-mapping of loci responsible for beneficial traits. In this study, we applied ddRAD methodology to identify polymorphic markers in a full-sib family of common pandora. Employing the Illumina MiSeq platform, we sampled and sequenced a size-selected genomic fraction of 99 individuals, which led to the identification of 920 polymorphic loci. Downstream mapping analysis resulted in the construction of 24 robust linkage groups, corresponding to the karyotype of the species. The common pandora linkage map showed varying degrees of conserved synteny with four other teleost genomes, namely the European seabass (Dicentrarchus labrax), Nile tilapia (Oreochromis niloticus), stickleback (Gasterosteus aculeatus), and medaka (Oryzias latipes), suggesting a conserved genomic evolution in Sparidae. Our work exploits the possibilities of genotyping by sequencing to gain novel insights into genome structure and evolution. Such information will boost the study of cultured species and will set the foundation for a deeper understanding of the complex evolutionary history of teleosts. PMID:26715088

  2. Ethical considerations of research policy for personal genome analysis: the approach of the Genome Science Project in Japan.

    Science.gov (United States)

    Minari, Jusaku; Shirai, Tetsuya; Kato, Kazuto

    2014-12-01

    As evidenced by high-throughput sequencers, genomic technologies have recently undergone radical advances. These technologies enable comprehensive sequencing of personal genomes considerably more efficiently and less expensively than heretofore. These developments present a challenge to the conventional framework of biomedical ethics; under these changing circumstances, each research project has to develop a pragmatic research policy. Based on the experience with a new large-scale project-the Genome Science Project-this article presents a novel approach to conducting a specific policy for personal genome research in the Japanese context. In creating an original informed-consent form template for the project, we present a two-tiered process: making the draft of the template following an analysis of national and international policies; refining the draft template in conjunction with genome project researchers for practical application. Through practical use of the template, we have gained valuable experience in addressing challenges in the ethical review process, such as the importance of sharing details of the latest developments in genomics with members of research ethics committees. We discuss certain limitations of the conventional concept of informed consent and its governance system and suggest the potential of an alternative process using information technology.

  3. Comparative Genomics Analysis of Streptomyces Species Reveals Their Adaptation to the Marine Environment and Their Diversity at the Genomic Level

    Science.gov (United States)

    Tian, Xinpeng; Zhang, Zhewen; Yang, Tingting; Chen, Meili; Li, Jie; Chen, Fei; Yang, Jin; Li, Wenjie; Zhang, Bing; Zhang, Zhang; Wu, Jiayan; Zhang, Changsheng; Long, Lijuan; Xiao, Jingfa

    2016-01-01

    Over 200 genomes of streptomycete strains that were isolated from various environments are available from the NCBI. However, little is known about the characteristics that are linked to marine adaptation in marine-derived streptomycetes. The particularity and complexity of the marine environment suggest that marine streptomycetes are genetically diverse. Here, we sequenced nine strains from the Streptomyces genus that were isolated from different longitudes, latitudes, and depths of the South China Sea. Then we compared these strains to 22 NCBI downloaded streptomycete strains. Thirty-one streptomycete strains are clearly grouped into a marine-derived subgroup and multiple source subgroup-based phylogenetic tree. The phylogenetic analyses have revealed the dynamic process underlying streptomycete genome evolution, and lateral gene transfer is an important driving force during the process. Pan-genomics analyses have revealed that streptomycetes have an open pan-genome, which reflects the diversity of these streptomycetes and guarantees the species a quick and economical response to diverse environments. Functional and comparative genomics analyses indicate that the marine-derived streptomycetes subgroup possesses some common characteristics of marine adaptation. Our findings have expanded our knowledge of how ocean isolates of streptomycete strains adapt to marine environments. The availability of streptomycete genomes from the South China Sea will be beneficial for further analysis on marine streptomycetes and will enrich the South China Sea’s genetic data sources. PMID:27446038

  4. Genomic analysis of smoothened inhibitor resistance in basal cell carcinoma.

    Science.gov (United States)

    Sharpe, Hayley J; Pau, Gregoire; Dijkgraaf, Gerrit J; Basset-Seguin, Nicole; Modrusan, Zora; Januario, Thomas; Tsui, Vickie; Durham, Alison B; Dlugosz, Andrzej A; Haverty, Peter M; Bourgon, Richard; Tang, Jean Y; Sarin, Kavita Y; Dirix, Luc; Fisher, David C; Rudin, Charles M; Sofen, Howard; Migden, Michael R; Yauch, Robert L; de Sauvage, Frederic J

    2015-03-09

    Smoothened (SMO) inhibitors are under clinical investigation for the treatment of several cancers. Vismodegib is approved for the treatment of locally advanced and metastatic basal cell carcinoma (BCC). Most BCC patients experience significant clinical benefit on vismodegib, but some develop resistance. Genomic analysis of tumor biopsies revealed that vismodegib resistance is associated with Hedgehog (Hh) pathway reactivation, predominantly through mutation of the drug target SMO and to a lesser extent through concurrent copy number changes in SUFU and GLI2. SMO mutations either directly impaired drug binding or activated SMO to varying levels. Furthermore, we found evidence for intra-tumor heterogeneity, suggesting that a combination of therapies targeting components at multiple levels of the Hh pathway is required to overcome resistance.

  5. SIGMA2: A system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and transcriptomes

    Directory of Open Access Journals (Sweden)

    MacAulay Calum

    2008-10-01

    Full Text Available Abstract Background High throughput microarray technologies have afforded the investigation of genomes, epigenomes, and transcriptomes at unprecedented resolution. However, software packages to handle, analyze, and visualize data from these multiple 'omics disciplines have not been adequately developed. Results Here, we present SIGMA2, a system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and transcriptomes. Multi-dimensional datasets can be simultaneously visualized and analyzed with respect to each dimension, allowing combinatorial integration of the different assays belonging to the different 'omics. Conclusion The identification of genes altered at multiple levels such as copy number, loss of heterozygosity (LOH, DNA methylation and the detection of consequential changes in gene expression can be concertedly performed, establishing SIGMA2 as a novel tool to facilitate the high throughput systems biology analysis of cancer.

  6. Functional Analysis of Shewanella, a cross genome comparison.

    Energy Technology Data Exchange (ETDEWEB)

    Serres, Margrethe H.

    2009-05-15

    The bacterial genus Shewanella includes a group of highly versatile organisms that have successfully adapted to life in many environments ranging from aquatic (fresh and marine) to sedimentary (lake and marine sediments, subsurface sediments, sea vent). A unique respiratory capability of the Shewanellas, initially observed for Shewanella oneidensis MR-1, is the ability to use metals and metalloids, including radioactive compounds, as electron acceptors. Members of the Shewanella genus have also been shown to degrade environmental pollutants i.e. halogenated compounds, making this group highly applicable for the DOE mission. S. oneidensis MR-1 has in addition been found to utilize a diverse set of nutrients and to have a large set of genes dedicated to regulation and to sensing of the environment. The sequencing of the S. oneidensis MR-1 genome facilitated experimental and bioinformatics analyses by a group of collaborating researchers, the Shewanella Federation. Through the joint effort and with support from Department of Energy S. oneidensis MR-1 has become a model organism of study. Our work has been a functional analysis of S. oneidensis MR-1, both by itself and as part of a comparative study. We have improved the annotation of gene products, assigned metabolic functions, and analyzed protein families present in S. oneidensis MR-1. The data has been applied to analysis of experimental data (i.e. gene expression, proteome) generated for S. oneidensis MR-1. Further, this work has formed the basis for a comparative study of over 20 members of the Shewanella genus. The species and strains selected for genome sequencing represented an evolutionary gradient of DNA relatedness, ranging from close to intermediate, and to distant. The organisms selected have also adapted to a variety of ecological niches. Through our work we have been able to detect and interpret genome similarities and differences between members of the genus. We have in this way contributed to the

  7. Identification of conserved regulatory elements by comparative genome analysis

    Directory of Open Access Journals (Sweden)

    Jareborg Niclas

    2003-05-01

    Full Text Available Abstract Background For genes that have been successfully delineated within the human genome sequence, most regulatory sequences remain to be elucidated. The annotation and interpretation process requires additional data resources and significant improvements in computational methods for the detection of regulatory regions. One approach of growing popularity is based on the preferential conservation of functional sequences over the course of evolution by selective pressure, termed 'phylogenetic footprinting'. Mutations are more likely to be disruptive if they appear in functional sites, resulting in a measurable difference in evolution rates between functional and non-functional genomic segments. Results We have devised a flexible suite of methods for the identification and visualization of conserved transcription-factor-binding sites. The system reports those putative transcription-factor-binding sites that are both situated in conserved regions and located as pairs of sites in equivalent positions in alignments between two orthologous sequences. An underlying collection of metazoan transcription-factor-binding profiles was assembled to facilitate the study. This approach results in a significant improvement in the detection of transcription-factor-binding sites because of an increased signal-to-noise ratio, as demonstrated with two sets of promoter sequences. The method is implemented as a graphical web application, ConSite, which is at the disposal of the scientific community at http://www.phylofoot.org/. Conclusions Phylogenetic footprinting dramatically improves the predictive selectivity of bioinformatic approaches to the analysis of promoter sequences. ConSite delivers unparalleled performance using a novel database of high-quality binding models for metazoan transcription factors. With a dynamic interface, this bioinformatics tool provides broad access to promoter analysis with phylogenetic footprinting.

  8. Simultaneous quantitative determination of 5-aza-2'-deoxycytidine genomic incorporation and DNA demethylation by liquid chromatography tandem mass spectrometry as exposure-response measures of nucleoside analog DNA methyltransferase inhibitors.

    Science.gov (United States)

    Anders, Nicole M; Liu, Jianyong; Wanjiku, Teresia; Giovinazzo, Hugh; Zhou, Jianya; Vaghasia, Ajay; Nelson, William G; Yegnasubramanian, Srinivasan; Rudek, Michelle A

    2016-06-01

    The epigenetic and anti-cancer activities of the nucleoside analog DNA methyltransferase (DNMT) inhibitors decitabine (5-aza-2'-deoxycytidine, DAC), azacitidine, and guadecitabine are thought to require cellular uptake, metabolism to 5-aza-2'-deoxycytidine triphosphate, and incorporation into DNA. This genomic incorporation can then lead to trapping and degradation of DNMT enzymes, and ultimately, passive loss of DNA methylation. To facilitate measurement of critical exposure-response relationships of nucleoside analog DNMT inhibitors, a sensitive and reliable method was developed to simultaneously quantitate 5-aza-2'-deoxycytidine genomic incorporation and genomic 5-methylcytosine content using LC-MS/MS. Genomic DNA was extracted and digested into single nucleosides. Chromatographic separation was achieved with a Thermo Hyperpcarb porous graphite column (100mm×2.1mm, 5μm) and isocratic elution with a 10mM ammonium acetate:acetonitrile with 0.1% formic acid (70:30, v/v) mobile phase over a 5min total analytical run time. An AB Sciex 5500 triple quadrupole mass spectrometer operated in positive electrospray ionization mode was used for the detection of 5-aza-2'-deoxycytidine, 2'-deoxycytidine, and 5-methyl-2'-deoxycytidine. The assay range was 2-400ng/mL for 5-aza-2'-deoxycytidine, 50-10,000ng/mL for 2'-deoxycytidine, and was 5-1000ng/mL for 5-methyl-2'-deoxycytidine. The assay proved to be accurate (93.0-102.2%) and precise (CV≤6.3%) across all analytes. All analytes exhibited long-term frozen digest matrix stability at -70°C for at least 117 days. The method was applied for the measurement of genomic 5-aza-2'-deoxycytidine and 5-methyl-2'-deoxycytidine content following exposure of in vitro cell culture and in vivo animal models to decitabine.

  9. Sequencing and comparative genome analysis of two pathogenic Streptococcus gallolyticus subspecies: genome plasticity, adaptation and virulence.

    Directory of Open Access Journals (Sweden)

    I-Hsuan Lin

    Full Text Available Streptococcus gallolyticus infections in humans are often associated with bacteremia, infective endocarditis and colon cancers. The disease manifestations are different depending on the subspecies of S. gallolyticus causing the infection. Here, we present the complete genomes of S. gallolyticus ATCC 43143 (biotype I and S. pasteurianus ATCC 43144 (biotype II.2. The genomic differences between the two biotypes were characterized with comparative genomic analyses. The chromosome of ATCC 43143 and ATCC 43144 are 2,36 and 2,10 Mb in length and encode 2246 and 1869 CDS respectively. The organization and genomic contents of both genomes were most similar to the recently published S. gallolyticus UCN34, where 2073 (92% and 1607 (86% of the ATCC 43143 and ATCC 43144 CDS were conserved in UCN34 respectively. There are around 600 CDS conserved in all Streptococcus genomes, indicating the Streptococcus genus has a small core-genome (constitute around 30% of total CDS and substantial evolutionary plasticity. We identified eight and five regions of genome plasticity in ATCC 43143 and ATCC 43144 respectively. Within these regions, several proteins were recognized to contribute to the fitness and virulence of each of the two subspecies. We have also predicted putative cell-surface associated proteins that could play a role in adherence to host tissues, leading to persistent infections causing sub-acute and chronic diseases in humans. This study showed evidence that the S. gallolyticus still possesses genes making it suitable in a rumen environment, whereas the ability for S. pasteurianus to live in rumen is reduced. The genome heterogeneity and genetic diversity among the two biotypes, especially membrane and lipoproteins, most likely contribute to the differences in the pathogenesis of the two S. gallolyticus biotypes and the type of disease an infected patient eventually develops.

  10. Genetic recombination in Escherichia coli : II. Calculation of incorporation frequency and relative map distance by recombinant analysis

    NARCIS (Netherlands)

    Haan, P.G. de; Verhoef, C.

    1966-01-01

    In this paper a mathematical analysis based on the physical exchange of genetic material is presented for a four-factor cross. The incorporation frequency of donor markers and the relative map distances may be accurately estimated from the frequencies of the eight recombinant classes. The results ob

  11. PhyloSift: phylogenetic analysis of genomes and metagenomes

    Directory of Open Access Journals (Sweden)

    Aaron E. Darling

    2014-01-01

    Full Text Available Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection.In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata.These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454.

  12. Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information

    Science.gov (United States)

    Upadhyay, Atul Kumar; Sowdhamini, Ramanathan

    2016-01-01

    3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids. PMID:27467780

  13. Pattern Analysis and Decision Support for Cancer through Clinico-Genomic Profiles

    Science.gov (United States)

    Exarchos, Themis P.; Giannakeas, Nikolaos; Goletsis, Yorgos; Papaloukas, Costas; Fotiadis, Dimitrios I.

    Advances in genome technology are playing a growing role in medicine and healthcare. With the development of new technologies and opportunities for large-scale analysis of the genome, genomic data have a clear impact on medicine. Cancer prognostics and therapeutics are among the first major test cases for genomic medicine, given that all types of cancer are related with genomic instability. In this paper we present a novel system for pattern analysis and decision support in cancer. The system integrates clinical data from electronic health records and genomic data. Pattern analysis and data mining methods are applied to these integrated data and the discovered knowledge is used for cancer decision support. Through this integration, conclusions can be drawn for early diagnosis, staging and cancer treatment.

  14. Genome-wide analysis of alternative splicing in Chlamydomonas reinhardtii

    Directory of Open Access Journals (Sweden)

    Thomas Julie

    2010-02-01

    Full Text Available Abstract Background Genome-wide computational analysis of alternative splicing (AS in several flowering plants has revealed that pre-mRNAs from about 30% of genes undergo AS. Chlamydomonas, a simple unicellular green alga, is part of the lineage that includes land plants. However, it diverged from land plants about one billion years ago. Hence, it serves as a good model system to study alternative splicing in early photosynthetic eukaryotes, to obtain insights into the evolution of this process in plants, and to compare splicing in simple unicellular photosynthetic and non-photosynthetic eukaryotes. We performed a global analysis of alternative splicing in Chlamydomonas reinhardtii using its recently completed genome sequence and all available ESTs and cDNAs. Results Our analysis of AS using BLAT and a modified version of the Sircah tool revealed AS of 498 transcriptional units with 611 events, representing about 3% of the total number of genes. As in land plants, intron retention is the most prevalent form of AS. Retained introns and skipped exons tend to be shorter than their counterparts in constitutively spliced genes. The splice site signals in all types of AS events are weaker than those in constitutively spliced genes. Furthermore, in alternatively spliced genes, the prevalent splice form has a stronger splice site signal than the non-prevalent form. Analysis of constitutively spliced introns revealed an over-abundance of motifs with simple repetitive elements in comparison to introns involved in intron retention. In almost all cases, AS results in a truncated ORF, leading to a coding sequence that is around 50% shorter than the prevalent splice form. Using RT-PCR we verified AS of two genes and show that they produce more isoforms than indicated by EST data. All cDNA/EST alignments and splice graphs are provided in a website at http://combi.cs.colostate.edu/as/chlamy. Conclusions The extent of AS in Chlamydomonas that we observed is much

  15. Decelerated genome evolution in modern vertebrates revealed by analysis of multiple lancelet genomes.

    Science.gov (United States)

    Huang, Shengfeng; Chen, Zelin; Yan, Xinyu; Yu, Ting; Huang, Guangrui; Yan, Qingyu; Pontarotti, Pierre Antoine; Zhao, Hongchen; Li, Jie; Yang, Ping; Wang, Ruihua; Li, Rui; Tao, Xin; Deng, Ting; Wang, Yiquan; Li, Guang; Zhang, Qiujin; Zhou, Sisi; You, Leiming; Yuan, Shaochun; Fu, Yonggui; Wu, Fenfang; Dong, Meiling; Chen, Shangwu; Xu, Anlong

    2014-12-19

    Vertebrates diverged from other chordates ~500 Myr ago and experienced successful innovations and adaptations, but the genomic basis underlying vertebrate origins are not fully understood. Here we suggest, through comparison with multiple lancelet (amphioxus) genomes, that ancient vertebrates experienced high rates of protein evolution, genome rearrangement and domain shuffling and that these rates greatly slowed down after the divergence of jawed and jawless vertebrates. Compared with lancelets, modern vertebrates retain, at least relatively, less protein diversity, fewer nucleotide polymorphisms, domain combinations and conserved non-coding elements (CNE). Modern vertebrates also lost substantial transposable element (TE) diversity, whereas lancelets preserve high TE diversity that includes even the long-sought RAG transposon. Lancelets also exhibit rapid gene turnover, pervasive transcription, fastest exon shuffling in metazoans and substantial TE methylation not observed in other invertebrates. These new lancelet genome sequences provide new insights into the chordate ancestral state and the vertebrate evolution.

  16. Comparative Analysis of CpG Islands in Four Fish Genomes

    Directory of Open Access Journals (Sweden)

    Leng Han

    2008-01-01

    Full Text Available There has been much interest in CpG islands (CGIs, clusters of CpG dinucleotides in GC-rich regions, because they are considered gene markers and involved in gene regulation. To date, there has been no genome-wide analysis of CGIs in the fish genome. We first evaluated the performance of three popular CGI identification algorithms in four fish genomes (tetraodon, stickleback, medaka, and zebrafish. Our results suggest that Takai and Jones' (2002 algorithm is most suitable for comparative analysis of CGIs in the fish genome. Then, we performed a systematic analysis of CGIs in the four fish genomes using Takai and Jones' algorithm, compared to other vertebrate genomes. We found that both the number of CGIs and the CGI density vary greatly among these genomes. Remarkably, each fish genome presents a distinct distribution of CGI density with some genomic factors (e.g., chromosome size and chromosome GC content. These findings are helpful for understanding evolution of fish genomes and the features of fish CGIs.

  17. Genome-wide analysis reveals coating of the mitochondrial genome by TFAM.

    Directory of Open Access Journals (Sweden)

    Yun E Wang

    Full Text Available Mitochondria contain a 16.6 kb circular genome encoding 13 proteins as well as mitochondrial tRNAs and rRNAs. Copies of the genome are organized into nucleoids containing both DNA and proteins, including the machinery required for mtDNA replication and transcription. The transcription factor TFAM is critical for initiation of transcription and replication of the genome, and is also thought to perform a packaging function. Although specific binding sites required for initiation of transcription have been identified in the D-loop, little is known about the characteristics of TFAM binding in its nonspecific packaging state. In addition, it is unclear whether TFAM also plays a role in the regulation of nuclear gene expression. Here we investigate these questions by using ChIP-seq to directly localize TFAM binding to DNA in human cells. Our results demonstrate that TFAM uniformly coats the whole mitochondrial genome, with no evidence of robust TFAM binding to the nuclear genome. Our study represents the first high-resolution assessment of TFAM binding on a genome-wide scale in human cells.

  18. Rapid mass spectrometric analysis of 15N-Leu incorporation fidelity during preparation of specifically labeled NMR samples

    DEFF Research Database (Denmark)

    Truhlar, Stephanie M E; Cervantes, Carla F; Torpey, Justin W

    2008-01-01

    Advances in NMR spectroscopy have enabled the study of larger proteins that typically have significant overlap in their spectra. Specific (15)N-amino acid incorporation is a powerful tool for reducing spectral overlap and attaining reliable sequential assignments. However, scrambling of the label...... during protein expression is a common problem. We describe a rapid method to evaluate the fidelity of specific (15)N-amino acid incorporation. The selectively labeled protein is proteolyzed, and the resulting peptides are analyzed using MALDI mass spectrometry. The (15)N incorporation is determined...... by analyzing the isotopic abundance of the peptides in the mass spectra using the program DEX. This analysis determined that expression with a 10-fold excess of unlabeled amino acids relative to the (15)N-amino acid prevents the scrambling of the (15)N label that is observed when equimolar amounts are used...

  19. Comparative genomic analysis of carbon and nitrogen assimilation mechanisms in three indigenous bioleaching bacteria: predictions and validations

    Directory of Open Access Journals (Sweden)

    Ehrenfeld Nicole

    2008-12-01

    Full Text Available Abstract Background Carbon and nitrogen fixation are essential pathways for autotrophic bacteria living in extreme environments. These bacteria can use carbon dioxide directly from the air as their sole carbon source and can use different sources of nitrogen such as ammonia, nitrate, nitrite, or even nitrogen from the air. To have a better understanding of how these processes occur and to determine how we can make them more efficient, a comparative genomic analysis of three bioleaching bacteria isolated from mine sites in Chile was performed. This study demonstrated that there are important differences in the carbon dioxide and nitrogen fixation mechanisms among bioleaching bacteria that coexist in mining environments. Results In this study, we probed that both Acidithiobacillus ferrooxidans and Acidithiobacillus thiooxidans incorporate CO2 via the Calvin-Benson-Bassham cycle; however, the former bacterium has two copies of the Rubisco type I gene whereas the latter has only one copy. In contrast, we demonstrated that Leptospirillum ferriphilum utilizes the reductive tricarboxylic acid cycle for carbon fixation. Although all the species analyzed in our study can incorporate ammonia by an ammonia transporter, we demonstrated that Acidithiobacillus thiooxidans could also assimilate nitrate and nitrite but only Acidithiobacillus ferrooxidans could fix nitrogen directly from the air. Conclusion The current study utilized genomic and molecular evidence to verify carbon and nitrogen fixation mechanisms for three bioleaching bacteria and provided an analysis of the potential regulatory pathways and functional networks that control carbon and nitrogen fixation in these microorganisms.

  20. The effects of music therapy incorporated with applied behavior analysis verbal behavior approach for children with autism spectrum disorders.

    Science.gov (United States)

    Lim, Hayoung A; Draper, Ellary

    2011-01-01

    This study compared a common form of Applied Behavior Analysis Verbal Behavior (ABA VB) approach and music incorporated with ABA VB method as part of developmental speech-language training in the speech production of children with Autism Spectrum Disorders (ASD). This study explored how the perception of musical patterns incorporated in ABA VB operants impacted the production of speech in children with ASD. Participants were 22 children with ASD, age range 3 to 5 years, who were verbal or pre verbal with presence of immediate echolalia. They were randomly assigned a set of target words for each of the 3 training conditions: (a) music incorporated ABA VB, (b) speech (ABA VB) and (c) no-training. Results showed both music and speech trainings were effective for production of the four ABA verbal operants; however, the difference between music and speech training was not statistically different. Results also indicated that music incorporated ABA VB training was most effective in echoic production, and speech training was most effective in tact production. Music can be incorporated into the ABA VB training method, and musical stimuli can be used as successfully as ABA VB speech training to enhance the functional verbal production in children with ASD.

  1. Genome-wide transcriptome analysis of 150 cell samples†

    Science.gov (United States)

    Russom, Aman; Xiao, Wenzhong; Wilhelmy, Julie; Wang, Shenglong; Heath, Joe Don; Kurn, Nurith; Tompkins, Ronald G.; Davis, Ronald W.; Toner, Mehmet

    2013-01-01

    A major challenge in molecular biology is interrogating the human transcriptome on a genome wide scale when only a limited amount of biological sample is available for analysis. Current methodologies using microarray technologies for simultaneously monitoring mRNA transcription levels require nanogram amounts of total RNA. To overcome the sample size limitation of current technologies, we have developed a method to probe the global gene expression in biological samples as small as 150 cells, or the equivalent of approximately 300 pg total RNA. The new method employs microfluidic devices for the purification of total RNA from mammalian cells and ultra-sensitive whole transcriptome amplification techniques. We verified that the RNA integrity is preserved through the isolation process, accomplished highly reproducible whole transcriptome analysis, and established high correlation between repeated isolations of 150 cells and the same cell culture sample. We validated the technology by demonstrating that the combined microfluidic and amplification protocol is capable of identifying biological pathways perturbed by stimulation, which are consistent with the information recognized in bulk-isolated samples. PMID:20023796

  2. Genome-wide transcriptome analysis of 150 cell samples.

    Science.gov (United States)

    Irimia, Daniel; Mindrinos, Michael; Russom, Aman; Xiao, Wenzhong; Wilhelmy, Julie; Wang, Shenglong; Heath, Joe Don; Kurn, Nurith; Tompkins, Ronald G; Davis, Ronald W; Toner, Mehmet

    2009-01-01

    A major challenge in molecular biology is interrogating the human transcriptome on a genome wide scale when only a limited amount of biological sample is available for analysis. Current methodologies using microarray technologies for simultaneously monitoring mRNA transcription levels require nanogram amounts of total RNA. To overcome the sample size limitation of current technologies, we have developed a method to probe the global gene expression in biological samples as small as 150 cells, or the equivalent of approximately 300 pg total RNA. The new method employs microfluidic devices for the purification of total RNA from mammalian cells and ultra-sensitive whole transcriptome amplification techniques. We verified that the RNA integrity is preserved through the isolation process, accomplished highly reproducible whole transcriptome analysis, and established high correlation between repeated isolations of 150 cells and the same cell culture sample. We validated the technology by demonstrating that the combined microfluidic and amplification protocol is capable of identifying biological pathways perturbed by stimulation, which are consistent with the information recognized in bulk-isolated samples.

  3. Rice-arsenate interactions in hydroponics: whole genome transcriptional analysis.

    Science.gov (United States)

    Norton, Gareth J; Lou-Hing, Daniel E; Meharg, Andrew A; Price, Adam H

    2008-01-01

    Rice (Oryza sativa) varieties that are arsenate-tolerant (Bala) and -sensitive (Azucena) were used to conduct a transcriptome analysis of the response of rice seedlings to sodium arsenate (AsV) in hydroponic solution. RNA extracted from the roots of three replicate experiments of plants grown for 1 week in phosphate-free nutrient with or without 13.3 muM AsV was used to challenge the Affymetrix (52K) GeneChip Rice Genome array. A total of 576 probe sets were significantly up-regulated at least 2-fold in both varieties, whereas 622 were down-regulated. Ontological classification is presented. As expected, a large number of transcription factors, stress proteins, and transporters demonstrated differential expression. Striking is the lack of response of classic oxidative stress-responsive genes or phytochelatin synthases/synthatases. However, the large number of responses from genes involved in glutathione synthesis, metabolism, and transport suggests that glutathione conjugation and arsenate methylation may be important biochemical responses to arsenate challenge. In this report, no attempt is made to dissect differences in the response of the tolerant and sensitive variety, but analysis in a companion article will link gene expression to the known tolerance loci available in the BalaxAzucena mapping population.

  4. Genome-Wide Analysis of DNA Methylation in Human Amnion

    Science.gov (United States)

    Kim, Jinsil; Pitlick, Mitchell M.; Christine, Paul J.; Schaefer, Amanda R.; Saleme, Cesar; Comas, Belén; Cosentino, Viviana; Gadow, Enrique; Murray, Jeffrey C.

    2013-01-01

    The amnion is a specialized tissue in contact with the amniotic fluid, which is in a constantly changing state. To investigate the importance of epigenetic events in this tissue in the physiology and pathophysiology of pregnancy, we performed genome-wide DNA methylation profiling of human amnion from term (with and without labor) and preterm deliveries. Using the Illumina Infinium HumanMethylation27 BeadChip, we identified genes exhibiting differential methylation associated with normal labor and preterm birth. Functional analysis of the differentially methylated genes revealed biologically relevant enriched gene sets. Bisulfite sequencing analysis of the promoter region of the oxytocin receptor (OXTR) gene detected two CpG dinucleotides showing significant methylation differences among the three groups of samples. Hypermethylation of the CpG island of the solute carrier family 30 member 3 (SLC30A3) gene in preterm amnion was confirmed by methylation-specific PCR. This work provides preliminary evidence that DNA methylation changes in the amnion may be at least partially involved in the physiological process of labor and the etiology of preterm birth and suggests that DNA methylation profiles, in combination with other biological data, may provide valuable insight into the mechanisms underlying normal and pathological pregnancies. PMID:23533356

  5. Genome-Wide Analysis of DNA Methylation in Human Amnion

    Directory of Open Access Journals (Sweden)

    Jinsil Kim

    2013-01-01

    Full Text Available The amnion is a specialized tissue in contact with the amniotic fluid, which is in a constantly changing state. To investigate the importance of epigenetic events in this tissue in the physiology and pathophysiology of pregnancy, we performed genome-wide DNA methylation profiling of human amnion from term (with and without labor and preterm deliveries. Using the Illumina Infinium HumanMethylation27 BeadChip, we identified genes exhibiting differential methylation associated with normal labor and preterm birth. Functional analysis of the differentially methylated genes revealed biologically relevant enriched gene sets. Bisulfite sequencing analysis of the promoter region of the oxytocin receptor (OXTR gene detected two CpG dinucleotides showing significant methylation differences among the three groups of samples. Hypermethylation of the CpG island of the solute carrier family 30 member 3 (SLC30A3 gene in preterm amnion was confirmed by methylation-specific PCR. This work provides preliminary evidence that DNA methylation changes in the amnion may be at least partially involved in the physiological process of labor and the etiology of preterm birth and suggests that DNA methylation profiles, in combination with other biological data, may provide valuable insight into the mechanisms underlying normal and pathological pregnancies.

  6. Porcine UCHL1: genomic organization, chromosome localization and expression analysis

    DEFF Research Database (Denmark)

    Larsen, Knud; Madsen, Lone Bruhn; Bendixen, Christian

    2012-01-01

    to and protection from Parkinson’s disease. Here we report cloning, characterization, expression analysis and mapping of porcine UCHL1. The UCHL1 cDNA was amplified by reverse transcriptase polymerase chain reaction (RT-PCR) using oligonucleotide primers derived from in silico sequences. The porcine cDNA codes...... for a protein of 223 amino acids which shows a very high similarity to human (98%) and to mouse (97%) UCHL1. In addition, the genomic organization of the porcine UCHL1 gene was determined. The porcine UCHL1 gene was mapped to chromosome 8(½p21)–p23. Three SNPs were found in the porcine UCHL1 sequence....... Expression analysis by quantitative real time RT-PCR demonstrated that porcine UCHL1 mRNA is differentially expressed in various organs and tissues and similar to its human counterpart. UCHL1 transcript is most abundant in brain tissues and in the spinal cord. The UCHL1 mRNA expression was also investigated...

  7. Improved statistics for genome-wide interaction analysis.

    Science.gov (United States)

    Ueki, Masao; Cordell, Heather J

    2012-01-01

    Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new "joint effects" statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al

  8. Complete sequence of the mitochondrial genome of a diatom alga Synedra acus and comparative analysis of diatom mitochondrial genomes.

    Science.gov (United States)

    Ravin, Nikolai V; Galachyants, Yuri P; Mardanov, Andrey V; Beletsky, Alexey V; Petrova, Darya P; Sherbakova, Tatyana A; Zakharova, Yuliya R; Likhoshway, Yelena V; Skryabin, Konstantin G; Grachev, Mikhail A

    2010-06-01

    The first two mitochondrial genomes of marine diatoms were previously reported for the centric Thalassiosira pseudonana and the raphid pennate Phaeodactylum tricornutum. As part of a genomic project, we sequenced the complete mitochondrial genome of the freshwater araphid pennate diatom Synedra acus. This 46,657 bp mtDNA encodes 2 rRNAs, 24 tRNAs, and 33 proteins. The mtDNA of S. acus contains three group II introns, two inserted into the cox1 gene and containing ORFs, and one inserted into the rnl gene and lacking an ORF. The compact gene organization contrasts with the presence of a 4.9-kb-long intergenic region, which contains repeat sequences. Comparison of the three sequenced mtDNAs showed that these three genomes carry similar gene pools, but the positions of some genes are rearranged. Phylogenetic analysis performed with a fragment of the cox1 gene of diatoms and other heterokonts produced a tree that is similar to that derived from 18S RNA genes. The introns of mtDNA in the diatoms seem to be polyphyletic. This study demonstrates that pyrosequencing is an efficient method for complete sequencing of mitochondrial genomes from diatoms, and may soon give valuable information about the molecular phylogeny of this outstanding group of unicellular organisms.

  9. Development and characterization of genomic and expressed SSRs in citrus by genome-wide analysis.

    Directory of Open Access Journals (Sweden)

    Sheng-Rui Liu

    Full Text Available Microsatellites or simple sequence repeats (SSRs are one of the most popular sources of genetic markers and play a significant role in plant genetics and breeding. In this study, we identified citrus SSRs in the genome of Clementine mandarin and analyzed their frequency and distribution in different genomic regions. A total of 80,708 SSRs were detected in the genome with an overall density of 268 SSRs/Mb. While di-nucleotide repeats were the most frequent microsatellites in genomic DNA sequence, tetra-nucleotides, which had more repeat units than any other SSR types, had the highest cumulative sequence length. We identified 6,834 transcripts as containing 8,989 SSRs in 33,929 Clementine mandarin transcripts, among which, tri-nucleotide motifs (36.0% were the most common, followed by di-nucleotide (26.9% and hexa-nucleotide motifs (15.1%. The motif AG (16.7% was most abundant among these SSRs, while motifs AAG (6.6%, AAT (5.0%, and TAG (2.2% were most common among tri-nucleotides. Functional categorization of transcripts containing SSRs revealed that 5,879 (86.0% of such transcripts had homology with known proteins, GO and KEGG annotation revealed that transcripts containing SSRs were those implicated in diverse biological processes in plants, including binding, development, transcription, and protein degradation. When 27 genomic and 78 randomly selected SSRs were tested on Clementine mandarin, 95 SSRs revealed polymorphism. These 95 SSRs were further deployed on 18 genotypes of the three generas of Rutaceae for the genetic diversity assessment, genomic SSRs generally show low transferability in comparison to SSRs developed from expressed sequences. These transcript-markers identified in our study may provide a valuable genetic and genomic tool for further genetic research and varietal development in citrus, such as diversity study, QTL mapping, molecular breeding, comparative mapping and other genetic analyses.

  10. Development and characterization of genomic and expressed SSRs in citrus by genome-wide analysis.

    Science.gov (United States)

    Liu, Sheng-Rui; Li, Wen-Yang; Long, Dang; Hu, Chun-Gen; Zhang, Jin-Zhi

    2013-01-01

    Microsatellites or simple sequence repeats (SSRs) are one of the most popular sources of genetic markers and play a significant role in plant genetics and breeding. In this study, we identified citrus SSRs in the genome of Clementine mandarin and analyzed their frequency and distribution in different genomic regions. A total of 80,708 SSRs were detected in the genome with an overall density of 268 SSRs/Mb. While di-nucleotide repeats were the most frequent microsatellites in genomic DNA sequence, tetra-nucleotides, which had more repeat units than any other SSR types, had the highest cumulative sequence length. We identified 6,834 transcripts as containing 8,989 SSRs in 33,929 Clementine mandarin transcripts, among which, tri-nucleotide motifs (36.0%) were the most common, followed by di-nucleotide (26.9%) and hexa-nucleotide motifs (15.1%). The motif AG (16.7%) was most abundant among these SSRs, while motifs AAG (6.6%), AAT (5.0%), and TAG (2.2%) were most common among tri-nucleotides. Functional categorization of transcripts containing SSRs revealed that 5,879 (86.0%) of such transcripts had homology with known proteins, GO and KEGG annotation revealed that transcripts containing SSRs were those implicated in diverse biological processes in plants, including binding, development, transcription, and protein degradation. When 27 genomic and 78 randomly selected SSRs were tested on Clementine mandarin, 95 SSRs revealed polymorphism. These 95 SSRs were further deployed on 18 genotypes of the three generas of Rutaceae for the genetic diversity assessment, genomic SSRs generally show low transferability in comparison to SSRs developed from expressed sequences. These transcript-markers identified in our study may provide a valuable genetic and genomic tool for further genetic research and varietal development in citrus, such as diversity study, QTL mapping, molecular breeding, comparative mapping and other genetic analyses.

  11. Complete genome sequence of Borrelia afzelii K78 and comparative genome analysis.

    Directory of Open Access Journals (Sweden)

    Wolfgang Schüler

    Full Text Available The main Borrelia species causing Lyme borreliosis in Europe and Asia are Borrelia afzelii, B. garinii, B. burgdorferi and B. bavariensis. This is in contrast to the United States, where infections are exclusively caused by B. burgdorferi. Until to date the genome sequences of four B. afzelii strains, of which only two include the numerous plasmids, are available. In order to further assess the genetic diversity of B. afzelii, the most common species in Europe, responsible for the large variety of clinical manifestations of Lyme borreliosis, we have determined the full genome sequence of the B. afzelii strain K78, a clinical isolate from Austria. The K78 genome contains a linear chromosome (905,949 bp and 13 plasmids (8 linear and 5 circular together presenting 1,309 open reading frames of which 496 are located on plasmids. With the exception of lp28-8, all linear replicons in their full length including their telomeres have been sequenced. The comparison with the genomes of the four other B. afzelii strains, ACA-1, PKo, HLJ01 and Tom3107, as well as the one of B. burgdorferi strain B31, confirmed a high degree of conservation within the linear chromosome of B. afzelii, whereas plasmid encoded genes showed a much larger diversity. Since some plasmids present in B. burgdorferi are missing in the B. afzelii genomes, the corresponding virulence factors of B. burgdorferi are found in B. afzelii on other unrelated plasmids. In addition, we have identified a species specific region in the circular plasmid, cp26, which could be used for species determination. Different non-coding RNAs have been located on the B. afzelii K78 genome, which have not previously been annotated in any of the published Borrelia genomes.

  12. Analysis of the Complete Chloroplast Genome of a Medicinal Plant, Dianthus superbus var. longicalyncinus, from a Comparative Genomics Perspective.

    Science.gov (United States)

    Raman, Gurusamy; Park, SeonJoo

    2015-01-01

    Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp) genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC) region (82,805 bp), with some variations in the inverted repeat region A (IRA)/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19) was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA) and ribosomal protein subunit L23 (rpl23) genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP) gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus.

  13. Analysis of the Complete Chloroplast Genome of a Medicinal Plant, Dianthus superbus var. longicalyncinus, from a Comparative Genomics Perspective.

    Directory of Open Access Journals (Sweden)

    Gurusamy Raman

    Full Text Available Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC region (82,805 bp, with some variations in the inverted repeat region A (IRA/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19 was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA and ribosomal protein subunit L23 (rpl23 genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus.

  14. Whole-genome thermodynamic analysis reduces siRNA off-target effects.

    Directory of Open Access Journals (Sweden)

    Xi Chen

    Full Text Available Small interfering RNAs (siRNAs are important tools for knocking down targeted genes, and have been widely applied to biological and biomedical research. To design siRNAs, two important aspects must be considered: the potency in knocking down target genes and the off-target effect on any nontarget genes. Although many studies have produced useful tools to design potent siRNAs, off-target prevention has mostly been delegated to sequence-level alignment tools such as BLAST. We hypothesize that whole-genome thermodynamic analysis can identify potential off-targets with higher precision and help us avoid siRNAs that may have strong off-target effects. To validate this hypothesis, two siRNA sets were designed to target three human genes IDH1, ITPR2 and TRIM28. They were selected from the output of two popular siRNA design tools, siDirect and siDesign. Both siRNA design tools have incorporated sequence-level screening to avoid off-targets, thus their output is believed to be optimal. However, one of the sets we tested has off-target genes predicted by Picky, a whole-genome thermodynamic analysis tool. Picky can identify off-target genes that may hybridize to a siRNA within a user-specified melting temperature range. Our experiments validated that some off-target genes predicted by Picky can indeed be inhibited by siRNAs. Similar experiments were performed using commercially available siRNAs and a few off-target genes were also found to be inhibited as predicted by Picky. In summary, we demonstrate that whole-genome thermodynamic analysis can identify off-target genes that are missed in sequence-level screening. Because Picky prediction is deterministic according to thermodynamics, if a siRNA candidate has no Picky predicted off-targets, it is unlikely to cause off-target effects. Therefore, we recommend including Picky as an additional screening step in siRNA design.

  15. CoCoNUT: an efficient system for the comparison and analysis of genomes

    Directory of Open Access Journals (Sweden)

    Kurtz Stefan

    2008-11-01

    Full Text Available Abstract Background Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. Results Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit that allows solving several different tasks in a unified framework: (1 finding regions of high similarity among multiple genomic sequences and aligning them, (2 comparing two draft or multi-chromosomal genomes, (3 locating large segmental duplications in large genomic sequences, and (4 mapping cDNA/EST to genomic sequences. Conclusion CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component, CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics.

  16. Zinc Composite Layers, Incorporating Polymeric Nano-aggregates: Surface Analysis and Electrochemical Behavior

    NARCIS (Netherlands)

    Koleva, D.A.; Zhang, X.; Petrov, P.; Boshkov, N.; Van Breugel, K.; De Wit, J.H.W.; Mol, J.M.C.; Tsvetkova, N.

    2008-01-01

    This study reports on a comparative investigation of the corrosion behavior of zinc (Zn) and nano-composite zinc (ZnC) galvanic layers in 5% NaCl solution. The metallic matrix of the ZnC layers incorporates nano-sized, stabilized polymeric aggregates, formed from the amphiphilic tri-block co-polymer

  17. Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms

    Directory of Open Access Journals (Sweden)

    Meller Jaroslaw

    2007-03-01

    Full Text Available Abstract Background Identifying syntenic regions, i.e., blocks of genes or other markers with evolutionary conserved order, and quantifying evolutionary relatedness between genomes in terms of chromosomal rearrangements is one of the central goals in comparative genomics. However, the analysis of synteny and the resulting assessment of genome rearrangements are sensitive to the choice of a number of arbitrary parameters that affect the detection of synteny blocks. In particular, the choice of a set of markers and the effect of different aggregation strategies, which enable coarse graining of synteny blocks and exclusion of micro-rearrangements, need to be assessed. Therefore, existing tools and resources that facilitate identification, visualization and analysis of synteny need to be further improved to provide a flexible platform for such analysis, especially in the context of multiple genomes. Results We present a new tool, Cinteny, for fast identification and analysis of synteny with different sets of markers and various levels of coarse graining of syntenic blocks. Using Hannenhalli-Pevzner approach and its extensions, Cinteny also enables interactive determination of evolutionary relationships between genomes in terms of the number of rearrangements (the reversal distance. In particular, Cinteny provides: i integration of synteny browsing with assessment of evolutionary distances for multiple genomes; ii flexibility to adjust the parameters and re-compute the results on-the-fly; iii ability to work with user provided data, such as orthologous genes, sequence tags or other conserved markers. In addition, Cinteny provides many annotated mammalian, invertebrate and fungal genomes that are pre-loaded and available for analysis at http://cinteny.cchmc.org. Conclusion Cinteny allows one to automatically compare multiple genomes and perform sensitivity analysis for synteny block detection and for the subsequent computation of reversal distances

  18. Comparative genome analysis of Bacillus cereus group genomes withBacillus subtilis

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Iain; Sorokin, Alexei; Kapatral, Vinayak; Reznik, Gary; Bhattacharya, Anamitra; Mikhailova, Natalia; Burd, Henry; Joukov, Victor; Kaznadzey, Denis; Walunas, Theresa; D' Souza, Mark; Larsen, Niels; Pusch,Gordon; Liolios, Konstantinos; Grechkin, Yuri; Lapidus, Alla; Goltsman,Eugene; Chu, Lien; Fonstein, Michael; Ehrlich, S. Dusko; Overbeek, Ross; Kyrpides, Nikos; Ivanova, Natalia

    2005-09-14

    Genome features of the Bacillus cereus group genomes (representative strains of Bacillus cereus, Bacillus anthracis and Bacillus thuringiensis sub spp israelensis) were analyzed and compared with the Bacillus subtilis genome. A core set of 1,381 protein families among the four Bacillus genomes, with an additional set of 933 families common to the B. cereus group, was identified. Differences in signal transduction pathways, membrane transporters, cell surface structures, cell wall, and S-layer proteins suggesting differences in their phenotype were identified. The B. cereus group has signal transduction systems including a tyrosine kinase related to two-component system histidine kinases from B. subtilis. A model for regulation of the stress responsive sigma factor sigmaB in the B. cereus group different from the well studied regulation in B. subtilis has been proposed. Despite a high degree of chromosomal synteny among these genomes, significant differences in cell wall and spore coat proteins that contribute to the survival and adaptation in specific hosts has been identified.

  19. Comparative Genome Analysis Provides Insights into the Pathogenicity of Flavobacterium psychrophilum

    Science.gov (United States)

    Castillo, Daniel; Christiansen, Rói Hammershaimb; Dalsgaard, Inger; Madsen, Lone; Espejo, Romilio

    2016-01-01

    Flavobacterium psychrophilum is a fish pathogen in salmonid aquaculture worldwide that causes cold water disease (CWD) and rainbow trout fry syndrome (RTFS). Comparative genome analyses of 11 F. psychrophilum isolates representing temporally and geographically distant populations were used to describe the F. psychrophilum pan-genome and to examine virulence factors, prophages, CRISPR arrays, and genomic islands present in the genomes. Analysis of the genomic DNA sequences were complemented with selected phenotypic characteristics of the strains. The pan genome analysis showed that F. psychrophilum could hold at least 3373 genes, while the core genome contained 1743 genes. On average, 67 new genes were detected for every new genome added to the analysis, indicating that F. psychrophilum possesses an open pan genome. The putative virulence factors were equally distributed among isolates, independent of geographic location, year of isolation and source of isolates. Only one prophage-related sequence was found which corresponded to the previously described prophage 6H, and appeared in 5 out of 11 isolates. CRISPR array analysis revealed two different loci with dissimilar spacer content, which only matched one sequence in the database, the temperate bacteriophage 6H. Genomic Islands (GIs) were identified in F. psychrophilum isolates 950106-1/1 and CSF 259–93, associated with toxins and antibiotic resistance. Finally, phenotypic characterization revealed a high degree of similarity among the strains with respect to biofilm formation and secretion of extracellular enzymes. Global scale dispersion of virulence factors in the genomes and the abilities for biofilm formation, hemolytic activity and secretion of extracellular enzymes among the strains suggested that F. psychrophilum isolates have a similar mode of action on adhesion, colonization and destruction of fish tissues across large spatial and temporal scales of occurrence. Overall, the genomic characterization and

  20. Full-length genomic analysis of korean porcine sapelovirus strains

    DEFF Research Database (Denmark)

    Son, Kyu-Yeol; Kim, Deok-Song; Kwon, Joseph

    2014-01-01

    the structural features of PSV genomes, the full-length nucleotide sequences of three Korean PSV strains were determined and analyzed using bioinformatic techniques in comparison with other known PSV strains. The Korean PSV genomes ranged from 7,542 to 7,566 nucleotides excluding the 3' poly(A) tail, and showed...

  1. Analysis of the ABCA4 genomic locus in Stargardt disease

    DEFF Research Database (Denmark)

    Zernant, Jana; Xie, Yajing Angela; Ayuso, Carmen

    2014-01-01

    was designed to find the missing disease-causing ABCA4 variation by a combination of next-generation sequencing (NGS), array-Comparative Genome Hybridization (aCGH) screening, familial segregation and in silico analyses. The entire 140 kb ABCA4 genomic locus was sequenced in 114 STGD patients with one known...

  2. High resolution microarray comparative genomic hybridisation analysis using spotted oligonucleotides.

    NARCIS (Netherlands)

    Carvalho, B; Ouwerkerk, E; Meijer, G.A.; Ylstra, B.

    2004-01-01

    BACKGROUND: Currently, comparative genomic hybridisation array (array CGH) is the method of choice for studying genome wide DNA copy number changes. To date, either amplified representations of bacterial artificial chromosomes (BACs)/phage artificial chromosomes (PACs) or cDNAs have been spotted as

  3. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  4. A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease

    Science.gov (United States)

    Kyriakou, Theodosios; Nelson, Christopher P; Hopewell, Jemma C; Webb, Thomas R; Zeng, Lingyao; Dehghan, Abbas; Alver, Maris; Armasu, Sebastian M; Auro, Kirsi; Bjonnes, Andrew; Chasman, Daniel I; Chen, Shufeng; Ford, Ian; Franceschini, Nora; Gieger, Christian; Grace, Christopher; Gustafsson, Stefan; Huang, Jie; Hwang, Shih-Jen; Kim, Yun Kyoung; Kleber, Marcus E; Lau, King Wai; Lu, Xiangfeng; Lu, Yingchang; Lyytikäinen, Leo-Pekka; Mihailov, Evelin; Morrison, Alanna C; Pervjakova, Natalia; Qu, Liming; Rose, Lynda M; Salfati, Elias; Saxena, Richa; Scholz, Markus; Smith, Albert V; Tikkanen, Emmi; Uitterlinden, Andre; Yang, Xueli; Zhang, Weihua; Zhao, Wei; de Andrade, Mariza; de Vries, Paul S; van Zuydam, Natalie R; Anand, Sonia S; Bertram, Lars; Beutner, Frank; Dedoussis, George; Frossard, Philippe; Gauguier, Dominique; Goodall, Alison H; Gottesman, Omri; Haber, Marc; Han, Bok-Ghee; Huang, Jianfeng; Jalilzadeh, Shapour; Kessler, Thorsten; König, Inke R; Lannfelt, Lars; Lieb, Wolfgang; Lind, Lars; Lindgren, Cecilia M; Lokki, Marja-Liisa; Magnusson, Patrik K; Mallick, Nadeem H; Mehra, Narinder; Meitinger, Thomas; Memon, Fazal-ur-Rehman; Morris, Andrew P; Nieminen, Markku S; Pedersen, Nancy L; Peters, Annette; Rallidis, Loukianos S; Rasheed, Asif; Samuel, Maria; Shah, Svati H; Sinisalo, Juha; Stirrups, Kathleen E; Trompet, Stella; Wang, Laiyuan; Zaman, Khan S; Ardissino, Diego; Boerwinkle, Eric; Borecki, Ingrid B; Bottinger, Erwin P; Buring, Julie E; Chambers, John C; Collins, Rory; Cupples, L Adrienne; Danesh, John; Demuth, Ilja; Elosua, Roberto; Epstein, Stephen E; Esko, Tõnu; Feitosa, Mary F; Franco, Oscar H; Franzosi, Maria Grazia; Granger, Christopher B; Gu, Dongfeng; Gudnason, Vilmundur; Hall, Alistair S; Hamsten, Anders; Harris, Tamara B; Hazen, Stanley L; Hengstenberg, Christian; Hofman, Albert; Ingelsson, Erik; Iribarren, Carlos; Jukema, J Wouter; Karhunen, Pekka J; Kim, Bong-Jo; Kooner, Jaspal S; Kullo, Iftikhar J; Lehtimäki, Terho; Loos, Ruth J F; Melander, Olle; Metspalu, Andres; März, Winfried; Palmer, Colin N; Perola, Markus; Quertermous, Thomas; Rader, Daniel J; Ridker, Paul M; Ripatti, Samuli; Roberts, Robert; Salomaa, Veikko; Sanghera, Dharambir K; Schwartz, Stephen M; Seedorf, Udo; Stewart, Alexandre F; Stott, David J; Thiery, Joachim; Zalloua, Pierre A; O’Donnell, Christopher J; Reilly, Muredach P; Assimes, Themistocles L; Thompson, John R; Erdmann, Jeanette; Clarke, Robert; Watkins, Hugh; Kathiresan, Sekar; McPherson, Ruth; Deloukas, Panos; Schunkert, Heribert; Samani, Nilesh J; Farrall, Martin

    2015-01-01

    Existing knowledge of genetic variants affecting risk of coronary artery disease (CAD) is largely based on genome-wide association studies (GWAS) analysis of common SNPs. Leveraging phased haplotypes from the 1000 Genomes Project, we report a GWAS meta-analysis of 185 thousand CAD cases and controls, interrogating 6.7 million common (MAF>0.05) as well as 2.7 million low frequency (0.005analysis provides a comprehensive survey of the fine genetic architecture of CAD showing that genetic susceptibility to this common disease is largely determined by common SNPs of small effect size. PMID:26343387

  5. Genomic organization and sequence analysis of the vomeronasal receptor V2R genes in mouse genome

    Institute of Scientific and Technical Information of China (English)

    YANG Hui; Zhang YaPing

    2007-01-01

    Two multigene superfamilies, named V1R and V2R, encoding seven-transmembrane-domain G-protein coupled receptors (GPCRs) have been identified as pheromone receptors in mammals. Three V2R gene families have been described in mouse and rat. Here we screened the updated mouse genome sequence database and finally retrieved 63 putative functional V2R genes including three newly identified genes which formed a new additional family. We described the genomic organization of these genes and also characterized the conservation of mouse V2R protein sequences. These genomic and sequence information we described are useful as part of the evidence to speculate the functional domain of V2Rs and should give aid to the functionality study in the future.

  6. Genomic analysis by oligonucleotide array Comparative Genomic Hybridization utilizing formalin-fixed, paraffin-embedded tissues.

    Science.gov (United States)

    Savage, Stephanie J; Hostetter, Galen

    2011-01-01

    Formalin fixation has been used to preserve tissues for more than a hundred years, and there are currently more than 300 million archival samples in the United States alone. The application of genomic protocols such as high-density oligonucleotide array Comparative Genomic Hybridization (aCGH) to formalin-fixed, paraffin-embedded (FFPE) tissues, therefore, opens an untapped resource of available tissues for research and facilitates utilization of existing clinical data in a research sample set. However, formalin fixation results in cross-linking of proteins and DNA, typically leading to such a significant degradation of DNA template that little is available for use in molecular applications. Here, we describe a protocol to circumvent formalin fixation artifact by utilizing enzymatic reactions to obtain quality DNA from a wide range of FFPE tissues for successful genome-wide discovery of gene dosage alterations in archival clinical samples.

  7. Single cell genome analysis of an uncultured heterotrophic stramenopile

    Science.gov (United States)

    Roy, Rajat S.; Price, Dana C.; Schliep, Alexander; Cai, Guohong; Korobeynikov, Anton; Yoon, Hwan Su; Yang, Eun Chan; Bhattacharya, Debashish

    2014-04-01

    A broad swath of eukaryotic microbial biodiversity cannot be cultivated in the lab and is therefore inaccessible to conventional genome-wide comparative methods. One promising approach to study these lineages is single cell genomics (SCG), whereby an individual cell is captured from nature and genome data are produced from the amplified total DNA. Here we tested the efficacy of SCG to generate a draft genome assembly from a single sample, in this case a cell belonging to the broadly distributed MAST-4 uncultured marine stramenopiles. Using de novo gene prediction, we identified 6,996 protein-encoding genes in the MAST-4 genome. This genetic inventory was sufficient to place the cell within the ToL using multigene phylogenetics and provided preliminary insights into the complex evolutionary history of horizontal gene transfer (HGT) in the MAST-4 lineage.

  8. Genomic Islands Prediction and Analysis in Cyanobacteira by Bioinfomatics

    Institute of Scientific and Technical Information of China (English)

    Yi Li; Ni-Ni Rao; Feng Yang; Han-Ming Liu

    2014-01-01

    Genomic islands (Gis) are one of the most important components for cyanobacterial genome. The Gis code has many functions, such as symbiosis, pathogenesis, and adaptation. In this article, we predict and analyze the Gis in Synechocystis sp. PCC 6803 by bioinfomatics, and the results show that ISL1, ISL8, and ISL16 are homologous with many other bacteria, and they involve in basic reactions and have a conservative evolution. On the contrary, ISL15 has a unique sequence and function only for Synechocystis sp. PCC 6803. Most of Gis play a role in genome rearrangement because they have lots of transposase. Moreover, we find that recombination and horizontal transfer of Gis are important factors to affect the distribution of non-coding RNA. Our work contributes to a comprehensive understanding of genomic islands and their impact on genome of cyanobacteria.

  9. In silico comparative genomic analysis of GABAA receptor transcriptional regulation

    Directory of Open Access Journals (Sweden)

    Joyce Christopher J

    2007-06-01

    Full Text Available Abstract Background Subtypes of the GABAA receptor subunit exhibit diverse temporal and spatial expression patterns. In silico comparative analysis was used to predict transcriptional regulatory features in individual mammalian GABAA receptor subunit genes, and to identify potential transcriptional regulatory components involved in the coordinate regulation of the GABAA receptor gene clusters. Results Previously unreported putative promoters were identified for the β2, γ1, γ3, ε, θ and π subunit genes. Putative core elements and proximal transcriptional factors were identified within these predicted promoters, and within the experimentally determined promoters of other subunit genes. Conserved intergenic regions of sequence in the mammalian GABAA receptor gene cluster comprising the α1, β2, γ2 and α6 subunits were identified as potential long range transcriptional regulatory components involved in the coordinate regulation of these genes. A region of predicted DNase I hypersensitive sites within the cluster may contain transcriptional regulatory features coordinating gene expression. A novel model is proposed for the coordinate control of the gene cluster and parallel expression of the α1 and β2 subunits, based upon the selective action of putative Scaffold/Matrix Attachment Regions (S/MARs. Conclusion The putative regulatory features identified by genomic analysis of GABAA receptor genes were substantiated by cross-species comparative analysis and now require experimental verification. The proposed model for the coordinate regulation of genes in the cluster accounts for the head-to-head orientation and parallel expression of the α1 and β2 subunit genes, and for the disruption of transcription caused by insertion of a neomycin gene in the close vicinity of the α6 gene, which is proximal to a putative critical S/MAR.

  10. Organization and comparative analysis of the mitochondrial genomes of bioluminescent Elateroidea (Coleoptera: Polyphaga).

    Science.gov (United States)

    Amaral, Danilo T; Mitani, Yasuo; Ohmiya, Yoshihiro; Viviani, Vadim R

    2016-07-25

    Mitochondrial genome organization in the Elateroidea superfamily (Coleoptera), which include the main families of bioluminescent beetles, has been poorly studied and lacking information about Phengodidae family. We sequenced the mitochondrial genomes of Neotropical Lampyridae (Bicellonycha lividipennis), Phengodidae (Brasilocerus sp.2 and Phrixothrix hirtus) and Elateridae (Pyrearinus termitilluminans, Hapsodrilus ignifer and Teslasena femoralis). All species had a typical insect mitochondrial genome except for the following: in the elaterid T. femoralis genome there is a non-coding region between NADH2 and tRNA-Trp; in the phengodids Brasilocerus sp.2 and P. hirtus genomes we did not find the tRNA-Ile and tRNA-Gln. The P. hirtus genome showed a ~1.6kb non-coding region, the rearrangement of tRNA-Tyr, a new tRNA-Leu copy, and several regions with higher AT contents. Phylogenetics analysis using Bayesian and ML models indicated that the Phengodidae+Rhagophthalmidae are closely related to Lampyridae family, and included Drilus flavescens (Drilidae) as an internal clade within Elateridae. This is the first report that compares the mitochondrial genomes organization of the three main families of bioluminescent Elateroidea, including the first Neotropical Lampyridae and Phengodidae. The losses of tRNAs, and translocation and duplication events found in Phengodidae mt genomes, mainly in P. hirtus, may indicate different evolutionary rates in these mitochondrial genomes. The mitophylogenomics analysis indicates the monophyly of the three bioluminescent families and a closer relationship between Lampyridae and Phengodidae/Rhagophthalmidae, in contrast with previous molecular analysis.

  11. The perennial ryegrass GenomeZipper: targeted use of genome resources for comparative grass genomics.

    Science.gov (United States)

    Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F X; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno

    2013-02-01

    Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species.

  12. Comparative Analysis of Fatty Acid Desaturases in Cyanobacterial Genomes

    Directory of Open Access Journals (Sweden)

    Xiaoyuan Chi

    2008-01-01

    Full Text Available Fatty acid desaturases are enzymes that introduce double bonds into the hydrocarbon chains of fatty acids. The fatty acid desaturases from 37 cyanobacterial genomes were identified and classified based upon their conserved histidine-rich motifs and phylogenetic analysis, which help to determine the amounts and distributions of desaturases in cyanobacterial species. The filamentous or N2-fixing cyanobacteria usually possess more types of fatty acid desaturases than that of unicellular species. The pathway of acyl-lipid desaturation for unicellular marine cyanobacteria Synechococcus and Prochlorococcus differs from that of other cyanobacteria, indicating different phylogenetic histories of the two genera from other cyanobacteria isolated from freshwater, soil, or symbiont. Strain Gloeobacter violaceus PCC 7421 was isolated from calcareous rock and lacks thylakoid membranes. The types and amounts of desaturases of this strain are distinct to those of other cyanobacteria, reflecting the earliest divergence of it from the cyanobacterial line. Three thermophilic unicellular strains, Thermosynechococcus elongatus BP-1 and two Synechococcus Yellowstone species, lack highly unsaturated fatty acids in lipids and contain only one Δ9 desaturase in contrast with mesophilic strains, which is probably due to their thermic habitats. Thus, the amounts and types of fatty acid desaturases are various among different cyanobacterial species, which may result from the adaption to environments in evolution.

  13. Genome-wide analysis of TCP family in tobacco.

    Science.gov (United States)

    Chen, L; Chen, Y Q; Ding, A M; Chen, H; Xia, F; Wang, W F; Sun, Y H

    2016-05-23

    The TCP family is a transcription factor family, members of which are extensively involved in plant growth and development as well as in signal transduction in the response against many physiological and biochemical stimuli. In the present study, 61 TCP genes were identified in tobacco (Nicotiana tabacum) genome. Bioinformatic methods were employed for predicting and analyzing the gene structure, gene expression, phylogenetic analysis, and conserved domains of TCP proteins in tobacco. The 61 NtTCP genes were divided into three diverse groups, based on the division of TCP genes in tomato and Arabidopsis, and the results of the conserved domain and sequence analyses further confirmed the classification of the NtTCP genes. The expression pattern of NtTCP also demonstrated that majority of these genes play important roles in all the tissues, while some special genes exercise their functions only in specific tissues. In brief, the comprehensive and thorough study of the TCP family in other plants provides sufficient resources for studying the structure and functions of TCPs in tobacco.

  14. Incorporation of future costs in health economic analysis publications: current situation and recommendations for the future.

    Science.gov (United States)

    Gros, Blanca; Soto Álvarez, Javier; Ángel Casado, Miguel

    2015-06-01

    Future costs are not usually included in economic evaluations. The aim of this study was to assess the extent of published economic analyses that incorporate future costs. A systematic review was conducted of economic analyses published from 2008 to 2013 in three general health economics journals: PharmacoEconomics, Value in Health and the European Journal of Health Economics. A total of 192 articles met the inclusion criteria, 94 of them (49.0%) incorporated future related medical costs, 9 (4.2%) also included future unrelated medical costs and none of them included future nonmedical costs. The percentage of articles including future costs increased from 2008 (30.8%) to 2013 (70.8%), and no differences were detected between the three journals. All relevant costs for the perspective considered should be included in economic evaluations, including related or unrelated, direct or indirect future costs. It is also advisable that pharmacoEconomic guidelines are adapted in this sense.

  15. Incorporating social role theory into topic models for social media content analysis

    OpenAIRE

    Zhao, Wayne Xin; Wang, Jinpeng; He, Yulan; Nie, Jian-Yun; Wen, Ji-Rong; Li, Xiaoming

    2015-01-01

    In this paper, we explore the idea of social role theory (SRT) and propose a novel regularized topic model which incorporates SRT into the generative process of social media content. We assume that a user can play multiple social roles, and each social role serves to fulfil different duties and is associated with a role-driven distribution over latent topics. In particular, we focus on social roles corresponding to the most common social activities on social networks. Our model is instantiate...

  16. Efficient strategies for genome scanning using maximum-likelihood affected-sib-pair analysis

    Energy Technology Data Exchange (ETDEWEB)

    Holmans, P.; Craddock, N. [Univ. of Wales College of Medicine, Cardiff (United Kingdom)

    1997-03-01

    Detection of linkage with a systematic genome scan in nuclear families including an affected sibling pair is an important initial step on the path to cloning susceptibility genes for complex genetic disorders, and it is desirable to optimize the efficiency of such studies. The aim is to maximize power while simultaneously minimizing the total number of genotypings and probability of type I error. One approach to increase efficiency, which has been investigated by other workers, is grid tightening: a sample is initially typed using a coarse grid of markers, and promising results are followed up by use of a finer grid. Another approach, not previously considered in detail in the context of an affected-sib-pair genome scan for linkage, is sample splitting: a portion of the sample is typed in the screening stage, and promising results are followed up in the whole sample. In the current study, we have used computer simulation to investigate the relative efficiency of two-stage strategies involving combinations of both grid tightening and sample splitting and found that the optimal strategy incorporates both approaches. In general, typing half the sample of affected pairs with a coarse grid of markers in the screening stage is an efficient strategy under a variety of conditions. If Hardy-Weinberg equilibrium holds, it is most efficient not to type parents in the screening stage. If Hardy-Weinberg equilibrium does not hold (e.g., because of stratification) failure to type parents in the first stage increases the amount of genotyping required, although the overall probability of type I error is not greatly increased, provided the parents are used in the final analysis. 23 refs., 4 figs., 5 tabs.

  17. Meta-analysis of genome-wide association from genomic prediction models

    Science.gov (United States)

    A limitation of many genome-wide association studies (GWA) in animal breeding is that there are many loci with small effect sizes; thus, larger sample sizes (N) are required to guarantee suitable power of detection. To increase sample size, results from different GWA can be combined in a meta-analys...

  18. The genome sequence of Blochmannia floridanus: Comparative analysis of reduced genomes

    NARCIS (Netherlands)

    Gil, R.; Silva, F.J.; Zientz, E.; Delmotte, F.; Gonzalez-Candelas, F.; Latorre, A.; Rausell, C.; Kamerbeek, J.; Gadau, J.; Hölldobler, B.; Ham, van R.C.H.J.; Gross, R.; Moya, A.

    2003-01-01

    Bacterial symbioses are widespread among insects, probably being one of the key factors of their evolutionary success. We present the complete genome sequence of Blochmannia floridanus, the primary endosymbiont of carpenter ants. Although these ants feed on a complex diet, this symbiosis very likely

  19. BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics

    DEFF Research Database (Denmark)

    Zhao, Wenming; Wang, Jing; He, Ximiao

    2004-01-01

    the application of the rice genomic information and to provide a foundation for functional and evolutionary studies of other important cereal crops, we implemented our Rice Information System (BGI-RIS), the most up-to-date integrated information resource as well as a workbench for comparative genomic analysis...

  20. Genomic analysis of a nontoxigenic, invasive Corynebacterium diphtheriae strain from Brazil

    Directory of Open Access Journals (Sweden)

    Fernando Encinas

    2015-09-01

    Full Text Available We report the complete genome sequence and analysis of an invasive Corynebacterium diphtheriae strain that caused endocarditis in Rio de Janeiro, Brazil. It was selected for sequencing on the basis of the current relevance of nontoxigenic strains for public health. The genomic information was explored in the context of diversity, plasticity and genetic relatedness with other contemporary strains.

  1. Genomic analysis of a nontoxigenic, invasive Corynebacterium diphtheriae strain from Brazil.

    Science.gov (United States)

    Encinas, Fernando; Marin, Michel A; Ramos, Juliana N; Vieira, Verônica V; Mattos-Guaraldi, Ana Luiza; Vicente, Ana Carolina P

    2015-09-01

    We report the complete genome sequence and analysis of an invasive Corynebacterium diphtheriae strain that caused endocarditis in Rio de Janeiro, Brazil. It was selected for sequencing on the basis of the current relevance of nontoxigenic strains for public health. The genomic information was explored in the context of diversity, plasticity and genetic relatedness with other contemporary strains.

  2. Genome-Wide Association Study and Linkage Analysis of the Healthy Aging Index

    DEFF Research Database (Denmark)

    Minster, Ryan L; Sanders, Jason L; Singh, Jatinder;

    2015-01-01

    BACKGROUND: The Healthy Aging Index (HAI) is a tool for measuring the extent of health and disease across multiple systems. METHODS: We conducted a genome-wide association study and a genome-wide linkage analysis to map quantitative trait loci associated with the HAI and a modified HAI weighted...

  3. Dissection of genomic correlation matrices of US Holsteins using multivariate factor analysis

    Science.gov (United States)

    Aim of the study was to compare correlation matrices between direct genomic predictions for 31 production, fitness and conformation traits both at genomic and chromosomal level in US Holstein bulls. Multivariate factor analysis was used to quantify basic features of correlation matrices. Factor extr...

  4. Genome-wide Association Analysis of Kernel Weight in Hard Winter Wheat

    Science.gov (United States)

    Wheat kernel weight is an important and heritable component of wheat grain yield and a key predictor of flour extraction. Genome-wide association analysis was conducted to identify genomic regions associated with kernel weight and kernel weight environmental response in 8 trials of 299 hard winter ...

  5. Meta-Analysis of Genome-Wide Association Studies of Attention-Deficit/Hyperactivity Disorder

    Science.gov (United States)

    Neale, Benjamin M.; Medland, Sarah E.; Ripke, Stephan; Asherson, Philip; Franke, Barbara; Lesch, Klaus-Peter; Faraone, Stephen V.; Nguyen, Thuy Trang; Schafer, Helmut; Holmans, Peter; Daly, Mark; Steinhausen, Hans-Christoph; Freitag, Christine; Reif, Andreas; Renner, Tobias J.; Romanos, Marcel; Romanos, Jasmin; Walitza, Susanne; Warnke, Andreas; Meyer, Jobst; Palmason, Haukur; Buitelaar, Jan; Vasquez, Alejandro Arias; Lambregts-Rommelse, Nanda; Gill, Michael; Anney, Richard J. L.; Langely, Kate; O'Donovan, Michael; Williams, Nigel; Owen, Michael; Thapar, Anita; Kent, Lindsey; Sergeant, Joseph; Roeyers, Herbert; Mick, Eric; Biederman, Joseph; Doyle, Alysa; Smalley, Susan; Loo, Sandra; Hakonarson, Hakon; Elia, Josephine; Todorov, Alexandre; Miranda, Ana; Mulas, Fernando; Ebstein, Richard P.; Rothenberger, Aribert; Banaschewski, Tobias; Oades, Robert D.; Sonuga-Barke, Edmund; McGough, James; Nisenbaum, Laura; Middleton, Frank; Hu, Xiaolan; Nelson, Stan

    2010-01-01

    Objective: Although twin and family studies have shown attention-deficit/hyperactivity disorder (ADHD) to be highly heritable, genetic variants influencing the trait at a genome-wide significant level have yet to be identified. As prior genome-wide association studies (GWAS) have not yielded significant results, we conducted a meta-analysis of…

  6. Pyrosequencing-based comparative genome analysis of the nosocomial pathogen Enterococcus faecium and identification of a large transferable pathogenicity island

    Directory of Open Access Journals (Sweden)

    Bonten Marc JM

    2010-04-01

    Full Text Available Abstract Background The Gram-positive bacterium Enterococcus faecium is an important cause of nosocomial infections in immunocompromized patients. Results We present a pyrosequencing-based comparative genome analysis of seven E. faecium strains that were isolated from various sources. In the genomes of clinical isolates several antibiotic resistance genes were identified, including the vanA transposon that confers resistance to vancomycin in two strains. A functional comparison between E. faecium and the related opportunistic pathogen E. faecalis based on differences in the presence of protein families, revealed divergence in plant carbohydrate metabolic pathways and oxidative stress defense mechanisms. The E. faecium pan-genome was estimated to be essentially unlimited in size, indicating that E. faecium can efficiently acquire and incorporate exogenous DNA in its gene pool. One of the most prominent sources of genomic diversity consists of bacteriophages that have integrated in the genome. The CRISPR-Cas system, which contributes to immunity against bacteriophage infection in prokaryotes, is not present in the sequenced strains. Three sequenced isolates carry the esp gene, which is involved in urinary tract infections and biofilm formation. The esp gene is located on a large pathogenicity island (PAI, which is between 64 and 104 kb in size. Conjugation experiments showed that the entire esp PAI can be transferred horizontally and inserts in a site-specific manner. Conclusions Genes involved in environmental persistence, colonization and virulence can easily be aquired by E. faecium. This will make the development of successful treatment strategies targeted against this organism a challenge for years to come.

  7. Comparative genomics in chicken and Pekin duck using FISH mapping and microarray analysis

    Directory of Open Access Journals (Sweden)

    Fowler Katie E

    2009-08-01

    Full Text Available Abstract Background The availability of the complete chicken (Gallus gallus genome sequence as well as a large number of chicken probes for fluorescent in-situ hybridization (FISH and microarray resources facilitate comparative genomic studies between chicken and other bird species. In a previous study, we provided a comprehensive cytogenetic map for the turkey (Meleagris gallopavo and the first analysis of copy number variants (CNVs in birds. Here, we extend this approach to the Pekin duck (Anas platyrhynchos, an obvious target for comparative genomic studies due to its agricultural importance and resistance to avian flu. Results We provide a detailed molecular cytogenetic map of the duck genome through FISH assignment of 155 chicken clones. We identified one inter- and six intrachromosomal rearrangements between chicken and duck macrochromosomes and demonstrated conserved synteny among all microchromosomes analysed. Array comparative genomic hybridisation revealed 32 CNVs, of which 5 overlap previously designated "hotspot" regions between chicken and turkey. Conclusion Our results suggest extensive conservation of avian genomes across 90 million years of evolution in both macro- and microchromosomes. The data on CNVs between chicken and duck extends previous analyses in chicken and turkey and supports the hypotheses that avian genomes contain fewer CNVs than mammalian genomes and that genomes of evolutionarily distant species share regions of copy number variation ("CNV hotspots". Our results will expedite duck genomics, assist marker development and highlight areas of interest for future evolutionary and functional studies.

  8. Comparative analysis of catfish BAC end sequences with the zebrafish genome

    Directory of Open Access Journals (Sweden)

    Abernathy Jason

    2009-12-01

    Full Text Available Abstract Background Comparative mapping is a powerful tool to transfer genomic information from sequenced genomes to closely related species for which whole genome sequence data are not yet available. However, such an approach is still very limited in catfish, the most important aquaculture species in the United States. This project was initiated to generate additional BAC end sequences and demonstrate their applications in comparative mapping in catfish. Results We reported the generation of 43,000 BAC end sequences and their applications for comparative genome analysis in catfish. Using these and the additional 20,000 existing BAC end sequences as a resource along with linkage mapping and existing physical map, conserved syntenic regions were identified between the catfish and zebrafish genomes. A total of 10,943 catfish BAC end sequences (17.3% had significant BLAST hits to the zebrafish genome (cutoff value ≤ e-5, of which 3,221 were unique gene hits, providing a platform for comparative mapping based on locations of these genes in catfish and zebrafish. Genetic linkage mapping of microsatellites associated with contigs allowed identification of large conserved genomic segments and construction of super scaffolds. Conclusion BAC end sequences and their associated polymorphic markers are great resources for comparative genome analysis in catfish. Highly conserved chromosomal regions were identified to exist between catfish and zebrafish. However, it appears that the level of conservation at local genomic regions are high while a high level of chromosomal shuffling and rearrangements exist between catfish and zebrafish genomes. Orthologous regions established through comparative analysis should facilitate both structural and functional genome analysis in catfish.

  9. Genomic analysis of the basal lineage fungus Rhizopus oryzae reveals a whole-genome duplication.

    Directory of Open Access Journals (Sweden)

    Li-Jun Ma

    2009-07-01

    Full Text Available Rhizopus oryzae is the primary cause of mucormycosis, an emerging, life-threatening infection characterized by rapid angioinvasive growth with an overall mortality rate that exceeds 50%. As a representative of the paraphyletic basal group of the fungal kingdom called "zygomycetes," R. oryzae is also used as a model to study fungal evolution. Here we report the genome sequence of R. oryzae strain 99-880, isolated from a fatal case of mucormycosis. The highly repetitive 45.3 Mb genome assembly contains abundant transposable elements (TEs, comprising approximately 20% of the genome. We predicted 13,895 protein-coding genes not overlapping TEs, many of which are paralogous gene pairs. The order and genomic arrangement of the duplicated gene pairs and their common phylogenetic origin provide evidence for an ancestral whole-genome duplication (WGD event. The WGD resulted in the duplication of nearly all subunits of the protein complexes associated with respiratory electron transport chains, the V-ATPase, and the ubiquitin-proteasome systems. The WGD, together with recent gene duplications, resulted in the expansion of multiple gene families related to cell growth and signal transduction, as well as secreted aspartic protease and subtilase protein families, which are known fungal virulence factors. The duplication of the ergosterol biosynthetic pathway, especially the major azole target, lanosterol 14alpha-demethylase (ERG11, could contribute to the variable responses of R. oryzae to different azole drugs, including voriconazole and posaconazole. Expanded families of cell-wall synthesis enzymes, essential for fungal cell integrity but absent in mammalian hosts, reveal potential targets for novel and R. oryzae-specific diagnostic and therapeutic treatments.

  10. CGUG: in silico proteome and genome parsing tool for the determination of "core" and unique genes in the analysis of genomes up to ca. 1.9 Mb

    Directory of Open Access Journals (Sweden)

    Mahadevan Padmanabhan

    2009-08-01

    Full Text Available Abstract Background Viruses and small-genome bacteria (~2 megabases and smaller comprise a considerable population in the biosphere and are of interest to many researchers. These genomes are now sequenced at an unprecedented rate and require complementary computational tools to analyze. "CoreGenesUniqueGenes" (CGUG is an in silico genome data mining tool that determines a "core" set of genes from two to five organisms with genomes in this size range. Core and unique genes may reflect similar niches and needs, and may be used in classifying organisms. Findings CGUG is available at http://binf.gmu.edu/geneorder.html as a web-based on-the-fly tool that performs iterative BLASTP analyses using a reference genome and up to four query genomes to provide a table of genes common to these genomes. The result is an in silico display of genomes and their proteomes, allowing for further analysis. CGUG can be used for "genome annotation by homology", as demonstrated with Chlamydophila and Francisella genomes. Conclusion CGUG is used to reanalyze the ICTV-based classifications of bacteriophages, to reconfirm long-standing relationships and to explore new classifications. These genomes have been problematic in the past, due largely to horizontal gene transfers. CGUG is validated as a tool for reannotating small genome bacteria using more up-to-date annotations by similarity or homology. These serve as an entry point for wet-bench experiments to confirm the functions of these "hypothetical" and "unknown" proteins.

  11. The Integrated Microbial Genomes (IMG) System: An Expanding Comparative Analysis Resource

    Energy Technology Data Exchange (ETDEWEB)

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Anderson, Iain; Lykidis, Athanasios; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2009-09-13

    The integrated microbial genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG contains both draft and complete microbial genomes integrated with other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. Since its first release in 2005, IMG's data content and analytical capabilities have been constantly expanded through regular releases. Several companion IMG systems have been set up in order to serve domain specific needs, such as expert review of genome annotations. IMG is available at .

  12. Carotenoid biosynthetic genes in Brassica rapa: comparative genomic analysis, phylogenetic analysis, and expression profiling

    OpenAIRE

    Li, Peirong; Zhang, Shujiang; Zhang, Shifan; Li, Fei; Zhang, Hui; Cheng, Feng; Wu, Jian; Wang, Xiaowu; Sun, Rifei

    2015-01-01

    Background Carotenoids are isoprenoid compounds synthesized by all photosynthetic organisms. Despite much research on carotenoid biosynthesis in the model plant Arabidopsis thaliana, there is a lack of information on the carotenoid pathway in Brassica rapa. To better understand its carotenoid biosynthetic pathway, we performed a systematic analysis of carotenoid biosynthetic genes at the genome level in B. rapa. Results We identified 67 carotenoid biosynthetic genes in B. rapa, which were ort...

  13. The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats.

    Science.gov (United States)

    van der Weide, Robin H; Simonis, Marieke; Hermsen, Roel; Toonen, Pim; Cuppen, Edwin; de Ligt, Joep

    2016-01-01

    Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts.

  14. [Phylogenetic relationships and intraspecific variation of D-genome Aegilops L. as revealed by RAPD analysis].

    Science.gov (United States)

    Goriunova, S V; Kochieva, E Z; Chikida, N N; Pukhal'skiĭ, V A

    2004-05-01

    RAPD analysis was carried out to study the genetic variation and phylogenetic relationships of polyploid Aegilops species, which contain the D genome as a component of the alloploid genome, and diploid Aegilops tauschii, which is a putative donor of the D genome for common wheat. In total, 74 accessions of six D-genome Aegilops species were examined. The highest intraspecific variation (0.03-0.21) was observed for Ae. tauschii. Intraspecific distances between accessions ranged 0.007-0.067 in Ae. cylindrica, 0.017-0.047 in Ae. vavilovii, and 0.00-0.053 in Ae. juvenalis. Likewise, Ae. ventricosa and Ae. crassa showed low intraspecific polymorphism. The among-accession difference in alloploid Ae. ventricosa (genome DvNv) was similar to that of one parental species, Ae. uniaristata (N), and substantially lower than in the other parent, Ae. tauschii (D). The among-accession difference in Ae. cylindrica (CcDc) was considerably lower than in either parent, Ae. tauschii (D) or Ae. caudata (C). With the exception of Ae. cylindrica, all D-genome species--Ae. tauschii (D), Ae. ventricosa (DvNv), Ae. crassa (XcrDcrl and XcrDcrlDcr2), Ae. juvenalis (XjDjUj), and Ae. vavilovii (XvaDvaSva)--formed a single polymorphic cluster, which was distinct from clusters of other species. The only exception, Ae. cylindrica, did not group with the other D-genome species, but clustered with Ae. caudata (C), a donor of the C genome. The cluster of these two species was clearly distinct from the cluster of the other D-genome species and close to a cluster of Ae. umbellulata (genome U) and Ae. ovata (genome UgMg). Thus, RAPD analysis for the first time was used to estimate and to compare the interpopulation polymorphism and to establish the phylogenetic relationships of all diploid and alloploid D-genome Aegilops species.

  15. Assigning protein functions by comparative genome analysis protein phylogenetic profiles

    Science.gov (United States)

    Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.

    2003-05-13

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  16. Determining protein function and interaction from genome analysis

    Science.gov (United States)

    Eisenberg, David; Marcotte, Edward M.; Thompson, Michael J.; Pellegrini, Matteo; Yeates, Todd O.

    2004-08-03

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  17. Integrated proteomic and genomic analysis of colorectal cancer

    Science.gov (United States)

    Investigators who analyzed 95 human colorectal tumor samples have determined how gene alterations identified in previous analyses of the same samples are expressed at the protein level. The integration of proteomic and genomic data, or proteogenomics, pro

  18. Comparative genomics and phylogenetic analysis of S. dysenteriae subgroup

    Institute of Scientific and Technical Information of China (English)

    YANG; E; BIN; Wen; PENG; Junping; ZHANG; Xiaobing; WANG; Ji

    2005-01-01

    Genomic compositions of representatives of thirteen S. Dysenteriae serotypes were investigated by performing comparative genomic hybridization (CGH) with microarray containing the whole genomic ORFs (open reading frames, ORFs) of E. Coli K12 strain MG1655 and specific ORFs of S. Dysenteriae A1 strain Sd51197. The CGH results indicated the genomes of the serotypes contain 2654 conserved ORFs originating from E. Coli. However, 219 intrinsic genes of E. Coli including those prophage genes, molecular chaperones, synthesis of specific O antigen and so on were absent. Moreover, some specific genes such as type II secretion system associated components, iron transport related genes and some others as well were acquired through horizontal transfer. According to phylogenic trees based on genetic composition, it was demonstrated that A1, A2, A8, A10 were distinct from the other S. Dysenteriae serotypes. Our results in this report may provide new insights into the physiological process, pathogenicity and evolution of S. Dysenteriae.

  19. Comparative bacterial proteomics: analysis of the core genome concept.

    Directory of Open Access Journals (Sweden)

    Stephen J Callister

    Full Text Available While comparative bacterial genomic studies commonly predict a set of genes indicative of common ancestry, experimental validation of the existence of this core genome requires extensive measurement and is typically not undertaken. Enabled by an extensive proteome database developed over six years, we have experimentally verified the expression of proteins predicted from genomic ortholog comparisons among 17 environmental and pathogenic bacteria. More exclusive relationships were observed among the expressed protein content of phenotypically related bacteria, which is indicative of the specific lifestyles associated with these organisms. Although genomic studies can establish relative orthologous relationships among a set of bacteria and propose a set of ancestral genes, our proteomics study establishes expressed lifestyle differences among conserved genes and proposes a set of expressed ancestral traits.

  20. Comparative Bacterial Proteomics: Analysis of the Core Genome Concept

    Energy Technology Data Exchange (ETDEWEB)

    Callister, Stephen J.; McCue, Lee Ann; Turse, Josh E.; Monroe, Matthew E.; Auberry, Kenneth J.; Smith, Richard D.; Adkins, Joshua N.; Lipton, Mary S.

    2008-02-06

    Comparative bacterial genomic studies commonly predict a set of genes indicative of common ancestry. Experimental validation of the existence of this core genome requires extensive measurement and is not typically undertaken. Enabled by an extensive proteome database development over a six year period, we experimentally verified the expression of proteins predicted from genomic ortholog comparisons among 17 environmental and pathogenic bacteria. More exclusive relationships were observed among the expressed protein content of phenotypically related bacteria, which is indicative of the specific lifestyles associated with these organisms. While genomic studies establish relative orthologous relationships among a set of bacteria and propose a set of ancestral genes, our proteomics study establishes expressed lifestyle differences among conserved genes and proposes a set of expressed ancestral traits.

  1. Analysis of high-identity segmental duplications in the grapevine genome

    Directory of Open Access Journals (Sweden)

    Carelli Francesco N

    2011-08-01

    Full Text Available Abstract Background Segmental duplications (SDs are blocks of genomic sequence of 1-200 kb that map to different loci in a genome and share a sequence identity > 90%. SDs show at the sequence level the same characteristics as other regions of the human genome: they contain both high-copy repeats and gene sequences. SDs play an important role in genome plasticity by creating new genes and modeling genome structure. Although data is plentiful for mammals, not much was known about the representation of SDs in plant genomes. In this regard, we performed a genome-wide analysis of high-identity SDs on the sequenced grapevine (Vitis vinifera genome (PN40024. Results We demonstrate that recent SDs (> 94% identity and >= 10 kb in size are a relevant component of the grapevine genome (85 Mb, 17% of the genome sequence. We detected mitochondrial and plastid DNA and genes (10% of gene annotation in segmentally duplicated regions of the nuclear genome. In particular, the nine highest copy number genes have a copy in either or both organelle genomes. Further we showed that several duplicated genes take part in the biosynthesis of compounds involved in plant response to environmental stress. Conclusions These data show the great influence of SDs and organelle DNA transfers in modeling the Vitis vinifera nuclear DNA structure as well as the impact of SDs in contributing to the adaptive capacity of grapevine and the nutritional content of grape products through genome variation. This study represents a step forward in the full characterization of duplicated genes important for grapevine cultural needs and human health.

  2. First fungal genome sequence from Africa: A preliminary analysis

    Directory of Open Access Journals (Sweden)

    Rene Sutherland

    2012-01-01

    Full Text Available Some of the most significant breakthroughs in the biological sciences this century will emerge from the development of next generation sequencing technologies. The ease of availability of DNA sequence made possible through these new technologies has given researchers opportunities to study organisms in a manner that was not possible with Sanger sequencing. Scientists will, therefore, need to embrace genomics, as well as develop and nurture the human capacity to sequence genomes and utilise the ’tsunami‘ of data that emerge from genome sequencing. In response to these challenges, we sequenced the genome of Fusarium circinatum, a fungal pathogen of pine that causes pitch canker, a disease of great concern to the South African forestry industry. The sequencing work was conducted in South Africa, making F. circinatum the first eukaryotic organism for which the complete genome has been sequenced locally. Here we report on the process that was followed to sequence, assemble and perform a preliminary characterisation of the genome. Furthermore, details of the computer annotation and manual curation of this genome are presented. The F. circinatum genome was found to be nearly 44 million bases in size, which is similar to that of four other Fusarium genomes that have been sequenced elsewhere. The genome contains just over 15 000 open reading frames, which is less than that of the related species, Fusarium oxysporum, but more than that for Fusarium verticillioides. Amongst the various putative gene clusters identified in F. circinatum, those encoding the secondary metabolites fumosin and fusarin appeared to harbour evidence of gene translocation. It is anticipated that similar comparisons of other loci will provide insights into the genetic basis for pathogenicity of the pitch canker pathogen. Perhaps more importantly, this project has engaged a relatively large group of scientists

  3. Genome Sequencing and Comparative Genomics Analysis Revealed Pathogenic Potential in Penicillium capsulatum as a Novel Fungal Pathogen Belonging to Eurotiales

    Science.gov (United States)

    Yang, Ying; Chen, Min; Li, Zongwei; Al-Hatmi, Abdullah M. S.; de Hoog, Sybren; Pan, Weihua; Ye, Qiang; Bo, Xiaochen; Li, Zhen; Wang, Shengqi; Wang, Junzhi; Chen, Huipeng; Liao, Wanqing

    2016-01-01

    Penicillium capsulatum is a rare Penicillium species used in paper manufacturing, but recently it has been reported to cause invasive infection. To research the pathogenicity of the clinical Penicillium strain, we sequenced the genomes and transcriptomes of the clinical and environmental strains of P. capsulatum. Comparative analyses of these two P. capsulatum strains and close related strains belonging to Eurotiales were performed. The assembled genome sizes of P. capsulatum are approximately 34.4 Mbp in length and encode 11,080 predicted genes. The different isolates of P. capsulatum are highly similar, with the exception of several unique genes, INDELs or SNPs in the genes coding for glycosyl hydrolases, amino acid transporters and circumsporozoite protein. A phylogenomic analysis was performed based on the whole genome data of 38 strains belonging to Eurotiales. By comparing the whole genome sequences and the virulence-related genes from 20 important related species, including fungal pathogens and non-human pathogens belonging to Eurotiales, we found meaningful pathogenicity characteristics between P. capsulatum and its closely related species. Our research indicated that P. capsulatum may be a neglected opportunistic pathogen. This study is beneficial for mycologists, geneticists and epidemiologists to achieve a deeper understanding of the genetic basis of the role of P. capsulatum as a newly reported fungal pathogen. PMID:27761131

  4. Analysis of Human Accelerated DNA Regions Using Archaic Hominin Genomes

    Science.gov (United States)

    Burbano, Hernán A.; Green, Richard E.; Maricic, Tomislav; Lalueza-Fox, Carles; de la Rasilla, Marco; Rosas, Antonio; Kelso, Janet; Pollard, Katherine S.; Lachmann, Michael; Pääbo, Svante

    2012-01-01

    Several previous comparisons of the human genome with other primate and vertebrate genomes identified genomic regions that are highly conserved in vertebrate evolution but fast-evolving on the human lineage. These human accelerated regions (HARs) may be regions of past adaptive evolution in humans. Alternatively, they may be the result of non-adaptive processes, such as biased gene conversion. We captured and sequenced DNA from a collection of previously published HARs using DNA from an Iberian Neandertal. Combining these new data with shotgun sequence from the Neandertal and Denisova draft genomes, we determine at least one archaic hominin allele for 84% of all positions within HARs. We find that 8% of HAR substitutions are not observed in the archaic hominins and are thus recent in the sense that the derived allele had not come to fixation in the common ancestor of modern humans and archaic hominins. Further, we find that recent substitutions in HARs tend to have come to fixation faster than substitutions elsewhere in the genome and that substitutions in HARs tend to cluster in time, consistent with an episodic rather than a clock-like process underlying HAR evolution. Our catalog of sequence changes in HARs will help prioritize them for functional studies of genomic elements potentially responsible for modern human adaptations. PMID:22412940

  5. Assembly, Annotation, and Analysis of Multiple Mycorrhizal Fungal Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Initiative Consortium, Mycorrhizal Genomics; Kuo, Alan; Grigoriev, Igor; Kohler, Annegret; Martin, Francis

    2013-03-08

    Mycorrhizal fungi play critical roles in host plant health, soil community structure and chemistry, and carbon and nutrient cycling, all areas of intense interest to the US Dept. of Energy (DOE) Joint Genome Institute (JGI). To this end we are building on our earlier sequencing of the Laccaria bicolor genome by partnering with INRA-Nancy and the mycorrhizal research community in the MGI to sequence and analyze dozens of mycorrhizal genomes of all Basidiomycota and Ascomycota orders and multiple ecological types (ericoid, orchid, and ectomycorrhizal). JGI has developed and deployed high-throughput sequencing techniques, and Assembly, RNASeq, and Annotation Pipelines. In 2012 alone we sequenced, assembled, and annotated 12 draft or improved genomes of mycorrhizae, and predicted ~;;232831 genes and ~;;15011 multigene families, All of this data is publicly available on JGI MycoCosm (http://jgi.doe.gov/fungi/), which provides access to both the genome data and tools with which to analyze the data. Preliminary comparisons of the current total of 14 public mycorrhizal genomes suggest that 1) short secreted proteins potentially involved in symbiosis are more enriched in some orders than in others amongst the mycorrhizal Agaricomycetes, 2) there are wide ranges of numbers of genes involved in certain functional categories, such as signal transduction and post-translational modification, and 3) novel gene families are specific to some ecological types.

  6. Ultra-high vacuum surface analysis study of rhodopsin incorporation into supported lipid bilayers.

    Science.gov (United States)

    Michel, Roger; Subramaniam, Varuni; McArthur, Sally L; Bondurant, Bruce; D'Ambruoso, Gemma D; Hall, Henry K; Brown, Michael F; Ross, Eric E; Saavedra, S Scott; Castner, David G

    2008-05-06

    Planar supported lipid bilayers that are stable under ambient atmospheric and ultra-high-vacuum conditions were prepared by cross-linking polymerization of bis-sorbylphosphatidylcholine (bis-SorbPC). X-ray photoelectron spectroscopy (XPS) and time-of-flight secondary ion mass spectrometry (ToF-SIMS) were employed to investigate bilayers that were cross-linked using either redox-initiated radical polymerization or ultraviolet photopolymerization. The redox method yields a more structurally intact bilayer; however, the UV method is more compatible with incorporation of transmembrane proteins. UV polymerization was therefore used to prepare cross-linked bilayers with incorporated bovine rhodopsin, a light-activated, G-protein-coupled receptor (GPCR). A previous study (Subramaniam, V.; Alves, I. D.; Salgado, G. F. J.; Lau, P. W.; Wysocki, R. J.; Salamon, Z.; Tollin, G.; Hruby, V. J.; Brown, M. F.; Saavedra, S. S. J. Am. Chem. Soc. 2005, 127, 5320-5321) showed that rhodopsin retains photoactivity after incorporation into UV-polymerized bis-SorbPC, but did not address how the protein is associated with the bilayer. In this study, we show that rhodopsin is retained in supported bilayers of poly(bis-SorbPC) under ultra-high-vacuum conditions, on the basis of the increase in the XPS nitrogen concentration and the presence of characteristic amino acid peaks in the ToF-SIMS data. Angle-resolved XPS data show that the protein is inserted into the bilayer, rather than adsorbed on the bilayer surface. This is the first study to demonstrate the use of ultra-high-vacuum techniques for structural studies of supported proteolipid bilayers.

  7. Flow cytometric analysis of T lymphocyte proliferation in vivo by EdU incorporation.

    Science.gov (United States)

    Sun, Xiaojing; Zhang, Chunpan; Jin, Hua; Sun, Guangyong; Tian, Yue; Shi, Wen; Zhang, Dong

    2016-12-01

    Monitoring T lymphocyte proliferation, especially in vivo, is essential for the evaluation of adaptive immune reactions. Flow cytometry-based proliferation assays have advantages in measuring cell division of different T lymphocyte subsets at the same time by multicolor labelling. In this study, we aimed to establish the use of 5-Ethynyl-2'-deoxyuridine (EdU) incorporation in vivo to monitor T lymphocyte proliferation by flow cytometry with an adoptive transfer model. We found that fixation followed by permeabilization preserved T cell surface antigens and had no obvious effects on the fluorescence intensity of APC, PE, PE-Cy7, FITC and PerCP-Cy5.5 when the concentration of the permeabilization reagents was optimized. However, the click reaction resulted in a significant decrease in the fluorescence intensity of PE and PE-Cy7, and surface staining after the click reaction improved the fluorescence intensity. Thus, an extra step of blocking with PBS with 3% FBS between the click reaction and cell surface staining is needed. Furthermore, the percentage of EdU-positive cells increased in a dose-dependent manner, and the saturated dose of EdU was 20mg/kg. Intraperitoneal and intravenous injection had no differences in lymphocyte proliferation detection with EdU in vivo. In addition, T cell proliferation measured by EdU incorporation was comparable to BrdU but was lower than CFSE labelling. In conclusion, we optimized the protocols for EdU administration in vivo and staining in vitro, providing a feasible method for the measurement of T lymphocyte proliferation with EdU incorporation by flow cytometry in vivo.

  8. BATCH-GE: Batch analysis of Next-Generation Sequencing data for genome editing assessment.

    Science.gov (United States)

    Boel, Annekatrien; Steyaert, Woutert; De Rocker, Nina; Menten, Björn; Callewaert, Bert; De Paepe, Anne; Coucke, Paul; Willaert, Andy

    2016-07-27

    Targeted mutagenesis by the CRISPR/Cas9 system is currently revolutionizing genetics. The ease of this technique has enabled genome engineering in-vitro and in a range of model organisms and has pushed experimental dimensions to unprecedented proportions. Due to its tremendous progress in terms of speed, read length, throughput and cost, Next-Generation Sequencing (NGS) has been increasingly used for the analysis of CRISPR/Cas9 genome editing experiments. However, the current tools for genome editing assessment lack flexibility and fall short in the analysis of large amounts of NGS data. Therefore, we designed BATCH-GE, an easy-to-use bioinformatics tool for batch analysis of NGS-generated genome editing data, available from https://github.com/WouterSteyaert/BATCH-GE.git. BATCH-GE detects and reports indel mutations and other precise genome editing events and calculates the corresponding mutagenesis efficiencies for a large number of samples in parallel. Furthermore, this new tool provides flexibility by allowing the user to adapt a number of input variables. The performance of BATCH-GE was evaluated in two genome editing experiments, aiming to generate knock-out and knock-in zebrafish mutants. This tool will not only contribute to the evaluation of CRISPR/Cas9-based experiments, but will be of use in any genome editing experiment and has the ability to analyze data from every organism with a sequenced genome.

  9. Human genomic DNA analysis using a semi-automated sample preparation, amplification, and electrophoresis separation platform.

    Science.gov (United States)

    Raisi, Fariba; Blizard, Benjamin A; Raissi Shabari, Akbar; Ching, Jesus; Kintz, Gregory J; Mitchell, Jim; Lemoff, Asuncion; Taylor, Mike T; Weir, Fred; Western, Linda; Wong, Wendy; Joshi, Rekha; Howland, Pamela; Chauhan, Avinash; Nguyen, Peter; Petersen, Kurt E

    2004-03-01

    The growing importance of analyzing the human genome to detect hereditary and infectious diseases associated with specific DNA sequences has motivated us to develop automated devices to integrate sample preparation, real-time PCR, and microchannel electrophoresis (MCE). In this report, we present results from an optimized compact system capable of processing a raw sample of blood, extracting the DNA, and performing a multiplexed PCR reaction. Finally, an innovative electrophoretic separation was performed on the post-PCR products using a unique MCE system. The sample preparation system extracted and lysed white blood cells (WBC) from whole blood, producing DNA of sufficient quantity and quality for a polymerase chain reaction (PCR). Separation of multiple amplicons was achieved in a microfabricated channel 30 microm x 100 microm in cross section and 85 mm in length filled with a replaceable methyl cellulose matrix operated under denaturing conditions at 50 degrees C. By incorporating fluorescent-labeled primers in the PCR, the amplicons were identified by a two-color (multiplexed) fluorescence detection system. Two base-pair resolution of single-stranded DNA (PCR products) was achieved. We believe that this integrated system provides a unique solution for DNA analysis.

  10. DEVELOPMENT OF NEW SEQUENCING TECHNOLOGIES AND THEIR APPLICATION IN GENOME ANALYSIS OF DOMESTIC ANIMALS

    Directory of Open Access Journals (Sweden)

    Kristina Gvozdanović

    2015-12-01

    Full Text Available Sequencing and detailed study of the genom of domestic animals began in the middle of the last century. It was primarily referred to development of the first generation sequencing methods, i.e. Sanger sequencing method. Next generation sequencing methods are currently the most common methods in the analysis of domestic animals genom. The application of these methods gave us up to 100 time more data in comparison with Sanger method. Analyses including RNA sequencing, genotyping of whole genome, immunoprecipitation associated with DNA microarrays, detection ofmutations and inherited diseases, sequencing ofthemitochondrial genome and many others have been conducted with development and application of new sequencing methods since 2005 until today. Application of new sequencing methods in the analysis ofdomestic animal genome provides better understanding of the genetic basis for important production traits which could help in improving the livestock production.

  11. Draft genome sequence and detailed analysis of Pantoea eucrina strain Russ and implication for opportunistic pathogenesis

    Directory of Open Access Journals (Sweden)

    Farzaneh Moghadam

    2016-12-01

    Full Text Available The genus Pantoea is a predominant member of host-associated microbiome. We here report on the genomic analysis of Pantoea eucrina strain Russ that was isolated from a trashcan at Oklahoma State University, Stillwater, OK. The draft genome of Pantoea eucrina strain Russ consists of 3,939,877 bp of DNA with 3704 protein-coding genes and 134 RNA genes. This is the first report of a genome sequence of a member of Pantoea eucrina. Genomic analysis revealed metabolic versatility with genes involved in the metabolism and transport of all amino acids as well as glucose, fructose, mannose, xylose, arabinose and galactose, suggesting the organism is a versatile heterotroph. The genome also encodes an extensive secretory machinery including types I, II, III, IV, and Vb secretion systems, and several genes for pili production including the new usher/chaperone system (pfam 05,229. The implications of these systems for opportunistic pathogenesis are discussed.

  12. Genome sequence of Cronobacter sakazakii BAA-894 and comparative genomic hybridization analysis with other Cronobacter species.

    Directory of Open Access Journals (Sweden)

    Eva Kucerova

    Full Text Available BACKGROUND: The genus Cronobacter (formerly called Enterobacter sakazakii is composed of five species; C. sakazakii, C. malonaticus, C. turicensis, C. muytjensii, and C. dublinensis. The genus includes opportunistic human pathogens, and the first three species have been associated with neonatal infections. The most severe diseases are caused in neonates and include fatal necrotizing enterocolitis and meningitis. The genetic basis of the diversity within the genus is unknown, and few virulence traits have been identified. METHODOLOGY/PRINCIPAL FINDINGS: We report here the first sequence of a member of this genus, C. sakazakii strain BAA-894. The genome of Cronobacter sakazakii strain BAA-894 comprises a 4.4 Mb chromosome (57% GC content and two plasmids; 31 kb (51% GC and 131 kb (56% GC. The genome was used to construct a 387,000 probe oligonucleotide tiling DNA microarray covering the whole genome. Comparative genomic hybridization (CGH was undertaken on five other C. sakazakii strains, and representatives of the four other Cronobacter species. Among 4,382 annotated genes inspected in this study, about 55% of genes were common to all C. sakazakii strains and 43% were common to all Cronobacter strains, with 10-17% absence of genes. CONCLUSIONS/SIGNIFICANCE: CGH highlighted 15 clusters of genes in C. sakazakii BAA-894 that were divergent or absent in more than half of the tested strains; six of these are of probable prophage origin. Putative virulence factors were identified in these prophage and in other variable regions. A number of genes unique to Cronobacter species associated with neonatal infections (C. sakazakii, C. malonaticus and C. turicensis were identified. These included a copper and silver resistance system known to be linked to invasion of the blood-brain barrier by neonatal meningitic strains of Escherichia coli. In addition, genes encoding for multidrug efflux pumps and adhesins were identified that were unique to C. sakazakii

  13. The genome sequence of E. coli W (ATCC 9637: comparative genome analysis and an improved genome-scale reconstruction of E. coli

    Directory of Open Access Journals (Sweden)

    Lee Sang

    2011-01-01

    Full Text Available Abstract Background Escherichia coli is a model prokaryote, an important pathogen, and a key organism for industrial biotechnology. E. coli W (ATCC 9637, one of four strains designated as safe for laboratory purposes, has not been sequenced. E. coli W is a fast-growing strain and is the only safe strain that can utilize sucrose as a carbon source. Lifecycle analysis has demonstrated that sucrose from sugarcane is a preferred carbon source for industrial bioprocesses. Results We have sequenced and annotated the genome of E. coli W. The chromosome is 4,900,968 bp and encodes 4,764 ORFs. Two plasmids, pRK1 (102,536 bp and pRK2 (5,360 bp, are also present. W has unique features relative to other sequenced laboratory strains (K-12, B and Crooks: it has a larger genome and belongs to phylogroup B1 rather than A. W also grows on a much broader range of carbon sources than does K-12. A genome-scale reconstruction was developed and validated in order to interrogate metabolic properties. Conclusions The genome of W is more similar to commensal and pathogenic B1 strains than phylogroup A strains, and therefore has greater utility for comparative analyses with these strains. W should therefore be the strain of choice, or 'type strain' for group B1 comparative analyses. The genome annotation and tools created here are expected to allow further utilization and development of E. coli W as an industrial organism for sucrose-based bioprocesses. Refinements in our E. coli metabolic reconstruction allow it to more accurately define E. coli metabolism relative to previous models.

  14. Analysis of the genome content of Lactococcus garvieae by genomic interspecies microarray hybridization

    Directory of Open Access Journals (Sweden)

    Gibello Alicia

    2010-03-01

    Full Text Available Abstract Background Lactococcus garvieae is a bacterial pathogen that affects different animal species in addition to humans. Despite the widespread distribution and emerging clinical significance of L. garvieae in both veterinary and human medicine, there is almost a complete lack of knowledge about the genetic content of this microorganism. In the present study, the genomic content of L. garvieae CECT 4531 was analysed using bioinformatics tools and microarray-based comparative genomic hybridization (CGH experiments. Lactococcus lactis subsp. lactis IL1403 and Streptococcus pneumoniae TIGR4 were used as reference microorganisms. Results The combination and integration of in silico analyses and in vitro CGH experiments, performed in comparison with the reference microorganisms, allowed establishment of an inter-species hybridization framework with a detection threshold based on a sequence similarity of ≥ 70%. With this threshold value, 267 genes were identified as having an analogue in L. garvieae, most of which (n = 258 have been documented for the first time in this pathogen. Most of the genes are related to ribosomal, sugar metabolism or energy conversion systems. Some of the identified genes, such as als and mycA, could be involved in the pathogenesis of L. garvieae infections. Conclusions In this study, we identified 267 genes that were potentially present in L. garvieae CECT 4531. Some of the identified genes could be involved in the pathogenesis of L. garvieae infections. These results provide the first insight into the genome content of L. garvieae.

  15. Genome sequencing and analysis of the first complete genome of Lactobacillus kunkeei strain MP2, an Apis mellifera gut isolate

    Science.gov (United States)

    Asenjo, Freddy; Olmos, Alejandro; Henríquez-Piskulich, Patricia; Polanco, Victor; Aldea, Patricia

    2016-01-01

    Background. The honey bee (Apis mellifera) is the most important pollinator in agriculture worldwide. However, the number of honey bees has fallen significantly since 2006, becoming a huge ecological problem nowadays. The principal cause is CCD, or Colony Collapse Disorder, characterized by the seemingly spontaneous abandonment of hives by their workers. One of the characteristics of CCD in honey bees is the alteration of the bacterial communities in their gastrointestinal tract, mainly due to the decrease of Firmicutes populations, such as the Lactobacilli. At this time, the causes of these alterations remain unknown. We recently isolated a strain of Lactobacillus kunkeei (L. kunkeei strain MP2) from the gut of Chilean honey bees. L. kunkeei, is one of the most commonly isolated bacterium from the honey bee gut and is highly versatile in different ecological niches. In this study, we aimed to elucidate in detail, the L. kunkeei genetic background and perform a comparative genome analysis with other Lactobacillus species. Methods. L. kunkeei MP2 was originally isolated from the guts of Chilean A. mellifera individuals. Genome sequencing was done using Pacific Biosciences single-molecule real-time sequencing technology. De novo assembly was performed using Celera assembler. The genome was annotated using Prokka, and functional information was added using the EggNOG 3.1 database. In addition, genomic islands were predicted using IslandViewer, and pro-phage sequences using PHAST. Comparisons between L. kunkeei MP2 with other L. kunkeei, and Lactobacillus strains were done using Roary. Results. The complete genome of L. kunkeei MP2 comprises one circular chromosome of 1,614,522 nt. with a GC content of 36,9%. Pangenome analysis with 16 L. kunkeei strains, identified 113 unique genes, most of them related to phage insertions. A large and unique region of L. kunkeei MP2 genome contains several genes that encode for phage structural protein and replication components

  16. Genome sequencing and analysis of the first complete genome of Lactobacillus kunkeei strain MP2, an Apis mellifera gut isolate

    Directory of Open Access Journals (Sweden)

    Freddy Asenjo

    2016-04-01

    Full Text Available Background. The honey bee (Apis mellifera is the most important pollinator in agriculture worldwide. However, the number of honey bees has fallen significantly since 2006, becoming a huge ecological problem nowadays. The principal cause is CCD, or Colony Collapse Disorder, characterized by the seemingly spontaneous abandonment of hives by their workers. One of the characteristics of CCD in honey bees is the alteration of the bacterial communities in their gastrointestinal tract, mainly due to the decrease of Firmicutes populations, such as the Lactobacilli. At this time, the causes of these alterations remain unknown. We recently isolated a strain of Lactobacillus kunkeei (L. kunkeei strain MP2 from the gut of Chilean honey bees. L. kunkeei, is one of the most commonly isolated bacterium from the honey bee gut and is highly versatile in different ecological niches. In this study, we aimed to elucidate in detail, the L. kunkeei genetic background and perform a comparative genome analysis with other Lactobacillus species. Methods. L. kunkeei MP2 was originally isolated from the guts of Chilean A. mellifera individuals. Genome sequencing was done using Pacific Biosciences single-molecule real-time sequencing technology. De novo assembly was performed using Celera assembler. The genome was annotated using Prokka, and functional information was added using the EggNOG 3.1 database. In addition, genomic islands were predicted using IslandViewer, and pro-phage sequences using PHAST. Comparisons between L. kunkeei MP2 with other L. kunkeei, and Lactobacillus strains were done using Roary. Results. The complete genome of L. kunkeei MP2 comprises one circular chromosome of 1,614,522 nt. with a GC content of 36,9%. Pangenome analysis with 16 L. kunkeei strains, identified 113 unique genes, most of them related to phage insertions. A large and unique region of L. kunkeei MP2 genome contains several genes that encode for phage structural protein and

  17. Genome sequencing and analysis of the first complete genome of Lactobacillus kunkeei strain MP2, an Apis mellifera gut isolate.

    Science.gov (United States)

    Asenjo, Freddy; Olmos, Alejandro; Henríquez-Piskulich, Patricia; Polanco, Victor; Aldea, Patricia; Ugalde, Juan A; Trombert, Annette N

    2016-01-01

    Background. The honey bee (Apis mellifera) is the most important pollinator in agriculture worldwide. However, the number of honey bees has fallen significantly since 2006, becoming a huge ecological problem nowadays. The principal cause is CCD, or Colony Collapse Disorder, characterized by the seemingly spontaneous abandonment of hives by their workers. One of the characteristics of CCD in honey bees is the alteration of the bacterial communities in their gastrointestinal tract, mainly due to the decrease of Firmicutes populations, such as the Lactobacilli. At this time, the causes of these alterations remain unknown. We recently isolated a strain of Lactobacillus kunkeei (L. kunkeei strain MP2) from the gut of Chilean honey bees. L. kunkeei, is one of the most commonly isolated bacterium from the honey bee gut and is highly versatile in different ecological niches. In this study, we aimed to elucidate in detail, the L. kunkeei genetic background and perform a comparative genome analysis with other Lactobacillus species. Methods. L. kunkeei MP2 was originally isolated from the guts of Chilean A. mellifera individuals. Genome sequencing was done using Pacific Biosciences single-molecule real-time sequencing technology. De novo assembly was performed using Celera assembler. The genome was annotated using Prokka, and functional information was added using the EggNOG 3.1 database. In addition, genomic islands were predicted using IslandViewer, and pro-phage sequences using PHAST. Comparisons between L. kunkeei MP2 with other L. kunkeei, and Lactobacillus strains were done using Roary. Results. The complete genome of L. kunkeei MP2 comprises one circular chromosome of 1,614,522 nt. with a GC content of 36,9%. Pangenome analysis with 16 L. kunkeei strains, identified 113 unique genes, most of them related to phage insertions. A large and unique region of L. kunkeei MP2 genome contains several genes that encode for phage structural protein and replication components

  18. Micro and nanofluidic structures for cell sorting and genomic analysis

    Science.gov (United States)

    Morton, Keith J.

    Microfluidic systems promise rapid analysis of small samples in a compact and inexpensive format. But direct scaling of lab bench protocols on-chip is challenging because laminar flows in typical microfluidic devices are characterized by non-mixing streamlines. Common microfluidic mixers and sorters work by diffusion, limiting application to objects that diffuse slowly such as cells and DNA. Recently Huang et.al. developed a passive microfluidic element to continuously separate bio-particles deterministically. In Deterministic Lateral Displacement (DLD), objects are sorted by size as they transit an asymmetric array of microfabricated posts. This thesis further develops DLD arrays with applications in three broad new areas. First the arrays are used, not simply to sort particles, but to move streams of cells through functional flows for chemical treatment---such as on-chip immunofluorescent labeling of blood cells with washing, and on-chip E.coli cell lysis with simultaneous chromosome extraction. Secondly, modular tiling of the basic DLD element is used to construct complex particle handling modes that include beam steering for jets of cells and beads. Thirdly, nanostructured DLD arrays are built using Nanoimprint Lithography (NIL) and continuous-flow separation of 100 nm and 200 nm size particles is demonstrated. Finally a number of ancillary nanofabrication techniques were developed in support of these overall goals, including methods to interface nanofluidic structures with standard microfluidic components such as inlet channels and reservoirs, precision etching of ultra-high aspect ratio (>50:1) silicon nanostructures, and fabrication of narrow (˜ 35 nm) channels used to stretch genomic length DNA.

  19. Incorporation of Multi-Member Substructure Capabilities in FAST for Analysis of Offshore Wind Turbines: Preprint

    Energy Technology Data Exchange (ETDEWEB)

    Song, H.; Robertson, A.; Jonkman, J.; Sewell, D.

    2012-05-01

    FAST, developed by the National Renewable Energy Laboratory (NREL), is an aero-hydro-servo-elastic tool widely used for analyzing onshore and offshore wind turbines. This paper discusses recent modifications made to FAST to enable the examination of offshore wind turbines with fixed-bottom, multi-member support structures (which are commonly used in transitional-depth waters).; This paper addresses the methods used for incorporating the hydrostatic and hydrodynamic loading on multi-member structures in FAST through its hydronamic loading module, HydroDyn. Modeling of the hydrodynamic loads was accomplished through the incorporation of Morison and buoyancy loads on the support structures. Issues addressed include how to model loads at the joints of intersecting members and on tapered and tilted members of the support structure. Three example structures are modeled to test and verify the solutions generated by the modifications to HydroDyn, including a monopile, tripod, and jacket structure. Verification is achieved through comparison of the results to a computational fluid dynamics (CFD)-derived solution using the commercial software tool STAR-CCM+.

  20. Distinct origin of the Y and St genome in Elymus species: evidence from the analysis of a large sample of St genome species using two nuclear genes.

    Directory of Open Access Journals (Sweden)

    Chi Yan

    Full Text Available BACKGROUND: Previous cytological and single copy nuclear genes data suggested the St and Y genome in the StY-genomic Elymus species originated from different donors: the St from a diploid species in Pseudoroegneria and the Y from an unknown diploid species, which are now extinct or undiscovered. However, ITS data suggested that the Y and St genome shared the same progenitor although rather few St genome species were studied. In a recent analysis of many samples of St genome species Pseudoroegneria spicata (Pursh À. Löve suggested that one accession of P. spicata species was the most likely donor of the Y genome. The present study tested whether intraspecific variation during sampling could affect the outcome of analyses to determining the origin of Y genome in allotetraploid StY species. We also explored the evolutionary dynamics of these species. METHODOLOGY/PRINCIPAL FINDINGS: Two single copy nuclear genes, the second largest subunit of RNA polymerase II (RPB2 and the translation elongation factor G (EF-G sequences from 58 accessions of Pseudoroegneria and Elymus species, together with those from Hordeum (H, Agropyron (P, Australopyrum (W, Lophopyrum (E(e, Thinopyrum (E(a, Thinopyrum (E(b, and Dasypyrum (V were analyzed using maximum parsimony, maximum likelihood and Bayesian methods. Sequence comparisons among all these genomes revealed that the St and Y genomes are relatively dissimilar. Extensive sequence variations have been detected not only between the sequences from St and Y genome, but also among the sequences from diploid St genome species. Phylogenetic analyses separated the Y sequences from the St sequences. CONCLUSIONS/SIGNIFICANCE: Our results confirmed that St and Y genome in Elymus species have originated from different donors, and demonstrated that intraspecific variation does not affect the identification of genome origin in polyploids. Moreover, sequence data showed evidence to support the suggestion of the genome

  1. Comparative Genome Analysis of Lolium-Festuca Complex Species

    DEFF Research Database (Denmark)

    Czaban, Adrian; Byrne, Stephen; Sharma, Sapna;

    2015-01-01

    The Lolium-Festuca complex incorporates species from the Lolium genera and the broad leaf Fescues. Plants belonging to this complex exhibit significant phenotypic plasticity for agriculturally important traits, such as annuality/perenniality, establishment potential, growth speed, nutritional val...

  2. Complete genome sequence of Nitrobacter hamburgensis X14 and comparative genomic analysis of species within the genus Nitrobacter.

    Energy Technology Data Exchange (ETDEWEB)

    Starkenburg, Shawn R [Oregon State University; Larimer, Frank W [ORNL; Stein, Lisa Y [University of California, Riverside; Klotz, Martin G [University of Louisville, Louisville; Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Sayavedra-Soto, LA [Oregon State University; Poret-Peterson, Amisha T. [University of Louisville, Louisville; Gentry, ME [University of Louisville, Louisville; Arp, D J [Oregon State University; Ward, Bess B. [Princeton University; Bottomley, Peter J [Oregon State University

    2008-05-01

    The alphaproteobacterium Nitrobacter hamburgensis X14 is a gram-negative facultative chemolithoautotroph that conserves energy from the oxidation of nitrite to nitrate. Sequencing and analysis of the Nitrobacter hamburgensis X14 genome revealed four replicons comprised of one chromosome (4.4 Mbp) and three plasmids (294, 188, and 121 kbp). Over 20% of the genome is composed of pseudogenes and paralogs. Whole-genome comparisons were conducted between N. hamburgensis and the finished and draft genome sequences of Nitrobacter winogradskyi and Nitrobacter sp. strain Nb-311A, respectively. Most of the plasmid-borne genes were unique to N. hamburgensis and encode a variety of functions (central metabolism, energy conservation, conjugation, and heavy metal resistance), yet approximately 21 kb of a approximately 28-kb "autotrophic" island on the largest plasmid was conserved in the chromosomes of Nitrobacter winogradskyi Nb-255 and Nitrobacter sp. strain Nb-311A. The N. hamburgensis chromosome also harbors many unique genes, including those for heme-copper oxidases, cytochrome b(561), and putative pathways for the catabolism of aromatic, organic, and one-carbon compounds, which help verify and extend its mixotrophic potential. A Nitrobacter "subcore" genome was also constructed by removing homologs found in strains of the closest evolutionary relatives, Bradyrhizobium japonicum and Rhodopseudomonas palustris. Among the Nitrobacter subcore inventory (116 genes), copies of genes or gene clusters for nitrite oxidoreductase (NXR), cytochromes associated with a dissimilatory nitrite reductase (NirK), PII-like regulators, and polysaccharide formation were identified. Many of the subcore genes have diverged significantly from, or have origins outside, the alphaproteobacterial lineage and may indicate some of the unique genetic requirements for nitrite oxidation in Nitrobacter.

  3. The Methanosarcina barkeri genome: comparative analysis withMethanosarcina acetivorans and Methanosarcina mazei reveals extensiverearrangement within methanosarcinal genomes

    Energy Technology Data Exchange (ETDEWEB)

    Maeder, Dennis L.; Anderson, Iain; Brettin, Thomas S.; Bruce,David C.; Gilna, Paul; Han, Cliff S.; Lapidus, Alla; Metcalf, William W.; Saunders, Elizabeth; Tapia, Roxanne; Sowers, Kevin R.

    2006-05-19

    We report here a comparative analysis of the genome sequence of Methanosarcina barkeri with those of Methanosarcina acetivorans and Methanosarcina mazei. All three genomes share a conserved double origin of replication and many gene clusters. M. barkeri is distinguished by having an organization that is well conserved with respect to the other Methanosarcinae in the region proximal to the origin of replication with interspecies gene similarities as high as 95%. However it is disordered and marked by increased transposase frequency and decreased gene synteny and gene density in the proximal semi-genome. Of the 3680 open reading frames in M. barkeri, 678 had paralogs with better than 80% similarity to both M. acetivorans and M. mazei while 128 nonhypothetical orfs were unique (non-paralogous) amongst these species including a complete formate dehydrogenase operon, two genes required for N-acetylmuramic acid synthesis, a 14 gene gas vesicle cluster and a bacterial P450-specific ferredoxin reductase cluster not previously observed or characterized in this genus. A cryptic 36 kbp plasmid sequence was detected in M. barkeri that contains an orc1 gene flanked by a presumptive origin of replication consisting of 38 tandem repeats of a 143 nt motif. Three-way comparison of these genomes reveals differing mechanisms for the accrual of changes. Elongation of the large M. acetivorans is the result of multiple gene-scale insertions and duplications uniformly distributed in that genome, while M. barkeri is characterized by localized inversions associated with the loss of gene content. In contrast, the relatively short M. mazei most closely approximates the ancestral organizational state.

  4. Flow cytometric analysis of oil palm: a preliminary analysis for cultivars and genomic DNA alteration

    Directory of Open Access Journals (Sweden)

    Warawut Chuthammathat

    2005-12-01

    Full Text Available DNA contents of oil palm (Elaeis guineensis Jacq. cultivars were analyzed by flow cytometry using different external reference plant species. Analysis using corn (Zea mays line CE-777 as a reference plant gave the highest DNA content of oil palm (4.72±0.23 pg 2C-1 whereas the DNA content was found to be lower when using soybean (Glycine max cv. Polanka (3.77±0.09 pg 2C-1 or tomato (Lycopersicon esculentum cv. Stupicke (4.25±0.09 pg 2C-1 as a reference. The nuclear DNA contents of Dura (D109, Pisifera (P168 and Tenera (T38 cultivars were 3.46±0.04, 3.24±0.03 and 3.76±0.04 pg 2C-1 nuclei, respectively, using soybean as a reference. One haploid genome of oil palm therefore ranged from 1.56 to 1.81±109 base pairs. DNA contents from one-year-old calli and cell suspension of oil palm were found to be significantly different from those of seedlings. It thus should be noted that genomic DNA alteration occurred in these cultured tissues. We therefore confirm that flow cytometric analysis could verify cultivars, DNA content and genomic DNA alteration of oil palm using soybean as an external reference standard.

  5. A Genome-Wide Association Study of Autism Incorporating Autism Diagnostic Interview-Revised, Autism Diagnostic Observation Schedule, and Social Responsiveness Scale

    Science.gov (United States)

    Connolly, John J.; Glessner, Joseph T.; Hakonarson, Hakon

    2013-01-01

    Efforts to understand the causes of autism spectrum disorders (ASDs) have been hampered by genetic complexity and heterogeneity among individuals. One strategy for reducing complexity is to target endophenotypes, simpler biologically based measures that may involve fewer genes and constitute a more homogenous sample. A genome-wide association…

  6. The first complete chloroplast genome sequences of Ulmus species by de novo sequencing: Genome comparative and taxonomic position analysis

    Science.gov (United States)

    Zhang, Shuang; Yu, Xiao-Yue; Ren, Ya-Chao; Yang, Min-Sheng; Wang, Jin-Mao

    2017-01-01

    further analysis of their nuclear genomes. This study is the first report on Ulmus chloroplast genomes, which has significance for understanding photosynthesis, evolution, and chloroplast transgenic engineering. PMID:28158318

  7. Analysis of pan-genome to identify the core genes and essential genes of Brucella spp.

    Science.gov (United States)

    Yang, Xiaowen; Li, Yajie; Zang, Juan; Li, Yexia; Bie, Pengfei; Lu, Yanli; Wu, Qingmin

    2016-04-01

    Brucella spp. are facultative intracellular pathogens, that cause a contagious zoonotic disease, that can result in such outcomes as abortion or sterility in susceptible animal hosts and grave, debilitating illness in humans. For deciphering the survival mechanism of Brucella spp. in vivo, 42 Brucella complete genomes from NCBI were analyzed for the pan-genome and core genome by identification of their composition and function of Brucella genomes. The results showed that the total 132,143 protein-coding genes in these genomes were divided into 5369 clusters. Among these, 1710 clusters were associated with the core genome, 1182 clusters with strain-specific genes and 2477 clusters with dispensable genomes. COG analysis indicated that 44 % of the core genes were devoted to metabolism, which were mainly responsible for energy production and conversion (COG category C), and amino acid transport and metabolism (COG category E). Meanwhile, approximately 35 % of the core genes were in positive selection. In addition, 1252 potential essential genes were predicted in the core genome by comparison with a prokaryote database of essential genes. The results suggested that the core genes in Brucella genomes are relatively conservation, and the energy and amino acid metabolism play a more important role in the process of growth and reproduction in Brucella spp. This study might help us to better understand the mechanisms of Brucella persistent infection and provide some clues for further exploring the gene modules of the intracellular survival in Brucella spp.

  8. Parallel WGA and WTA for Comparative Genome and Transcriptome NGS Analysis Using Tiny Cell Numbers.

    Science.gov (United States)

    Korfhage, Christian; Fricke, Evelyn; Meier, Andreas

    2015-07-01

    Genomic DNA determines how and when the transcriptome is changed by a trigger or environmental change and how cellular metabolism is influenced. Comparative genome and transcriptome analysis of the same cell sample links a defined genome with all changes in the bases, structure, or numbers of the transcriptome. However, comparative genome and transcriptome analysis using next-generation sequencing (NGS) or real-time PCR is often limited by the small amount of sample available. In mammals, the amount of DNA and RNA in a single cell is ∼10 picograms, but deep analysis of the genome and transcriptome currently requires several hundred nanograms of nucleic acids for library preparation for NGS sequencing. Consequently, accurate whole-genome amplification (WGA) and whole-transcriptome amplification (WTA) is required for such quantitative analysis. This unit describes how the genome and the transcriptome of a tiny number of cells can be amplified in a highly parallel and comparable process. Protocols for quality control of amplified DNA and application of amplified DNA for NGS are included.

  9. GENOME SIZE DETERMINATION AND RAPD ANALYSIS OF FOUR EDIBLE AROIDS OF NORTH EAST INDIA

    Directory of Open Access Journals (Sweden)

    Jyoti P. Saikia1*, Bolin K. Konwar 2 and Susmita Singh3

    2010-10-01

    Full Text Available Four edible aroid species were selected for the study. The genomic DNA of the plants was isolated and estimated. A part of the genomic DNA was used for analysis using six different primers from Operon Technologies, USA. The genome size determined for the aroids is in the order of Colocasia esculenta> Xanthosoma caracu> Xanthosoma sagittifolium > Amorphophallus paeonifolius. Amorphophallus species was found to be 50% similar to both Xanthosoma caracu and Colocasia esculenta. The analysis will provide a ground for exploring the vast diversified aroid population of the region.

  10. Analysis of CR1 Repeats in the Zebra Finch Genome

    Directory of Open Access Journals (Sweden)

    George E. Liu

    2013-06-01

    Full Text Available Most bird species have smaller genomes and fewer repeats than mammals. Chicken Repeat 1 (CR1 repeat is one of the most abundant families of repeats, ranging from ~133,000 to ~187,000 copies accounting for ~50 to ~80% of the interspersed repeats in the zebra finch and chicken genomes, respectively. CR1 repeats are believed to have arisen from the retrotransposition of a small number of master elements, which gave rise to multiple CR1 subfamilies in the chicken. In this study, we performed a global assessment of the divergence distributions, phylogenies, and consensus sequences of CR1 repeats in the zebra finch genome. We identified and validated 34 CR1 subfamilies and further analyzed the correlation between these subfamilies. We also discovered 4 novel lineage-specific CR1 subfamilies in the zebra finch when compared to the chicken genome. We built various evolutionary trees of these subfamilies and concluded that CR1 repeats may play an important role in reshaping the structure of bird genomes.

  11. A complete mitochondrial genome sequence of Ogura-type male-sterile cytoplasm and its comparative analysis with that of normal cytoplasm in radish (Raphanus sativus L.

    Directory of Open Access Journals (Sweden)

    Tanaka Yoshiyuki

    2012-07-01

    Full Text Available Abstract Background Plant mitochondrial genome has unique features such as large size, frequent recombination and incorporation of foreign DNA. Cytoplasmic male sterility (CMS is caused by rearrangement of the mitochondrial genome, and a novel chimeric open reading frame (ORF created by shuffling of endogenous sequences is often responsible for CMS. The Ogura-type male-sterile cytoplasm is one of the most extensively studied cytoplasms in Brassicaceae. Although the gene orf138 has been isolated as a determinant of Ogura-type CMS, no homologous sequence to orf138 has been found in public databases. Therefore, how orf138 sequence was created is a mystery. In this study, we determined the complete nucleotide sequence of two radish mitochondrial genomes, namely, Ogura- and normal-type genomes, and analyzed them to reveal the origin of the gene orf138. Results Ogura- and normal-type mitochondrial genomes were assembled to 258,426-bp and 244,036-bp circular sequences, respectively. Normal-type mitochondrial genome contained 33 protein-coding and three rRNA genes, which are well conserved with the reported mitochondrial genome of rapeseed. Ogura-type genomes contained same genes and additional atp9. As for tRNA, normal-type contained 17 tRNAs, while Ogura-type contained 17 tRNAs and one additional trnfM. The gene orf138 was specific to Ogura-type mitochondrial genome, and no sequence homologous to it was found in normal-type genome. Comparative analysis of the two genomes revealed that radish mitochondrial genome consists of 11 syntenic regions (length >3 kb, similarity >99.9%. It was shown that short repeats and overlapped repeats present in the edge of syntenic regions were involved in recombination events during evolution to interconvert two types of mitochondrial genome. Ogura-type mitochondrial genome has four unique regions (2,803 bp, 1,601 bp, 451 bp and 15,255 bp in size that are non-syntenic to normal-type genome, and the gene orf138

  12. Chloroplast genome analysis of Australian eucalypts--Eucalyptus, Corymbia, Angophora, Allosyncarpia and Stockwellia (Myrtaceae).

    Science.gov (United States)

    Bayly, Michael J; Rigault, Philippe; Spokevicius, Antanas; Ladiges, Pauline Y; Ades, Peter K; Anderson, Charlotte; Bossinger, Gerd; Merchant, Andrew; Udovicic, Frank; Woodrow, Ian E; Tibbits, Josquin

    2013-12-01

    We present a phylogenetic analysis and comparison of structural features of chloroplast genomes for 39 species of the eucalypt group (genera Eucalyptus, Corymbia, Angophora, and outgroups Allosyncarpia and Stockwellia). We use 41 complete chloroplast genome sequences, adding 39 finished-quality chloroplast genomes to two previously published genomes. Maximum parsimony and Bayesian analyses, based on >7000 variable nucleotide positions, produced one fully resolved phylogenetic tree (35 supported nodes, 27 with 100% bootstrap support). Eucalyptus and its sister lineage Angophora+Corymbia show a deep divergence. Within Eucalyptus, three lineages are resolved: the 'eudesmid', 'symphyomyrt' and 'monocalypt' groups. Corymbia is paraphyletic with respect to Angophora. Gene content and order do not vary among eucalypt chloroplasts; length mutations, especially frame shifts, are uncommon in protein-coding genes. Some non-synonymous mutations are highly incongruent with the overall phylogenetic signal, notably in rbcL, and may be adaptive. Application of custom informatics pipelines (GYDLE Inc.) enabled direct chloroplast genome assembly, resolving each genome to finished-quality with no need for PCR gap-filling or contig order resolution. Analysis of whole chloroplast genomes resolved major eucalypt clades and revealed variable regions of the genome that will be useful in lower-level genetic studies (including phylogeography and geneflow).

  13. Ten years of maintaining and expanding a microbial genome and metagenome analysis system.

    Science.gov (United States)

    Markowitz, Victor M; Chen, I-Min A; Chu, Ken; Pati, Amrita; Ivanova, Natalia N; Kyrpides, Nikos C

    2015-11-01

    Launched in March 2005, the Integrated Microbial Genomes (IMG) system is a comprehensive data management system that supports multidimensional comparative analysis of genomic data. At the core of the IMG system is a data warehouse that contains genome and metagenome datasets sequenced at the Joint Genome Institute or provided by scientific users, as well as public genome datasets available at the National Center for Biotechnology Information Genbank sequence data archive. Genomes and metagenome datasets are processed using IMG's microbial genome and metagenome sequence data processing pipelines and are integrated into the data warehouse using IMG's data integration toolkits. Microbial genome and metagenome application specific data marts and user interfaces provide access to different subsets of IMG's data and analysis toolkits. This review article revisits IMG's original aims, highlights key milestones reached by the system during the past 10 years, and discusses the main challenges faced by a rapidly expanding system, in particular the complexity of maintaining such a system in an academic setting with limited budgets and computing and data management infrastructure.

  14. Funding Opportunity: Genomic Data Centers

    Science.gov (United States)

    Funding Opportunity CCG, Funding Opportunity Center for Cancer Genomics, CCG, Center for Cancer Genomics, CCG RFA, Center for cancer genomics rfa, genomic data analysis network, genomic data analysis network centers,

  15. The Genome of Nosema sp. Isolate YNPr: A Comparative Analysis of Genome Evolution within the Nosema/Vairimorpha Clade

    Science.gov (United States)

    Ma, Zhenggang; Li, Tian; Zhang, Xiaoyan; Debrunner-Vossbrinck, Bettina A.; Zhou, Zeyang; Vossbrinck, Charles R.

    2016-01-01

    The microsporidian parasite designated here as Nosema sp. Isolate YNPr was isolated from the cabbage butterfly Pieris rapae collected in Honghe Prefecture, Yunnan Province, China. The genome was sequenced by Illumina sequencing and compared to those of two related members of the Nosema/Vairimorpha clade, Nosema ceranae and Nosema apis. Based upon assembly statistics, the Nosema sp. YNPr genome is 3.36 x 106bp with a G+C content of 23.18% and 2,075 protein coding sequences. An “ACCCTT” motif is present approximately 50-bp upstream of the start codon, as reported from other members of the clade and from Encephalitozoon cuniculi, a sister taxon. Comparative small subunit ribosomal DNA (SSU rDNA) analysis as well as genome-wide phylogenetic analysis confirms a closer relationship between N. ceranae and Nosema sp. YNPr than between the two honeybee parasites N. ceranae and N. apis. The more closely related N. ceranae and Nosema sp. YNPr show similarities in a number of structural characteristics such as gene synteny, gene length, gene number, transposon composition and gene reduction. Based on transposable element content of the assemblies, the transposon content of Nosema sp. YNPr is 4.8%, that of N. ceranae is 3.7%, and that of N. apis is 2.5%, with large differences in the types of transposons present among these 3 species. Gene function annotation indicates that the number of genes participating in most metabolic activities is similar in all three species. However, the number of genes in the transcription, general function, and cysteine protease categories is greater in N. apis than in the other two species. Our studies further characterize the evolution of the Nosema/Vairimorpha clade of microsporidia. These organisms maintain variable but very reduced genomes. We are interested in understanding the effects of genetic drift versus natural selection on genome size in the microsporidia and in developing a testable hypothesis for further studies on the genomic

  16. Vibration analysis of viscoelastic inhomogeneous nanobeams incorporating surface and thermal effects

    Science.gov (United States)

    Ebrahimi, Farzad; Barati, Mohammad Reza

    2017-01-01

    This article deals with the free vibration investigation of nonlocal strain gradient-based viscoelastic functionally graded (FG) nanobeams on viscoelastic medium considering surface stress effects. Nonlocal strain gradient theory possesses a nonlocal stress field parameter and a length scale parameter for more accurate prediction of mechanical behavior of nanostructures. Surface energy effect is incorporate to the nonlocal strain gradient theory employing Gurtin-Murdoch elasticity theory. Thermo-elastic material properties of nanobeam are graded in thickness direction using power-law distribution. Hamilton's principal is utilized to obtain the governing equations of FG nanobeam embedded in viscoelastic medium. The effects of surface stress, length scale parameter, nonlocal parameter, viscoelastic medium, internal damping constant, thermal loading, power-law index and boundary conditions on vibration frequencies of viscoelastic FGM nanobeams are discussed in detail.

  17. Transcriptome and genome size analysis of the venus flytrap

    DEFF Research Database (Denmark)

    Jensen, Michael Krogh; Vogt, Josef Korbinian; Bressendorff, Simon;

    2015-01-01

    The insectivorous Venus flytrap (Dionaea muscipula) is renowned from Darwin's studies of plant carnivory and the origins of species. To provide tools to analyze the evolution and functional genomics of D. muscipula, we sequenced a normalized cDNA library synthesized from mRNA isolated from D....... muscipula flowers and traps. Using the Oases transcriptome assembler 79,165,657 quality trimmed reads were assembled into 80,806 cDNA contigs, with an average length of 679 bp and an N50 length of 1,051 bp. A total of 17,047 unique proteins were identified, and assigned to Gene Ontology (GO) and classified......, using a single copy sequence PCR-based method, we estimated that the genome size of D. muscipula is approx. 3 Gb. Our genome size estimate and transcriptome analyses will contribute to future research on this fascinating, monotypic species and its heterotrophic adaptations....

  18. Power analysis for genome-wide association studies

    Directory of Open Access Journals (Sweden)

    Klein Robert J

    2007-08-01

    Full Text Available Abstract Background Genome-wide association studies are a promising new tool for deciphering the genetics of complex diseases. To choose the proper sample size and genotyping platform for such studies, power calculations that take into account genetic model, tag SNP selection, and the population of interest are required. Results The power of genome-wide association studies can be computed using a set of tag SNPs and a large number of genotyped SNPs in a representative population, such as available through the HapMap project. As expected, power increases with increasing sample size and effect size. Power also depends on the tag SNPs selected. In some cases, more power is obtained by genotyping more individuals at fewer SNPs than fewer individuals at more SNPs. Conclusion Genome-wide association studies should be designed thoughtfully, with the choice of genotyping platform and sample size being determined from careful power calculations.

  19. Analysis of anoxybacillus genomes from the aspects of lifestyle adaptations, prophage diversity, and carbohydrate metabolism.

    Directory of Open Access Journals (Sweden)

    Kian Mau Goh

    Full Text Available Species of Anoxybacillus are widespread in geothermal springs, manure, and milk-processing plants. The genus is composed of 22 species and two subspecies, but the relationship between its lifestyle and genome is little understood. In this study, two high-quality draft genomes were generated from Anoxybacillus spp. SK3-4 and DT3-1, isolated from Malaysian hot springs. De novo assembly and annotation were performed, followed by comparative genome analysis with the complete genome of Anoxybacillus flavithermus WK1 and two additional draft genomes, of A. flavithermus TNO-09.006 and A. kamchatkensis G10. The genomes of Anoxybacillus spp. are among the smaller of the family Bacillaceae. Despite having smaller genomes, their essential genes related to lifestyle adaptations at elevated temperature, extreme pH, and protection against ultraviolet are complete. Due to the presence of various competence proteins, Anoxybacillus spp. SK3-4 and DT3-1 are able to take up foreign DNA fragments, and some of these transferred genes are important for the survival of the cells. The analysis of intact putative prophage genomes shows that they are highly diversified. Based on the genome analysis using SEED, many of the annotated sequences are involved in carbohydrate metabolism. The presence of glycosyl hydrolases among the Anoxybacillus spp. was compared, and the potential applications of these unexplored enzymes are suggested here. This is the first study that compares Anoxybacillus genomes from the aspect of lifestyle adaptations, the capacity for horizontal gene transfer, and carbohydrate metabolism.

  20. Complete genome sequence and comparative genome analysis of a new special Yersinia enterocolitica.

    Science.gov (United States)

    Shi, Guoxiang; Su, Mingming; Liang, Junrong; Duan, Ran; Gu, Wenpeng; Xiao, Yuchun; Zhang, Zhewen; Qiu, Haiyan; Zhang, Zheng; Li, Yi; Zhang, Xiaohe; Ling, Yunchao; Song, Lai; Chen, Meili; Zhao, Yongbing; Wu, Jiayan; Jing, Huaiqi; Xiao, Jingfa; Wang, Xin

    2016-09-01

    Yersinia enterocolitica is the most diverse species among the Yersinia genera and shows more polymorphism, especially for the non-pathogenic strains. Individual non-pathogenic Y. enterocolitica strains are wrongly identified because of atypical phenotypes. In this study, we isolated an unusual Y. enterocolitica strain LC20 from Rattus norvegicus. The strain did not utilize urea and could not be classified as the biotype. API 20E identified Escherichia coli; however, it grew well at 25 °C, but E. coli grew well at 37 °C. We analyzed the genome of LC20 and found the whole chromosome of LC20 was collinear with Y. enterocolitica 8081, and the urease gene did not exist on the genome which is consistent with the result of API 20E. Also, the 16 S and 23 SrRNA gene of LC20 lay on a branch of Y. enterocolitica. Furthermore, the core-based and pan-based phylogenetic trees showed that LC20 was classified into the Y. enterocolitica cluster. Two plasmids (80 and 50 k) from LC20 shared low genetic homology with pYV from the Yersinia genus, one was an ancestral Yersinia plasmid and the other was novel encoding a number of transposases. Some pathogenic and non-pathogenic Y. enterocolitica-specific genes coexisted in LC20. Thus, although it could not be classified into any Y. enterocolitica biotype due to its special biochemical metabolism, we concluded the LC20 was a Y. enterocolitica strain because its genome was similar to other Y. enterocolitica and it might be a strain with many mutations and combinations emerging in the processes of its evolution.

  1. Genome-wide analysis reveals a complex pattern of genomic imprinting in mice.

    Directory of Open Access Journals (Sweden)

    Jason B Wolf

    2008-06-01

    Full Text Available Parent-of-origin-dependent gene expression resulting from genomic imprinting plays an important role in modulating complex traits ranging from developmental processes to cognitive abilities and associated disorders. However, while gene-targeting techniques have allowed for the identification of imprinted loci, very little is known about the contribution of imprinting to quantitative variation in complex traits. Most studies, furthermore, assume a simple pattern of imprinting, resulting in either paternal or maternal gene expression; yet, more complex patterns of effects also exist. As a result, the distribution and number of different imprinting patterns across the genome remain largely unexplored. We address these unresolved issues using a genome-wide scan for imprinted quantitative trait loci (iQTL affecting body weight and growth in mice using a novel three-generation design. We identified ten iQTL that display much more complex and diverse effect patterns than previously assumed, including four loci with effects similar to the callipyge mutation found in sheep. Three loci display a new phenotypic pattern that we refer to as bipolar dominance, where the two heterozygotes are different from each other while the two homozygotes are identical to each other. Our study furthermore detected a paternally expressed iQTL on Chromosome 7 in a region containing a known imprinting cluster with many paternally expressed genes. Surprisingly, the effects of the iQTL were mostly restricted to traits expressed after weaning. Our results imply that the quantitative effects of an imprinted allele at a locus depend both on its parent of origin and the allele it is paired with. Our findings also show that the imprinting pattern of a locus can be variable over ontogenetic time and, in contrast to current views, may often be stronger at later stages in life.

  2. Analysis on n-gram statistics and linguistic features of whole genome protein sequences

    Institute of Scientific and Technical Information of China (English)

    DONG Qi-wen; WANG Xiao-long; LIN Lei

    2008-01-01

    To obtain the statistical sequence analysis on a large number of genomic and proteomie sequences available for different organisms,the n-grams of whole genome protein sequences from 20 organisms were extracted.Their linguistic features were analyzed by two tests:Zipf power law and Shannon entropy,developed for analysis of natural languages and symbolic sequences.The natural genome proteins and the artificial genome proteins were compared with each other and some statistical features of n-grams were discovered.The results show that:the n-grams of whole genome protein sequences approximately follow the Zipf law when n is larger than 4;the Shannon n-gram entropy of natural genome proteins is lower than that of artificial proteins;a simple unigram model can distinguish different organisms;there exist organism-specific usages of "phrases" in protein sequences.It is suggested that further detailed analysis on n-gram of whole genome protein sequences will result in a powerful model for mapping the relationship of protein sequence,structure and function.

  3. Complete sequence of the mitochondrial genome of Odontamblyopus rubicundus (Perciformes: Gobiidae): genome characterization and phylogenetic analysis.

    Science.gov (United States)

    Liu, Tianxing; Jin, Xiaoxiao; Wang, Rixin; Xu, Tianjun

    2013-12-01

    Odontamblyopus rubicundus is a species of gobiid fishes, inhabits muddy-bottomed coastal waters. In this paper, the first complete mitochondrial genome sequence of O. rubicundus is reported. The complete mitochondrial genome sequence is 17119 bp in length and contains 13 protein-coding genes, two rRNA genes, 22 tRNA genes, a control region and an L-strand origin as in other teleosts. Most mitochondrial genes are encoded on H-strand except for ND6 and seven tRNA genes. Some overlaps occur in protein-coding genes and tRNAs ranging from 1 to 7 bp. The possibly nonfunctional L-strand origin folded into a typical stem-loop secondary structure and a conserved motif (5'-GCCGG-3') was found at the base of the stem within the tRNACys gene. The TAS, CSB-2 and CSB-3 could be detected in the control region. However, in contrast to most of other fishes, the central conserved sequence block domain and the CSB-1 could not be recognized in O. rubicundus, which is consistent with Acanthogobius hasta (Gobiidae). In addition, phylogenetic analyses based on different sequences of species of Gobiidae and different methods showed that the classification of O. rubicundus into Odontamblyopus due to morphology is debatable.

  4. Complete sequence of the mitochondrial genome of Odontamblyopus rubicundus (Perciformes: Gobiidae): genome characterization and phylogenetic analysis

    Indian Academy of Sciences (India)

    Tianxing Liu; Xiaoxiao Jin; Rixin Wang; Tianjun Xu

    2013-12-01

    Odontamblyopus rubicundus is a species of gobiid fishes, inhabits muddy-bottomed coastal waters. In this paper, the first complete mitochondrial genome sequence of O. rubicundus is reported. The complete mitochondrial genome sequence is 17119 bp in length and contains 13 protein-coding genes, two rRNA genes, 22 tRNA genes, a control region and an L-strand origin as in other teleosts. Most mitochondrial genes are encoded on H-strand except for ND6 and seven tRNA genes. Some overlaps occur in protein-coding genes and tRNAs ranging from 1 to 7 bp. The possibly nonfunctional L-strand origin folded into a typical stem-loop secondary structure and a conserved motif (5′-GCCGG-3′) was found at the base of the stem within the $tRNA^{Cys}$ gene. The TAS, CSB-2 and CSB-3 could be detected in the control region. However, in contrast to most of other fishes, the central conserved sequence block domain and the CSB-1 could not be recognized in O. rubicundus, which is consistent with Acanthogobius hasta (Gobiidae). In addition, phylogenetic analyses based on different sequences of species of Gobiidae and different methods showed that the classification of O. rubicundus into Odontamblyopus due to morphology is debatable.

  5. Genome sequence, comparative analysis and haplotype structure of the domestic dog.

    Science.gov (United States)

    Lindblad-Toh, Kerstin; Wade, Claire M; Mikkelsen, Tarjei S; Karlsson, Elinor K; Jaffe, David B; Kamal, Michael; Clamp, Michele; Chang, Jean L; Kulbokas, Edward J; Zody, Michael C; Mauceli, Evan; Xie, Xiaohui; Breen, Matthew; Wayne, Robert K; Ostrander, Elaine A; Ponting, Chris P; Galibert, Francis; Smith, Douglas R; DeJong, Pieter J; Kirkness, Ewen; Alvarez, Pablo; Biagi, Tara; Brockman, William; Butler, Jonathan; Chin, Chee-Wye; Cook, April; Cuff, James; Daly, Mark J; DeCaprio, David; Gnerre, Sante; Grabherr, Manfred; Kellis, Manolis; Kleber, Michael; Bardeleben, Carolyne; Goodstadt, Leo; Heger, Andreas; Hitte, Christophe; Kim, Lisa; Koepfli, Klaus-Peter; Parker, Heidi G; Pollinger, John P; Searle, Stephen M J; Sutter, Nathan B; Thomas, Rachael; Webber, Caleb; Baldwin, Jennifer; Abebe, Adal; Abouelleil, Amr; Aftuck, Lynne; Ait-Zahra, Mostafa; Aldredge, Tyler; Allen, Nicole; An, Peter; Anderson, Scott; Antoine, Claudel; Arachchi, Harindra; Aslam, Ali; Ayotte, Laura; Bachantsang, Pasang; Barry, Andrew; Bayul, Tashi; Benamara, Mostafa; Berlin, Aaron; Bessette, Daniel; Blitshteyn, Berta; Bloom, Toby; Blye, Jason; Boguslavskiy, Leonid; Bonnet, Claude; Boukhgalter, Boris; Brown, Adam; Cahill, Patrick; Calixte, Nadia; Camarata, Jody; Cheshatsang, Yama; Chu, Jeffrey; Citroen, Mieke; Collymore, Alville; Cooke, Patrick; Dawoe, Tenzin; Daza, Riza; Decktor, Karin; DeGray, Stuart; Dhargay, Norbu; Dooley, Kimberly; Dooley, Kathleen; Dorje, Passang; Dorjee, Kunsang; Dorris, Lester; Duffey, Noah; Dupes, Alan; Egbiremolen, Osebhajajeme; Elong, Richard; Falk, Jill; Farina, Abderrahim; Faro, Susan; Ferguson, Diallo; Ferreira, Patricia; Fisher, Sheila; FitzGerald, Mike; Foley, Karen; Foley, Chelsea; Franke, Alicia; Friedrich, Dennis; Gage, Diane; Garber, Manuel; Gearin, Gary; Giannoukos, Georgia; Goode, Tina; Goyette, Audra; Graham, Joseph; Grandbois, Edward; Gyaltsen, Kunsang; Hafez, Nabil; Hagopian, Daniel; Hagos, Birhane; Hall, Jennifer; Healy, Claire; Hegarty, Ryan; Honan, Tracey; Horn, Andrea; Houde, Nathan; Hughes, Leanne; Hunnicutt, Leigh; Husby, M; Jester, Benjamin; Jones, Charlien; Kamat, Asha; Kanga, Ben; Kells, Cristyn; Khazanovich, Dmitry; Kieu, Alix Chinh; Kisner, Peter; Kumar, Mayank; Lance, Krista; Landers, Thomas; Lara, Marcia; Lee, William; Leger, Jean-Pierre; Lennon, Niall; Leuper, Lisa; LeVine, Sarah; Liu, Jinlei; Liu, Xiaohong; Lokyitsang, Yeshi; Lokyitsang, Tashi; Lui, Annie; Macdonald, Jan; Major, John; Marabella, Richard; Maru, Kebede; Matthews, Charles; McDonough, Susan; Mehta, Teena; Meldrim, James; Melnikov, Alexandre; Meneus, Louis; Mihalev, Atanas; Mihova, Tanya; Miller, Karen; Mittelman, Rachel; Mlenga, Valentine; Mulrain, Leonidas; Munson, Glen; Navidi, Adam; Naylor, Jerome; Nguyen, Tuyen; Nguyen, Nga; Nguyen, Cindy; Nguyen, Thu; Nicol, Robert; Norbu, Nyima; Norbu, Choe; Novod, Nathaniel; Nyima, Tenchoe; Olandt, Peter; O'Neill, Barry; O'Neill, Keith; Osman, Sahal; Oyono, Lucien; Patti, Christopher; Perrin, Danielle; Phunkhang, Pema; Pierre, Fritz; Priest, Margaret; Rachupka, Anthony; Raghuraman, Sujaa; Rameau, Rayale; Ray, Verneda; Raymond, Christina; Rege, Filip; Rise, Cecil; Rogers, Julie; Rogov, Peter; Sahalie, Julie; Settipalli, Sampath; Sharpe, Theodore; Shea, Terrance; Sheehan, Mechele; Sherpa, Ngawang; Shi, Jianying; Shih, Diana; Sloan, Jessie; Smith, Cherylyn; Sparrow, Todd; Stalker, John; Stange-Thomann, Nicole; Stavropoulos, Sharon; Stone, Catherine; Stone, Sabrina; Sykes, Sean; Tchuinga, Pierre; Tenzing, Pema; Tesfaye, Senait; Thoulutsang, Dawa; Thoulutsang, Yama; Topham, Kerri; Topping, Ira; Tsamla, Tsamla; Vassiliev, Helen; Venkataraman, Vijay; Vo, Andy; Wangchuk, Tsering; Wangdi, Tsering; Weiand, Michael; Wilkinson, Jane; Wilson, Adam; Yadav, Shailendra; Yang, Shuli; Yang, Xiaoping; Young, Geneva; Yu, Qing; Zainoun, Joanne; Zembek, Lisa; Zimmer, Andrew; Lander, Eric S

    2005-12-08

    Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.

  6. MD-SeeGH: a platform for integrative analysis of multi-dimensional genomic data

    Directory of Open Access Journals (Sweden)

    Ng Raymond T

    2008-05-01

    Full Text Available Abstract Background Recent advances in global genomic profiling methodologies have enabled multi-dimensional characterization of biological systems. Complete analysis of these genomic profiles require an in depth look at parallel profiles of segmental DNA copy number status, DNA methylation state, single nucleotide polymorphisms, as well as gene expression profiles. Due to the differences in data types it is difficult to conduct parallel analysis of multiple datasets from diverse platforms. Results To address this issue, we have developed an integrative genomic analysis platform MD-SeeGH, a software tool that allows users to rapidly and directly analyze genomic datasets spanning multiple genomic experiments. With MD-SeeGH, users have the flexibility to easily update datasets in accordance with new genomic builds, make a quality assessment of data using the filtering features, and identify genetic alterations within single or across multiple experiments. Multiple sample analysis in MD-SeeGH allows users to compare profiles from many experiments alongside tracks containing detailed localized gene information, microRNA, CpG islands, and copy number variations. Conclusion MD-SeeGH is a new platform for the integrative analysis of diverse microarray data, facilitating multiple profile analyses and group comparisons.

  7. On the Analysis of a Repeated Measure Design in Genome-Wide Association Analysis

    Directory of Open Access Journals (Sweden)

    Young Lee

    2014-11-01

    Full Text Available Longitudinal data enables detecting the effect of aging/time, and as a repeated measures design is statistically more efficient compared to cross-sectional data if the correlations between repeated measurements are not large. In particular, when genotyping cost is more expensive than phenotyping cost, the collection of longitudinal data can be an efficient strategy for genetic association analysis. However, in spite of these advantages, genome-wide association studies (GWAS with longitudinal data have rarely been analyzed taking this into account. In this report, we calculate the required sample size to achieve 80% power at the genome-wide significance level for both longitudinal and cross-sectional data, and compare their statistical efficiency. Furthermore, we analyzed the GWAS of eight phenotypes with three observations on each individual in the Korean Association Resource (KARE. A linear mixed model allowing for the correlations between observations for each individual was applied to analyze the longitudinal data, and linear regression was used to analyze the first observation on each individual as cross-sectional data. We found 12 novel genome-wide significant disease susceptibility loci that were then confirmed in the Health Examination cohort, as well as some significant interactions between age/sex and SNPs.

  8. Transcriptome and genome size analysis of the venus flytrap

    DEFF Research Database (Denmark)

    Jensen, Michael Krogh; Vogt, Josef Korbinian; Bressendorff, Simon

    2015-01-01

    The insectivorous Venus flytrap (Dionaea muscipula) is renowned from Darwin's studies of plant carnivory and the origins of species. To provide tools to analyze the evolution and functional genomics of D. muscipula, we sequenced a normalized cDNA library synthesized from mRNA isolated from D...

  9. Genomic analysis of the rainbow trout response to crowding

    Science.gov (United States)

    Genomic analyses have the potential to impact selective breeding programs by identifying markers as proxies for traits which are expensive or difficult to measure. One such set of traits is the physiological responses of rainbow trout to the stresses of the aquaculture environment. Typical stresso...

  10. Genomic Analysis of Secondary Metabolite Production by Pseudomonas fluorescens

    Science.gov (United States)

    Pseudomonas fluorescens is a diverse bacterial species known for its ubiquity in natural habitats and its production of secondary metabolites. The high degree of ecological and metabolic diversity represented in P. fluorescens is reflected in the genomic diversity displayed among strains. Certain st...

  11. Sequencing and analysis of an Irish human genome.

    LENUS (Irish Health Repository)

    Tong, Pin

    2010-01-01

    Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.

  12. Comprehensive DNA methylation analysis of the Aedes aegypti genome

    Science.gov (United States)

    Falckenhayn, Cassandra; Carneiro, Vitor Coutinho; de Mendonça Amarante, Anderson; Schmid, Katharina; Hanna, Katharina; Kang, Seokyoung; Helm, Mark; Dimopoulos, George; Fantappié, Marcelo Rosado; Lyko, Frank

    2016-01-01

    Aedes aegypti mosquitoes are important vectors of viral diseases. Mosquito host factors play key roles in virus control and it has been suggested that dengue virus replication is regulated by Dnmt2-mediated DNA methylation. However, recent studies have shown that Dnmt2 is a tRNA methyltransferase and that Dnmt2-dependent methylomes lack defined DNA methylation patterns, thus necessitating a systematic re-evaluation of the mosquito genome methylation status. We have now searched the Ae. aegypti genome for candidate DNA modification enzymes. This failed to reveal any known (cytosine-5) DNA methyltransferases, but identified homologues for the Dnmt2 tRNA methyltransferase, the Mettl4 (adenine-6) DNA methyltransferase, and the Tet DNA demethylase. All genes were expressed at variable levels throughout mosquito development. Mass spectrometry demonstrated that DNA methylation levels were several orders of magnitude below the levels that are usually detected in organisms with DNA methylation-dependent epigenetic regulation. Furthermore, whole-genome bisulfite sequencing failed to reveal any evidence of defined DNA methylation patterns. These results suggest that the Ae. aegypti genome is unmethylated. Interestingly, additional RNA bisulfite sequencing provided first evidence for Dnmt2-mediated tRNA methylation in mosquitoes. These findings have important implications for understanding the mechanism of Dnmt2-dependent virus regulation. PMID:27805064

  13. Analysis of the hybrid genomes of brewing yeasts

    NARCIS (Netherlands)

    Bolat, I.

    2016-01-01

    One of the best guarded secrets of brewers is represented by the brewing yeast employed in beer fermentation, due to its profound impact upon the specific flavour profile of the final product. The current research tackles the genome diversity of lager brewing strains as well as their impact on impor

  14. Gene hunting : molecular analysis of the chicken genome

    NARCIS (Netherlands)

    Crooijmans, R.P.M.A.

    2000-01-01

    This dissertation describes the development of molecular tools to identify genes that are involved in production and health traits in poultry. To unravel the chicken genome, fluorescent molecular markers (microsatellite markers) were developed and optimized to perform high throughput screening of re

  15. Molecular cytogenetic applications in analysis of the cancer genome.

    Science.gov (United States)

    Rao, Pulivarthi H; Nandula, Subhadra V; Murty, Vundavalli V

    2007-01-01

    Cancer cells exhibit nonrandom and complex chromosome abnormalities. The role of genomic changes in cancer is well established. However, the identification of complex and cryptic chromosomal changes is beyond the resolution of conventional banding methods. The fluorescence microscopy afforded by imaging technologies, developed recently, facilitates a precise identification of these chromosome alterations in cancer. The three most commonly utilized molecular cytogenetics methods comparative genomic hybridization, spectral karyotype, and fluorescence in situ hybridization, that have already become benchmark tools in cancer cytogenetics, are described in this chapter. Comparative genomic hybridization is a powerful tool for screening copy-number changes in tumor genomes without the need for preparation of metaphases from tumor cells. Multicolor spectral karyotype permits visualization of all chromosomes in one experiment permitting identification of precise chromosomal changes on metaphases derived from tumor cells. The uses of fluorescence in situ hybridization are diverse, including mapping of alteration in single copy genes, chromosomal regions, or entire chromosomes. The opportunities to detect genetic alterations in cancer cells continue to evolve with the use of these methodologies both in diagnosis and research.

  16. Is the societal approach wide enough to include relatives? Incorporating relatives' costs and effects in a cost-effectiveness analysis.

    Science.gov (United States)

    Davidson, Thomas; Levin, Lars-Ake

    2010-01-01

    It is important for economic evaluations in healthcare to cover all relevant information. However, many existing evaluations fall short of this goal, as they fail to include all the costs and effects for the relatives of a disabled or sick individual. The objective of this study was to analyse how relatives' costs and effects could be measured, valued and incorporated into a cost-effectiveness analysis. In this article, we discuss the theories underlying cost-effectiveness analyses in the healthcare arena; the general conclusion is that it is hard to find theoretical arguments for excluding relatives' costs and effects if a societal perspective is used. We argue that the cost of informal care should be calculated according to the opportunity cost method. To capture relatives' effects, we construct a new term, the R-QALY weight, which is defined as the effect on relatives' QALY weight of being related to a disabled or sick individual. We examine methods for measuring, valuing and incorporating the R-QALY weights. One suggested method is to estimate R-QALYs and incorporate them together with the patient's QALY in the analysis. However, there is no well established method as yet that can create R-QALY weights. One difficulty with measuring R-QALY weights using existing instruments is that these instruments are rarely focused on relative-related aspects. Even if generic quality-of-life instruments do cover some aspects relevant to relatives and caregivers, they may miss important aspects and potential altruistic preferences. A further development and validation of the existing caregiving instruments used for eliciting utility weights would therefore be beneficial for this area, as would further studies on the use of time trade-off or Standard Gamble methods for valuing R-QALY weights. Another potential method is to use the contingent valuation method to find a monetary value for all the relatives' costs and effects. Because cost-effectiveness analyses are used for

  17. Computational workflow for analysis of gain and loss of genes in distantly related genomes

    Directory of Open Access Journals (Sweden)

    Ptitsyn Andrey

    2012-09-01

    Full Text Available Abstract Background Early evolution of animals led to profound changes in body plan organization, symmetry and the rise of tissue complexity including formation of muscular and nervous systems. This process was associated with massive restructuring of animal genomes as well as deletion, acquisition and rapid differentiation of genes from a common metazoan ancestor. Here, we present a simple but efficient workflow for elucidation of gene gain and gene loss within major branches of the animal kingdom. Methods We have designed a pipeline of sequence comparison, clustering and functional annotation using 12 major phyla as illustrative examples. Specifically, for the input we used sets of ab initio predicted gene models from the genomes of six bilaterians, three basal metazoans (Cnidaria, Placozoa, Porifera, two unicellular eukaryotes (Monosiga and Capsospora and the green plant Arabidopsis as an out-group. Due to the large amounts of data the software required a high-performance Linux cluster. The final results can be imported into standard spreadsheet analysis software and queried for the numbers and specific sets of genes absent in specific genomes, uniquely present or shared among different taxons. Results and conclusions The developed software is open source and available free of charge on Open Source principles. It allows the user to address a number of specific questions regarding gene gain and gene loss in particular genomes, and user-defined groups of genomes can be formulated in a type of logical expression. For example, our analysis of 12 sequenced genomes indicated that these genomes possess at least 90,000 unique genes and gene families, suggesting enormous diversity of the genome repertoire in the animal kingdom. Approximately 9% of these gene families are shared universally (homologous among all genomes, 53% are unique to specific taxa, and the rest are shared between two or more distantly related genomes.

  18. Incorporating Human Movement Behavior into the Analysis of Spatially Distributed Infrastructure.

    Science.gov (United States)

    Wu, Lihua; Leung, Henry; Jiang, Hao; Zheng, Hong; Ma, Li

    2016-01-01

    For the first time in human history, the majority of the world's population resides in urban areas. Therefore, city managers are faced with new challenges related to the efficiency, equity and quality of the supply of resources, such as water, food and energy. Infrastructure in a city can be viewed as service points providing resources. These service points function together as a spatially collaborative system to serve an increasing population. To study the spatial collaboration among service points, we propose a shared network according to human's collective movement and resource usage based on data usage detail records (UDRs) from the cellular network in a city in western China. This network is shown to be not scale-free, but exhibits an interesting triangular property governed by two types of nodes with very different link patterns. Surprisingly, this feature is consistent with the urban-rural dualistic context of the city. Another feature of the shared network is that it consists of several spatially separated communities that characterize local people's active zones but do not completely overlap with administrative areas. According to these features, we propose the incorporation of human movement into infrastructure classification. The presence of well-defined spatially separated clusters confirms the effectiveness of this approach. In this paper, our findings reveal the spatial structure inside a city, and the proposed approach provides a new perspective on integrating human movement into the study of a spatially distributed system.

  19. Genome sequence and comparative analysis of Avibacterium paragallinarum

    Science.gov (United States)

    Requena, David; Chumbe, Ana; Torres, Michael; Alzamora, Ofelia; Ramirez, Manuel; Valdivia-Olarte, Hugo; Gutierrez, Andres Hazaet; Izquierdo-Lara, Ray; Saravia, Luis Enrique; Zavaleta, Milagros; Tataje-Lavanda, Luis; Best, Ivan; Fernández-Sánchez, Manolo; Icochea, Eliana; Zimic, Mirko; Fernández-Díaz, Manolo

    2013-01-01

    Background: Avibacterium paragallinarum, the causative agent of infectious coryza, is a highly contagious respiratory acute disease of poultry, which affects commercial chickens, laying hens and broilers worldwide. Methodology: In this study, we performed the whole genome sequencing, assembly and annotation of a Peruvian isolate of A. paragallinarum. Genome was sequenced in a 454 GS FLX Titanium system. De novo assembly was performed and annotation was completed with GS De Novo Assembler 2.6 using the H. influenzae str. F3031 gene model. Manual curation of the genome was performed with Artemis. Putative function of genes was predicted with Blast2GO. Virulence factors were identified by comparison with the Virulence Factor Database. Results: The genome obtained has a length of 2.47 Mb with 40.66% of GC content. Seventy five large contigs (>500 nt) were obtained, which comprised 1,204 predicted genes. All the contigs are available in Genbank [GenBank: PRJNA64665]. A total of 103 virulence factors, reported in the Virulence Factor Database, were found in A. paragallinarum. Forty four of them are present in 7 species of Haemophilus, which are related with pathogenesis, virulence and host immune system evasion. A tetracycline-resistance associated transposon (Tn10), was found in A. paragallinarum, possibly acting as a defense mechanism. Discussion and conclusion: The availability of A. paragallinarum genome represents an important source of information for the development of diagnostic tests, genotyping, and novel antigens for potential vaccines against infectious coryza. Identification of virulence factors contributes to better understanding the pathogenesis, and planning efforts for prevention and control of the disease. PMID:23861570

  20. Comparative genomic analysis of bacteriophages specific to the channel catfish pathogen Edwardsiella ictaluri

    Directory of Open Access Journals (Sweden)

    Mead David A

    2011-01-01

    Full Text Available Abstract Background The bacterial pathogen Edwardsiella ictaluri is a primary cause of mortality in channel catfish raised commercially in aquaculture farms. Additional treatment and diagnostic regimes are needed for this enteric pathogen, motivating the discovery and characterization of bacteriophages specific to E. ictaluri. Results The genomes of three Edwardsiella ictaluri-specific bacteriophages isolated from geographically distant aquaculture ponds, at different times, were sequenced and analyzed. The genomes for phages eiAU, eiDWF, and eiMSLS are 42.80 kbp, 42.12 kbp, and 42.69 kbp, respectively, and are greater than 95% identical to each other at the nucleotide level. Nucleotide differences were mostly observed in non-coding regions and in structural proteins, with significant variability in the sequences of putative tail fiber proteins. The genome organization of these phages exhibit a pattern shared by other Siphoviridae. Conclusions These E. ictaluri-specific phage genomes reveal considerable conservation of genomic architecture and sequence identity, even with considerable temporal and spatial divergence in their isolation. Their genomic homogeneity is similarly observed among E. ictaluri bacterial isolates. The genomic analysis of these phages supports the conclusion that these are virulent phages, lacking the capacity for lysogeny or expression of virulence genes. This study contributes to our knowledge of phage genomic diversity and facilitates studies on the diagnostic and therapeutic applications of these phages.

  1. Quantitative analysis of polycomb response elements (PREs at identical genomic locations distinguishes contributions of PRE sequence and genomic environment

    Directory of Open Access Journals (Sweden)

    Okulski Helena

    2011-03-01

    Full Text Available Abstract Background Polycomb/Trithorax response elements (PREs are cis-regulatory elements essential for the regulation of several hundred developmentally important genes. However, the precise sequence requirements for PRE function are not fully understood, and it is also unclear whether these elements all function in a similar manner. Drosophila PRE reporter assays typically rely on random integration by P-element insertion, but PREs are extremely sensitive to genomic position. Results We adapted the ΦC31 site-specific integration tool to enable systematic quantitative comparison of PREs and sequence variants at identical genomic locations. In this adaptation, a miniwhite (mw reporter in combination with eye-pigment analysis gives a quantitative readout of PRE function. We compared the Hox PRE Frontabdominal-7 (Fab-7 with a PRE from the vestigial (vg gene at four landing sites. The analysis revealed that the Fab-7 and vg PREs have fundamentally different properties, both in terms of their interaction with the genomic environment at each site and their inherent silencing abilities. Furthermore, we used the ΦC31 tool to examine the effect of deletions and mutations in the vg PRE, identifying a 106 bp region containing a previously predicted motif (GTGT that is essential for silencing. Conclusions This analysis showed that different PREs have quantifiably different properties, and that changes in as few as four base pairs have profound effects on PRE function, thus illustrating the power and sensitivity of ΦC31 site-specific integration as a tool for the rapid and quantitative dissection of elements of PRE design.

  2. A Brief Review: The Z-curve Theory and its Application in Genome Analysis.

    Science.gov (United States)

    Zhang, Ren; Zhang, Chun-Ting

    2014-04-01

    In theoretical physics, there exist two basic mathematical approaches, algebraic and geometrical methods, which, in most cases, are complementary. In the area of genome sequence analysis, however, algebraic approaches have been widely used, while geometrical approaches have been less explored for a long time. The Z-curve theory is a geometrical approach to genome analysis. The Z-curve is a three-dimensional curve that represents a given DNA sequence in the sense that each can be uniquely reconstructed given the other. The Z-curve, therefore, contains all the information that the corresponding DNA sequence carries. The analysis of a DNA sequence can then be performed through studying the corresponding Z-curve. The Z-curve method has found applications in a wide range of areas in the past two decades, including the identifications of protein-coding genes, replication origins, horizontally-transferred genomic islands, promoters, translational start sides and isochores, as well as studies on phylogenetics, genome visualization and comparative genomics. Here, we review the progress of Z-curve studies from aspects of both theory and applications in genome analysis.

  3. Whole-genome single-nucleotide-polymorphism analysis for discrimination of Clostridium botulinum group I strains.

    Science.gov (United States)

    Gonzalez-Escalona, Narjol; Timme, Ruth; Raphael, Brian H; Zink, Donald; Sharma, Shashi K

    2014-04-01

    Clostridium botulinum is a genetically diverse Gram-positive bacterium producing extremely potent neurotoxins (botulinum neurotoxins A through G [BoNT/A-G]). The complete genome sequences of three strains harboring only the BoNT/A1 nucleotide sequence are publicly available. Although these strains contain a toxin cluster (HA(+) OrfX(-)) associated with hemagglutinin genes, little is known about the genomes of subtype A1 strains (termed HA(-) OrfX(+)) that lack hemagglutinin genes in the toxin gene cluster. We sequenced the genomes of three BoNT/A1-producing C. botulinum strains: two strains with the HA(+) OrfX(-) cluster (69A and 32A) and one strain with the HA(-) OrfX(+) cluster (CDC297). Whole-genome phylogenic single-nucleotide-polymorphism (SNP) analysis of these strains along with other publicly available C. botulinum group I strains revealed five distinct lineages. Strains 69A and 32A clustered with the C. botulinum type A1 Hall group, and strain CDC297 clustered with the C. botulinum type Ba4 strain 657. This study reports the use of whole-genome SNP sequence analysis for discrimination of C. botulinum group I strains and demonstrates the utility of this analysis in quickly differentiating C. botulinum strains harboring identical toxin gene subtypes. This analysis further supports previous work showing that strains CDC297 and 657 likely evolved from a common ancestor and independently acquired separate BoNT/A1 toxin gene clusters at distinct genomic locations.

  4. Decoding the genome with an integrative analysis tool: combinatorial CRM Decoder.

    Science.gov (United States)

    Kang, Keunsoo; Kim, Joomyeong; Chung, Jae Hoon; Lee, Daeyoup

    2011-09-01

    The identification of genome-wide cis-regulatory modules (CRMs) and characterization of their associated epigenetic features are fundamental steps toward the understanding of gene regulatory networks. Although integrative analysis of available genome-wide information can provide new biological insights, the lack of novel methodologies has become a major bottleneck. Here, we present a comprehensive analysis tool called combinatorial CRM decoder (CCD), which utilizes the publicly available information to identify and characterize genome-wide CRMs in a species of interest. CCD first defines a set of the epigenetic features which is significantly associated with a set of known CRMs as a code called 'trace code', and subsequently uses the trace code to pinpoint putative CRMs throughout the genome. Using 61 genome-wide data sets obtained from 17 independent mouse studies, CCD successfully catalogued ∼12 600 CRMs (five distinct classes) including polycomb repressive complex 2 target sites as well as imprinting control regions. Interestingly, we discovered that ∼4% of the identified CRMs belong to at least two different classes named 'multi-functional CRM', suggesting their functional importance for regulating spatiotemporal gene expression. From these examples, we show that CCD can be applied to any potential genome-wide datasets and therefore will shed light on unveiling genome-wide CRMs in various species.

  5. A Critical Review of Concepts and Methods Used in Classical Genome Analysis

    DEFF Research Database (Denmark)

    Seberg, Ole; Petersen, Gitte

    1998-01-01

    A short account of the development of classical genome analysis, the analysis of chromosome behaviour in metaphase I of meiosis, primarily in interspecific hybrids, is given. The application of the concept of homology to describe chromosome pairing between the respective chromosomes of a pair...... the fundamental premises, genome analysis is burdened by observational difficulties. Hence, chromosome pairing has been shown to be under genetic control and is also influenced by environmental conditions. Additionally, basic biological observations such as the distribution of meiotic configurations...... or the identity of the individual chromosomes are frequently neglected. Data from chromosome pairing are captured as pair-wise comparisons and are amenable only to phenetic analysis, and hence are not suited for phylogenetic inferences. As currently perceived, genome analysis may have a role to play in plant...

  6. [RAPD analysis of the intraspecific and interspecific variation and phylogenetic relationships of Aegilops L. species with the U genome].

    Science.gov (United States)

    Goriunova, S V; Chikida, N N; Kochieva, E Z

    2010-07-01

    RAPD analysis was used to study the genetic variation and phylogenetic relationships of polyploid Aegilops species with the U genome. In total, 115 DNA samples of eight polyploid species containing the U genome and the diploid species Ae. umbellulata (U) were examined. Substantial interspecific polymorphism was observed for the majority of the polyploid species with the U genome (interspecific differences, 0.01-0,2; proportion of polymorphic loci, 56.6-88.2%). Aegilops triuncialis was identified as the only alloploid species with low interspecific polymorphism (interspecific differences, 0-0.01, P = 50%) in the U-genome group. The U-genome Aegilops species proved to be separated from other species of the genus. The phylogenetic relationships were established for the U-genome species. The greatest separation within the U-genome group was observed for the US-genome species Ae. kotschyi and Ae. variabilis. The tetraploid species Ae. triaristata and Ae. columnaris, which had the UX genome, and the hexaploid species Ae. recta (UXN) were found to be related to each other and separate from the UM-genome species. A similarity was observed between the U M-genome species Ae. ovata and Ae. biuncialis, which had the UM genome, and the ancestral diploid U-genome species Ae. umbellulata. The UC-genome species Ae. triuncialis was rather separate and slightly similar to the UX-genome species.

  7. Genetic Characterization and Comparative Genome Analysis of Brucella melitensis Isolates from India

    Directory of Open Access Journals (Sweden)

    Sarwar Azam

    2016-01-01

    Full Text Available Brucellosis is the most frequent zoonotic disease worldwide, with over 500,000 new human infections every year. Brucella melitensis, the most virulent species in humans, primarily affects goats and the zoonotic transmission occurs by ingestion of unpasteurized milk products or through direct contact with fetal tissues. Brucellosis is endemic in India but no information is available on population structure and genetic diversity of Brucella spp. in India. We performed multilocus sequence typing of four B. melitensis strains isolated from naturally infected goats from India. For more detailed genetic characterization, we carried out whole genome sequencing and comparative genome analysis of one of the B. melitensis isolates, Bm IND1. Genome analysis identified 141 unique SNPs, 78 VNTRs, 51 Indels, and 2 putative prophage integrations in the Bm IND1 genome. Our data may help to develop improved epidemiological typing tools and efficient preventive strategies to control brucellosis.

  8. Genetic Characterization and Comparative Genome Analysis of Brucella melitensis Isolates from India

    Science.gov (United States)

    Azam, Sarwar; Rao, Sashi Bhushan; Jakka, Padmaja; NarasimhaRao, Veera; Bhargavi, Bindu; Gupta, Vivek Kumar

    2016-01-01

    Brucellosis is the most frequent zoonotic disease worldwide, with over 500,000 new human infections every year. Brucella melitensis, the most virulent species in humans, primarily affects goats and the zoonotic transmission occurs by ingestion of unpasteurized milk products or through direct contact with fetal tissues. Brucellosis is endemic in India but no information is available on population structure and genetic diversity of Brucella spp. in India. We performed multilocus sequence typing of four B. melitensis strains isolated from naturally infected goats from India. For more detailed genetic characterization, we carried out whole genome sequencing and comparative genome analysis of one of the B. melitensis isolates, Bm IND1. Genome analysis identified 141 unique SNPs, 78 VNTRs, 51 Indels, and 2 putative prophage integrations in the Bm IND1 genome. Our data may help to develop improved epidemiological typing tools and efficient preventive strategies to control brucellosis. PMID:27525259

  9. Genetic Characterization and Comparative Genome Analysis of Brucella melitensis Isolates from India.

    Science.gov (United States)

    Azam, Sarwar; Rao, Sashi Bhushan; Jakka, Padmaja; NarasimhaRao, Veera; Bhargavi, Bindu; Gupta, Vivek Kumar; Radhakrishnan, Girish

    2016-01-01

    Brucellosis is the most frequent zoonotic disease worldwide, with over 500,000 new human infections every year. Brucella melitensis, the most virulent species in humans, primarily affects goats and the zoonotic transmission occurs by ingestion of unpasteurized milk products or through direct contact with fetal tissues. Brucellosis is endemic in India but no information is available on population structure and genetic diversity of Brucella spp. in India. We performed multilocus sequence typing of four B. melitensis strains isolated from naturally infected goats from India. For more detailed genetic characterization, we carried out whole genome sequencing and comparative genome analysis of one of the B. melitensis isolates, Bm IND1. Genome analysis identified 141 unique SNPs, 78 VNTRs, 51 Indels, and 2 putative prophage integrations in the Bm IND1 genome. Our data may help to develop improved epidemiological typing tools and efficient preventive strategies to control brucellosis.

  10. Bioinformatics tools and databases for whole genome sequence analysis of Mycobacterium tuberculosis.

    Science.gov (United States)

    Faksri, Kiatichai; Tan, Jun Hao; Chaiprasert, Angkana; Teo, Yik-Ying; Ong, Rick Twee-Hee

    2016-11-01

    Tuberculosis (TB) is an infectious disease of global public health importance caused by Mycobacterium tuberculosis complex (MTC) in which M. tuberculosis (Mtb) is the major causative agent. Recent advancements in genomic technologies such as next generation sequencing have enabled high throughput cost-effective generation of whole genome sequence information from Mtb clinical isolates, providing new insights into the evolution, genomic diversity and transmission of the Mtb bacteria, including molecular mechanisms of antibiotic resistance. The large volume of sequencing data generated however necessitated effective and efficient management, storage, analysis and visualization of the data and results through development of novel and customized bioinformatics software tools and databases. In this review, we aim to provide a comprehensive survey of the current freely available bioinformatics software tools and publicly accessible databases for genomic analysis of Mtb for identifying disease transmission in molecular epidemiology and in rapid determination of the antibiotic profiles of clinical isolates for prompt and optimal patient treatment.

  11. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    Science.gov (United States)

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities.

  12. Comparative genomic analysis of two-component regulatory proteins in Pseudomonas syringae

    DEFF Research Database (Denmark)

    Lavin, J.L.; Kiil, Kristoffer; Resano, O.

    2007-01-01

    important differences in TCS proteins among the three P. syringae pathovars. Conclusion: In this article we present a thorough analysis of the identification and distribution of TCS proteins among the sequenced genomes of P. syringae. We have identified differences in TCS proteins among the three P...... requires a complex array of TCS proteins to cope with diverse plant hosts, host responses, and environmental conditions. Results: Based on the genomic data, pattern searches with Hidden Markov Model (HMM) profiles have been used to identify putative HKs and RRs. The genomes of Psy B728a, Pto DC3000 and Pph...... 1448A were found to contain a large number of genes encoding TCS proteins, and a core of complete TCS proteins were shared between these genomes: 30 putative TCS clusters, 11 orphan HKs, 33 orphan RRs, and 16 hybrid HKs. A close analysis of the distribution of genes encoding TCS proteins revealed...

  13. A Fully Automated and Robust Method to Incorporate Stamping Data in Crash, NVH and Durability Analysis

    Science.gov (United States)

    Palaniswamy, Hariharasudhan; Kanthadai, Narayan; Roy, Subir; Beauchesne, Erwan

    2011-08-01

    Crash, NVH (Noise, Vibration, Harshness), and durability analysis are commonly deployed in structural CAE analysis for mechanical design of components especially in the automotive industry. Components manufactured by stamping constitute a major portion of the automotive structure. In CAE analysis they are modeled at a nominal state with uniform thickness and no residual stresses and strains. However, in reality the stamped components have non-uniformly distributed thickness and residual stresses and strains resulting from stamping. It is essential to consider the stamping information in CAE analysis to accurately model the behavior of the sheet metal structures under different loading conditions. Especially with the current emphasis on weight reduction by replacing conventional steels with aluminum and advanced high strength steels it is imperative to avoid over design. Considering this growing need in industry, a highly automated and robust method has been integrated within Altair Hyperworks® to initialize sheet metal components in CAE models with stamping data. This paper demonstrates this new feature and the influence of stamping data for a full car frontal crash analysis.

  14. Analysis of Aspergillus nidulans metabolism at the genome-scale

    DEFF Research Database (Denmark)

    David, Helga; Ozcelik, İlknur Ş; Hofmann, Gerald

    2008-01-01

    a function. Results: In this work, we have manually assigned functions to 472 orphan genes in the metabolism of A. nidulans, by using a pathway-driven approach and by employing comparative genomics tools based on sequence similarity. The central metabolism of A. nidulans, as well as biosynthetic pathways...... of relevant secondary metabolites, was reconstructed based on detailed metabolic reconstructions available for A. niger and Saccharomyces cerevisiae, and information on the genetics, biochemistry and physiology of A. nidulans. Thereby, it was possible to identify metabolic functions without a gene associated......, and to look for candidate ORFs in the genome of A. nidulans by comparing its sequence to sequences of well-characterized genes in other species encoding the function of interest. A classification system, based on defined criteria, was developed for evaluating and selecting the ORFs among the candidates...

  15. Genome analysis of the Anerobic Thermohalophilic bacterium Halothermothrix orenii

    Energy Technology Data Exchange (ETDEWEB)

    Mavromatis, Konstantinos; Ivanova, Natalia; Anderson, Iain; Lykidis, Athanasios; Hooper, Sean D.; Sun, Hui; Kunin, Victor; Lapidus, Alla; Hugenholtz, Philip; Patel, Bharat; Kyrpides, Nikos C.

    2008-11-03

    Halothermothirx orenii is a strictly anaerobic thermohalophilic bacterium isolated from sediment of a Tunisian salt lake. It belongs to the order Halanaerobiales in the phylum Firmicutes. The complete sequence revealed that the genome consists of one circular chromosome of 2578146 bps encoding 2451 predicted genes. This is the first genome sequence of an organism belonging to the Haloanaerobiales. Features of both Gram positive and Gram negative bacteria were identified with the presence of both a sporulating mechanism typical of Firmicutes and a characteristic Gram negative lipopolysaccharide being the most prominent. Protein sequence analyses and metabolic reconstruction reveal a unique combination of strategies for thermophilic and halophilic adaptation. H. orenii can serve as a model organism for the study of the evolution of the Gram negative phenotype as well as the adaptation under thermohalophilic conditions and the development of biotechnological applications under conditions that require high temperatures and high salt concentrations.

  16. Complete genome analysis of Ketogulonigenium sp.WB0104

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Ketogulonigenium sp. may convert L-sorbose into 2-keto-L-gulonic acid, the vitamin C precursor. The genome of Ketogulonigenium sp. WB0104 consists of a circular 2765030 bp chromosome with 61.69% G+C content and two circular plasmids of 267968 and 242707 bp. The genome contains 2727 open reading frames (ORFs). The systems of replication, transcription, translation, carbohydrate and energy metabolism are intact, but the repair system is incomplete. About 640 predicted ORFs have been found to encode transporter proteins, which account for about one fourth of total predicted ORFs, noticeably higher than other documented bacteria. This may be due to the fact that WB0104 adapts to soil circumstance.

  17. Steady-State Kinetic Analysis of DNA Polymerase Single-Nucleotide Incorporation Products

    Science.gov (United States)

    O'Flaherty, Derek K.

    2014-01-01

    This unit describes the experimental procedures for the steady-state kinetic analysis of DNA synthesis across DNA nucleotides (native or modified) by DNA polymerases. In vitro primer extension experiments with a single nucleoside triphosphate species followed by denaturing polyacrylamide gel electrophoresis of the extended products is described. Data analysis procedures and fitting to steady-state kinetic models is presented to highlight the kinetic differences involved in the bypass of damaged versus undamaged DNA. Moreover, explanations concerning problems encountered in these experiments are addressed. This approach provides useful quantitative parameters for the processing of damaged DNA by DNA polymerases. PMID:25501593

  18. STINGRAY: system for integrated genomic resources and analysis

    OpenAIRE

    Wagner, Glauber; Jardim, Rodrigo; Tschoeke, Diogo A; Loureiro, Daniel R.; Ocaña, Kary ACS; Ribeiro, Antonio CB; Vanessa E. Emmel; Probst, Christian M.; Pitaluga, André N; Grisard, Edmundo C; Cavalcanti, Maria C; Campos, Maria LM; Mattoso, Marta; Dávila, Alberto MR

    2014-01-01

    Background The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. Findings STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interfac...

  19. QTL Analysis and Functional Genomics of Animal Model

    DEFF Research Database (Denmark)

    Farajzadeh, Leila

    In recent years, the use of functional genomics and next-generation sequencing technologies has increased the probability of success in studies of complex properties. The integration of large data sets from association studies, DNA resequencing, gene expression profiles and phenotypic data......, for example, has enabled scientists to examine more complex interactions in connection with studies of properties and diseases. In her PhD project, Leila Farajzadeh integrated different organisational levels in biology, including genotype, phenotype, association studies, transcription profiles and genetic...

  20. Genomic analysis of the symbiotic marine crenarchaeon, Cenarchaeumsymbiosum

    Energy Technology Data Exchange (ETDEWEB)

    Hallam, Steven J.; Konstantinidis, Konstantinos T.; Brochier,Celine; Putnam, Nik; Schleper, Christa; Watanabe, Yoh-ichi; Sugahara,Junichi; Preston, Christina; de la Torre, Jose; Richardson, Paul M.; DeLong, Edward F.

    2006-06-24

    Crenarchaea are ubiquitous and abundant microbial constituents of soils, sediments, lakes and ocean waters, yet relatively little is known about their fundamental evolutionary, ecological, and physiological properties. To better describe the ubiquitous nonthermophilic Crenarchaea, we analyzed the genome sequence of one representative, the uncultivated sponge symbiont, Cenarchaeum symbiosum. C. symbiosum genotypes coinhabiting the same host partitioned into two dominant populations, corresponding to previously described a- and b-type ribosomal RNA variants. Although synthetic, overlapping a- and b-type ribotypes harbored significant genetic variability. A single tiling path comprising the dominant a-type genotype was assembled, and used to explore the biological properties of C. symbiosum and its planktonic relatives. Out of a total of 2,066 predicted open reading frames, 36% were more highly conserved with other Archaea. The remainder partitioned between bacteria (18%), eukaryotes (1.5%) and viruses (0.1%). A total of 525 open reading frames were more highly conserved with sequences derived from marine environmental genomic surveys, most probably representing orthologous genes found in free-living planktonic Crenarchaea. The remaining genes partitioned between functional RNAs (2.4%), and hypotheticals (42%) with limited homology to known functional genes. The latter category likely contains genes specifically involved in mediated archaeal-sponge symbiosis. Phylogenetic analyses placed C. symbiosum as a basal crenarchaeon, sharing specific genomic features in common with either Crenarchaea, Euryarchaea, or both. The genome sequence of C. symbiosum reflect a unique and unusual evolutionary, physiological, and ecological history, one remarkably distinct from that of any other previously known microbial lineage.

  1. Comparative Genomic Analysis of Human Fungal Pathogens Causing Paracoccidioidomycosis

    OpenAIRE

    Desjardins, Christopher A; Champion, Mia D.; Holder, Jason W.; Muszewska, Anna; Goldberg, Jonathan; Bailao, Alexandre M.; Brigido, Marcelo de Macedo; Silva Ferreira, Marcia Eliana da; Garcia, Ana Maria; Grynberg, Marcin; Gujja, Sharvari; Heiman, David I.; Henn, Matthew R.; Kodira, Chinnappa D.; Leon-Narvaez, Henry

    2011-01-01

    Paracoccidioides is a fungal pathogen and the cause of paracoccidioidomycosis, a health-threatening human systemic mycosis endemic to Latin America. Infection by Paracoccidioides, a dimorphic fungus in the order Onygenales, is coupled with a thermally regulated transition from a soil-dwelling filamentous form to a yeast-like pathogenic form. To better understand the genetic basis of growth and pathogenicity in Paracoccidioides, we sequenced the genomes of two strains of Paracoccidioides brasi...

  2. Structure-infectivity analysis of the human rhinovirus genomic RNA 3' non-coding region.

    OpenAIRE

    1996-01-01

    The specific recognition of genomic positive strand RNAS as templates for the synthesis of intermediate negative strands by the picornavirus replication machinery is presumably mediated by cis-acting sequences within the genomic RNA 3' non-coding region (NCR). A structure-infectivity analysis was conducted on the 44 nt human rhinovirus 14 (HRV14) 3' NCR to identify the primary sequence and/or secondary structure determinants required for viral replication. Using biochemical RNA secondary stru...

  3. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci

    OpenAIRE

    Stahl, Eli A; Raychaudhuri, Soumya; Remmers, Elaine F.; Xie, Gang; Eyre, Stephen; Thomson, Brian P.; Li, Yonghong; Kurreeman, Fina A. S.; Zhernakova, Alexandra; Hinks, Anne; Guiducci, Candace; Chen, Robert; Alfredsson, Lars; Amos, Christopher I.; Ardlie, Kristin G.

    2010-01-01

    To identify novel genetic risk factors for rheumatoid arthritis (RA), we conducted a genome-wide association study (GWAS) meta-analysis of 5,539 autoantibody positive RA cases and 20,169 controls of European descent, followed by replication in an independent set of 6,768 RA cases and 8,806 controls. Of 34 SNPs selected for replication, 7 novel RA risk alleles were identified at genome-wide significance (P

  4. SpeedSeq: Ultra-fast personal genome analysis and interpretation

    Science.gov (United States)

    Chiang, Colby; Layer, Ryan M.; Faust, Gregory G.; Lindberg, Michael R.; Rose, David B.; Garrison, Erik P.; Marth, Gabor T.; Quinlan, Aaron R.; Hall, Ira M.

    2015-01-01

    SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 hours on a low-cost server, alleviating a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers competitive or superior performance to current methods for detecting germline and somatic single nucleotide variants, indels, and structural variants, and includes novel functionality for streamlined interpretation. PMID:26258291

  5. A genome-wide 20 K citrus microarray for gene expression analysis

    OpenAIRE

    Martínez-Godoy, M. Ángeles; Mauri, Nuria; Juárez, José; Marqués, M. Carmen; Santiago, Julia; Forment, Javier; Gadea Vacas, José

    2008-01-01

    Background: Understanding of genetic elements that contribute to key aspects of citrus biology will impact future improvements in this economically important crop. Global gene expression analysis demands microarray platforms with a high genome coverage. In the last years, genomewide EST collections have been generated in citrus, opening the possibility to create new tools for functional genomics in this crop plant. Results: We have designed and constructed a publicly available ...

  6. Chromosome region-specific libraries for human genome analysis

    Energy Technology Data Exchange (ETDEWEB)

    Kao, Fa-Ten.

    1991-01-01

    We have made important progress since the beginning of the current grant year. We have further developed the microdissection and PCR- assisted microcloning techniques using the linker-adaptor method. We have critically evaluated the microdissection libraries constructed by this microtechnology and proved that they are of high quality. We further demonstrated that these microdissection clones are useful in identifying corresponding YAC clones for a thousand-fold expansion of the genomic coverage and for contig construction. We are also improving the technique of cloning the dissected fragments in test tube by the TDT method. We are applying both of these PCR cloning technique to human chromosomes 2 and 5 to construct region-specific libraries for physical mapping purposes of LLNL and LANL. Finally, we are exploring efficient procedures to use unique sequence microclones to isolate cDNA clones from defined chromosomal regions as valuable resources for identifying expressed gene sequences in the human genome. We believe that we are making important progress under the auspices of this DOE human genome program grant and we will continue to make significant contributions in the coming year. 4 refs., 4 figs.

  7. MIPS: analysis and annotation of genome information in 2007.

    Science.gov (United States)

    Mewes, H W; Dietmann, S; Frishman, D; Gregory, R; Mannhaupt, G; Mayer, K F X; Münsterkötter, M; Ruepp, A; Spannagl, M; Stümpflen, V; Rattei, T

    2008-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  8. Analysis of complete nucleotide sequences of 12 Gossypium chloroplast genomes: origin and evolution of allotetraploids.

    Directory of Open Access Journals (Sweden)

    Qin Xu

    Full Text Available BACKGROUND: Cotton (Gossypium spp. is a model system for the analysis of polyploidization. Although ascertaining the donor species of allotetraploid cotton has been intensively studied, sequence comparison of Gossypium chloroplast genomes is still of interest to understand the mechanisms underlining the evolution of Gossypium allotetraploids, while it is generally accepted that the parents were A- and D-genome containing species. Here we performed a comparative analysis of 13 Gossypium chloroplast genomes, twelve of which are presented here for the first time. METHODOLOGY/PRINCIPAL FINDINGS: The size of 12 chloroplast genomes under study varied from 159,959 bp to 160,433 bp. The chromosomes were highly similar having >98% sequence identity. They encoded the same set of 112 unique genes which occurred in a uniform order with only slightly different boundary junctions. Divergence due to indels as well as substitutions was examined separately for genome, coding and noncoding sequences. The genome divergence was estimated as 0.374% to 0.583% between allotetraploid species and A-genome, and 0.159% to 0.454% within allotetraploids. Forty protein-coding genes were completely identical at the protein level, and 20 intergenic sequences were completely conserved. The 9 allotetraploids shared 5 insertions and 9 deletions in whole genome, and 7-bp substitutions in protein-coding genes. The phylogenetic tree confirmed a close relationship between allotetraploids and the ancestor of A-genome, and the allotetraploids were divided into four separate groups. Progenitor allotetraploid cotton originated 0.43-0.68 million years ago (MYA. CONCLUSION: Despite high degree of conservation between the Gossypium chloroplast genomes, sequence variations among species could still be detected. Gossypium chloroplast genomes preferred for 5-bp indels and 1-3-bp indels are mainly attributed to the SSR polymorphisms. This study supports that the common ancestor of diploid A-genome

  9. Genomic and population genetic analysis of deep-sea vent chemoautotrophs

    Science.gov (United States)

    Nakagawa, S.; Shimamura, S.; Takaki, Y.; Mino, S.; Makita, H.; Sawabe, T.; Takai, K.

    2012-12-01

    Deep-sea vents are the light-independent, highly productive ecosystems driven primarily by chemoautotrophs. Most of the invertebrates thrive there through their relationship with symbiotic chemoautotrophs. Chemoautotrophs are microorganisms that are able to fix inorganic carbon using a chemical energy obtained through the oxidation of reduced compounds. Following the discovery of deep-sea vent ecosystems in 1977, there has been an increasing knowledge that deep-sea vent chemoautotrophs display remarkable physiological and phylogenetic diversity. Recent microbiological studies have led to an emerging view that the majority of deep-sea vent chemoautotrophs have the ability to derive energy from multiple redox couples other than the conventional sulfur-oxygen couple. Genomic, metagenomic and postgenomic studies have considerably accelerated the comprehensive understanding of molecular mechanisms of deep-sea vent chemoautotrophy, even in unculturable endosymbionts of vent fauna. For example, genomic analysis suggested that there were previously unrecognized evolutionary links between deep-sea vent chemoautotrophs and important human/animal pathogens. However, relatively little is known about the genome of horizontally transmitted endosymbionts. In this study, we sequenced whole genomes of the probably horizontally transmitted endosymbionts of two different gastropod species from a deep-sea hydrothermal field, as an effort to address questions about 1) the genome evolution of horizontally transmitted, facultative endosymbionts, 2) their genomic variability, and 3) genetic differences among symbionts of various deep-sea vent invertebrates. Both endosymbiont genomes display features consistent with ongoing genome reduction such as large proportions of pseudogenes and transposable elements. The genomes encode multiple functions for chemoautotrophic respirations, probably reflecting their adaptation to their niches with continuous changes in environmental conditions. When

  10. Analysis of Product Sampling for New Product Diffusion Incorporating Multiple-Unit Ownership

    Directory of Open Access Journals (Sweden)

    Zhineng Hu

    2014-01-01

    Full Text Available Multiple-unit ownership of nondurable products is an important component of sales in many product categories. Based on the Bass model, this paper develops a new model considering the multiple-unit adoptions as a diffusion process under the influence of product sampling. Though the analysis aims to determine the optimal dynamic sampling effort for a firm and the results demonstrate that experience sampling can accelerate the diffusion process, the best time to send free samples is just before the product being launched. Multiple-unit purchasing behavior can increase sales to make more profit for a firm, and it needs more samples to make the product known much better. The local sensitivity analysis shows that the increase of both external coefficients and internal coefficients has a negative influence on the sampling level, but the internal influence on the subsequent multiple-unit adoptions has little significant influence on the sampling. Using the logistic regression along with linear regression, the global sensitivity analysis gives a whole analysis of the interaction of all factors, which manifests the external influence and multiunit purchase rate are two most important factors to influence the sampling level and net present value of the new product, and presents a two-stage method to determine the sampling level.

  11. Protein Analysis Using Real-Time PCR Instrumentation: Incorporation in an Integrated, Inquiry-Based Project

    Science.gov (United States)

    Southard, Jonathan N.

    2014-01-01

    Instrumentation for real-time PCR is used primarily for amplification and quantitation of nucleic acids. The capability to measure fluorescence while controlling temperature in multiple samples can also be applied to the analysis of proteins. Conformational stability and changes in stability due to ligand binding are easily assessed. Protein…

  12. Incorporating Asymmetric Dependency Patterns in the Evaluation of IS/IT projects Using Real Option Analysis

    Science.gov (United States)

    Burke, John C.

    2012-01-01

    The objective of my dissertation is to create a general approach to evaluating IS/IT projects using Real Option Analysis (ROA). This is an important problem because an IT Project Portfolio (ITPP) can represent hundreds of projects, millions of dollars of investment and hundreds of thousands of employee hours. Therefore, any advance in the…

  13. Integration of the Rat Recombination and EST Maps in the Rat Genomic Sequence and Comparative Mapping Analysis With the Mouse Genome

    OpenAIRE

    Wilder, Steven P.; Bihoreau, Marie-Thérèse; Argoud, Karène; Watanabe, Takeshi K.; Lathrop, Mark; Gauguier, Dominique

    2004-01-01

    Inbred strains of the laboratory rat are widely used for identifying genetic regions involved in the control of complex quantitative phenotypes of biomedical importance. The draft genomic sequence of the rat now provides essential information for annotating rat quantitative trait locus (QTL) maps. Following the survey of unique rat microsatellite (11,585 including 1648 new markers) and EST (10,067) markers currently available, we have incorporated a selection of 7952 rat EST sequences in an i...

  14. Comparative Genomic Analysis of Clinical and Environmental Vibrio Vulnificus Isolates Revealed Biotype 3 Evolutionary Relationships

    Directory of Open Access Journals (Sweden)

    Yael eKotton

    2015-01-01

    Full Text Available In 1996 a common-source outbreak of severe soft tissue and bloodstream infections erupted among Israeli fish farmers and fish consumers due to changes in fish marketing policies. The causative pathogen was a new strain of Vibrio vulnificus, named biotype 3, which displayed a unique biochemical and genotypic profile. Initial observations suggested that the pathogen erupted as a result of genetic recombination between two distinct populations. We applied a whole genome shotgun sequencing approach using several V. vulnificus strains from Israel in order to study the pan genome of V. vulnificus and determine the phylogenetic relationship of biotype 3 with existing populations. The core genome of V. vulnificus based on 16 draft and complete genomes consisted of 3068 genes, representing between 59% and 78% of the whole genome of 16 strains. The accessory genome varied in size from 781 kbp to 2044 kbp. Phylogenetic analysis based on whole, core, and accessory genomes displayed similar clustering patterns with two main clusters, clinical (C and environmental (E, all biotype 3 strains formed a distinct group within the E cluster. Annotation of accessory genomic regions found in biotype 3 strains and absent from the core genome yielded 1732 genes, of which the vast majority encoded hypothetical proteins, phage-related proteins, and mobile element proteins. A total of 1916 proteins (including 713 hypothetical proteins were present in all human pathogenic strains (both biotype 3 and non-biotype 3 and absent from the environmental strains. Clustering analysis of the non-hypothetical proteins revealed 148 protein clusters shared by all human pathogenic strains; these included transcriptional regulators, arylsulfatases, methyl-accepting chemotaxis proteins, acetyltransferases, GGDEF family proteins, transposases, type IV secretory system (T4SS proteins, and integrases. Our study showed that V. vulnificus biotype 3 evolved from environmental populations and

  15. Evaluation of a Two-Stage Approach in Trans-Ethnic Meta-Analysis in Genome-Wide Association Studies.

    Science.gov (United States)

    Hong, Jaeyoung; Lunetta, Kathryn L; Cupples, L Adrienne; Dupuis, Josée; Liu, Ching-Ti

    2016-05-01

    Meta-analysis of genome-wide association studies (GWAS) has achieved great success in detecting loci underlying human diseases. Incorporating GWAS results from diverse ethnic populations for meta-analysis, however, remains challenging because of the possible heterogeneity across studies. Conventional fixed-effects (FE) or random-effects (RE) methods may not be most suitable to aggregate multiethnic GWAS results because of violation of the homogeneous effect assumption across studies (FE) or low power to detect signals (RE). Three recently proposed methods, modified RE (RE-HE) model, binary-effects (BE) model and a Bayesian approach (Meta-analysis of Transethnic Association [MANTRA]), show increased power over FE and RE methods while incorporating heterogeneity of effects when meta-analyzing trans-ethnic GWAS results. We propose a two-stage approach to account for heterogeneity in trans-ethnic meta-analysis in which we clustered studies with cohort-specific ancestry information prior to meta-analysis. We compare this to a no-prior-clustering (crude) approach, evaluating type I error and power of these two strategies, in an extensive simulation study to investigate whether the two-stage approach offers any improvements over the crude approach. We find that the two-stage approach and the crude approach for all five methods (FE, RE, RE-HE, BE, MANTRA) provide well-controlled type I error. However, the two-stage approach shows increased power for BE and RE-HE, and similar power for MANTRA and FE compared to their corresponding crude approach, especially when there is heterogeneity across the multiethnic GWAS results. These results suggest that prior clustering in the two-stage approach can be an effective and efficient intermediate step in meta-analysis to account for the multiethnic heterogeneity.

  16. Bridging ImmunoGenomic Data Analysis Workflow Gaps (BIGDAWG): An integrated case-control analysis pipeline.

    Science.gov (United States)

    Pappas, Derek J; Marin, Wesley; Hollenbach, Jill A; Mack, Steven J

    2016-03-01

    Bridging ImmunoGenomic Data-Analysis Workflow Gaps (BIGDAWG) is an integrated data-analysis pipeline designed for the standardized analysis of highly-polymorphic genetic data, specifically for the HLA and KIR genetic systems. Most modern genetic analysis programs are designed for the analysis of single nucleotide polymorphisms, but the highly polymorphic nature of HLA and KIR data require specialized methods of data analysis. BIGDAWG performs case-control data analyses of highly polymorphic genotype data characteristic of the HLA and KIR loci. BIGDAWG performs tests for Hardy-Weinberg equilibrium, calculates allele frequencies and bins low-frequency alleles for k×2 and 2×2 chi-squared tests, and calculates odds ratios, confidence intervals and p-values for each allele. When multi-locus genotype data are available, BIGDAWG estimates user-specified haplotypes and performs the same binning and statistical calculations for each haplotype. For the HLA loci, BIGDAWG performs the same analyses at the individual amino-acid level. Finally, BIGDAWG generates figures and tables for each of these comparisons. BIGDAWG obviates the error-prone reformatting needed to traffic data between multiple programs, and streamlines and standardizes the data-analysis process for case-control studies of highly polymorphic data. BIGDAWG has been implemented as the bigdawg R package and as a free web application at bigdawg.immunogenomics.org.

  17. Genome sequencing and analysis of a granulovirus isolated from the Asiatic rice leafroller, Cnaphalocrocis medinalis

    Institute of Scientific and Technical Information of China (English)

    Shan Zhang; Zheng Zhu; Shifeng Sun; Qijin Chen; Fei Deng; Kai Yang

    2015-01-01

    The complete genome of Cnaphalocrocis medinalis granulovirus(CnmeGV) from a serious migratory rice pest, Cnaphalocrocis medinalis(Lepidoptera: Pyralidae), was sequenced using the Roche 454 Genome Sequencer FLX system(GS FLX) with shotgun strategy and assembled by Roche GS De Novo assembler software. Its circular double-stranded genome is 111,246 bp in size with a high A+T content of 64.8% and codes for 118 putative open reading frames(ORFs). It contains 37 conserved baculovirus core ORFs, 13 unique ORFs, 26 ORFs that were found in all Lepidoptera baculoviruses and 42 common ORFs. The analysis of nucleotide sequence repeats revealed that the CnmeGV genome differs from the rest of sequenced GVs by a 23 kb and a 17 kb gene block inversions, and does not contain any typical homologous region(hr) except for a region of non-hr-like sequence. Chitinase and cathepsin genes, which are reported to have major roles in the liquefaction of the hosts, were not found in the CnmeGV genome, which explains why CnmeGV infected insects do not show the phenotype of typical liquefaction. Phylogenetic analysis,based on the 37 core baculovirus genes, indicates that CnmeGV is closely related to Adoxophyes orana granulovirus. The genome analysis would contribute to the functional research of CnmeGV,and would benefit to the utilization of CnmeGV as pest control reagent for rice production.

  18. Genome sequencing and analysis of a granulovirus isolated from the Asiatic rice leafroller, Cnaphalocrocis medinalis.

    Science.gov (United States)

    Zhang, Shan; Zhu, Zheng; Sun, Shifeng; Chen, Qijin; Deng, Fei; Yang, Kai

    2015-12-01

    The complete genome of Cnaphalocrocis medinalis granulovirus (CnmeGV) from a serious migratory rice pest, Cnaphalocrocis medinalis (Lepidoptera: Pyralidae), was sequenced using the Roche 454 Genome Sequencer FLX system (GS FLX) with shotgun strategy and assembled by Roche GS De Novo assembler software. Its circular double-stranded genome is 111,246 bp in size with a high A+T content of 64.8% and codes for 118 putative open reading frames (ORFs). It contains 37 conserved baculovirus core ORFs, 13 unique ORFs, 26 ORFs that were found in all Lepidoptera baculoviruses and 42 common ORFs. The analysis of nucleotide sequence repeats revealed that the CnmeGV genome differs from the rest of sequenced GVs by a 23 kb and a 17kb gene block inversions, and does not contain any typical homologous region (hr) except for a region of non-hr-like sequence. Chitinase and cathepsin genes, which are reported to have major roles in the liquefaction of the hosts, were not found in the CnmeGV genome, which explains why CnmeGV infected insects do not show the phenotype of typical liquefaction. Phylogenetic analysis, based on the 37 core baculovirus genes, indicates that CnmeGV is closely related to Adoxophyes orana granulovirus. The genome analysis would contribute to the functional research of CnmeGV, and would benefit to the utilization of CnmeGV as pest control reagent for rice production.

  19. IMG/M: integrated genome and metagenome comparative data analysis system.

    Science.gov (United States)

    Chen, I-Min A; Markowitz, Victor M; Chu, Ken; Palaniappan, Krishna; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Andersen, Evan; Huntemann, Marcel; Varghese, Neha; Hadjithomas, Michalis; Tennessen, Kristin; Nielsen, Torben; Ivanova, Natalia N; Kyrpides, Nikos C

    2017-01-04

    The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support for examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review (ER) companion system (IMG/M ER: https://img.jgi.doe.gov/mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.

  20. Comparative analysis of Salmonella genomes identifies a metabolic network for escalating growth in the inflamed gut.

    Science.gov (United States)

    Nuccio, Sean-Paul; Bäumler, Andreas J

    2014-03-18

    The Salmonella genus comprises a group of pathogens associated with illnesses ranging from gastroenteritis to typhoid fever. We performed an in silico analysis of comparatively reannotated Salmonella genomes to identify genomic signatures indicative of disease potential. By removing numerous annotation inconsistencies and inaccuracies, the process of reannotation identified a network of 469 genes involved in central anaerobic metabolism, which was intact in genomes of gastrointestinal pathogens but degrading in genomes of extraintestinal pathogens. This large network contained pathways that enable gastrointestinal pathogens to utilize inflammation-derived nutrients as well as many of the biochemical reactions used for the enrichment and biochemical discrimination of Salmonella serovars. Thus, comparative genome analysis identifies a metabolic network that provides clues about the strategies for nutrient acquisition and utilization that are characteristic of gastrointestinal pathogens. IMPORTANCE While some Salmonella serovars cause infections that remain localized to the gut, others disseminate throughout the body. Here, we compared Salmonella genomes to identify characteristics that distinguish gastrointestinal from extraintestinal pathogens. We identified a large metabolic network that is functional in gastrointestinal pathogens but decaying in extraintestinal pathogens. While taxonomists have used traits from this network empirically for many decades for the enrichment and biochemical discrimination of Salmonella serovars, our findings suggest that it is part of a "business plan" for growth in the inflamed gastrointestinal tract. By identifying a large metabolic network characteristic of Salmonella serovars associated with gastroenteritis, our in silico analysis provides a blueprint for potential strategies to utilize inflammation-derived nutrients and edge out competing gut microbes.

  1. Teaching For Art Criticism: Incorporating Feldman’s Critical Analysis Learning Model In Students’ Studio Practice

    Directory of Open Access Journals (Sweden)

    Maithreyi Subramaniam

    2016-01-01

    Full Text Available This study adopted 30 first year graphic design students’ artwork, with critical analysis using Feldman’s model of art criticism. Data were analyzed quantitatively; descriptive statistical techniques were employed. The scores were viewed in the form of mean score and frequencies to determine students’ performances in their critical ability. Pearson Correlation Coefficient was used to find out the correlation between students’ studio practice and art critical ability scores. The findings showed most students performed slightly better than average in the critical analyses and performed best in selecting analysis among the four dimensions assessed. In the context of the students’ studio practice and critical ability, findings showed there are some connections between the students’ art critical ability and studio practice.

  2. Multifractal detrended cross-correlation analysis of genome sequences using chaos-game representation

    Science.gov (United States)

    Pal, Mayukha; Kiran, V. Satya; Rao, P. Madhusudana; Manimaran, P.

    2016-08-01

    We characterized the multifractal nature and power law cross-correlation between any pair of genome sequence through an integrative approach combining 2D multifractal detrended cross-correlation analysis and chaos game representation. In this paper, we have analyzed genomes of some prokaryotes and calculated fractal spectra h(q) and f(α) . From our analysis, we observed existence of multifractal nature and power law cross-correlation behavior between any pair of genome sequences. Cluster analysis was performed on the calculated scaling exponents to identify the class affiliation and the same is represented as a dendrogram. We suggest this approach may find applications in next generation sequence analysis, big data analytics etc.

  3. Pre-steady-state Kinetic Analysis of a Family D DNA Polymerase from Thermococcus sp. 9°N Reveals Mechanisms for Archaeal Genomic Replication and Maintenance*

    Science.gov (United States)

    Schermerhorn, Kelly M.; Gardner, Andrew F.

    2015-01-01

    Family D DNA polymerases (polDs) have been implicated as the major replicative polymerase in archaea, excluding the Crenarchaeota branch, and bear little sequence homology to other DNA polymerase families. Here we report a detailed kinetic analysis of nucleotide incorporation and exonuclease activity for a Family D DNA polymerase from Thermococcus sp. 9°N. Pre-steady-state single-turnover nucleotide incorporation assays were performed to obtain the kinetic parameters, kpol and Kd, for correct nucleotide incorporation, incorrect nucleotide incorporation, and ribonucleotide incorporation by exonuclease-deficient polD. Correct nucleotide incorporation kinetics revealed a relatively slow maximal rate of polymerization (kpol ∼2.5 s−1) and especially tight nucleotide binding (Kd(dNTP) ∼1.7 μm), compared with DNA polymerases from Families A, B, C, X, and Y. Furthermore, pre-steady-state nucleotide incorporation assays revealed that polD prevents the incorporation of incorrect nucleotides and ribonucleotides primarily through reduced nucleotide binding affinity. Pre-steady-state single-turnover assays on wild-type 9°N polD were used to examine 3′-5′ exonuclease hydrolysis activity in the presence of Mg2+ and Mn2+. Interestingly, substituting Mn2+ for Mg2+ accelerated hydrolysis rates >40-fold (kexo ≥110 s−1 versus ≥2.5 s−1). Preference for Mn2+ over Mg2+ in exonuclease hydrolysis activity is a property unique to the polD family. The kinetic assays performed in this work provide critical insight into the mechanisms that polD employs to accurately and efficiently replicate the archaeal genome. Furthermore, despite the unique properties of polD, this work suggests that a conserved polymerase kinetic pathway is present in all known DNA polymerase families. PMID:26160179

  4. Pre-steady-state Kinetic Analysis of a Family D DNA Polymerase from Thermococcus sp. 9°N Reveals Mechanisms for Archaeal Genomic Replication and Maintenance.

    Science.gov (United States)

    Schermerhorn, Kelly M; Gardner, Andrew F

    2015-09-04

    Family D DNA polymerases (polDs) have been implicated as the major replicative polymerase in archaea, excluding the Crenarchaeota branch, and bear little sequence homology to other DNA polymerase families. Here we report a detailed kinetic analysis of nucleotide incorporation and exonuclease activity for a Family D DNA polymerase from Thermococcus sp. 9°N. Pre-steady-state single-turnover nucleotide incorporation assays were performed to obtain the kinetic parameters, kpol and Kd, for correct nucleotide incorporation, incorrect nucleotide incorporation, and ribonucleotide incorporation by exonuclease-deficient polD. Correct nucleotide incorporation kinetics revealed a relatively slow maximal rate of polymerization (kpol ∼ 2.5 s(-1)) and especially tight nucleotide binding (Kd (dNTP) ∼ 1.7 μm), compared with DNA polymerases from Families A, B, C, X, and Y. Furthermore, pre-steady-state nucleotide incorporation assays revealed that polD prevents the incorporation of incorrect nucleotides and ribonucleotides primarily through reduced nucleotide binding affinity. Pre-steady-state single-turnover assays on wild-type 9°N polD were used to examine 3'-5' exonuclease hydrolysis activity in the presence of Mg(2+) and Mn(2+). Interestingly, substituting Mn(2+) for Mg(2+) accelerated hydrolysis rates > 40-fold (kexo ≥ 110 s(-1) versus ≥ 2.5 s(-1)). Preference for Mn(2+) over Mg(2+) in exonuclease hydrolysis activity is a property unique to the polD family. The kinetic assays performed in this work provide critical insight into the mechanisms that polD employs to accurately and efficiently replicate the archaeal genome. Furthermore, despite the unique properties of polD, this work suggests that a conserved polymerase kinetic pathway is present in all known DNA polymerase families.

  5. Comparative analysis of A, B,C and D genomes in the genus Oryza with C0t-1 DNA of C genome

    Institute of Scientific and Technical Information of China (English)

    LAN Weizhen; QIN Rui; LI Gang; HE Guangcun

    2006-01-01

    Fluorescence in situ hybridization (FISH)was applied to somatic chromosomes preparations of Oryza officinalis Wall. (CC), O. sativa L. (AA)×O. officinalis F1 hybrid (AC), backcross progenies BC1 (AAC and ACC), O. latifolia Desv. (CCDD), O. alta Swallen (CCDD) and O. punctata Kotschy (BBCC)with a labelled probe of Cot-1 DNA from O. officinalis.In O. officinalis, the homologous chromosomes showed similar signal bands probed by C0t-1 DNA and karyotype analysis was conducted based on the band patterns. Using no blocking DNA, the probe identified the chromosomes of C genome clearly, but detected few signals on chromosomes of A genome in the F1 hybrid and two backcross progenies of BC1.It is obvious that the highly and moderately repetitive DNA sequences were considerably different between C and A genomes. The chromosomes of C genome were also discriminated from the chromosomes of D-and B-genome in the tetraploid species O. latifolia, O.alta and O. punctata by C0t-1 DNA-FISH. Comparison of the fluorescence intensity on the chromosomes of B, C and D genomes in O. latifolia, O. alta,and O. punctata indicated that the differentiations between C and D genomes are less than that between C and B genomes. The relationship between C and D genomes in O. alta is closer than that of C and D genomes in O. latifolia. This would be one of the causes for the fact that both the genomes are of the same karyotype (CCDD) but belong to different species. The above results showed that the C0t-1 DNA had a high specificity of genome and species. In this paper, the origin of allotetraploid in genus Oryza is also discussed.

  6. arrayCGHbase: an analysis platform for comparative genomic hybridization microarrays

    Directory of Open Access Journals (Sweden)

    Moreau Yves

    2005-05-01

    Full Text Available Abstract Background The availability of the human genome sequence as well as the large number of physically accessible oligonucleotides, cDNA, and BAC clones across the entire genome has triggered and accelerated the use of several platforms for analysis of DNA copy number changes, amongst others microarray comparative genomic hybridization (arrayCGH. One of the challenges inherent to this new technology is the management and analysis of large numbers of data points generated in each individual experiment. Results We have developed arrayCGHbase, a comprehensive analysis platform for arrayCGH experiments consisting of a MIAME (Minimal Information About a Microarray Experiment supportive database using MySQL underlying a data mining web tool, to store, analyze, interpret, compare, and visualize arrayCGH results in a uniform and user-friendly format. Following its flexible design, arrayCGHbase is compatible with all existing and forthcoming arrayCGH platforms. Data can be exported in a multitude of formats, including BED files to map copy number information on the genome using the Ensembl or UCSC genome browser. Conclusion ArrayCGHbase is a web based and platform independent arrayCGH data analysis tool, that allows users to access the analysis suite through the internet or a local intranet after installation on a private server. ArrayCGHbase is available at http://medgen.ugent.be/arrayCGHbase/.

  7. Genomic Analysis of Caldithrix abyssi, the Thermophilic Anaerobic Bacterium of the Novel Bacterial Phylum Calditrichaeota

    Science.gov (United States)

    Kublanov, Ilya V.; Sigalova, Olga M.; Gavrilov, Sergey N.; Lebedinsky, Alexander V.; Rinke, Christian; Kovaleva, Olga; Chernyh, Nikolai A.; Ivanova, Natalia; Daum, Chris; Reddy, T.B.K.; Klenk, Hans-Peter; Spring, Stefan; Göker, Markus; Reva, Oleg N.; Miroshnichenko, Margarita L.; Kyrpides, Nikos C.; Woyke, Tanja; Gelfand, Mikhail S.; Bonch-Osmolovskaya, Elizaveta A.

    2017-01-01

    The genome of Caldithrix abyssi, the first cultivated representative of a phylum-level bacterial lineage, was sequenced within the framework of Genomic Encyclopedia of Bacteria and Archaea (GEBA) project. The genomic analysis revealed mechanisms allowing this anaerobic bacterium to ferment peptides or to implement nitrate reduction with acetate or molecular hydrogen as electron donors. The genome encoded five different [NiFe]- and [FeFe]-hydrogenases, one of which, group 1 [NiFe]-hydrogenase, is presumably involved in lithoheterotrophic growth, three other produce H2 during fermentation, and one is apparently bidirectional. The ability to reduce nitrate is determined by a nitrate reductase of the Nap family, while nitrite reduction to ammonia is presumably catalyzed by an octaheme cytochrome c nitrite reductase εHao. The genome contained genes of respiratory polysulfide/thiosulfate reductase, however, elemental sulfur and thiosulfate were not used as the electron acceptors for anaerobic respiration with acetate or H2, probably due to the lack of the gene of the maturation protein. Nevertheless, elemental sulfur and thiosulfate stimulated growth on fermentable substrates (peptides), being reduced to sulfide, most probably through the action of the cytoplasmic sulfide dehydrogenase and/or NAD(P)-dependent [NiFe]-hydrogenase (sulfhydrogenase) encoded by the genome. Surprisingly, the genome of this anaerobic microorganism encoded all genes for cytochrome c oxidase, however, its maturation machinery seems to be non-operational due to genomic rearrangements of supplementary genes. Despite the fact that sugars were not among the substrates reported when C. abyssi was first described, our genomic analysis revealed multiple genes of glycoside hydrolases, and some of them were predicted to be secreted. This finding aided in bringing out four carbohydrates that supported the growth of C. abyssi: starch, cellobiose, glucomannan and xyloglucan. The genomic analysis

  8. Phylogenetic analysis of the genomes of two strains of human adenovirus type 3

    Institute of Scientific and Technical Information of China (English)

    RONG ZHOU; XIAO Bo SU; QI WEI ZIIANG; QI YI ZENG; BING ZHU; CHU Yu ZHANG; Hou Bo WU; ZAO HE WU; SI TANG GONG

    2007-01-01

    Human adenovirus type 3 (HAdV-3) is widely prevalent all over the world, especially in Asia. The objective of this study is to carry out complete genomic DNA sequencing and the phylogenetic analysis for two strains (Guangzhou01 and Guangzhou02) of HAdV-3 wild virus isolated from South China. Nasopharyngeal secretion aspirate specimens of sick children were inoculated into HEp-2 and HeLa culture tubes, and the cultures were identified by neutralization assay with type-specific reference rabbit antisermn. Type-specific primers were also utilized to confirm the serotype. The restriction fragments of HAdV genome DNA were cloned into pBlueScript SK ( + ) vectors and sequenced, and the 5' and 3'ends of the linear HAdV-3 genome were directly sequenced with double purified genomic DNA as templates. General features of the HAdV-3 genome sequences were explored by using several bio-software.Phylogenetic analysis was done with MEGA 3.0 software. The genomic sequences of Guangzhou01 and Guangzhou02 possess the same 4 early regions and 5 late regions and have 39 ceding sequences and two RNA coding sequences. Other non-ceding regions are conservative. Inverted repeats and palindromes were identified in the genome sequences. The genomes of group B human adenovirus as well as HAdV-3have close phylogenetic relationship with that of chimpanzee adenovirus type 21. The genomie lengths of these two isolated strains are 35 273 bp and 35 269 bp, respectively. The phylogenetie analysis showed that HAdV-B species has some relationship with eertain types of chimpanzee adenovirus.

  9. Survey and analysis of simple sequence repeats (SSRs) in three genomes of Candida species.

    Science.gov (United States)

    Jia, Dongmei

    2016-06-15

    Simple sequence repeats (SSRs) or microsatellites, which composed of tandem repeated short units of 1-6 bp, have been paying attention continuously. Here, the distribution, composition and polymorphism of microsatellites and compound microsatellites were analyzed in three available genomes of Candida species (Candida dubliniensis, Candida glabrata and Candida orthopsilosis). The results show that there were 118,047, 66,259 and 61,119 microsatellites in genomes of C. dubliniensis, C. glabrata and C. orthopsilosis, respectively. The SSRs covered more than 1/3 length of genomes in the three species. The microsatellites, which just consist of bases A and (or) T, such as (A)n, (T)n, (AT)n, (TA)n, (AAT)n, (TAA)n, (TTA)n, (ATA)n, (ATT)n and (TAT)n, were predominant in the three genomes. The length of microsatellites was focused on 6 bp and 9 bp either in the three genomes or in its coding sequences. What's more, the relative abundance (19.89/kbp) and relative density (167.87 bp/kbp) of SSRs in sequence of mitochondrion of C. glabrata were significantly great than that in any one of genomes or chromosomes of the three species. In addition, the distance between any two adjacent microsatellites was an important factor to influence the formation of compound microsatellites. The analysis may be helpful for further studying the roles of microsatellites in genomes' origination, organization and evolution of Candida species.

  10. Microsatellite analysis in the genome of Acanthaceae: An in silico approach

    Directory of Open Access Journals (Sweden)

    Priyadharsini Kaliswamy

    2015-01-01

    Full Text Available Background: Acanthaceae is one of the advanced and specialized families with conventionally used medicinal plants. Simple sequence repeats (SSRs play a major role as molecular markers for genome analysis and plant breeding. The microsatellites existing in the complete genome sequences would help to attain a direct role in the genome organization, recombination, gene regulation, quantitative genetic variation, and evolution of genes. Objective: The current study reports the frequency of microsatellites and appropriate markers for the Acanthaceae family genome sequences. Materials and Methods: The whole nucleotide sequences of Acanthaceae species were obtained from National Center for Biotechnology Information database and screened for the presence of SSRs. SSR Locator tool was used to predict the microsatellites and inbuilt Primer3 module was used for primer designing. Results: Totally 110 repeats from 108 sequences of Acanthaceae family plant genomes were identified, and the occurrence of dinucleotide repeats was found to be abundant in the genome sequences. The essential amino acid isoleucine was found rich in all the sequences. We also designed the SSR-based primers/markers for 59 sequences of this family that contains microsatellite repeats in their genome. Conclusion: The identified microsatellites and primers might be useful for breeding and genetic studies of plants that belong to Acanthaceae family in the future.

  11. Analysis of microsatellite markers in the genome of the plant pathogen Ceratocystis fimbriata.

    Science.gov (United States)

    Simpson, Melissa C; Wilken, P Markus; Coetzee, Martin P A; Wingfield, Michael J; Wingfield, Brenda D

    2013-01-01

    Ceratocystis fimbriata sensu lato represents a complex of cryptic and commonly plant pathogenic species that are morphologically similar. Species in this complex have been described using morphological characteristics, intersterility tests and phylogenetics. Microsatellite markers have been useful to study the population structure and origin of some species in the complex. In this study we sequenced the genome of C. fimbriata. This provided an opportunity to mine the genome for microsatellites, to develop new microsatellite markers, and map previously developed markers onto the genome. Over 6000 microsatellites were identified in the genome and their abundance and distribution was determined. Ceratocystis fimbriata has a medium level of microsatellite density and slightly smaller genome when compared with other fungi for which similar microsatellite analyses have been performed. This is the first report of a microsatellite analysis conducted on a genome sequence of a fungal species in the order Microascales. Forty-seven microsatellite markers have been published for population genetic studies, of which 35 could be mapped onto the C. fimbriata genome sequence. We developed an additional ten microsatellite markers within putative genes to differentiate between species in the C. fimbriata s.l. complex. These markers were used to distinguish between 12 species in the complex.

  12. [Homologous simple sequence repeats (SSRs) analysis in tetraploid (AD1) and diploid (A₂, D₅) genomes of Gossypium].

    Science.gov (United States)

    Gaofei, Sun; Shoupu, He; Zhaoe, Pan; Xiongming, Du

    2015-02-01

    Simple sequence repeats (SSRs)are a class of repetitive DNA sequences, which are commonly used for genome analysis. Comparison of the homologous SSRs among different genomes is helpful to understand the evolutionary process in relative species. In this study, SSR scanning was performed to investigate their distribution and length variation among the genomes of G. raimondii (D₅), G. arboretum (A₂) and G. hirsutum (AD₁). The results demonstrated that the distribution of SSRs in A genome was very similar with that in D genome, while the length variation of homologous SSRs between A and AD genome was more conserved than that between D and AD genome. Compared with SSRs in AD genome, the number of SSRs with longer motif length in A genome was about five times of those with shorter motif length, while it was about three times in D genome. This implied that the length variation rates of homologous SSRs between diploid cotton and tetraploid cotton were different during the parallel evolution due to the subgenome fusion, and the motif length of most SSRs in tetraoploid genome tended to become shorter than homologous SSRs in diploid genome during the process of evolution. This study comprehensively compared the SSRs in three cotton genomes and revealed the significant difference among them, providing a foundation for further evolutionary study of Gossypium genome.

  13. [Incorporation of the Hazard Analysis and Critical Control Point system (HACCP) in food legislation].

    Science.gov (United States)

    Castellanos Rey, Liliana C; Villamil Jiménez, Luis C; Romero Prada, Jaime R

    2004-01-01

    The Hazard Analysis and Critical Control Point system (HACCP), recommended by different international organizations as the Codex Alimentarius Commission, the World Trade Organization (WTO), the International Office of Epizootics (OIE) and the International Convention for Vegetables Protection (ICPV) amongst others, contributes to ensuring the innocuity of food along the agro-alimentary chain and requires of Good Manufacturing Practices (GMP) for its implementation, GMP's which are legislated in most countries. Since 1997, Colombia has set rules and legislation for application of HACCP system in agreement with international standards. This paper discusses the potential and difficulties of the legislation enforcement and suggests some policy implications towards food safety.

  14. Thermoeconomic analysis incorporating the concept of ecological efficiency; Analise termoeconomica incorporando o conceito de eficiencia ecologica

    Energy Technology Data Exchange (ETDEWEB)

    Villela, I.A.C. [University of Sao Paulo (EEL/USP), Lorena, SP (Brazil). Coll. of Engineering. Dept. of Environment Science ], Email: iraides@debas.eel.usp.br; Silveira, J.L. [Universidade Estadual Paulista (UNESP), Guaratingueta, SP (Brazil). Dept. of Energy], Email: joseluz@feg.unesp.br

    2009-07-01

    A comparative analysis of the pollution resulting from the natural gas combustion for a thermoelectric power plant (230 MW) by utilizing the combined cycle (CC) and recovering kettle, with no burning and with fuel complementary burning. Initially the CO{sub 2}, SO{sub 2}, NO{sub x} and Particulate Matter emission levels are determined. Later, the thermoelectric power plant environmental impact is evaluated through the utilization of a methodology based on the ecological efficiency ({epsilon}), parameter that integrates in a single coefficient the aspects that define the environmental impact intensity, with basis on the fuel utilized, combustion technology, pollution index and power plant thermodynamic efficiency. The objective is to apply the concept of ecological efficiency in a thermoeconomic analysis method which utilizes function diagram and allows the estimation of the electricity production cost. It is concluded that the use of a system with no complementary burning is better than the one with complementary burning, both from the ecological and the economical points of view. (author)

  15. Stoichiometric network reconstruction and analysis of yeast sphingolipid metabolism incorporating different states of hydroxylation.

    Science.gov (United States)

    Kavun Ozbayraktar, Fatma Betul; Ulgen, Kutlu O

    2011-04-01

    The first elaborate metabolic model of Saccharomyces cerevisiae sphingolipid metabolism was reconstructed in silico. The model considers five different states of sphingolipid hydroxylation, rendering it unique among other models. It is aimed to clarify the significance of hydroxylation on sphingolipids and hence to interpret the preferences of the cell between different metabolic pathway branches under different stress conditions. The newly constructed model was validated by single, double and triple gene deletions with experimentally verified phenotypes. Calcium sensitivity and deletion mutations that may suppress calcium sensitivity were examined by CSG1 and CSG2 related deletions. The model enabled the analysis of complex sphingolipid content of the plasma membrane coupled with diacylglycerol and phosphatidic acid biosynthesis and ATP consumption in in silico cell. The flux data belonging to these critically important key metabolites are integrated with the fact of phytoceramide induced cell death to propose novel potential drug targets for cancer therapeutics. In conclusion, we propose that IPT1, GDA1, CSG and AUR1 gene deletions may be novel candidates of drug targets for cancer therapy according to the results of flux balance and variability analyses coupled with robustness analysis.

  16. Extending the Virtual Solar Observatory (VSO) to Incorporate Data Analysis Capabilities (III)

    Science.gov (United States)

    Csillaghy, A.; Etesi, L.; Dennis, B.; Zarro, D.; Schwartz, R.; Tolbert, K.

    2008-12-01

    We will present a progress report on our activities to extend the data analysis capabilities of the VSO. Our efforts to date have focused on three areas: 1. Extending the data retrieval capabilities by developing a centralized data processing server. The server is built with Java, IDL (Interactive Data Language), and the SSW (Solar SoftWare) package with all SSW-related instrument libraries and required calibration data. When a user requests VSO data that requires preprocessing, the data are transparently sent to the server, processed, and returned to the user's IDL session for viewing and analysis. It is possible to have any Java or IDL client connect to the server. An IDL prototype for preparing and calibrating SOHO/EIT data wll be demonstrated. 2. Improving the solar data search in SHOW SYNOP, a graphical user tool connected to VSO in IDL. We introduce the Java-IDL interface that allows a flexible dynamic, and extendable way of searching the VSO, where all the communication with VSO are managed dynamically by standard Java tools. 3. Improving image overlay capability to support coregistration of solar disk observations obtained from different orbital view angles, position angles, and distances - such as from the twin STEREO spacecraft.

  17. Nonlinear Dynamic Analysis of Multi-component Mooring Lines Incorporating Line-seabed Interaction

    Directory of Open Access Journals (Sweden)

    V.J. Kurian

    2013-07-01

    Full Text Available In this study, a deterministic approach for the dynamic analysis of a multi-component mooring line was formulated. The floater motion responses were considered as the mooring line upper boundary conditions while the anchored point was considered as pinned. Lumped parameter approach was adopted for the mooring line modelling. The forces considered were the submerged weights of mooring/attachment, physical/added inertia, line tension, fluid/line relative drag forces and line/seabed reactive forces. The latter interactions were modelled assuming that the mooring line rested on an elastic dissipative foundation. An iterative procedure for the dynamic analysis was developed and results for various mooring lines partially lying on different soils were obtained and validated by conducting a comparative study against published results. Good agreement between numerical and published experimental results was achieved. The contribution of the soil characteristics of the seabed to the dynamic behaviour of mooring line was investigated for different types of soil and reported.

  18. Invited review: Genomic analysis of data from physiological studies.

    Science.gov (United States)

    Garrick, D J; Baumgard, L H; Neibergs, H L

    2012-02-01

    Physiology deals with the functions of living organisms and their systems, and its scientific endeavors can be viewed as having temporally occurred in 3 phases. The first phase of physiology studies focused on determining the functions of particular organs and tissues and their functional differences according to physiological status. The second phase of studies focused on characterizing differences in these functions according to the environment, or productivity. The third phase of studies focuses on determining the physiological causes of differences in productivity. Distinguishing cause from effect in physiological systems of inter-related processes is problematic, such that science has struggled to identify the root physiological mechanisms and their role in the network of genes leading to differences in productivity. Genomics is the study of the entire genome and provides powerful new tools that will accelerate third-phase discoveries of causal physiological processes. That research exploits information on DNA polymorphisms known as markers, complete DNA sequence, RNA sequence, and RNA expression in particular tissues at specific life stages. Physiologists can determine the genetic cause of mutant animals, identify genetic differences between cases and controls, and identify genes responsible for differences in performance between average and above-average animals. In some species, physiologists can leverage genomic data being used to predict genetic merit in elite seedstock populations, as a starting point to identify genes that will then motivate detailed physiological studies in the organs or tissues and stages of life in which those genes are expressed. Such work will increase our knowledge of biology and may lead to novel approaches to manipulate animal performance.

  19. Analysis of Acorus calamus chloroplast genome and its phylogenetic implications.

    Science.gov (United States)

    Goremykin, Vadim V; Holland, Barbara; Hirsch-Ernst, Karen I; Hellwig, Frank H

    2005-09-01

    Determining the phylogenetic relationships among the major lines of angiosperms is a long-standing problem, yet the uncertainty as to the phylogenetic affinity of these lines persists. While a number of studies have suggested that the ANITA (Amborella-Nymphaeales-Illiciales-Trimeniales-Aristolochiales) grade is basal within angiosperms, studies of complete chloroplast genome sequences also suggested an alternative tree, wherein the line leading to the grasses branches first among the angiosperms. To improve taxon sampling in the existing chloroplast genome data, we sequenced the chloroplast genome of the monocot Acorus calamus. We generated a concatenated alignment (89,436 positions for 15 taxa), encompassing almost all sequences usable for phylogeny reconstruction within spermatophytes. The data still contain support for both the ANITA-basal and grasses-basal hypotheses. Using simulations we can show that were the ANITA-basal hypothesis true, parsimony (and distance-based methods with many models) would be expected to fail to recover it. The self-evident explanation for this failure appears to be a long-branch attraction (LBA) between the clade of grasses and the out-group. However, this LBA cannot explain the discrepancies observed between tree topology recovered using the maximum likelihood (ML) method and the topologies recovered using the parsimony and distance-based methods when grasses are deleted. Furthermore, the fact that neither maximum parsimony nor distance methods consistently recover the ML tree, when according to the simulations they would be expected to, when the out-group (Pinus) is deleted, suggests that either the generating tree is not correct or the best symmetric model is misspecified (or both). We demonstrate that the tree recovered under ML is extremely sensitive to model specification and that the best symmetric model is misspecified. Hence, we remain agnostic regarding phylogenetic relationships among basal angiosperm lineages.

  20. The complexity of Rhipicephalus (Boophilus microplus genome characterised through detailed analysis of two BAC clones

    Directory of Open Access Journals (Sweden)

    Valle Manuel

    2011-07-01

    Full Text Available Abstract Background Rhipicephalus (Boophilus microplus (Rmi a major cattle ectoparasite and tick borne disease vector, impacts on animal welfare and industry productivity. In arthropod research there is an absence of a complete Chelicerate genome, which includes ticks, mites, spiders, scorpions and crustaceans. Model arthropod genomes such as Drosophila and Anopheles are too taxonomically distant for a reference in tick genomic sequence analysis. This study focuses on the de-novo assembly of two R. microplus BAC sequences from the understudied R microplus genome. Based on available R. microplus sequenced resources and comparative analysis, tick genomic structure and functional predictions identify complex gene structures and genomic targets expressed during tick-cattle interaction. Results In our BAC analyses we have assembled, using the correct positioning of BAC end sequences and transcript sequences, two challenging genomic regions. Cot DNA fractions compared to the BAC sequences confirmed a highly repetitive BAC sequence BM-012-E08 and a low repetitive BAC sequence BM-005-G14 which was gene rich and contained short interspersed elements (SINEs. Based directly on the BAC and Cot data comparisons, the genome wide frequency of the SINE Ruka element was estimated. Using a conservative approach to the assembly of the highly repetitive BM-012-E08, the sequence was de-convoluted into three repeat units, each unit containing an 18S, 5.8S and 28S ribosomal RNA (rRNA encoding gene sequence (rDNA, related internal transcribed spacer and complex intergenic region. In the low repetitive BM-005-G14, a novel gene complex was found between to 2 genes on the same strand. Nested in the second intron of a large 9 Kb papilin gene was a helicase gene. This helicase overlapped in two exonic regions with the papilin. Both these genes were shown expressed in different tick life stage important in ectoparasite interaction with the host. Tick specific sequence

  1. Connecting Anxiety and Genomic Copy Number Variation: A Genome-Wide Analysis in CD-1 Mice.

    Directory of Open Access Journals (Sweden)

    Julia Brenndörfer

    Full Text Available Genomic copy number variants (CNVs have been implicated in multiple psychiatric disorders, but not much is known about their influence on anxiety disorders specifically. Using next-generation sequencing (NGS and two additional array-based genotyping approaches, we detected CNVs in a mouse model consisting of two inbred mouse lines showing high (HAB and low (LAB anxiety-related behavior, respectively. An influence of CNVs on gene expression in the central (CeA and basolateral (BLA amygdala, paraventricular nucleus (PVN, and cingulate cortex (Cg was shown by a two-proportion Z-test (p = 1.6 x 10-31, with a positive correlation in the CeA (p = 0.0062, PVN (p = 0.0046 and Cg (p = 0.0114, indicating a contribution of CNVs to the genetic predisposition to trait anxiety in the specific context of HAB/LAB mice. In order to confirm anxiety-relevant CNVs and corresponding genes in a second mouse model, we further examined CD-1 outbred mice. We revealed the distribution of CNVs by genotyping 64 CD 1 individuals using a high-density genotyping array (Jackson Laboratory. 78 genes within those CNVs were identified to show nominally significant association (48 genes, or a statistical trend in their association (30 genes with the time animals spent on the open arms of the elevated plus-maze (EPM. Fifteen of them were considered promising candidate genes of anxiety-related behavior as we could show a significant overlap (permutation test, p = 0.0051 with genes within HAB/LAB CNVs. Thus, here we provide what is to our knowledge the first extensive catalogue of CNVs in CD-1 mice and potential corresponding candidate genes linked to anxiety-related behavior in mice.

  2. Incorporation of Socio-Economic Features' Ranking in Multicriteria Analysis Based on Ecosystem Services for Marine Protected Area Planning.

    Directory of Open Access Journals (Sweden)

    Michelle E Portman

    Full Text Available Developed decades ago for spatial choice problems related to zoning in the urban planning field, multicriteria analysis (MCA has more recently been applied to environmental conflicts and presented in several documented cases for the creation of protected area management plans. Its application is considered here for the development of zoning as part of a proposed marine protected area management plan. The case study incorporates specially-explicit conservation features while considering stakeholder preferences, expert opinion and characteristics of data quality. It involves the weighting of criteria using a modified analytical hierarchy process. Experts ranked physical attributes which include socio-economically valued physical features. The parameters used for the ranking of (physical attributes important for socio-economic reasons are derived from the field of ecosystem services assessment. Inclusion of these feature values results in protection that emphasizes those areas closest to shore, most likely because of accessibility and familiarity parameters and because of data biases. Therefore, other spatial conservation prioritization methods should be considered to supplement the MCA and efforts should be made to improve data about ecosystem service values farther from shore. Otherwise, the MCA method allows incorporation of expert and stakeholder preferences and ecosystem services values while maintaining the advantages of simplicity and clarity.

  3. Incorporating natural capital into economy-wide impact analysis: a case study from Alberta.

    Science.gov (United States)

    Patriquin, Mike N; Alavalapati, Janaki R R; Adamowicz, Wiktor L; White, William A

    2003-01-01

    Traditionally, decision-makers have relied on economic impact estimates derived from conventional economy-wide models. Conventional models lack the environmental linkages necessary for examining environmental stewardship and economic sustainability, and in particular the ability to assess the impact of policies on natural capital. This study investigates environmentally extended economic impact estimation on a regional scale using a case study region in the province of Alberta known as the Foothills Model Forest (FMF). Conventional economic impact models are environmentally extended in pursuit of enhancing policy analysis and local decision-making. It is found that the flexibility of the computable general equilibrium (CGE) modeling approach offers potential for environmental extension, with a solid grounding in economic theory. The CGE approach may be the tool of the future for more complete integrated environment and economic impact assessment.

  4. New methodology developed for the differential scanning calorimetry analysis of polymeric matrixes incorporating phase change materials

    Science.gov (United States)

    Barreneche, Camila; Solé, Aran; Miró, Laia; Martorell, Ingrid; Inés Fernández, A.; Cabeza, Luisa F.

    2012-08-01

    Nowadays, thermal comfort needs in buildings have led to an increase in energy consumption of the residential and service sectors. For this reason, thermal energy storage is shown as an alternative to achieve reduction of this high consumption. Phase change materials (PCM) have been studied to store energy due to their high storage capacity. A polymeric material capable of macroencapsulating PCM was developed by the authors of this paper. However, difficulties were found while measuring the thermal properties of these materials by differential scanning calorimetry (DSC). The polymeric matrix interferes in the detection of PCM properties by DSC. To remove this interfering effect, a new methodology which replaces the conventional empty crucible used as a reference in the DSC analysis by crucibles composed of the polymeric matrix was developed. Thus, a clear signal from the PCM is obtained by subtracting the new full crucible signal from the sample signal.

  5. Genome-Scale Analysis of Cell-Specific Regulatory Codes Using Nuclear Enzymes.

    Science.gov (United States)

    Baek, Songjoon; Sung, Myong-Hee

    2016-01-01

    High-throughput sequencing technologies have made it possible for biologists to generate genome-wide profiles of chromatin features at the nucleotide resolution. Enzymes such as nucleases or transposes have been instrumental as a chromatin-probing agent due to their ability to target accessible chromatin for cleavage or insertion. On the scale of a few hundred base pairs, preferential action of the nuclear enzymes on accessible chromatin allows mapping of cell state-specific accessibility in vivo. Such accessible regions contain functionally important regulatory sites, including promoters and enhancers, which undergo active remodeling for cells adapting in a dynamic environment. DNase-seq and the more recent ATAC-seq are two assays that are gaining popularity. Deep sequencing of DNA libraries from these assays, termed genomic footprinting, has been proposed to enable the comprehensive construction of protein occupancy profiles over the genome at the nucleotide level. Recent studies have discovered limitations of genomic footprinting which reduce the scope of detectable proteins. In addition, the identification of putative factors that bind to the observed footprints remains challenging. Despite these caveats, the methodology still presents significant advantages over alternative techniques such as ChIP-seq or FAIRE-seq. Here we describe computational approaches and tools for analysis of chromatin accessibility and genomic footprinting. Proper experimental design and assay-specific data analysis ensure the detection sensitivity and maximize retrievable information. The enzyme-based chromatin profiling approaches represent a powerful and evolving methodology which facilitates our understanding of how the genome is regulated.

  6. FadE: whole genome methylation analysis for multiple sequencing platforms.

    Science.gov (United States)

    Souaiaia, Tade; Zhang, Zheng; Chen, Ting

    2013-01-01

    DNA methylation plays a central role in genomic regulation and disease. Sodium bisulfite treatment (SBT) causes unmethylated cytosines to be sequenced as thymine, which allows methylation levels to reflected in the number of 'C'-'C' alignments covering reference cytosines. Di-base color reads produced by lifetech's SOLiD sequencer provide unreliable results when translated to bases because single sequencing errors effect the downstream sequence. We describe FadE, an algorithm to accurately determine genome-wide methylation rates directly in color or nucleotide space. FadE uses SBT unmethylated and untreated data to determine background error rates and incorporate them into a model which uses Newton-Raphson optimization to estimate the methylation rate and provide a credible interval describing its distribution at every reference cytosine. We sequenced two slides of human fibroblast cell-line bisulfite-converted fragment library with the SOLiD sequencer to investigate genome-wide methylation levels. FadE reported widespread differences in methylation levels across CpG islands and a large number of differentially methylated regions adjacent to genes which compares favorably to the results of an investigation on the same cell-line using nucleotide-space reads at higher coverage levels, suggesting that FadE is an accurate method to estimate genome-wide methylation with color or nucleotide reads. http://code.google.com/p/fade/.

  7. DivStat: a user-friendly tool for single nucleotide polymorphism analysis of genomic diversity.

    Directory of Open Access Journals (Sweden)

    Inês Soares

    Full Text Available Recent developments have led to an enormous increase of publicly available large genomic data, including complete genomes. The 1000 Genomes Project was a major contributor, releasing the results of sequencing a large number of individual genomes, and allowing for a myriad of large scale studies on human genetic variation. However, the tools currently available are insufficient when the goal concerns some analyses of data sets encompassing more than hundreds of base pairs and when considering haplotype sequences of single nucleotide polymorphisms (SNPs. Here, we present a new and potent tool to deal with large data sets allowing the computation of a variety of summary statistics of population genetic data, increasing the speed of data analysis.

  8. Genomic analysis of thermophilic Bacillus coagulans strains: efficient producers for platform bio-chemicals.

    Science.gov (United States)

    Su, Fei; Xu, Ping

    2014-01-29

    Microbial strains with high substrate efficiency and excellent environmental tolerance are urgently needed for the production of platform bio-chemicals. Bacillus coagulans has these merits; however, little genetic information is available about this species. Here, we determined the genome sequences of five B. coagulans strains, and used a comparative genomic approach to reconstruct the central carbon metabolism of this species to explain their fermentation features. A novel xylose isomerase in the xylose utilization pathway was identified in these strains. Based on a genome-wide positive selection scan, the selection pressure on amino acid metabolism may have played a significant role in the thermal adaptation. We also researched the immune systems of B. coagulans strains, which provide them with acquired resistance to phages and mobile genetic elements. Our genomic analysis provides comprehensive insights into the genetic characteristics of B. coagulans and paves the way for improving and extending the uses of this species.

  9. Comparative Analysis on Genomes from Oryza alta and Oryza latifolia by C0t-1 DNA

    Institute of Scientific and Technical Information of China (English)

    WANG De-bin; WANG Yang; WU Qi; ZHAO Hou-ming; LI Gang; QIN Rui; WANG Chun-tai; LIU Hong

    2010-01-01

    In order to reveal the origin and evolutionary relationship between two CCDD genome species, Oryza alta and Oryza latifolia, fluorescence in situ hybridization (FISH) was adopted to analyze the genomes of the two species with C0t-1 DNA from O. alta as a probe. Karyotype was also comparatively analyzed between O. alta and O. latifolia based on their similar band patterns of the hybridization signals. There were a high homology and close relationship between O. alta and O. latifolia, however, the distinction between the hybridization signals was also clear. C0t-1 DNA was proved to be species- and genome type-specific. It is suggested that C0t-1 DNA-FISH could be more efficient to analyze the genomic relationship between different species. According to the comparative analysis of highly and moderately repetitive DNA sequences between the two allotetraploidy species, O. alta and O. latifolia, the possible origin and evolutionary mechanism of allotetraploidy of Oryza were discussed.

  10. Genome Analysis of Streptococcus pyogenes Associated with Pharyngitis and Skin Infections

    Science.gov (United States)

    Ibrahim, Joe; Eisen, Jonathan A.; Jospin, Guillaume; Coil, David A.; Khazen, Georges

    2016-01-01

    Streptococcus pyogenes is a very important human pathogen, commonly associated with skin or throat infections but can also cause life-threatening situations including sepsis, streptococcal toxic shock syndrome, and necrotizing fasciitis. Various studies involving typing and molecular characterization of S. pyogenes have been published to date; however next-generation sequencing (NGS) studies provide a comprehensive collection of an organism’s genetic variation. In this study, the genomes of nine S. pyogenes isolates associated with pharyngitis and skin infection were sequenced and studied for the presence of virulence genes, resistance elements, prophages, genomic recombination, and other genomic features. Additionally, a comparative phylogenetic analysis of the isolates with global clones highlighted their possible evolutionary lineage and their site of infection. The genomes were found to also house a multitude of features including gene regulation systems, virulence factors and antimicrobial resistance mechanisms. PMID:27977735

  11. Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

    KAUST Repository

    Doan, Ryan

    2012-02-17

    BACKGROUND: The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. RESULTS: Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse\\'s genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. CONCLUSIONS: This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  12. Comparative genomic analysis of phylogenetically closely related Hydrogenobaculum sp. isolates from Yellowstone National Park.

    Science.gov (United States)

    Romano, Christine; D'Imperio, Seth; Woyke, Tanja; Mavromatis, Konstantinos; Lasken, Roger; Shock, Everett L; McDermott, Timothy R

    2013-05-01

    We describe the complete genome sequences of four closely related Hydrogenobaculum sp. isolates (≥ 99.7% 16S rRNA gene identity) that were isolated from the outflow channel of Dragon Spring (DS), Norris Geyser Basin, in Yellowstone National Park (YNP), WY. The genomes range in size from 1,552,607 to 1,552,931 bp, contain 1,667 to 1,676 predicted genes, and are highly syntenic. There are subtle differences among the DS isolates, which as a group are different from Hydrogenobaculum sp. strain Y04AAS1 that was previously isolated from a geographically distinct YNP geothermal feature. Genes unique to the DS genomes encode arsenite [As(III)] oxidation, NADH-ubiquinone-plastoquinone (complex I), NADH-ubiquinone oxidoreductase chain, a DNA photolyase, and elements of a type II secretion system. Functions unique to strain Y04AAS1 include thiosulfate metabolism, nitrate respiration, and mercury resistance determinants. DS genomes contain seven CRISPR loci that are almost identical but are different from the single CRISPR locus in strain Y04AAS1. Other differences between the DS and Y04AAS1 genomes include average nucleotide identity (94.764%) and percentage conserved DNA (80.552%). Approximately half of the genes unique to Y04AAS1 are predicted to have been acquired via horizontal gene transfer. Fragment recruitment analysis and marker gene searches demonstrated that the DS metagenome was more similar to the DS genomes than to the Y04AAS1 genome, but that the DS community is likely comprised of a continuum of Hydrogenobaculum genotypes that span from the DS genomes described here to an Y04AAS1-like organism, which appears to represent a distinct ecotype relative to the DS genomes characterized.

  13. The Revolution in Viral Genomics as Exemplified by the Bioinformatic Analysis of Human Adenoviruses

    Directory of Open Access Journals (Sweden)

    Sarah Torres

    2010-06-01

    Full Text Available Over the past 30 years, genomic and bioinformatic analysis of human adenoviruses has been achieved using a variety of DNA sequencing methods; initially with the use of restriction enzymes and more currently with the use of the GS FLX pyrosequencing technology. Following the conception of DNA sequencing in the 1970s, analysis of adenoviruses has evolved from 100 base pair mRNA fragments to entire genomes. Comparative genomics of adenoviruses made its debut in 1984 when nucleotides and amino acids of coding sequences within the hexon genes of two human adenoviruses (HAdV, HAdV–C2 and HAdV–C5, were compared and analyzed. It was determined that there were three different zones (1-393, 394-1410, 1411-2910 within the hexon gene, of which HAdV–C2 and HAdV–C5 shared zones 1 and 3 with 95% and 89.5% nucleotide identity, respectively. In 1992, HAdV-C5 became the first adenovirus genome to be fully sequenced using the Sanger method. Over the next seven years, whole genome analysis and characterization was completed using bioinformatic tools such as blastn, tblastx, ClustalV and FASTA, in order to determine key proteins in species HAdV-A through HAdV-F. The bioinformatic revolution was initiated with the introduction of a novel species, HAdV-G, that was typed and named by the use of whole genome sequencing and phylogenetics as opposed to traditional serology. HAdV bioinformatics will continue to advance as the latest sequencing technology enables scientists to add to and expand the resource databases. As a result of these advancements, how novel HAdVs are typed has changed. Bioinformatic analysis has become the revolutionary tool that has significantly accelerated the in-depth study of HAdV microevolution through comparative genomics.

  14. Analysis of individual differences in radiosensitivity using genome editing.

    Science.gov (United States)

    Matsuura, S; Royba, E; Akutsu, S N; Yanagihara, H; Ochiai, H; Kudo, Y; Tashiro, S; Miyamoto, T

    2016-06-01

    Current standards for radiological protection of the public have been uniformly established. However, individual differences in radiosensitivity are suggested to exist in human populations, which could be caused by nucleotide variants of DNA repair genes. In order to verify if such genetic variants are responsible for individual differences in radiosensitivity, they could be introduced into cultured human cells for evaluation. This strategy would make it possible to analyse the effect of candidate nucleotide variants on individual radiosensitivity, independent of the diverse genetic background. However, efficient gene targeting in cultured human cells is difficult due to the low frequency of homologous recombination (HR) repair. The development of artificial nucleases has enabled efficient HR-mediated genome editing to be performed in cultured human cells. A novel genome editing strategy, 'transcription activator-like effector nuclease (TALEN)-mediated two-step single base pair editing', has been developed, and this was used to introduce a nucleotide variant associated with a chromosomal instability syndrome bi-allelically into cultured human cells to demonstrate that it is the causative mutation. It is proposed that this editing technique will be useful to investigate individual radiosensitivity.

  15. Comparative Genomic Analysis Reveals Ecological Differentiation in the Genus Carnobacterium

    Science.gov (United States)

    Iskandar, Christelle F.; Borges, Frédéric; Taminiau, Bernard; Daube, Georges; Zagorec, Monique; Remenant, Benoît; Leisner, Jørgen J.; Hansen, Martin A.; Sørensen, Søren J.; Mangavel, Cécile; Cailliez-Grimal, Catherine; Revol-Junelles, Anne-Marie

    2017-01-01

    Lactic acid bacteria (LAB) differ in their ability to colonize food and animal-associated habitats: while some species are specialized and colonize a limited number of habitats, other are generalist and are able to colonize multiple animal-linked habitats. In the current study, Carnobacterium was used as a model genus to elucidate the genetic basis of these colonization differences. Analyses of 16S rRNA gene meta-barcoding data showed that C. maltaromaticum followed by C. divergens are the most prevalent species in foods derived from animals (meat, fish, dairy products), and in the gut. According to phylogenetic analyses, these two animal-adapted species belong to one of two deeply branched lineages. The second lineage contains species isolated from habitats where contact with animal is rare. Genome analyses revealed that members of the animal-adapted lineage harbor a larger secretome than members of the other lineage. The predicted cell-surface proteome is highly diversified in C. maltaromaticum and C. divergens with genes involved in adaptation to the animal milieu such as those encoding biopolymer hydrolytic enzymes, a heme uptake system, and biopolymer-binding adhesins. These species also exhibit genes for gut adaptation and respiration. In contrast, Carnobacterium species belonging to the second lineage encode a poorly diversified cell-surface proteome, lack genes for gut adaptation and are unable to respire. These results shed light on the important genomics traits required for adaptation to animal-linked habitats in generalist Carnobacterium. PMID:28337181

  16. Genomic analysis of extra-intestinal pathogenic Escherichia coli urosepsis.

    Science.gov (United States)

    McNally, A; Alhashash, F; Collins, M; Alqasim, A; Paszckiewicz, K; Weston, V; Diggle, M

    2013-08-01

    Urosepsis is a bacteraemia infection caused by an organism previously causing an infection in the urinary tract of a patient, a diagnosis which has been classically confirmed by culture of the same species of bacteria from both blood and urine samples. Given the new insights afforded by sequencing technologies into the complicated population structures of infectious agents affecting humans, we sought to investigate urosepsis by comparing the genome sequences of blood and urine isolates of Escherichia coli from five patients with urosepsis. The results confirm the classical urosepsis hypothesis in four of the five cases, but also show the complex nature of extra-intestinal E. coli infection in the fifth case, where three distinct strains caused two distinct infections. Additionally, we show there is little to no variation in the bacterial genome as it progressed from urine to blood, and also present a minimal set of virulence genes required for bacteraemia in E. coli based on gene association. These suggest that most E. coli have the genetic propensity to cause bacteraemia.

  17. Genome-wide identification and phylogenetic analysis of the ERF gene family in cucumbers

    Directory of Open Access Journals (Sweden)

    Lifang Hu

    2011-01-01

    Full Text Available Members of the ERF transcription-factor family participate in a number of biological processes, viz., responses to hormones, adaptation to biotic and abiotic stress, metabolism regulation, beneficial symbiotic interactions, cell differentiation and developmental processes. So far, no tissue-expression profile of any cucumber ERF protein has been reported in detail. Recent completion of the cucumber full-genome sequence has come to facilitate, not only genome-wide analysis of ERF family members in cucumbers themselves, but also a comparative analysis with those in Arabidopsis and rice. In this study, 103 hypothetical ERF family genes in the cucumber genome were identified, phylogenetic analysis indicating their classification into 10 groups, designated I to X. Motif analysis further indicated that most of the conserved motifs outside the AP2/ERF domain, are selectively distributed among the specific clades in the phylogenetic tree. From chromosomal localization and genome distribution analysis, it appears that tandem-duplication may have contributed to CsERF gene expansion. Intron/exon structure analysis indicated that a few CsERFs still conserved the former intron-position patterns existent in the common ancestor of monocots and eudicots. Expression analysis revealed the widespread distribution of the cucumber ERF gene family within plant tissues, thereby implying the probability of their performing various roles therein. Furthermore, members of some groups presented mutually similar expression patterns that might be related to their phylogenetic groups.

  18. Chromosomal imbalances in nasopharyngeal carcinoma: a meta-analysis of comparative genomic hybridization results

    Directory of Open Access Journals (Sweden)

    Jin Ping

    2006-01-01

    Full Text Available Abstract Nasopharyngeal carcinoma (NPC is a highly prevalent disease in Southeast Asia and its prevalence is clearly affected by genetic background. Various theories have been suggested for its high incidence in this geographical region but to these days no conclusive explanation has been identified. Chromosomal imbalances identifiable through comparative genomic hybridization may shed some light on common genetic alterations that may be of relevance to the onset and progression of NPC. Review of the literature, however, reveals contradictory results among reported findings possibly related to factors associated with patient selection, stage of disease, differences in methodological details etc. To increase the power of the analysis and attempt to identify commonalities among the reported findings, we performed a meta-analysis of results described in NPC tissues based on chromosomal comparative genomic hybridization (CGH. This meta-analysis revealed consistent patters in chromosomal abnormalities that appeared to cluster in specific "hot spots" along the genome following a stage-dependent progression.

  19. Genome-wide analysis of the synonymous codon usage patterns in apple

    Institute of Scientific and Technical Information of China (English)

    LI Ning; SUN Mei-hong; JIANG Ze-sheng; SHU Huai-rui; ZHANG Shi-zhong

    2016-01-01

    Apple (Malus×domestica) has been proposed as an important woody plant and the major cultivated fruit trees in temperate regions. Apple whole genome sequencing has been completed, which provided an excelent opportunity for genome-wide analysis of the synonymous codon usage patterns. In this study, a multivariate bioinformatics analysis was performed to reveal the characteristics of synonymous codon usage and the main factors affecting codon bias in apple. The neutrality, correspondence, and correlation analyses were performed by CodonW and SPSS (Statistical Product and Service Solu-tions) programs, indicating that the apple genome codon usage patterns were affected by mutational pressure and selective constraint. Meanwhile, coding sequence length and the hydrophobicity of proteins could also inlfuence the codon usage patterns. In short, codon usage pattern analysis and determination of optimal codons has laid an important theoretical basis for genetic engineering, gene prediction and molecular evolution studies in apple.

  20. The diversity of cyanobacterial metabolism: genome analysis of multiple phototrophic microorganisms

    Directory of Open Access Journals (Sweden)

    Beck Christian

    2012-02-01

    Full Text Available Abstract Background Cyanobacteria are among the most abundant organisms on Earth and represent one of the oldest and most widespread clades known in modern phylogenetics. As the only known prokaryotes capable of oxygenic photosynthesis, cyanobacteria are considered to be a promising resource for renewable fuels and natural products. Our efforts to harness the sun's energy using cyanobacteria would greatly benefit from an increased understanding of the genomic diversity across multiple cyanobacterial strains. In this respect, the advent of novel sequencing techniques and the availability of several cyanobacterial genomes offers new opportunities for understanding microbial diversity and metabolic organization and evolution in diverse environments. Results Here, we report a whole genome comparison of multiple phototrophic cyanobacteria. We describe genetic diversity found within cyanobacterial genomes, specifically with respect to metabolic functionality. Our results are based on pair-wise comparison of protein sequences and concomitant construction of clusters of likely ortholog genes. We differentiate between core, shared and unique genes and show that the majority of genes are associated with a single genome. In contrast, genes with metabolic function are strongly overrepresented within the core genome that is common to all considered strains. The analysis of metabolic diversity within core carbon metabolism reveals parts of the metabolic networks that are highly conserved, as well as highly fragmented pathways. Conclusions Our results have direct implications for resource allocation and further sequencing projects. It can be extrapolated that the number of newly identified genes still significantly increases with increasing number of new sequenced genomes. Furthermore, genome analysis of multiple phototrophic strains allows us to obtain a detailed picture of metabolic diversity that can serve as a starting point for biotechnological

  1. An innovative land use regression model incorporating meteorology for exposure analysis.

    Science.gov (United States)

    Su, Jason G; Brauer, Michael; Ainslie, Bruce; Steyn, Douw; Larson, Timothy; Buzzelli, Michael

    2008-02-15

    The advent of spatial analysis and geographic information systems (GIS) has led to studies of chronic exposure and health effects based on the rationale that intra-urban variations in ambient air pollution concentrations are as great as inter-urban differences. Such studies typically rely on local spatial covariates (e.g., traffic, land use type) derived from circular areas (buffers) to predict concentrations/exposures at receptor sites, as a means of averaging the annual net effect of meteorological influences (i.e., wind speed, wind direction and insolation). This is the approach taken in the now popular land use regression (LUR) method. However spatial studies of chronic exposures and temporal studies of acute exposures have not been adequately integrated. This paper presents an innovative LUR method implemented in a GIS environment that reflects both temporal and spatial variability and considers the role of meteorology. The new source area LUR integrates wind speed, wind direction and cloud cover/insolation to estimate hourly nitric oxide (NO) and nitrogen dioxide (NO(2)) concentrations from land use types (i.e., road network, commercial land use) and these concentrations are then used as covariates to regress against NO and NO(2) measurements at various receptor sites across the Vancouver region and compared directly with estimates from a regular LUR. The results show that, when variability in seasonal concentration measurements is present, the source area LUR or SA-LUR model is a better option for concentration estimation.

  2. Incorporating Fuzzy Systems Modeling and Possibility Theory in Hydrogeological Uncertainty Analysis

    Science.gov (United States)

    Faybishenko, B.

    2008-12-01

    Hydrogeological predictions are subject to numerous uncertainties, including the development of conceptual, mathematical, and numerical models, as well as determination of their parameters. Stochastic simulations of hydrogeological systems and the associated uncertainty analysis are usually based on the assumption that the data characterizing spatial and temporal variations of hydrogeological processes are random, and the output uncertainty is quantified using a probability distribution. However, hydrogeological systems are often characterized by imprecise, vague, inconsistent, incomplete or subjective information. One of the modern approaches to modeling and uncertainty quantification of such systems is based on using a combination of statistical and fuzzy-logic uncertainty analyses. The aims of this presentation are to: (1) present evidence of fuzziness in developing conceptual hydrogeological models, and (2) give examples of the integration of the statistical and fuzzy-logic analyses in modeling and assessing both aleatoric uncertainties (e.g., caused by vagueness in assessing the subsurface system heterogeneities of fractured-porous media) and epistemic uncertainties (e.g., caused by the selection of different simulation models) involved in hydrogeological modeling. The author will discuss several case studies illustrating the application of fuzzy modeling for assessing the water balance and water travel time in unsaturated-saturated media. These examples will include the evaluation of associated uncertainties using the main concepts of possibility theory, a comparison between the uncertainty evaluation using probabilistic and possibility theories, and a transformation of the probabilities into possibilities distributions (and vice versa) for modeling hydrogeological processes.

  3. Parametric study on single shot peening by dimensional analysis method incorporated with finite element method

    Institute of Scientific and Technical Information of China (English)

    Xian-Qian Wu; Xi Wang; Yan-Peng Wei; Hong-Wei Song; Chen-Guang Huang

    2012-01-01

    Shot peening is a widely used surface treatment method by generating compressive residual stress near the surface of metallic materials to increase fatigue life and resistance to corrosion fatigue,cracking,etc.Compressive residual stress and dent profile are important factors to evaluate the effectiveness of shot peening process.In this paper,the influence of dimensionless parameters on maximum compressive residual stress and maximum depth of the dent were investigated.Firstly,dimensionless relations of processing parameters that affect the maximum compressive residual stress and the maximum depth of the dent were deduced by dimensional analysis method.Secondly,the influence of each dimensionless parameter on dimensionless variables was investigated by the finite element method.Furthermore,related empirical formulas were given for each dimensionless parameter based on the simulation results.Finally,comparison was made and good agreement was found between the simulation results and the empirical formula,which shows that a useful approach is provided in this paper for analyzing the influence of each individual parameter.

  4. Analysis of Factors for Incorporating User Preferences in Air Traffic Management: A system Perspective

    Science.gov (United States)

    Sheth, Kapil S.; Gutierrez-Nolasco, Sebastian

    2010-01-01

    This paper presents an analysis of factors that impact user flight schedules during air traffic congestion. In pre-departure flight planning, users file one route per flight, which often leads to increased delays, inefficient airspace utilization, and exclusion of user flight preferences. In this paper, first the idea of filing alternate routes and providing priorities on each of those routes is introduced. Then, the impact of varying planning interval and system imposed departure delay increment is discussed. The metrics of total delay and equity are used for analyzing the impact of these factors on increased traffic and on different users. The results are shown for four cases, with and without the optional routes and priority assignments. Results demonstrate that adding priorities to optional routes further improves system performance compared to filing one route per flight and using first-come first-served scheme. It was also observed that a two-hour planning interval with a five-minute system imposed departure delay increment results in highest delay reduction. The trend holds for a scenario with increased traffic.

  5. Incorporating the Uncertainties of Nodal-Plane Orientation in the Seismo-Lineament Analysis Method (SLAM)

    Science.gov (United States)

    Cronin, V.; Sverdrup, K. A.

    2013-05-01

    The process of delineating a seismo-lineament has evolved since the first description of the Seismo-Lineament Analysis Method (SLAM) by Cronin et al. (2008, Env & Eng Geol 14(3) 199-219). SLAM is a reconnaissance tool to find the trace of the fault that produced an shallow-focus earthquake by projecting the corresponding nodal planes (NP) upward to their intersections with the ground surface, as represented by a DEM or topographic map. A seismo-lineament is formed by the intersection of the uncertainty volume associated with a given NP and the ground surface. The ground-surface trace of the fault that produced the earthquake is likely to be within one of the two seismo-lineaments associated with the two NPs derived from the earthquake's focal mechanism solution. When no uncertainty estimate has been reported for the NP orientation, the uncertainty volume associated with a given NP is bounded by parallel planes that are [1] tangent to the ellipsoidal uncertainty volume around the focus and [2] parallel to the NP. If the ground surface is planar, the resulting seismo-lineament is bounded by parallel lines. When an uncertainty is reported for the NP orientation, the seismo-lineament resembles a bow tie, with the epicenter located adjacent to or within the "knot." Some published lists of focal mechanisms include only one NP with associated uncertainties. The NP orientation uncertainties in strike azimuth (+/- gamma), dip angle (+/- epsilon) and rake that are output from an FPFIT analysis (Reasenberg and Oppenheimer, 1985, USGS OFR 85-739) are taken to be the same for both NPs (Oppenheimer, 2013, pers com). The boundaries of the NP uncertainty volume are each comprised by planes that are tangent to the focal uncertainty ellipsoid. One boundary, whose nearest horizontal distance from the epicenter is greater than or equal to that of the other boundary, is formed by the set of all planes with strike azimuths equal to the reported NP strike azimuth +/- gamma, and dip angle

  6. Meta-Analysis of Studies Incorporating the Interests of Young Children with Autism Spectrum Disorders into Early Intervention Practices

    Directory of Open Access Journals (Sweden)

    Carl J. Dunst

    2012-01-01

    Full Text Available Incorporating the interests and preferences of young children with autism spectrum disorders into interventions to promote prosocial behavior and decrease behavior excesses has emerged as a promising practice for addressing the core features of autism. The efficacy of interest-based early intervention practices was examined in a meta-analysis of 24 studies including 78 children 2 to 6 years of age diagnosed with autism spectrum disorders. Effect size analyses of intervention versus nonintervention conditions and high-interest versus low-interest contrasts indicated that interest-based intervention practices were effective in terms of increasing prosocial and decreasing aberrant child behavior. Additionally, interest-based interventions that focused on two of the three core features of autism spectrum disorders (poor communication, poor interpersonal relationships were found most effective in influencing child outcomes. Implications for very early intervention are discussed in terms addressing the behavior markers of autism spectrum disorders before they become firmly established.

  7. Heavy ion ToF analysis of oxygen incorporation in MgB{sub 2} thin films

    Energy Technology Data Exchange (ETDEWEB)

    Ionescu, M. [Australian Nuclear Science and Technology Organization, Lucas Heights, Building 53, New South Wales 2234 (Australia)], E-mail: Mihail.Ionescu@ansto.gov.au; Zhao, Y. [Institute for Superconduction and Electronic Materials, University of Wollongong, NSW 2522 (Australia); Siegele, R.; Cohen, D.D.; Stelcer, E.; Prior, M. [Australian Nuclear Science and Technology Organization, Lucas Heights, Building 53, New South Wales 2234 (Australia)

    2008-04-15

    Oxygen incorporation in MgB{sub 2} thin films during their fabrication process has a strong influence on the future properties of the films, and was studied by Elastic Recoil Detection Analysis with Heavy ions and a Time-of-flight detection. A series of MgB{sub 2} thin film samples were analyzed, including films produced in situ on Al{sub 2}O{sub 3}-C and Si (0 0 1) substrates (with higher T{sub c} and lower T{sub c}) with an 'on-axis' geometry, and films produced in situ with an 'off-axis' geometry. The amount of oxygen detected in these films appears to be correlated with the T{sub c} of the films, the higher the T{sub c} the lower the oxygen content. The superconducting properties of the examined thin films are discussed in the context of the ERDA results.

  8. Incorporating Climate Change Projections into a Hydrologic Hazard Analysis for Friant Dam

    Science.gov (United States)

    Holman, K. D.; Novembre, N.; Sankovich-Bahls, V.; England, J. F.

    2015-12-01

    The Bureau of Reclamation's Dam Safety Office has initiated a series of pilot studies focused on exploring potential impacts of climate change on hydrologic hazards at specific dam locations across the Western US. Friant Dam, located in Fresno, California, was chosen for study because the site had recently undergone a high-level hydrologic hazard analysis using the Stochastic Event Flood Model (SEFM). SEFM is a deterministic flood-event model that treats input parameters as variables, rather than fixed values. Monte Carlo sampling allows the hydrometeorological input parameters to vary according to observed relationships. In this study, we explore the potential impacts of climate change on the hydrologic hazard at Friant Dam using historical and climate-adjusted hydrometeorological inputs to the SEFM. Historical magnitude-frequency relationships of peak inflow and reservoir elevation were developed at Friant Dam for the baseline study using observed temperature and precipitation data between 1966 and 2011. Historical air temperatures, antecedent precipitation, mean annual precipitation, and the precipitation-frequency curve were adjusted for the climate change study using the delta method to create climate-adjusted hydrometeorological inputs. Historical and future climate projections are based on the Bias-Corrected Spatially-Disaggregated CMIP5 dataset (BCSD-CMIP5). The SEFM model was run thousands of times to produce magnitude-frequency relationships of peak reservoir inflow, inflow volume, and reservoir elevation, based on historical and climate-adjusted inputs. Results suggest that peak reservoir inflow and peak reservoir elevation increase (decrease) for all return periods under mean increases (decreases) in precipitation, independently of changes in surface air temperature.