WorldWideScience

Sample records for evolution sequence analysis

  1. Sequence analysis of serum albumins reveals the molecular evolution of ligand recognition properties.

    Science.gov (United States)

    Fanali, Gabriella; Ascenzi, Paolo; Bernardi, Giorgio; Fasano, Mauro

    2012-01-01

    Serum albumin (SA) is a circulating protein providing a depot and carrier for many endogenous and exogenous compounds. At least seven major binding sites have been identified by structural and functional investigations mainly in human SA. SA is conserved in vertebrates, with at least 49 entries in protein sequence databases. The multiple sequence analysis of this set of entries leads to the definition of a cladistic tree for the molecular evolution of SA orthologs in vertebrates, thus showing the clustering of the considered species, with lamprey SAs (Lethenteron japonicum and Petromyzon marinus) in a separate outgroup. Sequence analysis aimed at searching conserved domains revealed that most SA sequences are made up by three repeated domains (about 600 residues), as extensively characterized for human SA. On the contrary, lamprey SAs are giant proteins (about 1400 residues) comprising seven repeated domains. The phylogenetic analysis of the SA family reveals a stringent correlation with the taxonomic classification of the species available in sequence databases. A focused inspection of the sequences of ligand binding sites in SA revealed that in all sites most residues involved in ligand binding are conserved, although the versatility towards different ligands could be peculiar of higher organisms. Moreover, the analysis of molecular links between the different sites suggests that allosteric modulation mechanisms could be restricted to higher vertebrates.

  2. Molecular evolution and diversification of snake toxin genes, revealed by analysis of intron sequences.

    Science.gov (United States)

    Fujimi, T J; Nakajyo, T; Nishimura, E; Ogura, E; Tsuchiya, T; Tamiya, T

    2003-08-14

    The genes encoding erabutoxin (short chain neurotoxin) isoforms (Ea, Eb, and Ec), LsIII (long chain neurotoxin) and a novel long chain neurotoxin pseudogene were cloned from a Laticauda semifasciata genomic library. Short and long chain neurotoxin genes were also cloned from the genome of Laticauda laticaudata, a closely related species of L. semifasciata, by PCR. A putative matrix attached region (MAR) sequence was found in the intron I of the LsIII gene. Comparative analysis of 11 structurally relevant snake toxin genes (three-finger-structure toxins) revealed the molecular evolution of these toxins. Three-finger-structure toxin genes diverged from a common ancestor through two types of evolutionary pathways (long and short types), early in the course of evolution. At a later stage of evolution in each gene, the accumulation of mutations in the exons, especially exon II, by accelerated evolution may have caused the increased diversification in their functions. It was also revealed that the putative MAR sequence found in the LsIII gene was integrated into the gene after the species-level divergence.

  3. Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

    Energy Technology Data Exchange (ETDEWEB)

    Schulman, Al

    2009-08-09

    Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops

  4. System risk evolution analysis and risk critical event identification based on event sequence diagram

    International Nuclear Information System (INIS)

    Luo, Pengcheng; Hu, Yang

    2013-01-01

    During system operation, the environmental, operational and usage conditions are time-varying, which causes the fluctuations of the system state variables (SSVs). These fluctuations change the accidents’ probabilities and then result in the system risk evolution (SRE). This inherent relation makes it feasible to realize risk control by monitoring the SSVs in real time, herein, the quantitative analysis of SRE is essential. Besides, some events in the process of SRE are critical to system risk, because they act like the “demarcative points” of safety and accident, and this characteristic makes each of them a key point of risk control. Therefore, analysis of SRE and identification of risk critical events (RCEs) are remarkably meaningful to ensure the system to operate safely. In this context, an event sequence diagram (ESD) based method of SRE analysis and the related Monte Carlo solution are presented; RCE and risk sensitive variable (RSV) are defined, and the corresponding identification methods are also proposed. Finally, the proposed approaches are exemplified with an accident scenario of an aircraft getting into the icing region

  5. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution.

    Science.gov (United States)

    2004-12-09

    We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.

  6. MBEToolbox: a Matlab toolbox for sequence data analysis in molecular biology and evolution

    Directory of Open Access Journals (Sweden)

    Xia Xuhua

    2005-03-01

    Full Text Available Abstract Background MATLAB is a high-performance language for technical computing, integrating computation, visualization, and programming in an easy-to-use environment. It has been widely used in many areas, such as mathematics and computation, algorithm development, data acquisition, modeling, simulation, and scientific and engineering graphics. However, few functions are freely available in MATLAB to perform the sequence data analyses specifically required for molecular biology and evolution. Results We have developed a MATLAB toolbox, called MBEToolbox, aimed at filling this gap by offering efficient implementations of the most needed functions in molecular biology and evolution. It can be used to manipulate aligned sequences, calculate evolutionary distances, estimate synonymous and nonsynonymous substitution rates, and infer phylogenetic trees. Moreover, it provides an extensible, functional framework for users with more specialized requirements to explore and analyze aligned nucleotide or protein sequences from an evolutionary perspective. The full functions in the toolbox are accessible through the command-line for seasoned MATLAB users. A graphical user interface, that may be especially useful for non-specialist end users, is also provided. Conclusion MBEToolbox is a useful tool that can aid in the exploration, interpretation and visualization of data in molecular biology and evolution. The software is publicly available at http://web.hku.hk/~jamescai/mbetoolbox/ and http://bioinformatics.org/project/?group_id=454.

  7. Expressed sequence tags as a tool for phylogenetic analysis of placental mammal evolution.

    Directory of Open Access Journals (Sweden)

    Morgan Kullberg

    Full Text Available BACKGROUND: We investigate the usefulness of expressed sequence tags, ESTs, for establishing divergences within the tree of placental mammals. This is done on the example of the established relationships among primates (human, lagomorphs (rabbit, rodents (rat and mouse, artiodactyls (cow, carnivorans (dog and proboscideans (elephant. METHODOLOGY/PRINCIPAL FINDINGS: We have produced 2000 ESTs (1.2 mega bases from a marsupial mouse and characterized the data for their use in phylogenetic analysis. The sequences were used to identify putative orthologous sequences from whole genome projects. Although most ESTs stem from single sequence reads, the frequency of potential sequencing errors was found to be lower than allelic variation. Most of the sequences represented slowly evolving housekeeping-type genes, with an average amino acid distance of 6.6% between human and mouse. Positive Darwinian selection was identified at only a few single sites. Phylogenetic analyses of the EST data yielded trees that were consistent with those established from whole genome projects. CONCLUSIONS: The general quality of EST sequences and the general absence of positive selection in these sequences make ESTs an attractive tool for phylogenetic analysis. The EST approach allows, at reasonable costs, a fast extension of data sampling from species outside the genome projects.

  8. Complete Chloroplast Genome Sequence of Aquilaria sinensis (Lour.) Gilg and Evolution Analysis within the Malvales Order.

    Science.gov (United States)

    Wang, Ying; Zhan, Di-Feng; Jia, Xian; Mei, Wen-Li; Dai, Hao-Fu; Chen, Xiong-Ting; Peng, Shi-Qing

    2016-01-01

    Aquilaria sinensis (Lour.) Gilg is an important medicinal woody plant producing agarwood, which is widely used in traditional Chinese medicine. High-throughput sequencing of chloroplast (cp) genomes enhanced the understanding about evolutionary relationships within plant families. In this study, we determined the complete cp genome sequences for A. sinensis. The size of the A. sinensis cp genome was 159,565 bp. This genome included a large single-copy region of 87,482 bp, a small single-copy region of 19,857 bp, and a pair of inverted repeats (IRa and IRb) of 26,113 bp each. The GC content of the genome was 37.11%. The A. sinensis cp genome encoded 113 functional genes, including 82 protein-coding genes, 27 tRNA genes, and 4 rRNA genes. Seven genes were duplicated in the protein-coding genes, whereas 11 genes were duplicated in the RNA genes. A total of 45 polymorphic simple-sequence repeat loci and 60 pairs of large repeats were identified. Most simple-sequence repeats were located in the noncoding sections of the large single-copy/small single-copy region and exhibited high A/T content. Moreover, 33 pairs of large repeat sequences were located in the protein-coding genes, whereas 27 pairs were located in the intergenic regions. Aquilaria sinensis cp genome bias ended with A/T on the basis of codon usage. The distribution of codon usage in A. sinensis cp genome was most similar to that in the Gonystylus bancanus cp genome. Comparative results of 82 protein-coding genes from 29 species of cp genomes demonstrated that A. sinensis was a sister species to G. bancanus within the Malvales order. Aquilaria sinensis cp genome presented the highest sequence similarity of >90% with the G. bancanus cp genome by using CGView Comparison Tool. This finding strongly supports the placement of A. sinensis as a sister to G. bancanus within the Malvales order. The complete A. sinensis cp genome information will be highly beneficial for further studies on this traditional medicinal

  9. Combining protein sequence, structure, and dynamics: A novel approach for functional evolution analysis of PAS domain superfamily.

    Science.gov (United States)

    Dong, Zheng; Zhou, Hongyu; Tao, Peng

    2018-02-01

    PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.

  10. Chromosome-scale comparative sequence analysis unravels molecular mechanisms of genome evolution between two wheat cultivars

    KAUST Repository

    Thind, Anupriya Kaur

    2018-02-08

    Background: Recent improvements in DNA sequencing and genome scaffolding have paved the way to generate high-quality de novo assemblies of pseudomolecules representing complete chromosomes of wheat and its wild relatives. These assemblies form the basis to compare the evolutionary dynamics of wheat genomes on a megabase-scale. Results: Here, we provide a comparative sequence analysis of the 700-megabase chromosome 2D between two bread wheat genotypes, the old landrace Chinese Spring and the elite Swiss spring wheat line CH Campala Lr22a. There was a high degree of sequence conservation between the two chromosomes. Analysis of large structural variations revealed four large insertions/deletions (InDels) of >100 kb. Based on the molecular signatures at the breakpoints, unequal crossing over and double-strand break repair were identified as the evolutionary mechanisms that caused these InDels. Three of the large InDels affected copy number of NLRs, a gene family involved in plant immunity. Analysis of single nucleotide polymorphism (SNP) density revealed three haploblocks of 8 Mb, 9 Mb and 48 Mb with a 35-fold increased SNP density compared to the rest of the chromosome. Conclusions: This comparative analysis of two high-quality chromosome assemblies enabled a comprehensive assessment of large structural variations. The insight obtained from this analysis will form the basis of future wheat pan-genome studies.

  11. Population genetic and evolution analysis of controversial genus Edwardsiella by multilocus sequence typing.

    Science.gov (United States)

    Buján, Noemí; Balboa, Sabela; L Romalde, Jesús; E Toranzo, Alicia; Magariños, Beatriz

    2018-05-08

    At present, the genus Edwardsiella compiles five species: E. tarda, E. hoshinae, E. ictaluri, E. piscicida and E. anguillarum. Some species of this genus such us E. ictaluri and E. piscicida are important pathogens of numerous fish species. With the description of the two latter species, the phylogeny of Edwardsiella became more complicated. With the aim to clarify the relationships among all species in the genus, a multilocus sequence typing (MLST) approach was developed and applied to characterize 56 isolates and 6 reference strains belonging to the five Edwardsiella species. Moreover, several analyses based on the MLST scheme were performed to investigate the evolution within the genus, as well as the influence of recombination and mutation in the speciation. Edwardsiella isolates presented a high genetic variability reflected in the fourteen sequence types (ST) represented by a single isolates out of eighteen total ST. Mutation events were considerably more frequent than recombination, although both approximately equal influenced the genetic diversification. However, the speciation among species occurred mostly by recombination. Edwardsiella genus displays a non-clonal population structure with some degree of geographical isolation followed by a population expansion of E. piscicida. A database from this study was created and hosted on pubmlst.org (http://pubmlst.org/edwardsiella/). Copyright © 2018 Elsevier Inc. All rights reserved.

  12. New insights into Trypanosoma cruzi evolution, genotyping and molecular diagnostics from satellite DNA sequence analysis.

    Directory of Open Access Journals (Sweden)

    Juan C Ramírez

    2017-12-01

    Full Text Available Trypanosoma cruzi has been subdivided into seven Discrete Typing Units (DTUs, TcI-TcVI and Tcbat. Two major evolutionary models have been proposed to explain the origin of hybrid lineages, but while it is widely accepted that TcV and TcVI are the result of genetic exchange between TcII and TcIII strains, the origin of TcIII and TcIV is still a matter of debate. T. cruzi satellite DNA (SatDNA, comprised of 195 bp units organized in tandem repeats, from both TcV and TcVI stocks were found to have SatDNA copies type TcI and TcII; whereas contradictory results were observed for TcIII stocks and no TcIV sequence has been analyzed yet. Herein, we have gone deeper into this matter analyzing 335 distinct SatDNA sequences from 19 T. cruzi stocks representative of DTUs TcI-TcVI for phylogenetic inference. Bayesian phylogenetic tree showed that all sequences were grouped in three major clusters, which corresponded to sequences from DTUs TcI/III, TcII and TcIV; whereas TcV and TcVI stocks had two sets of sequences distributed into TcI/III and TcII clusters. As expected, the lowest genetic distances were found between TcI and TcIII, and between TcV and TcVI sequences; whereas the highest ones were observed between TcII and TcI/III, and among TcIV sequences and those from the remaining DTUs. In addition, signature patterns associated to specific T. cruzi lineages were identified and new primers that improved SatDNA-based qPCR sensitivity were designed. Our findings support the theory that TcIII is not the result of a hybridization event between TcI and TcII, and that TcIV had an independent origin from the other DTUs, contributing to clarifying the evolutionary history of T. cruzi lineages. Moreover, this work opens the possibility of typing samples from Chagas disease patients with low parasitic loads and improving molecular diagnostic methods of T. cruzi infection based on SatDNA sequence amplification.

  13. New insights into Trypanosoma cruzi evolution, genotyping and molecular diagnostics from satellite DNA sequence analysis.

    Science.gov (United States)

    Ramírez, Juan C; Torres, Carolina; Curto, María de Los A; Schijman, Alejandro G

    2017-12-01

    Trypanosoma cruzi has been subdivided into seven Discrete Typing Units (DTUs), TcI-TcVI and Tcbat. Two major evolutionary models have been proposed to explain the origin of hybrid lineages, but while it is widely accepted that TcV and TcVI are the result of genetic exchange between TcII and TcIII strains, the origin of TcIII and TcIV is still a matter of debate. T. cruzi satellite DNA (SatDNA), comprised of 195 bp units organized in tandem repeats, from both TcV and TcVI stocks were found to have SatDNA copies type TcI and TcII; whereas contradictory results were observed for TcIII stocks and no TcIV sequence has been analyzed yet. Herein, we have gone deeper into this matter analyzing 335 distinct SatDNA sequences from 19 T. cruzi stocks representative of DTUs TcI-TcVI for phylogenetic inference. Bayesian phylogenetic tree showed that all sequences were grouped in three major clusters, which corresponded to sequences from DTUs TcI/III, TcII and TcIV; whereas TcV and TcVI stocks had two sets of sequences distributed into TcI/III and TcII clusters. As expected, the lowest genetic distances were found between TcI and TcIII, and between TcV and TcVI sequences; whereas the highest ones were observed between TcII and TcI/III, and among TcIV sequences and those from the remaining DTUs. In addition, signature patterns associated to specific T. cruzi lineages were identified and new primers that improved SatDNA-based qPCR sensitivity were designed. Our findings support the theory that TcIII is not the result of a hybridization event between TcI and TcII, and that TcIV had an independent origin from the other DTUs, contributing to clarifying the evolutionary history of T. cruzi lineages. Moreover, this work opens the possibility of typing samples from Chagas disease patients with low parasitic loads and improving molecular diagnostic methods of T. cruzi infection based on SatDNA sequence amplification.

  14. DNA sequence analysis tells the truth of the origin, propagation, and evolution of chili (red pepper

    Directory of Open Access Journals (Sweden)

    Hye Jeong Yang

    2017-09-01

    The theory that Korean chili (kochu comes from Japan is the biggest culprit responsible for distorting the history and value of Korean chili and kimchi. It has also spawned a number of other idle theories. This paper aims to correct this misconception through scientific analysis and ultimately restore the truth about the history and culture of Korean fermented foods.

  15. Comparative analysis of function and interaction of transcription factors in nematodes: Extensive conservation of orthology coupled to rapid sequence evolution

    Directory of Open Access Journals (Sweden)

    Singh Rama S

    2008-08-01

    Full Text Available Abstract Background Much of the morphological diversity in eukaryotes results from differential regulation of gene expression in which transcription factors (TFs play a central role. The nematode Caenorhabditis elegans is an established model organism for the study of the roles of TFs in controlling the spatiotemporal pattern of gene expression. Using the fully sequenced genomes of three Caenorhabditid nematode species as well as genome information from additional more distantly related organisms (fruit fly, mouse, and human we sought to identify orthologous TFs and characterized their patterns of evolution. Results We identified 988 TF genes in C. elegans, and inferred corresponding sets in C. briggsae and C. remanei, containing 995 and 1093 TF genes, respectively. Analysis of the three gene sets revealed 652 3-way reciprocal 'best hit' orthologs (nematode TF set, approximately half of which are zinc finger (ZF-C2H2 and ZF-C4/NHR types and HOX family members. Examination of the TF genes in C. elegans and C. briggsae identified the presence of significant tandem clustering on chromosome V, the majority of which belong to ZF-C4/NHR family. We also found evidence for lineage-specific duplications and rapid evolution of many of the TF genes in the two species. A search of the TFs conserved among nematodes in Drosophila melanogaster, Mus musculus and Homo sapiens revealed 150 reciprocal orthologs, many of which are associated with important biological processes and human diseases. Finally, a comparison of the sequence, gene interactions and function indicates that nematode TFs conserved across phyla exhibit significantly more interactions and are enriched in genes with annotated mutant phenotypes compared to those that lack orthologs in other species. Conclusion Our study represents the first comprehensive genome-wide analysis of TFs across three nematode species and other organisms. The findings indicate substantial conservation of transcription

  16. A Topical Trajectory on Survival: An Analysis of Link-Making in a Sequence of Lessons on Evolution

    Science.gov (United States)

    Rocksén, Miranda; Olander, Clas

    2017-01-01

    This study explores the concept of link-making in relation to communicative strategies applied in the teaching and studying of biological evolution. The analysis focused on video recordings of 11 lessons on biological evolution conducted in a Swedish 9th grade class of students aged 15 years. It reveals how the teacher and students connected…

  17. Myocardial tagging by Cardiovascular Magnetic Resonance: evolution of techniques--pulse sequences, analysis algorithms, and applications

    Directory of Open Access Journals (Sweden)

    Ibrahim El-Sayed H

    2011-07-01

    Full Text Available Abstract Cardiovascular magnetic resonance (CMR tagging has been established as an essential technique for measuring regional myocardial function. It allows quantification of local intramyocardial motion measures, e.g. strain and strain rate. The invention of CMR tagging came in the late eighties, where the technique allowed for the first time for visualizing transmural myocardial movement without having to implant physical markers. This new idea opened the door for a series of developments and improvements that continue up to the present time. Different tagging techniques are currently available that are more extensive, improved, and sophisticated than they were twenty years ago. Each of these techniques has different versions for improved resolution, signal-to-noise ratio (SNR, scan time, anatomical coverage, three-dimensional capability, and image quality. The tagging techniques covered in this article can be broadly divided into two main categories: 1 Basic techniques, which include magnetization saturation, spatial modulation of magnetization (SPAMM, delay alternating with nutations for tailored excitation (DANTE, and complementary SPAMM (CSPAMM; and 2 Advanced techniques, which include harmonic phase (HARP, displacement encoding with stimulated echoes (DENSE, and strain encoding (SENC. Although most of these techniques were developed by separate groups and evolved from different backgrounds, they are in fact closely related to each other, and they can be interpreted from more than one perspective. Some of these techniques even followed parallel paths of developments, as illustrated in the article. As each technique has its own advantages, some efforts have been made to combine different techniques together for improved image quality or composite information acquisition. In this review, different developments in pulse sequences and related image processing techniques are described along with the necessities that led to their invention

  18. Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis

    Directory of Open Access Journals (Sweden)

    John P. Jakupciak

    2013-01-01

    Full Text Available Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.

  19. An ITS phylogeny of Leccinum and an analysis of the evolution of minisatellite-like sequences within ITS1

    NARCIS (Netherlands)

    Bakker, den H.C.; Gravendeel, B.; Kuyper, T.W.

    2004-01-01

    Phylogenetic relationships of the European species of Leccinum (Boletales, Boletaceae) were investigated by maximum parsimony, Bayesian and likelihood analyses of nrITS1-5.8S-ITS2 and 28S sequences. The separate gene trees inferred were largely concordant, and their combined analysis indicates that

  20. Insights into the evolution of Darwin’s finches from comparative analysis of the Geospiza magnirostris genome sequence

    Directory of Open Access Journals (Sweden)

    Rands Chris M

    2013-02-01

    Full Text Available Abstract Background A classical example of repeated speciation coupled with ecological diversification is the evolution of 14 closely related species of Darwin’s (Galápagos finches (Thraupidae, Passeriformes. Their adaptive radiation in the Galápagos archipelago took place in the last 2–3 million years and some of the molecular mechanisms that led to their diversification are now being elucidated. Here we report evolutionary analyses of genome of the large ground finch, Geospiza magnirostris. Results 13,291 protein-coding genes were predicted from a 991.0 Mb G. magnirostris genome assembly. We then defined gene orthology relationships and constructed whole genome alignments between the G. magnirostris and other vertebrate genomes. We estimate that 15% of genomic sequence is functionally constrained between G. magnirostris and zebra finch. Genic evolutionary rate comparisons indicate that similar selective pressures acted along the G. magnirostris and zebra finch lineages suggesting that historical effective population size values have been similar in both lineages. 21 otherwise highly conserved genes were identified that each show evidence for positive selection on amino acid changes in the Darwin's finch lineage. Two of these genes (Igf2r and Pou1f1 have been implicated in beak morphology changes in Darwin’s finches. Five of 47 genes showing evidence of positive selection in early passerine evolution have cilia related functions, and may be examples of adaptively evolving reproductive proteins. Conclusions These results provide insights into past evolutionary processes that have shaped G. magnirostris genes and its genome, and provide the necessary foundation upon which to build population genomics resources that will shed light on more contemporaneous adaptive and non-adaptive processes that have contributed to the evolution of the Darwin’s finches.

  1. Differential evolution-simulated annealing for multiple sequence alignment

    Science.gov (United States)

    Addawe, R. C.; Addawe, J. M.; Sueño, M. R. K.; Magadia, J. C.

    2017-10-01

    Multiple sequence alignments (MSA) are used in the analysis of molecular evolution and sequence structure relationships. In this paper, a hybrid algorithm, Differential Evolution - Simulated Annealing (DESA) is applied in optimizing multiple sequence alignments (MSAs) based on structural information, non-gaps percentage and totally conserved columns. DESA is a robust algorithm characterized by self-organization, mutation, crossover, and SA-like selection scheme of the strategy parameters. Here, the MSA problem is treated as a multi-objective optimization problem of the hybrid evolutionary algorithm, DESA. Thus, we name the algorithm as DESA-MSA. Simulated sequences and alignments were generated to evaluate the accuracy and efficiency of DESA-MSA using different indel sizes, sequence lengths, deletion rates and insertion rates. The proposed hybrid algorithm obtained acceptable solutions particularly for the MSA problem evaluated based on the three objectives.

  2. Protein sequence comparison and protein evolution

    Energy Technology Data Exchange (ETDEWEB)

    Pearson, W.R. [Univ. of Virginia, Charlottesville, VA (United States). Dept. of Biochemistry

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. This tutorial examines how the information conserved during the evolution of a protein molecule can be used to infer reliably homology, and thus a shared proteinfold and possibly a shared active site or function. The authors start by reviewing a geological/evolutionary time scale. Next they look at the evolution of several protein families. During the tutorial, these families will be used to demonstrate that homologous protein ancestry can be inferred with confidence. They also examine different modes of protein evolution and consider some hypotheses that have been presented to explain the very earliest events in protein evolution. The next part of the tutorial will examine the technical aspects of protein sequence comparison. Both optimal and heuristic algorithms and their associated parameters that are used to characterize protein sequence similarities are discussed. Perhaps more importantly, they survey the statistics of local similarity scores, and how these statistics can both be used to improve the selectivity of a search and to evaluate the significance of a match. They them examine distantly related members of three protein families, the serine proteases, the glutathione transferases, and the G-protein-coupled receptors (GCRs). Finally, the discuss how sequence similarity can be used to examine internal repeated or mosaic structures in proteins.

  3. The Role of the Y-Chromosome in the Establishment of Murine Hybrid Dysgenesis and in the Analysis of the Nucleotide Sequence Organization, Genetic Transmission and Evolution of Repeated Sequences.

    Science.gov (United States)

    Nallaseth, Ferez Soli

    The Y-chromosome presents a unique cytogenetic framework for the evolution of nucleotide sequences. Alignment of nine Y-chromosomal fragments in their increasing Y-specific/non Y-specific (male/female) sequence divergence ratios was directly and inversely related to their interspersion on these two respective genomic fractions. Sequence analysis confirmed a direct relationship between divergence ratios and the Alu, LINE-1, Satellite and their derivative oligonucleotide contents. Thus their relocation on the Y-chromosome is followed by sequence divergence rather than the well documented concerted evolution of these non-coding progenitor repeated sequences. Five of the nine Y-chromosomal fragments are non-pseudoautosomal and transcribed into heterogeneous PolyA^+ RNA and thus can be retrotransposed. Evolutionary and computer analysis identified homologous oligonucleotide tracts in several human loci suggesting common and random mechanistic origins. Dysgenic genomes represent the accelerated evolution driving sequence divergence (McClintock, 1984). Sex reversal and sterility characterizing dysgenesis occurs in C57BL/6JY ^{rm Pos} but not in 129/SvY^{rm Pos} derivative strains. High frequency, random, multi-locus deletion products of the feral Y^{ rm Pos}-chromosome are generated in the germlines of F1(C57BL/6J X 129/SvY^{ rm Pos})(male) and C57BL/6JY ^{rm Pos}(male) but not in 129/SvY^{rm Pos}(male). Equal, 10^{-1}, 10^ {-2}, and 0 copies (relative to males) of Y^{rm Pos}-specific deletion products respectively characterize C57BL/6JY ^{rm Pos} (HC), (LC), (T) and (F) females. The testes determining loci of inactive Y^{rm Pos}-chromosomes in C57BL/6JY^{rm Pos} HC females are the preferentially deleted/rearranged Y ^{rm Pos}-sequences. Disruption of regulation of plasma testosterone and hepatic MUP-A mRNA levels, TRD of a 4.7 Kbp EcoR1 fragment suggest disruption of autosomal/X-chromosomal sequences. These data and the highly repeated progenitor (Alu, GATA, LINE-1

  4. Simulating efficiently the evolution of DNA sequences.

    Science.gov (United States)

    Schöniger, M; von Haeseler, A

    1995-02-01

    Two menu-driven FORTRAN programs are described that simulate the evolution of DNA sequences in accordance with a user-specified model. This general stochastic model allows for an arbitrary stationary nucleotide composition and any transition-transversion bias during the process of base substitution. In addition, the user may define any hypothetical model tree according to which a family of sequences evolves. The programs suggest the computationally most inexpensive approach to generate nucleotide substitutions. Either reproducible or non-repeatable simulations, depending on the method of initializing the pseudo-random number generator, can be performed. The corresponding options are offered by the interface menu.

  5. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene...

  6. Insights on the evolution of metabolic networks of unicellular translationally biased organisms from transcriptomic data and sequence analysis.

    Science.gov (United States)

    Carbone, Alessandra; Madden, Richard

    2005-10-01

    Codon bias is related to metabolic functions in translationally biased organisms, and two facts are argued about. First, genes with high codon bias describe in meaningful ways the metabolic characteristics of the organism; important metabolic pathways corresponding to crucial characteristics of the lifestyle of an organism, such as photosynthesis, nitrification, anaerobic versus aerobic respiration, sulfate reduction, methanogenesis, and others, happen to involve especially biased genes. Second, gene transcriptional levels of sets of experiments representing a significant variation of biological conditions strikingly confirm, in the case of Saccharomyces cerevisiae, that metabolic preferences are detectable by purely statistical analysis: the high metabolic activity of yeast during fermentation is encoded in the high bias of enzymes involved in the associated pathways, suggesting that this genome was affected by a strong evolutionary pressure that favored a predominantly fermentative metabolism of yeast in the wild. The ensemble of metabolic pathways involving enzymes with high codon bias is rather well defined and remains consistent across many species, even those that have not been considered as translationally biased, such as Helicobacter pylori, for instance, reveal some weak form of translational bias for this genome. We provide numerical evidence, supported by experimental data, of these facts and conclude that the metabolic networks of translationally biased genomes, observable today as projections of eons of evolutionary pressure, can be analyzed numerically and predictions of the role of specific pathways during evolution can be derived. The new concepts of Comparative Pathway Index, used to compare organisms with respect to their metabolic networks, and Evolutionary Pathway Index, used to detect evolutionarily meaningful bias in the genetic code from transcriptional data, are introduced.

  7. Learning about evolution from sequence data

    Science.gov (United States)

    Dayarian, Adel; Shraiman, Boris

    2012-02-01

    Recent advances in sequencing and in laboratory evolution experiments have made possible to obtain quantitative data on genetic diversity of populations and on the dynamics of evolution. This dynamics is shaped by the interplay between selection acting on beneficial and deleterious mutations and recombination which reshuffles genotypes. Mounting evidence suggests that natural populations harbor extensive fitness diversity, yet most of the currently available tools for analyzing polymorphism data are based on the neutral theory. Our aim is to develop methods to analyze genomic data for populations in the presence of the above-mentioned factors. We consider different evolutionary regimes - Muller's ratchet, mutation-recombination-selection balance and positive adaption rate - and revisit a number of observables considered in the nearly-neutral theory of evolution. In particular, we examine the coalescent structure in the presence of recombination and calculate quantities such as the distribution of the coalescent times along the genome, the distribution of haplotype block sizes and the correlation between ancestors of different loci along the genome. In addition, we characterize the probability and time of fixation of mutations as a function of their fitness effect.

  8. Mitochondrial DNA sequence evolution in shorebird populations

    NARCIS (Netherlands)

    Wenink, P.W.

    1994-01-01

    This thesis describes the global molecular population structure of two shorebird species, in particular of the dunlin, Calidris alpina, by means of comparative sequence analysis of the most variable part of the mitochondrial DNA (mtDNA) genome. There are several reasons

  9. Fractals in DNA sequence analysis

    Institute of Scientific and Technical Information of China (English)

    Yu Zu-Guo(喻祖国); Vo Anh; Gong Zhi-Min(龚志民); Long Shun-Chao(龙顺潮)

    2002-01-01

    Fractal methods have been successfully used to study many problems in physics, mathematics, engineering, finance,and even in biology. There has been an increasing interest in unravelling the mysteries of DNA; for example, how can we distinguish coding and noncoding sequences, and the problems of classification and evolution relationship of organisms are key problems in bioinformatics. Although much research has been carried out by taking into consideration the long-range correlations in DNA sequences, and the global fractal dimension has been used in these works by other people, the models and methods are somewhat rough and the results are not satisfactory. In recent years, our group has introduced a time series model (statistical point of view) and a visual representation (geometrical point of view)to DNA sequence analysis. We have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. In this paper, we introduce these fractal models and methods and the results of DNA sequence analysis.

  10. Insights into hominid evolution from the gorilla genome sequence

    Science.gov (United States)

    Scally, Aylwyn; Dutheil, Julien Y.; Hillier, LaDeana W.; Jordan, Greg E.; Goodhead, Ian; Herrero, Javier; Hobolth, Asger; Lappalainen, Tuuli; Mailund, Thomas; Marques-Bonet, Tomas; McCarthy, Shane; Montgomery, Stephen H.; Schwalie, Petra C.; Tang, Y. Amy; Ward, Michelle C.; Xue, Yali; Yngvadottir, Bryndis; Alkan, Can; Andersen, Lars N.; Ayub, Qasim; Ball, Edward V.; Beal, Kathryn; Bradley, Brenda J.; Chen, Yuan; Clee, Chris M.; Fitzgerald, Stephen; Graves, Tina A.; Gu, Yong; Heath, Paul; Heger, Andreas; Karakoc, Emre; Kolb-Kokocinski, Anja; Laird, Gavin K.; Lunter, Gerton; Meader, Stephen; Mort, Matthew; Mullikin, James C.; Munch, Kasper; O’Connor, Timothy D.; Phillips, Andrew D.; Prado-Martinez, Javier; Rogers, Anthony S.; Sajjadian, Saba; Schmidt, Dominic; Shaw, Katy; Simpson, Jared T.; Stenson, Peter D.; Turner, Daniel J.; Vigilant, Linda; Vilella, Albert J.; Whitener, Weldon; Zhu, Baoli; Cooper, David N.; de Jong, Pieter; Dermitzakis, Emmanouil T.; Eichler, Evan E.; Flicek, Paul; Goldman, Nick; Mundy, Nicholas I.; Ning, Zemin; Odom, Duncan T.; Ponting, Chris P.; Quail, Michael A.; Ryder, Oliver A.; Searle, Stephen M.; Warren, Wesley C.; Wilson, Richard K.; Schierup, Mikkel H.; Rogers, Jane; Tyler-Smith, Chris; Durbin, Richard

    2012-01-01

    Summary Gorillas are humans’ closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago (Mya). In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution. PMID:22398555

  11. Understanding Cancer Genome and Its Evolution by Next Generation Sequencing

    DEFF Research Database (Denmark)

    Hou, Yong

    Cancer will cause 13 million deaths by the year of 2030, ranking the second leading cause of death worldwide. Previous studies indicate that most of the cancers originate from cells that acquired somatic mutations and evolved as Darwin Theory. Ten biological insights of cancer have been summarized...... recently. Cutting-age technologies like next generation sequencing (NGS) enable exploring cancer genome and evolution much more efficiently. However, integrated cancer genome sequencing studies showed great inter-/intra-tumoral heterogeneity (ITH) and complex evolution patterns beyond the cancer biological...... knowledge we previously know. There is very limited knowledge of East Asia lung cancer genome except enrichment of EGFR mutations and lack of KRAS mutations. We carried out integrated genomic, transcriptomic and methylomic analysis of 335 primary Chinese lung adenocarcinomas (LUAD) and 35 corresponding...

  12. A Markov chain Monte Carlo Expectation Maximization Algorithm for Statistical Analysis of DNA Sequence Evolution with Neighbor-Dependent Substitution Rates

    DEFF Research Database (Denmark)

    Hobolth, Asger

    2008-01-01

    The evolution of DNA sequences can be described by discrete state continuous time Markov processes on a phylogenetic tree. We consider neighbor-dependent evolutionary models where the instantaneous rate of substitution at a site depends on the states of the neighboring sites. Neighbor...

  13. New insights into flavivirus evolution, taxonomy and biogeographic history, extended by analysis of canonical and alternative coding sequences.

    Directory of Open Access Journals (Sweden)

    Gregory Moureau

    Full Text Available To generate the most diverse phylogenetic dataset for the flaviviruses to date, we determined the genomic sequences and phylogenetic relationships of 14 flaviviruses, of which 10 are primarily associated with Culex spp. mosquitoes. We analyze these data, in conjunction with a comprehensive collection of flavivirus genomes, to characterize flavivirus evolutionary and biogeographic history in unprecedented detail and breadth. Based on the presumed introduction of yellow fever virus into the Americas via the transatlantic slave trade, we extrapolated a timescale for a relevant subset of flaviviruses whose evolutionary history, shows that different Culex-spp. associated flaviviruses have been introduced from the Old World to the New World on at least five separate occasions, with 2 different sets of factors likely to have contributed to the dispersal of the different viruses. We also discuss the significance of programmed ribosomal frameshifting in a central region of the polyprotein open reading frame in some mosquito-associated flaviviruses.

  14. Image sequence analysis

    CERN Document Server

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  15. A model of proteostatic energy cost and its use in analysis of proteome trends and sequence evolution.

    Directory of Open Access Journals (Sweden)

    Kasper P Kepp

    Full Text Available A model of proteome-associated chemical energetic costs of cells is derived from protein-turnover kinetics and protein folding. Minimization of the proteostatic maintenance cost can explain a range of trends of proteomes and combines both protein function, stability, size, proteostatic cost, temperature, resource availability, and turnover rates in one simple framework. We then explore the ansatz that the chemical energy remaining after proteostatic maintenance is available for reproduction (or cell division and thus, proportional to organism fitness. Selection for lower proteostatic costs is then shown to be significant vs. typical effective population sizes of yeast. The model explains and quantifies evolutionary conservation of highly abundant proteins as arising both from functional mutations and from changes in other properties such as stability, cost, or turnover rates. We show that typical hypomorphic mutations can be selected against due to increased cost of compensatory protein expression (both in the mutated gene and in related genes, i.e. epistasis rather than compromised function itself, although this compensation depends on the protein's importance. Such mutations exhibit larger selective disadvantage in abundant, large, synthetically costly, and/or short-lived proteins. Selection against increased turnover costs of less stable proteins rather than misfolding toxicity per se can explain equilibrium protein stability distributions, in agreement with recent findings in E. coli. The proteostatic selection pressure is stronger at low metabolic rates (i.e. scarce environments and in hot habitats, explaining proteome adaptations towards rough environments as a question of energy. The model may also explain several trade-offs observed in protein evolution and suggests how protein properties can coevolve to maintain low proteostatic cost.

  16. Evolution of developmental sequences in lepidosaurs

    Directory of Open Access Journals (Sweden)

    Tomasz Skawiński

    2017-04-01

    Full Text Available Background Lepidosaurs, a group including rhynchocephalians and squamates, are one of the major clades of extant vertebrates. Although there has been extensive phylogenetic work on this clade, its interrelationships are a matter of debate. Morphological and molecular data suggest very different relationships within squamates. Despite this, relatively few studies have assessed the utility of other types of data for inferring squamate phylogeny. Methods We used developmental sequences of 20 events in 29 species of lepidosaurs. These sequences were analysed using event-pairing and continuous analysis. They were transformed into cladistic characters and analysed in TNT. Ancestral state reconstructions were performed on two main phylogenetic hypotheses of squamates (morphological and molecular. Results Cladistic analyses conducted using characters generated by these methods do not resemble any previously published phylogeny. Ancestral state reconstructions are equally consistent with both morphological and molecular hypotheses of squamate phylogeny. Only several inferred heterochronic events are common to all methods and phylogenies. Discussion Results of the cladistic analyses, and the fact that reconstructions of heterochronic events show more similarities between certain methods rather than phylogenetic hypotheses, suggest that phylogenetic signal is at best weak in the studied developmental events. Possibly the developmental sequences analysed here evolve too quickly to recover deep divergences within Squamata.

  17. Chloroplast DNA analysis of Tunisian cork oak populations (Quercus suber L.): sequence variations and molecular evolution of the trnL (UAA)-trnF (GAA) region.

    Science.gov (United States)

    Abdessamad, A; Baraket, G; Sakka, H; Ammari, Y; Ksontini, M; Hannachi, A Salhi

    2016-10-24

    Sequences of the trnL-trnF spacer and combined trnL-trnF region in chloroplast DNA of cork oak (Quercus suber L.) were analyzed to detect polymorphisms and to elucidate molecular evolution and demographic history. The aligned sequences varied in length and nucleotide composition. The overall ratio of transition/transversion (ti/tv) of 0.724 for the intergenic spacer and 0.258 for the pooled sequences were estimated, and indicated that transversions are more frequent than transitions. The molecular evolution and demographic history of Q. suber were investigated. Neutrality tests (Tajima's D and Fu and Li) ruled out the null hypothesis of a strictly neutral model, and Fu's Fs and Ramos-Onsins and Rozas' R2 confirmed the recent expansion of cork oak trees, validating its persistency in North Africa since the last glaciation during the Quaternary. The observed uni-modal mismatch distribution and the Harpending's raggedness index confirmed the demographic history model for cork oak. A phylogenetic dendrogram showed that the distribution of Q. suber trees occurs independently of geographical origin, the relief of the population site, and the bioclimatic stages. The molecular history and cytoplasmic diversity suggest that in situ and ex situ conservation strategies can be recommended for preserving landscape value and facing predictable future climatic changes.

  18. Exploring the correlations between sequence evolution rate and ...

    Indian Academy of Sciences (India)

    2012-10-15

    Oct 15, 2012 ... The vast functional divergence within mammalian lineages that ... Keywords. Phylogenetics; molecular clock; sequence evolutionary rate; phenotypic evolution; morphology; genomics .... entire lineages during periods with ecosystem-level commu- ... increases from fish to amphibians to birds to mammals.

  19. The evolution of coronal activity in main sequence cool stars

    International Nuclear Information System (INIS)

    Stern, R.A.

    1984-01-01

    Stars spend most of their lifetime and show the least amount of nuclear evolution on the main sequence. However, the x-ray luminosities of cool star coronas change by orders of magnitude as a function of main sequence age. Such coronal evolution is discussed in relation to our knowledge of the solar corona, solar and stellar flares, stellar rotation and binarity. The relevance of X-ray observations to current speculations on stellar dynamos is also considered

  20. Mio-Pliocene to Pleistocene paleotopographic evolution of Brittany (France) from a sequence stratigraphic analysis: relative influence of tectonics and climate

    Science.gov (United States)

    Brault, N.; Bourquin, S.; Guillocheau, F.; Dabard, M.-P.; Bonnet, S.; Courville, P.; Estéoule-Choux, J.; Stepanoff, F.

    2004-01-01

    The Mio-Pliocene in Western Europe is a period of major climatic and tectonic change with important topographic consequences. The aim of this paper is to reconstruct these topographic changes (based on sedimentological analysis and sequence stratigraphy) for the Armorican Massif (western France) and to discuss their significance. The Mio-Pliocene sands of the Armorican Massif (Red Sands) are mainly preserved in paleovalleys and are characterized by extensive fluvial sheetflood deposits with low-preservation and by-pass facies. This sedimentological study shows that the Red Sands correspond to three main sedimentary environments: fluvial (alluvial fan, low-sinuosity rivers and braided rivers), estuarine and some rare open marine deposits (marine bioclastic sands: "faluns" of French authors). Two orders of sequences have been correlated across Brittany with one or two minor A/ S cycles comprised within the retrogradational trend of a major cycle. The unconformity at the base of the lower cycle is more marked than the unconformity observed at the top, which corresponds to a re-incision of the paleovalley network. A comparison of the results of the sequence stratigraphy analysis with eustatic variations and tectonic events during the Mio-Pliocene allows (1) to discuss their influence on the evolution of the Armorican Massif and (2) to compare the stratigraphic record with other west-European basins. The unconformity observed at the base of the first minor cycle may be attributed to Serravallian-Tortonian tectonic activity and/or eustatic fall, and the unconformity of the second minor cycle may be attributed to Late Tortonian-Early Messinian tectonic activity. The earlier unconformity is coeval with the development of a "smooth" paleovalley network compared to the jagged present-day relief. A single episode of Mio-Pliocene deformation recorded in Brittany may be dated as Zanclean, thus explaining the lack of the maximum flooding surface except in isolated areas. From

  1. Sequence analysis of two alleles reveals that intra-and intergenic recombination played a role in the evolution of the radish fertility restorer (Rfo

    Directory of Open Access Journals (Sweden)

    Budar Françoise

    2010-02-01

    Full Text Available Abstract Background Land plant genomes contain multiple members of a eukaryote-specific gene family encoding proteins with pentatricopeptide repeat (PPR motifs. Some PPR proteins were shown to participate in post-transcriptional events involved in organellar gene expression, and this type of function is now thought to be their main biological role. Among PPR genes, restorers of fertility (Rf of cytoplasmic male sterility systems constitute a peculiar subgroup that is thought to evolve in response to the presence of mitochondrial sterility-inducing genes. Rf genes encoding PPR proteins are associated with very close relatives on complex loci. Results We sequenced a non-restoring allele (L7rfo of the Rfo radish locus whose restoring allele (D81Rfo was previously described, and compared the two alleles and their PPR genes. We identified a ca 13 kb long fragment, likely originating from another part of the radish genome, inserted into the L7rfo sequence. The L7rfo allele carries two genes (PPR-1 and PPR-2 closely related to the three previously described PPR genes of the restorer D81Rfo allele (PPR-A, PPR-B, and PPR-C. Our results indicate that alleles of the Rfo locus have experienced complex evolutionary events, including recombination and insertion of extra-locus sequences, since they diverged. Our analyses strongly suggest that present coding sequences of Rfo PPR genes result from intragenic recombination. We found that the 10 C-terminal PPR repeats in Rfo PPR gene encoded proteins result from the tandem duplication of a 5 PPR repeat block. Conclusions The Rfo locus appears to experience more complex evolution than its flanking sequences. The Rfo locus and PPR genes therein are likely to evolve as a result of intergenic and intragenic recombination. It is therefore not possible to determine which genes on the two alleles are direct orthologs. Our observations recall some previously reported data on pathogen resistance complex loci.

  2. Post-main-sequence planetary system evolution

    Science.gov (United States)

    Veras, Dimitri

    2016-01-01

    The fates of planetary systems provide unassailable insights into their formation and represent rich cross-disciplinary dynamical laboratories. Mounting observations of post-main-sequence planetary systems necessitate a complementary level of theoretical scrutiny. Here, I review the diverse dynamical processes which affect planets, asteroids, comets and pebbles as their parent stars evolve into giant branch, white dwarf and neutron stars. This reference provides a foundation for the interpretation and modelling of currently known systems and upcoming discoveries. PMID:26998326

  3. Morphological character evolution of Amorphophallus (Araceae) based on a combined phylogenetic analysis of trnL, rbcL, and LEAFY second intron sequences

    NARCIS (Netherlands)

    Sedayu, A.; Eurlings, M.C.M.; Gravendeel, B.; Hetterscheid, W.L.A.

    2010-01-01

    Sequences of three different genes in 69 taxa of Amorphophallus were combined to reconstruct the molecular phylogeny of this species-rich Aroid genus. The data set was analyzed by three different methods, Maximum Parsimony, Maximum Likelihood and Bayesian analysis, producing slightly different tree

  4. Mitochondrial DNA sequence evolution in the Arctoidea.

    OpenAIRE

    Zhang, Y P; Ryder, O A

    1993-01-01

    Some taxa in the superfamily Arctoidea, such as the giant panda and the lesser panda, have presented puzzles to taxonomists. In the present study, approximately 397 bases of the cytochrome b gene, 364 bases of the 12S rRNA gene, and 74 bases of the tRNA(Thr) and tRNA(Pro) genes from the giant panda, lesser panda, kinkajou, raccoon, coatimundi, and all species of the Ursidae were sequenced. The high transition/transversion ratios in cytochrome b and RNA genes prior to saturation suggest that t...

  5. A Markov chain Monte Carlo Expectation Maximization Algorithm for Statistical Analysis of DNA Sequence Evolution with Neighbor-Dependent Substitution Rates

    DEFF Research Database (Denmark)

    Hobolth, Asger

    2008-01-01

    -dimensional integrals required in the EM algorithm are estimated using MCMC sampling. The MCMC sampler requires simulation of sample paths from a continuous time Markov process, conditional on the beginning and ending states and the paths of the neighboring sites. An exact path sampling algorithm is developed......The evolution of DNA sequences can be described by discrete state continuous time Markov processes on a phylogenetic tree. We consider neighbor-dependent evolutionary models where the instantaneous rate of substitution at a site depends on the states of the neighboring sites. Neighbor......-dependent substitution models are analytically intractable and must be analyzed using either approximate or simulation-based methods. We describe statistical inference of neighbor-dependent models using a Markov chain Monte Carlo expectation maximization (MCMC-EM) algorithm. In the MCMC-EM algorithm, the high...

  6. MODELING THE RED SEQUENCE: HIERARCHICAL GROWTH YET SLOW LUMINOSITY EVOLUTION

    International Nuclear Information System (INIS)

    Skelton, Rosalind E.; Bell, Eric F.; Somerville, Rachel S.

    2012-01-01

    We explore the effects of mergers on the evolution of massive early-type galaxies by modeling the evolution of their stellar populations in a hierarchical context. We investigate how a realistic red sequence population set up by z ∼ 1 evolves under different assumptions for the merger and star formation histories, comparing changes in color, luminosity, and mass. The purely passive fading of existing red sequence galaxies, with no further mergers or star formation, results in dramatic changes at the bright end of the luminosity function and color-magnitude relation. Without mergers there is too much evolution in luminosity at a fixed space density compared to observations. The change in color and magnitude at a fixed mass resembles that of a passively evolving population that formed relatively recently, at z ∼ 2. Mergers among the red sequence population ('dry mergers') occurring after z = 1 build up mass, counteracting the fading of the existing stellar populations to give smaller changes in both color and luminosity for massive galaxies. By allowing some galaxies to migrate from the blue cloud onto the red sequence after z = 1 through gas-rich mergers, younger stellar populations are added to the red sequence. This manifestation of the progenitor bias increases the scatter in age and results in even smaller changes in color and luminosity between z = 1 and z = 0 at a fixed mass. The resultant evolution appears much slower, resembling the passive evolution of a population that formed at high redshift (z ∼ 3-5), and is in closer agreement with observations. We conclude that measurements of the luminosity and color evolution alone are not sufficient to distinguish between the purely passive evolution of an old population and cosmologically motivated hierarchical growth, although these scenarios have very different implications for the mass growth of early-type galaxies over the last half of cosmic history.

  7. Experimental evolution, genetic analysis and genome re-sequencing reveal the mutation conferring artemisinin resistance in an isogenic lineage of malaria parasites

    KAUST Repository

    Hunt, Paul

    2010-09-16

    Background: Classical and quantitative linkage analyses of genetic crosses have traditionally been used to map genes of interest, such as those conferring chloroquine or quinine resistance in malaria parasites. Next-generation sequencing technologies now present the possibility of determining genome-wide genetic variation at single base-pair resolution. Here, we combine in vivo experimental evolution, a rapid genetic strategy and whole genome re-sequencing to identify the precise genetic basis of artemisinin resistance in a lineage of the rodent malaria parasite, Plasmodium chabaudi. Such genetic markers will further the investigation of resistance and its control in natural infections of the human malaria, P. falciparum.Results: A lineage of isogenic in vivo drug-selected mutant P. chabaudi parasites was investigated. By measuring the artemisinin responses of these clones, the appearance of an in vivo artemisinin resistance phenotype within the lineage was defined. The underlying genetic locus was mapped to a region of chromosome 2 by Linkage Group Selection in two different genetic crosses. Whole-genome deep coverage short-read re-sequencing (IlluminaSolexa) defined the point mutations, insertions, deletions and copy-number variations arising in the lineage. Eight point mutations arise within the mutant lineage, only one of which appears on chromosome 2. This missense mutation arises contemporaneously with artemisinin resistance and maps to a gene encoding a de-ubiquitinating enzyme.Conclusions: This integrated approach facilitates the rapid identification of mutations conferring selectable phenotypes, without prior knowledge of biological and molecular mechanisms. For malaria, this model can identify candidate genes before resistant parasites are commonly observed in natural human malaria populations. 2010 Hunt et al; licensee BioMed Central Ltd.

  8. Evolution of electricity markets: Does sequencing matter?

    International Nuclear Information System (INIS)

    Neuhoff, K.; Newbery, D.

    2005-01-01

    This paper addresses three questions that are relevant to integrating different regional transmission areas. Market integration normally increases the number of competitors and should therefore reduce prices but the first section shows that prices could rise when the number of generators initially increases. Regulatory effort will also be affected by market integration. If the number of generators in either market is low, then our analysis suggests that the outcome depends on whether the regulators act independently or coordinate. Finally, if markets are gradually combined into larger units, the choice of transmission allocation (coordinated auctions or market coupling) will affect the prospects of making further gains and hence could lead to incomplete reform. (author)

  9. Analysis of the overdispersed clock in the short-term evolution of hepatitis C virus: Using the E1/E2 gene sequences to infer infection dates in a single source outbreak.

    Science.gov (United States)

    Wróbel, Borys; Torres-Puente, Manuela; Jiménez, Nuria; Bracho, María Alma; García-Robles, Inmaculada; Moya, Andrés; González-Candelas, Fernando

    2006-06-01

    The assumption of a molecular clock for dating events from sequence information is often frustrated by the presence of heterogeneity among evolutionary rates due, among other factors, to positively selected sites. In this work, our goal is to explore methods to estimate infection dates from sequence analysis. One such method, based on site stripping for clock detection, was proposed to unravel the clocklike molecular evolution in sequences showing high variability of evolutionary rates and in the presence of positive selection. Other alternatives imply accommodating heterogeneity in evolutionary rates at various levels, without eliminating any information from the data. Here we present the analysis of a data set of hepatitis C virus (HCV) sequences from 24 patients infected by a single individual with known dates of infection. We first used a simple criterion of relative substitution rate for site removal prior to a regression analysis. Time was regressed on maximum likelihood pairwise evolutionary distances between the sequences sampled from the source individual and infected patients. We show that it is indeed the fastest evolving sites that disturb the molecular clock and that these sites correspond to positively selected codons. The high computational efficiency of the regression analysis allowed us to compare the site-stripping scheme with random removal of sites. We demonstrate that removing the fast-evolving sites significantly increases the accuracy of estimation of infection times based on a single substitution rate. However, the time-of-infection estimations improved substantially when a more sophisticated and computationally demanding Bayesian method was used. This method was used with the same data set but keeping all the sequence positions in the analysis. Consequently, despite the distortion introduced by positive selection on evolutionary rates, it is possible to obtain quite accurate estimates of infection dates, a result of especial relevance for

  10. Three aspects of stellar evolution near the main sequence

    International Nuclear Information System (INIS)

    Morgan, J.C.

    1979-05-01

    Three problems of stellar evolution are considered: the gap in the HR diagram of M67, the evolutionary status of RS CVn binaries and the solar neutrino problem. The physical basis of the Eggleton stellar evolution computer program is described. The program was used to calculate a grid of evolutionary tracks for models with masses between 0.7 and 1.29 solar masses. The more massive stars considered here have expanding convective cores during their main sequence evolution. The isochrone of the old galactic cluster M67 has a gap at the top of its main sequence because of the rapid evolution of stars at hydrogen exhaustion. RS CVn binaries present a complex collection of observational phenomena although they appear to be detached binaries. Their evolutionary status has remained controversial because of their high space density. Here it is shown that a post main sequence interpretation is satisfactory. Models of the Sun with metal poor interiors have been proposed in an attempt to resolve the solar neutrino problem. Here the evolution of two such models is calculated in detail, including a gradual contamination of the surface convection zone to produce the observed metal abundance, giving fully consistent models of the Sun as it is observed. (author)

  11. Whole genome sequence analysis of the first Australian OXA-48-producing outbreak-associated Klebsiella pneumoniae isolates: the resistome and in vivo evolution.

    Directory of Open Access Journals (Sweden)

    Björn A Espedido

    Full Text Available Whole genome sequencing was used to characterize the resistome of intensive care unit (ICU outbreak-associated carbapenem-resistant K. pneumoniae isolates. Importantly, and of particular concern, the carbapenem-hydrolyzing β-lactamase gene bla(OXA-48 and the extended-spectrum β-lactamase gene bla(CTX-M-14, were identified on a single broad host-range conjugative plasmid. This represents the first report of bla(OXA-48 in Australia and highlights the importance of resistance gene surveillance, as such plasmids can silently spread amongst enterobacterial populations and have the potential to drastically limit treatment options. Furthermore, the in vivo evolution of these isolates was also examined after 18 months of intra-abdominal carriage in a patient that transited through the ICU during the outbreak period. Reflecting the clonality of K. pneumoniae, only 11 single nucleotide polymorphisms (SNPs were accumulated during this time-period and many of these were associated with genes involved in tolerance/resistance to antibiotics, metals or organic solvents, and transcriptional regulation. Collectively, these SNPs are likely to be associated with changes in virulence (at least to some extent that have refined the in vivo colonization capacity of the original outbreak isolate.

  12. Whole Genome Sequence Analysis of the First Australian OXA-48-Producing Outbreak-Associated Klebsiella pneumoniae Isolates: The Resistome and In Vivo Evolution

    Science.gov (United States)

    Espedido, Björn A.; Steen, Jason A.; Ziochos, Helen; Grimmond, Sean M.; Cooper, Matthew A.; Gosbell, Iain B.; van Hal, Sebastiaan J.; Jensen, Slade O.

    2013-01-01

    Whole genome sequencing was used to characterize the resistome of intensive care unit (ICU) outbreak-associated carbapenem-resistant K. pneumoniae isolates. Importantly, and of particular concern, the carbapenem-hydrolyzing β-lactamase gene bla OXA-48 and the extended-spectrum β-lactamase gene bla CTX-M-14, were identified on a single broad host-range conjugative plasmid. This represents the first report of bla OXA-48 in Australia and highlights the importance of resistance gene surveillance, as such plasmids can silently spread amongst enterobacterial populations and have the potential to drastically limit treatment options. Furthermore, the in vivo evolution of these isolates was also examined after 18 months of intra-abdominal carriage in a patient that transited through the ICU during the outbreak period. Reflecting the clonality of K. pneumoniae, only 11 single nucleotide polymorphisms (SNPs) were accumulated during this time-period and many of these were associated with genes involved in tolerance/resistance to antibiotics, metals or organic solvents, and transcriptional regulation. Collectively, these SNPs are likely to be associated with changes in virulence (at least to some extent) that have refined the in vivo colonization capacity of the original outbreak isolate. PMID:23555831

  13. Direct, rapid RNA sequence analysis

    International Nuclear Information System (INIS)

    Peattie, D.A.

    1987-01-01

    The original methods of RNA sequence analysis were based on enzymatic production and chromatographic separation of overlapping oligonucleotide fragments from within an RNA molecule followed by identification of the mononucleotides comprising the oligomer. Over the past decade the field of nucleic acid sequencing has changed dramatically, however, and RNA molecules now can be sequenced in a variety of more streamlined fashions. Most of the more recent advances in RNA sequencing have involved one-dimensional electrophoretic separation of 32 P-end-labeled oligoribonucleotides on polyacrylamide gels. In this chapter the author discusses two of these methods for determining the nucleotide sequences of RNA molecules rapidly: the chemical method and the enzymatic method. Both methods are direct and degradative, i.e., they rely on fragmatic and chemical approaches should be utilized. The single-strand-specific ribonucleases (A, T 1 , T 2 , and S 1 ) provide an efficient means to locate double-helical regions rapidly, and the chemical reactions provide a means to determine the RNA sequence within these regions. In addition, the chemical reactions allow one to assign interactions to specific atoms and to distinguish secondary interactions from tertiary ones. If the RNA molecule is small enough to be sequenced directly by the enzymatic or chemical method, the probing reactions can be done easily at the same time as sequencing reactions

  14. Integrated sequence analysis. Final report

    International Nuclear Information System (INIS)

    Andersson, K.; Pyy, P.

    1998-02-01

    The NKS/RAK subprojet 3 'integrated sequence analysis' (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term 'methodology' denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  15. Study of canine parvovirus evolution: comparative analysis of full-length VP2 gene sequences from Argentina and international field strains.

    Science.gov (United States)

    Gallo Calderón, Marina; Wilda, Maximiliano; Boado, Lorena; Keller, Leticia; Malirat, Viviana; Iglesias, Marcela; Mattion, Nora; La Torre, Jose

    2012-02-01

    The continuous emergence of new strains of canine parvovirus (CPV), poorly protected by current vaccination, is a concern among breeders, veterinarians, and dog owners around the world. Therefore, the understanding of the genetic variation in emerging CPV strains is crucial for the design of disease control strategies, including vaccines. In this paper, we obtained the sequences of the full-length gene encoding for the main capsid protein (VP2) of 11 canine parvovirus type 2 (CPV-2) Argentine representative field strains, selected from a total of 75 positive samples studied in our laboratory in the last 9 years. A comparative sequence analysis was performed on 9 CPV-2c, one CPV-2a, and one CPV-2b Argentine strains with respect to international strains reported in the GenBank database. In agreement with previous reports, a high degree of identity was found among CPV-2c Argentine strains (99.6-100% and 99.7-100% at nucleotide and amino acid levels, respectively). However, the appearance of a new substitution in the 440 position (T440A) in four CPV-2c Argentine strains obtained after the year 2009 gives support to the variability observed for this position located within the VP2, three-fold spike. This is the first report on the genetic characterization of the full-length VP2 gene of emerging CPV strains in South America and shows that all the Argentine CPV-2c isolates cluster together with European and North American CPV-2c strains.

  16. Mutation of miRNA target sequences during human evolution

    DEFF Research Database (Denmark)

    Gardner, Paul P; Vinther, Jeppe

    2008-01-01

    It has long-been hypothesized that changes in non-protein-coding genes and the regulatory sequences controlling expression could undergo positive selection. Here we identify 402 putative microRNA (miRNA) target sequences that have been mutated specifically in the human lineage and show that genes...... containing such deletions are more highly expressed than their mouse orthologs. Our findings indicate that some miRNA target mutations are fixed by positive selection and might have been involved in the evolution of human-specific traits....

  17. Structural Approaches to Sequence Evolution Molecules, Networks, Populations

    CERN Document Server

    Bastolla, Ugo; Roman, H. Eduardo; Vendruscolo, Michele

    2007-01-01

    Structural requirements constrain the evolution of biological entities at all levels, from macromolecules to their networks, right up to populations of biological organisms. Classical models of molecular evolution, however, are focused at the level of the symbols - the biological sequence - rather than that of their resulting structure. Now recent advances in understanding the thermodynamics of macromolecules, the topological properties of gene networks, the organization and mutation capabilities of genomes, and the structure of populations make it possible to incorporate these key elements into a broader and deeply interdisciplinary view of molecular evolution. This book gives an account of such a new approach, through clear tutorial contributions by leading scientists specializing in the different fields involved.

  18. Biophysical and structural considerations for protein sequence evolution

    Directory of Open Access Journals (Sweden)

    Grahnen Johan A

    2011-12-01

    Full Text Available Abstract Background Protein sequence evolution is constrained by the biophysics of folding and function, causing interdependence between interacting sites in the sequence. However, current site-independent models of sequence evolutions do not take this into account. Recent attempts to integrate the influence of structure and biophysics into phylogenetic models via statistical/informational approaches have not resulted in expected improvements in model performance. This suggests that further innovations are needed for progress in this field. Results Here we develop a coarse-grained physics-based model of protein folding and binding function, and compare it to a popular informational model. We find that both models violate the assumption of the native sequence being close to a thermodynamic optimum, causing directional selection away from the native state. Sampling and simulation show that the physics-based model is more specific for fold-defining interactions that vary less among residue type. The informational model diffuses further in sequence space with fewer barriers and tends to provide less support for an invariant sites model, although amino acid substitutions are generally conservative. Both approaches produce sequences with natural features like dN/dS Conclusions Simple coarse-grained models of protein folding can describe some natural features of evolving proteins but are currently not accurate enough to use in evolutionary inference. This is partly due to improper packing of the hydrophobic core. We suggest possible improvements on the representation of structure, folding energy, and binding function, as regards both native and non-native conformations, and describe a large number of possible applications for such a model.

  19. Effects of mass loss on the evolution of massive stars. I. Main-sequence evolution

    International Nuclear Information System (INIS)

    Dearborn, D.S.P.; Blake, J.B.; Hainebach, K.L.; Schramm, D.N.

    1978-01-01

    The effect of mass loss on the evolution and surface composition of massive stars during main-sequence evolution are examined. While some details of the evolutionary track depend on the formula used for the mass loss, the results appear most sensitive to the total mass removed during the main-sequence lifetime. It was found that low mass-loss rates have very little effect on the evolution of a star; the track is slightly subluminous, but the lifetime is almost unaffected. High rates of mass loss lead to a hot, high-luminosity stellar model with a helium core surrounded by a hydrogen-deficient (Xapprox.0.1) envelope. The main-sequence lifetime is extended by a factor of 2--3. These models may be identified with Wolf-Rayet stars. Between these mass-loss extremes are intermediate models which appear as OBN stars on the main sequence. The mass-loss rates required for significant observable effects range from 8 x 10 -7 to 10 -5 M/sub sun/ yr -1 , depending on the initial stellar mass. It is found that observationally consistent mass-loss rates for stars with M> or =30 M/sub sun/ may be sufficiently high that these stars lose mass on a time scale more rapidly than their main-sequence core evolution time. This result implies that the helium cores resulting from the main-sequence evolution of these massive stars may all be very similar to that of a star of Mapprox.30 M/sub sun/ regardless of the zero-age mass

  20. Computational analysis of sequence selection mechanisms.

    Science.gov (United States)

    Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

    2004-04-01

    Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.

  1. Integrated sequence analysis. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Andersson, K.; Pyy, P

    1998-02-01

    The NKS/RAK subprojet 3 `integrated sequence analysis` (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term `methodology` denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  2. STATISTICAL ANALYSIS OF STELLAR EVOLUTION

    OpenAIRE

    van Dyk, DA; DeGennaro, S; Stein, N; Jefferys, WH; von Hippel, T

    2009-01-01

    Color-Magnitude Diagrams (CMDs) are plots that compare the magnitudes (luminosities) of stars in different wavelengths of light (colors). High nonlinear correlations among the mass, color, and surface temperature of newly formed stars induce a long narrow curved point cloud in a CMD known as the main sequence. Aging stars form new CMD groups of red giants and white dwarfs. The physical processes that govern this evolution can be described with mathematical models and explored using complex co...

  3. Comparative genome sequencing of drosophila pseudoobscura: Chromosomal, gene and cis-element evolution

    Energy Technology Data Exchange (ETDEWEB)

    Richards, Stephen; Liu, Yue; Bettencourt, Brian R.; Hradecky, Pavel; Letovsky, Stan; Nielsen, Rasmus; Thornton, Kevin; Todd, Melissa J.; Chen, Rui; Meisel, Richard P.; Couronne, Olivier; Hua, Sujun; Smith, Mark A.; Bussemaker, Harmen J.; van Batenburg, Marinus F.; Howells, Sally L.; Scherer, Steven E.; Sodergren, Erica; Matthews, Beverly B.; Crosby, Madeline A.; Schroeder, Andrew J.; Ortiz-Barrientos, Daniel; Rives, Catherine M.; Metzker, Michael L.; Muzny, Donna M.; Scott, Graham; Steffen, David; Wheeler, David A.; Worley, Kim C.; Havlak, Paul; Durbin, K. James; Egan, Amy; Gill, Rachel; Hume, Jennifer; Morgan, Margaret B.; Miner, George; Hamilton, Cerissa; Huang, Yanmei; Waldron, Lenee; Verduzco, Daniel; Blankenburg, Kerstin P.; Dubchak, Inna; Noor, Mohamed A.F.; Anderson, Wyatt; White, Kevin P.; Clark, Andrew G.; Schaeffer, Stephen W.; Gelbart, William; Weinstock, George M.; Gibbs, Richard A.

    2004-04-01

    The genome sequence of a second fruit fly, D. pseudoobscura, presents an opportunity for comparative analysis of a primary model organism D. melanogaster. The vast majority of Drosophila genes have remained on the same arm, but within each arm gene order has been extensively reshuffled leading to the identification of approximately 1300 syntenic blocks. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 35 My since divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome wide average consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than control sequences between the species but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a picture of repeat mediated chromosomal rearrangement, and high co-adaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila.

  4. A branch-heterogeneous model of protein evolution for efficient inference of ancestral sequences.

    Science.gov (United States)

    Groussin, M; Boussau, B; Gouy, M

    2013-07-01

    Most models of nucleotide or amino acid substitution used in phylogenetic studies assume that the evolutionary process has been homogeneous across lineages and that composition of nucleotides or amino acids has remained the same throughout the tree. These oversimplified assumptions are refuted by the observation that compositional variability characterizes extant biological sequences. Branch-heterogeneous models of protein evolution that account for compositional variability have been developed, but are not yet in common use because of the large number of parameters required, leading to high computational costs and potential overparameterization. Here, we present a new branch-nonhomogeneous and nonstationary model of protein evolution that captures more accurately the high complexity of sequence evolution. This model, henceforth called Correspondence and likelihood analysis (COaLA), makes use of a correspondence analysis to reduce the number of parameters to be optimized through maximum likelihood, focusing on most of the compositional variation observed in the data. The model was thoroughly tested on both simulated and biological data sets to show its high performance in terms of data fitting and CPU time. COaLA efficiently estimates ancestral amino acid frequencies and sequences, making it relevant for studies aiming at reconstructing and resurrecting ancestral amino acid sequences. Finally, we applied COaLA on a concatenate of universal amino acid sequences to confirm previous results obtained with a nonhomogeneous Bayesian model regarding the early pattern of adaptation to optimal growth temperature, supporting the mesophilic nature of the Last Universal Common Ancestor.

  5. Sequence diversity and evolution of antimicrobial peptides in invertebrates.

    Science.gov (United States)

    Tassanakajon, Anchalee; Somboonwiwat, Kunlaya; Amparyup, Piti

    2015-02-01

    Antimicrobial peptides (AMPs) are evolutionarily ancient molecules that act as the key components in the invertebrate innate immunity against invading pathogens. Several AMPs have been identified and characterized in invertebrates, and found to display considerable diversity in their amino acid sequence, structure and biological activity. AMP genes appear to have rapidly evolved, which might have arisen from the co-evolutionary arms race between host and pathogens, and enabled organisms to survive in different microbial environments. Here, the sequence diversity of invertebrate AMPs (defensins, cecropins, crustins and anti-lipopolysaccharide factors) are presented to provide a better understanding of the evolution pattern of these peptides that play a major role in host defense mechanisms. Copyright © 2014 Elsevier Ltd. All rights reserved.

  6. Establishing a framework for comparative analysis of genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  7. Experimental evolution, genetic analysis and genome re-sequencing reveal the mutation conferring artemisinin resistance in an isogenic lineage of malaria parasites

    KAUST Repository

    Hunt, Paul; Martinelli, Axel; Modrzynska, Katarzyna; Borges, Sofia; Creasey, Alison; Rodrigues, Louise; Beraldi, Dario; Loewe, Laurence; Fawcett, Richard; Kumar, Sujai; Thomson, Marian; Trivedi, Urmi; Otto, Thomas D; Pain, Arnab; Blaxter, Mark; Cravo, Pedro

    2010-01-01

    was mapped to a region of chromosome 2 by Linkage Group Selection in two different genetic crosses. Whole-genome deep coverage short-read re-sequencing (IlluminaSolexa) defined the point mutations, insertions, deletions and copy-number variations arising

  8. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; Van Der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah; Siame, Kabengele Keith; Gey Van Pittius, Nicolaas Claudius; Van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-01-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  9. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  10. Pre-main-sequence evolution of the sun

    International Nuclear Information System (INIS)

    Gough, D.

    1980-01-01

    The phase of solar evolution after the dynamical collapse is considered. The physics of the Kelvin-Helmholtz phase of gravitational collapse is described, attention being given to the early stages of the star when it was completely convective. It is noted that subsequently, a radiative core developed and evolution was controlled by the rate at which heat can diffuse through it by radiative transfer. Since the study of the Kelvin-Helmholtz contraction alone does not give enough information regarding the state of the sun when it first settled down to approximate hydrostatic equilibrium, other stars are studied, and information on the sun is obtained by analogy. Many young solar-type stars, such as the T Tauri stars, are not in the completely convective Hayashi (1961) phase hence it is proposed that the sun was completely mixed soon after its formation, which has some bearing on the sun's chemical structure. It is suggested that the surface of the sun was very nonuniform compared with the photosphere of today. The simple solar evolution model presented gives a good guide to the general way in which the sun contracted to the main sequence

  11. The evolution and utility of ribosomal ITS sequences in Bambusinae ...

    Indian Academy of Sciences (India)

    2012-08-09

    Aug 9, 2012 ... both direct sequencing and cloning of PCR products were used for ITS analysis, with only one clone ... woody bamboos belonging to this subtribe. The recent availability of molecular data enabled tax- ..... Gigantochloa species occur mainly in Malaysia, and they can be distinguished upon flowering by the ...

  12. Genome sequence diversity and clues to the evolution of variola (smallpox) virus.

    Science.gov (United States)

    Esposito, Joseph J; Sammons, Scott A; Frace, A Michael; Osborne, John D; Olsen-Rasmussen, Melissa; Zhang, Ming; Govil, Dhwani; Damon, Inger K; Kline, Richard; Laker, Miriam; Li, Yu; Smith, Geoffrey L; Meyer, Hermann; Leduc, James W; Wohlhueter, Robert M

    2006-08-11

    Comparative genomics of 45 epidemiologically varied variola virus isolates from the past 30 years of the smallpox era indicate low sequence diversity, suggesting that there is probably little difference in the isolates' functional gene content. Phylogenetic clustering inferred three clades coincident with their geographical origin and case-fatality rate; the latter implicated putative proteins that mediate viral virulence differences. Analysis of the viral linear DNA genome suggests that its evolution involved direct descent and DNA end-region recombination events. Knowing the sequences will help understand the viral proteome and improve diagnostic test precision, therapeutics, and systems for their assessment.

  13. Analysis of the complete DNA sequence of the temperate bacteriophage TP901-1: Evolution, structure, and genome organization of lactococcal bacteriophages

    DEFF Research Database (Denmark)

    Brøndsted, Lone; Østergaard, Solvej; Pedersen, Margit

    2001-01-01

    A complete analysis of the entire genome of the temperate lactococcal bacteriophage TP901-1 has been performed and the function of 21 of 56 TP901-1-encoded ORFs has been assigned. This knowledge has been used to propose 10 functional modules each responsible for specific functions during...

  14. Sequence Quality Analysis Tool for HIV Type 1 Protease and Reverse Transcriptase

    OpenAIRE

    DeLong, Allison K.; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W.; Kantor, Rami

    2012-01-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802...

  15. Accelerated Evolution of Conserved Noncoding Sequences in theHuman Genome

    Energy Technology Data Exchange (ETDEWEB)

    Prambhakar, Shyam; Noonan, James P.; Paabo, Svante; Rubin, EdwardM.

    2006-07-06

    Genomic comparisons between human and distant, non-primatemammals are commonly used to identify cis-regulatory elements based onconstrained sequence evolution. However, these methods fail to detect"cryptic" functional elements, which are too weakly conserved amongmammals to distinguish from nonfunctional DNA. To address this problem,we explored the potential of deep intra-primate sequence comparisons. Wesequenced the orthologs of 558 kb of human genomic sequence, coveringmultiple loci involved in cholesterol homeostasis, in 6 nonhumanprimates. Our analysis identified 6 noncoding DNA elements displayingsignificant conservation among primates, but undetectable in more distantcomparisons. In vitro and in vivo tests revealed that at least three ofthese 6 elements have regulatory function. Notably, the mouse orthologsof these three functional human sequences had regulatory activity despitetheir lack of significant sequence conservation, indicating that they arecryptic ancestral cis-regulatory elements. These regulatory elementscould still be detected in a smaller set of three primate speciesincluding human, rhesus and marmoset. Since the human and rhesus genomesequences are already available, and the marmoset genome is activelybeing sequenced, the primate-specific conservation analysis describedhere can be applied in the near future on a whole-genome scale, tocomplement the annotation provided by more distant speciescomparisons.

  16. On Statistical Modeling of Sequencing Noise in High Depth Data to Assess Tumor Evolution

    Science.gov (United States)

    Rabadan, Raul; Bhanot, Gyan; Marsilio, Sonia; Chiorazzi, Nicholas; Pasqualucci, Laura; Khiabanian, Hossein

    2017-12-01

    One cause of cancer mortality is tumor evolution to therapy-resistant disease. First line therapy often targets the dominant clone, and drug resistance can emerge from preexisting clones that gain fitness through therapy-induced natural selection. Such mutations may be identified using targeted sequencing assays by analysis of noise in high-depth data. Here, we develop a comprehensive, unbiased model for sequencing error background. We find that noise in sufficiently deep DNA sequencing data can be approximated by aggregating negative binomial distributions. Mutations with frequencies above noise may have prognostic value. We evaluate our model with simulated exponentially expanded populations as well as data from cell line and patient sample dilution experiments, demonstrating its utility in prognosticating tumor progression. Our results may have the potential to identify significant mutations that can cause recurrence. These results are relevant in the pretreatment clinical setting to determine appropriate therapy and prepare for potential recurrence pretreatment.

  17. Evolution of sequence-defined highly functionalized nucleic acid polymers

    Science.gov (United States)

    Chen, Zhen; Lichtor, Phillip A.; Berliner, Adrian P.; Chen, Jonathan C.; Liu, David R.

    2018-03-01

    The evolution of sequence-defined synthetic polymers made of building blocks beyond those compatible with polymerase enzymes or the ribosome has the potential to generate new classes of receptors, catalysts and materials. Here we describe a ligase-mediated DNA-templated polymerization and in vitro selection system to evolve highly functionalized nucleic acid polymers (HFNAPs) made from 32 building blocks that contain eight chemically diverse side chains on a DNA backbone. Through iterated cycles of polymer translation, selection and reverse translation, we discovered HFNAPs that bind proprotein convertase subtilisin/kexin type 9 (PCSK9) and interleukin-6, two protein targets implicated in human diseases. Mutation and reselection of an active PCSK9-binding polymer yielded evolved polymers with high affinity (KD = 3 nM). This evolved polymer potently inhibited the binding between PCSK9 and the low-density lipoprotein receptor. Structure-activity relationship studies revealed that specific side chains at defined positions in the polymers are required for binding to their respective targets. Our findings expand the chemical space of evolvable polymers to include densely functionalized nucleic acids with diverse, researcher-defined chemical repertoires.

  18. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  19. Genome Sequencing and Analysis Conference IV

    Energy Technology Data Exchange (ETDEWEB)

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  20. Universal sequence replication, reversible polymerization and early functional biopolymers: a model for the initiation of prebiotic sequence evolution.

    Directory of Open Access Journals (Sweden)

    Sara Imari Walker

    Full Text Available Many models for the origin of life have focused on understanding how evolution can drive the refinement of a preexisting enzyme, such as the evolution of efficient replicase activity. Here we present a model for what was, arguably, an even earlier stage of chemical evolution, when polymer sequence diversity was generated and sustained before, and during, the onset of functional selection. The model includes regular environmental cycles (e.g. hydration-dehydration cycles that drive polymers between times of replication and functional activity, which coincide with times of different monomer and polymer diffusivity. Template-directed replication of informational polymers, which takes place during the dehydration stage of each cycle, is considered to be sequence-independent. New sequences are generated by spontaneous polymer formation, and all sequences compete for a finite monomer resource that is recycled via reversible polymerization. Kinetic Monte Carlo simulations demonstrate that this proposed prebiotic scenario provides a robust mechanism for the exploration of sequence space. Introduction of a polymer sequence with monomer synthetase activity illustrates that functional sequences can become established in a preexisting pool of otherwise non-functional sequences. Functional selection does not dominate system dynamics and sequence diversity remains high, permitting the emergence and spread of more than one functional sequence. It is also observed that polymers spontaneously form clusters in simulations where polymers diffuse more slowly than monomers, a feature that is reminiscent of a previous proposal that the earliest stages of life could have been defined by the collective evolution of a system-wide cooperation of polymer aggregates. Overall, the results presented demonstrate the merits of considering plausible prebiotic polymer chemistries and environments that would have allowed for the rapid turnover of monomer resources and for

  1. Tracing early stellar evolution with asteroseismology: pre-main sequence stars in NGC 2264

    Directory of Open Access Journals (Sweden)

    Zwintz Konstanze

    2015-01-01

    Full Text Available Asteroseismology has been proven to be a successful tool to unravel details of the internal structure for different types of stars in various stages of their main sequence and post-main sequence evolution. Recently, we found a relation between the detected pulsation properties in a sample of 34 pre-main sequence (pre-MS δ Scuti stars and the relative phase in their pre-MS evolution. With this we are able to demonstrate that asteroseismology is similarly powerful if applied to stars in the earliest stages of evolution before the onset of hydrogen core burning.

  2. Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution

    DEFF Research Database (Denmark)

    Richards, Stephen; Liu, Yue; Bettencourt, Brian R.

    2005-01-01

    years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences......We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each...... between the species-but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence...

  3. Robustness analysis of chiller sequencing control

    International Nuclear Information System (INIS)

    Liao, Yundan; Sun, Yongjun; Huang, Gongsheng

    2015-01-01

    Highlights: • Uncertainties with chiller sequencing control were systematically quantified. • Robustness of chiller sequencing control was systematically analyzed. • Different sequencing control strategies were sensitive to different uncertainties. • A numerical method was developed for easy selection of chiller sequencing control. - Abstract: Multiple-chiller plant is commonly employed in the heating, ventilating and air-conditioning system to increase operational feasibility and energy-efficiency under part load condition. In a multiple-chiller plant, chiller sequencing control plays a key role in achieving overall energy efficiency while not sacrifices the cooling sufficiency for indoor thermal comfort. Various sequencing control strategies have been developed and implemented in practice. Based on the observation that (i) uncertainty, which cannot be avoided in chiller sequencing control, has a significant impact on the control performance and may cause the control fail to achieve the expected control and/or energy performance; and (ii) in current literature few studies have systematically addressed this issue, this paper therefore presents a study on robustness analysis of chiller sequencing control in order to understand the robustness of various chiller sequencing control strategies under different types of uncertainty. Based on the robustness analysis, a simple and applicable method is developed to select the most robust control strategy for a given chiller plant in the presence of uncertainties, which will be verified using case studies

  4. Cyclotide Evolution: Insights from the Analyses of Their Precursor Sequences, Structures and Distribution in Violets (Viola

    Directory of Open Access Journals (Sweden)

    Sungkyu Park

    2017-12-01

    Full Text Available Cyclotides are a family of plant proteins that are characterized by a cyclic backbone and a knotted disulfide topology. Their cyclic cystine knot (CCK motif makes them exceptionally resistant to thermal, chemical, and enzymatic degradation. By disrupting cell membranes, the cyclotides function as host defense peptides by exhibiting insecticidal, anthelmintic, antifouling, and molluscicidal activities. In this work, we provide the first insight into the evolution of this family of plant proteins by studying the Violaceae, in particular species of the genus Viola. We discovered 157 novel precursor sequences by the transcriptomic analysis of six Viola species: V. albida var. takahashii, V. mandshurica, V. orientalis, V. verecunda, V. acuminata, and V. canadensis. By combining these precursor sequences with the phylogenetic classification of Viola, we infer the distribution of cyclotides across 63% of the species in the genus (i.e., ~380 species. Using full precursor sequences from transcriptomes, we show an evolutionary link to the structural diversity of the cyclotides, and further classify the cyclotides by sequence signatures from the non-cyclotide domain. Also, transcriptomes were compared to cyclotide expression on a peptide level determined using liquid chromatography-mass spectrometry. Furthermore, the novel cyclotides discovered were associated with the emergence of new biological functions.

  5. PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways

    OpenAIRE

    Mi, Huaiyu; Guo, Nan; Kejariwal, Anish; Thomas, Paul D.

    2006-01-01

    PANTHER is a freely available, comprehensive software system for relating protein sequence evolution to the evolution of specific protein functions and biological roles. Since 2005, there have been three main improvements to PANTHER. First, the sequences used to create evolutionary trees are carefully selected to provide coverage of phylogenetic as well as functional information. Second, PANTHER is now a member of the InterPro Consortium, and the PANTHER hidden markov Models (HMMs) are distri...

  6. Probabilistic accident sequence recovery analysis

    International Nuclear Information System (INIS)

    Stutzke, Martin A.; Cooper, Susan E.

    2004-01-01

    Recovery analysis is a method that considers alternative strategies for preventing accidents in nuclear power plants during probabilistic risk assessment (PRA). Consideration of possible recovery actions in PRAs has been controversial, and there seems to be a widely held belief among PRA practitioners, utility staff, plant operators, and regulators that the results of recovery analysis should be skeptically viewed. This paper provides a framework for discussing recovery strategies, thus lending credibility to the process and enhancing regulatory acceptance of PRA results and conclusions. (author)

  7. Evolution of source term definition and analysis

    International Nuclear Information System (INIS)

    Lutz, R.J. Jr.

    2004-01-01

    The objective of this presentation was to provide an overview of the evolution of accident fission product release analysis methodology and the obtained results; and to provide an overview of the source term implementation analysis in regulatory decisions

  8. Comparative Genome Analysis and Genome Evolution

    NARCIS (Netherlands)

    Snel, Berend

    2002-01-01

    This thesis described a collection of bioinformatic analyses on complete genome sequence data. We have studied the evolution of gene content and find that vertical inheritance dominates over horizontal gene trasnfer, even to the extent that we can use the gene content to make genome phylogenies.

  9. Lithium evolution in metal-poor stars: from Pre-Main Sequence to the Spite plateau

    OpenAIRE

    Fu, Xiaoting; Bressan, Alessandro; Molaro, Paolo; Marigo, Paola

    2015-01-01

    Lithium abundance derived in metal-poor main sequence stars is about three times lower than the value of primordial Li predicted by the standard Big Bang nucleosynthesis when the baryon density is taken from the CMB or the deuterium measurements. This disagreement is generally referred as the lithium problem. We here reconsider the stellar Li evolution from the pre-main sequence to the end of the main sequence phase by introducing the effects of convective overshooting and residual mass accre...

  10. ACCELERATED EVOLUTION OF LAND SNAILS MANDARINA IN THE OCEANIC BONIN ISLANDS: EVIDENCE FROM MITOCHONDRIAL DNA SEQUENCES.

    Science.gov (United States)

    Chiba, Satoshi

    1999-04-01

    An endemic land snail genus Mandarina of the oceanic Bonin (Ogasawara) Islands shows exceptionally rapid evolution not only of morphological and ecological traits, but of DNA sequence. A phylogenetic relationship based on mitochondrial DNA (mtDNA) sequences suggests that morphological differences equivalent to the differences between families were produced between Mandarina and its ancestor during the Pleistocene. The inferred phylogeny shows that species with similar morphologies and life habitats appeared repeatedly and independently in different lineages and islands at different times. Sequential adaptive radiations occurred in different islands of the Bonin Islands and species occupying arboreal, semiarboreal, and terrestrial habitat arose independently in each island. Because of a close relationship between shell morphology and life habitat, independent evolution of the same life habitat in different islands created species possesing the same shell morphology in different islands and lineages. This rapid evolution produced some incongruences between phylogenetic relationship and species taxonomy. Levels of sequence divergence of mtDNA among the species of Mandarina is extremely high. The maximum level of sequence divergence at 16S and 12S ribosomal RNA sequence within Mandarina are 18.7% and 17.7%, respectively, and this suggests that evolution of mtDNA of Mandarina is extremely rapid, more than 20 times faster than the standard rate in other animals. The present examination reveals that evolution of morphological and ecological traits occurs at extremely high rates in the time of adaptive radiation, especially in fragmented environments. © 1999 The Society for the Study of Evolution.

  11. Differential evolution optimization combined with chaotic sequences for image contrast enhancement

    Energy Technology Data Exchange (ETDEWEB)

    Santos Coelho, Leandro dos [Industrial and Systems Engineering Graduate Program, LAS/PPGEPS, Pontifical Catholic University of Parana, PUCPR Imaculada Conceicao, 1155, 80215-901 Curitiba, Parana (Brazil)], E-mail: leandro.coelho@pucpr.br; Sauer, Joao Guilherme [Industrial and Systems Engineering Graduate Program, LAS/PPGEPS, Pontifical Catholic University of Parana, PUCPR Imaculada Conceicao, 1155, 80215-901 Curitiba, Parana (Brazil)], E-mail: joao.sauer@gmail.com; Rudek, Marcelo [Industrial and Systems Engineering Graduate Program, LAS/PPGEPS, Pontifical Catholic University of Parana, PUCPR Imaculada Conceicao, 1155, 80215-901 Curitiba, Parana (Brazil)], E-mail: marcelo.rudek@pucpr.br

    2009-10-15

    Evolutionary Algorithms (EAs) are stochastic and robust meta-heuristics of evolutionary computation field useful to solve optimization problems in image processing applications. Recently, as special mechanism to avoid being trapped in local minimum, the ergodicity property of chaotic sequences has been used in various designs of EAs. Three differential evolution approaches based on chaotic sequences using logistic equation for image enhancement process are proposed in this paper. Differential evolution is a simple yet powerful evolutionary optimization algorithm that has been successfully used in solving continuous problems. The proposed chaotic differential evolution schemes have fast convergence rate but also maintain the diversity of the population so as to escape from local optima. In this paper, the image contrast enhancement is approached as a constrained nonlinear optimization problem. The objective of the proposed chaotic differential evolution schemes is to maximize the fitness criterion in order to enhance the contrast and detail in the image by adapting the parameters using a contrast enhancement technique. The proposed chaotic differential evolution schemes are compared with classical differential evolution to two testing images. Simulation results on three images show that the application of chaotic sequences instead of random sequences is a possible strategy to improve the performance of classical differential evolution optimization algorithm.

  12. Rapid evolution of the sequences and gene repertoires of secreted proteins in bacteria.

    Directory of Open Access Journals (Sweden)

    Teresa Nogueira

    Full Text Available Proteins secreted to the extracellular environment or to the periphery of the cell envelope, the secretome, play essential roles in foraging, antagonistic and mutualistic interactions. We hypothesize that arms races, genetic conflicts and varying selective pressures should lead to the rapid change of sequences and gene repertoires of the secretome. The analysis of 42 bacterial pan-genomes shows that secreted, and especially extracellular proteins, are predominantly encoded in the accessory genome, i.e. among genes not ubiquitous within the clade. Genes encoding outer membrane proteins might engage more frequently in intra-chromosomal gene conversion because they are more often in multi-genic families. The gene sequences encoding the secretome evolve faster than the rest of the genome and in particular at non-synonymous positions. Cell wall proteins in Firmicutes evolve particularly fast when compared with outer membrane proteins of Proteobacteria. Virulence factors are over-represented in the secretome, notably in outer membrane proteins, but cell localization explains more of the variance in substitution rates and gene repertoires than sequence homology to known virulence factors. Accordingly, the repertoires and sequences of the genes encoding the secretome change fast in the clades of obligatory and facultative pathogens and also in the clades of mutualists and free-living bacteria. Our study shows that cell localization shapes genome evolution. In agreement with our hypothesis, the repertoires and the sequences of genes encoding secreted proteins evolve fast. The particularly rapid change of extracellular proteins suggests that these public goods are key players in bacterial adaptation.

  13. Sequence analysis by iterated maps, a review.

    Science.gov (United States)

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  14. Biased Gene Conversion and GC-Content Evolution in the Coding Sequences of Reptiles and Vertebrates

    Science.gov (United States)

    Figuet, Emeric; Ballenghien, Marion; Romiguier, Jonathan; Galtier, Nicolas

    2015-01-01

    Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins. PMID:25527834

  15. Preliminary hazard analysis using sequence tree method

    International Nuclear Information System (INIS)

    Huang Huiwen; Shih Chunkuan; Hung Hungchih; Chen Minghuei; Yih Swu; Lin Jiinming

    2007-01-01

    A system level PHA using sequence tree method was developed to perform Safety Related digital I and C system SSA. The conventional PHA is a brainstorming session among experts on various portions of the system to identify hazards through discussions. However, this conventional PHA is not a systematic technique, the analysis results strongly depend on the experts' subjective opinions. The analysis quality cannot be appropriately controlled. Thereby, this research developed a system level sequence tree based PHA, which can clarify the relationship among the major digital I and C systems. Two major phases are included in this sequence tree based technique. The first phase uses a table to analyze each event in SAR Chapter 15 for a specific safety related I and C system, such as RPS. The second phase uses sequence tree to recognize what I and C systems are involved in the event, how the safety related systems work, and how the backup systems can be activated to mitigate the consequence if the primary safety systems fail. In the sequence tree, the defense-in-depth echelons, including Control echelon, Reactor trip echelon, ESFAS echelon, and Indication and display echelon, are arranged to construct the sequence tree structure. All the related I and C systems, include digital system and the analog back-up systems are allocated in their specific echelon. By this system centric sequence tree based analysis, not only preliminary hazard can be identified systematically, the vulnerability of the nuclear power plant can also be recognized. Therefore, an effective simplified D3 evaluation can be performed as well. (author)

  16. Isolation of Hox cluster genes from insects reveals an accelerated sequence evolution rate.

    Directory of Open Access Journals (Sweden)

    Heike Hadrys

    Full Text Available Among gene families it is the Hox genes and among metazoan animals it is the insects (Hexapoda that have attracted particular attention for studying the evolution of development. Surprisingly though, no Hox genes have been isolated from 26 out of 35 insect orders yet, and the existing sequences derive mainly from only two orders (61% from Hymenoptera and 22% from Diptera. We have designed insect specific primers and isolated 37 new partial homeobox sequences of Hox cluster genes (lab, pb, Hox3, ftz, Antp, Scr, abd-a, Abd-B, Dfd, and Ubx from six insect orders, which are crucial to insect phylogenetics. These new gene sequences provide a first step towards comparative Hox gene studies in insects. Furthermore, comparative distance analyses of homeobox sequences reveal a correlation between gene divergence rate and species radiation success with insects showing the highest rate of homeobox sequence evolution.

  17. The Evolution of Bony Vertebrate Enhancers at Odds with Their Coding Sequence Landscape.

    Science.gov (United States)

    Yousaf, Aisha; Sohail Raza, Muhammad; Ali Abbasi, Amir

    2015-08-06

    Enhancers lie at the heart of transcriptional and developmental gene regulation. Therefore, changes in enhancer sequences usually disrupt the target gene expression and result in disease phenotypes. Despite the well-established role of enhancers in development and disease, evolutionary sequence studies are lacking. The current study attempts to unravel the puzzle of bony vertebrates' conserved noncoding elements (CNE) enhancer evolution. Bayesian phylogenetics of enhancer sequences spotlights promising interordinal relationships among placental mammals, proposing a closer relationship between humans and laurasiatherians while placing rodents at the basal position. Clock-based estimates of enhancer evolution provided a dynamic picture of interspecific rate changes across the bony vertebrate lineage. Moreover, coelacanth in the study augmented our appreciation of the vertebrate cis-regulatory evolution during water-land transition. Intriguingly, we observed a pronounced upsurge in enhancer evolution in land-dwelling vertebrates. These novel findings triggered us to further investigate the evolutionary trend of coding as well as CNE nonenhancer repertoires, to highlight the relative evolutionary dynamics of diverse genomic landscapes. Surprisingly, the evolutionary rates of enhancer sequences were clearly at odds with those of the coding and the CNE nonenhancer sequences during vertebrate adaptation to land, with land vertebrates exhibiting significantly reduced rates of coding sequence evolution in comparison to their fast evolving regulatory landscape. The observed variation in tetrapod cis-regulatory elements caused the fine-tuning of associated gene regulatory networks. Therefore, the increased evolutionary rate of tetrapods' enhancer sequences might be responsible for the variation in developmental regulatory circuits during the process of vertebrate adaptation to land. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for

  18. PhyloSim - Monte Carlo simulation of sequence evolution in the R statistical computing environment

    Directory of Open Access Journals (Sweden)

    Massingham Tim

    2011-04-01

    Full Text Available Abstract Background The Monte Carlo simulation of sequence evolution is routinely used to assess the performance of phylogenetic inference methods and sequence alignment algorithms. Progress in the field of molecular evolution fuels the need for more realistic and hence more complex simulations, adapted to particular situations, yet current software makes unreasonable assumptions such as homogeneous substitution dynamics or a uniform distribution of indels across the simulated sequences. This calls for an extensible simulation framework written in a high-level functional language, offering new functionality and making it easy to incorporate further complexity. Results PhyloSim is an extensible framework for the Monte Carlo simulation of sequence evolution, written in R, using the Gillespie algorithm to integrate the actions of many concurrent processes such as substitutions, insertions and deletions. Uniquely among sequence simulation tools, PhyloSim can simulate arbitrarily complex patterns of rate variation and multiple indel processes, and allows for the incorporation of selective constraints on indel events. User-defined complex patterns of mutation and selection can be easily integrated into simulations, allowing PhyloSim to be adapted to specific needs. Conclusions Close integration with R and the wide range of features implemented offer unmatched flexibility, making it possible to simulate sequence evolution under a wide range of realistic settings. We believe that PhyloSim will be useful to future studies involving simulated alignments.

  19. Digital image sequence processing, compression, and analysis

    CERN Document Server

    Reed, Todd R

    2004-01-01

    IntroductionTodd R. ReedCONTENT-BASED IMAGE SEQUENCE REPRESENTATIONPedro M. Q. Aguiar, Radu S. Jasinschi, José M. F. Moura, andCharnchai PluempitiwiriyawejTHE COMPUTATION OF MOTIONChristoph Stiller, Sören Kammel, Jan Horn, and Thao DangMOTION ANALYSIS AND DISPLACEMENT ESTIMATION IN THE FREQUENCY DOMAINLuca Lucchese and Guido Maria CortelazzoQUALITY OF SERVICE ASSESSMENT IN NEW GENERATION WIRELESS VIDEO COMMUNICATIONSGaetano GiuntaERROR CONCEALMENT IN DIGITAL VIDEOFrancesco G.B. De NataleIMAGE SEQUENCE RESTORATION: A WIDER PERSPECTIVEAnil KokaramVIDEO SUMMARIZATIONCuneyt M. Taskiran and Edward

  20. Mathematical Analysis of Evolution, Information, and Complexity

    CERN Document Server

    Arendt, Wolfgang

    2009-01-01

    Mathematical Analysis of Evolution, Information, and Complexity deals with the analysis of evolution, information and complexity. The time evolution of systems or processes is a central question in science, this text covers a broad range of problems including diffusion processes, neuronal networks, quantum theory and cosmology. Bringing together a wide collection of research in mathematics, information theory, physics and other scientific and technical areas, this new title offers elementary and thus easily accessible introductions to the various fields of research addressed in the book.

  1. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Directory of Open Access Journals (Sweden)

    Can Alkan

    2007-09-01

    Full Text Available The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  2. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Science.gov (United States)

    Alkan, Can; Ventura, Mario; Archidiacono, Nicoletta; Rocchi, Mariano; Sahinalp, S Cenk; Eichler, Evan E

    2007-09-01

    The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  3. Network analysis of metabolic enzyme evolution in Escherichia coli

    Directory of Open Access Journals (Sweden)

    Kraulis Per

    2004-02-01

    Full Text Available Abstract Background The two most common models for the evolution of metabolism are the patchwork evolution model, where enzymes are thought to diverge from broad to narrow substrate specificity, and the retrograde evolution model, according to which enzymes evolve in response to substrate depletion. Analysis of the distribution of homologous enzyme pairs in the metabolic network can shed light on the respective importance of the two models. We here investigate the evolution of the metabolism in E. coli viewed as a single network using EcoCyc. Results Sequence comparison between all enzyme pairs was performed and the minimal path length (MPL between all enzyme pairs was determined. We find a strong over-representation of homologous enzymes at MPL 1. We show that the functionally similar and functionally undetermined enzyme pairs are responsible for most of the over-representation of homologous enzyme pairs at MPL 1. Conclusions The retrograde evolution model predicts that homologous enzymes pairs are at short metabolic distances from each other. In general agreement with previous studies we find that homologous enzymes occur close to each other in the network more often than expected by chance, which lends some support to the retrograde evolution model. However, we show that the homologous enzyme pairs which may have evolved through retrograde evolution, namely the pairs that are functionally dissimilar, show a weaker over-representation at MPL 1 than the functionally similar enzyme pairs. Our study indicates that, while the retrograde evolution model may have played a small part, the patchwork evolution model is the predominant process of metabolic enzyme evolution.

  4. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    Phylogenetic analysis suggests that our sequences are clustered with sequences reported from Japan. This is the first phylogenetic analysis of HCV core gene from Pakistani population. Our sequences and sequences from Japan are grouped into same cluster in the phylogenetic tree. Sequence comparison and ...

  5. Development of a Prognostic Marker for Lung Cancer Using Analysis of Tumor Evolution

    Science.gov (United States)

    2017-08-01

    AWARD NUMBER: W81XWH-15-1-0243 TITLE: Development of a Prognostic Marker for Lung Cancer Using Analysis of Tumor Evolution PRINCIPAL...SUBTITLE 5a. CONTRACT NUMBER Development of a Prognostic Marker for Lung Cancer Using Analysis of Tumor Evolution 5b. GRANT NUMBER 5c. PROGRAM...derive a prognostic classifier. 15. SUBJECT TERMS NSCLC; tumor evolution ; whole exome sequencing 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF

  6. [Complete genome sequencing and sequence analysis of BCG Tice].

    Science.gov (United States)

    Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

    2012-10-04

    The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.

  7. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  8. The evolution and population structure of Lactobacillus fermentum from different naturally fermented products as determined by multilocus sequence typing (MLST).

    Science.gov (United States)

    Dan, Tong; Liu, Wenjun; Song, Yuqin; Xu, Haiyan; Menghe, Bilige; Zhang, Heping; Sun, Zhihong

    2015-05-20

    Lactobacillus fermentum is economically important in the production and preservation of fermented foods. A repeatable and discriminative typing method was devised to characterize L. fermentum at the molecular level. The multilocus sequence typing (MLST) scheme developed was based on analysis of the internal sequence of 11 housekeeping gene fragments (clpX, dnaA, dnaK, groEL, murC, murE, pepX, pyrG, recA, rpoB, and uvrC). MLST analysis of 203 isolates of L. fermentum from Mongolia and seven provinces/ autonomous regions in China identified 57 sequence types (ST), 27 of which were represented by only a single isolate, indicating high genetic diversity. Phylogenetic analyses based on the sequence of the 11 housekeeping gene fragments indicated that the L. fermentum isolates analyzed belonged to two major groups. A standardized index of association (I A (S)) indicated a weak clonal population structure in L. fermentum. Split decomposition analysis indicated that recombination played an important role in generating the genetic diversity observed in L. fermentum. The results from the minimum spanning tree strongly suggested that evolution of L. fermentum STs was not correlated with geography or food-type. The MLST scheme developed will be valuable for further studies on the evolution and population structure of L. fermentum isolates used in food products.

  9. Evolution of Sphingomonad Gene Clusters Related to Pesticide Catabolism Revealed by Genome Sequence and Mobilomics of Sphingobium herbicidovorans MH.

    Science.gov (United States)

    Nielsen, Tue Kjærgaard; Rasmussen, Morten; Demanèche, Sandrine; Cecillon, Sébastien; Vogel, Timothy M; Hansen, Lars Hestbjerg

    2017-09-01

    Bacterial degraders of chlorophenoxy herbicides have been isolated from various ecosystems, including pristine environments. Among these degraders, the sphingomonads constitute a prominent group that displays versatile xenobiotic-degradation capabilities. Four separate sequencing strategies were required to provide the complete sequence of the complex and plastic genome of the canonical chlorophenoxy herbicide-degrading Sphingobium herbicidovorans MH. The genome has an intricate organization of the chlorophenoxy-herbicide catabolic genes sdpA, rdpA, and cadABCD that encode the (R)- and (S)-enantiomer-specific 2,4-dichlorophenoxypropionate dioxygenases and four subunits of a Rieske non-heme iron oxygenase involved in 2-methyl-chlorophenoxyacetic acid degradation, respectively. Several major genomic rearrangements are proposed to help understand the evolution and mobility of these important genes and their genetic context. Single-strain mobilomic sequence analysis uncovered plasmids and insertion sequence-associated circular intermediates in this environmentally important bacterium and enabled the description of evolutionary models for pesticide degradation in strain MH and related organisms. The mobilome presented a complex mosaic of mobile genetic elements including four plasmids and several circular intermediate DNA molecules of insertion-sequence elements and transposons that are central to the evolution of xenobiotics degradation. Furthermore, two individual chromosomally integrated prophages were shown to excise and form free circular DNA molecules. This approach holds great potential for improving the understanding of genome plasticity, evolution, and microbial ecology. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  10. Sequence Matching Analysis for Curriculum Development

    Directory of Open Access Journals (Sweden)

    Liem Yenny Bendatu

    2015-06-01

    Full Text Available Many organizations apply information technologies to support their business processes. Using the information technologies, the actual events are recorded and utilized to conform with predefined model. Conformance checking is an approach to measure the fitness and appropriateness between process model and actual events. However, when there are multiple events with the same timestamp, the traditional approach unfit to result such measures. This study attempts to develop a sequence matching analysis. Considering conformance checking as the basis of this approach, this proposed approach utilizes the current control flow technique in process mining domain. A case study in the field of educational process has been conducted. This study also proposes a curriculum analysis framework to test the proposed approach. By considering the learning sequence of students, it results some measurements for curriculum development. Finally, the result of the proposed approach has been verified by relevant instructors for further development.

  11. Neutral evolution of proteins: The superfunnel in sequence space and its relation to mutational robustness

    Science.gov (United States)

    Noirel, Josselin; Simonson, Thomas

    2008-11-01

    Following Kimura's neutral theory of molecular evolution [M. Kimura, The Neutral Theory of Molecular Evolution (Cambridge University Press, Cambridge, 1983) (reprinted in 1986)], it has become common to assume that the vast majority of viable mutations of a gene confer little or no functional advantage. Yet, in silico models of protein evolution have shown that mutational robustness of sequences could be selected for, even in the context of neutral evolution. The evolution of a biological population can be seen as a diffusion on the network of viable sequences. This network is called a "neutral network." Depending on the mutation rate μ and the population size N, the biological population can evolve purely randomly (μN ≪1) or it can evolve in such a way as to select for sequences of higher mutational robustness (μN ≫1). The stringency of the selection depends not only on the product μN but also on the exact topology of the neutral network, the special arrangement of which was named "superfunnel." Even though the relation between mutation rate, population size, and selection was thoroughly investigated, a study of the salient topological features of the superfunnel that could affect the strength of the selection was wanting. This question is addressed in this study. We use two different models of proteins: on lattice and off lattice. We compare neutral networks computed using these models to random networks. From this, we identify two important factors of the topology that determine the stringency of the selection for mutationally robust sequences. First, the presence of highly connected nodes ("hubs") in the network increases the selection for mutationally robust sequences. Second, the stringency of the selection increases when the correlation between a sequence's mutational robustness and its neighbors' increases. The latter finding relates a global characteristic of the neutral network to a local one, which is attainable through experiments or molecular

  12. Analysis of Pteridium ribosomal RNA sequences by rapid direct sequencing.

    Science.gov (United States)

    Tan, M K

    1991-08-01

    A total of 864 bases from 5 regions interspersed in the 18S and 26S rRNA molecules from various clones of Pteridium covering the general geographical distribution of the genus was analysed using a rapid rRNA sequencing technique. No base difference has been detected amongst the three major lineages, two of which apparently separated before the breakup of the ancient supercontinent, Pangaea. These regions of the rRNA sequences have thus been conserved for at least 160 million years and are here compared with other eukaryotic, especially plant rRNAs.

  13. Perspective on sequence evolution of microsatellite locus (CCGn in Rv0050 gene from Mycobacterium tuberculosis

    Directory of Open Access Journals (Sweden)

    Jin Ruiliang

    2011-08-01

    Full Text Available Abstract Background The mycobacterial genome is inclined to polymerase slippage and a high mutation rate in microsatellite regions due to high GC content and absence of a mismatch repair system. However, the exact molecular mechanisms underlying microsatellite variation have not been fully elucidated. Here, we investigated mutation events in the hyper-variable trinucleotide microsatellite locus MML0050 located in the Rv0050 gene of W-Beijing and non-W-Beijing Mycobacterium tuberculosis strains in order to gain insight into the genomic structure and activity of repeated regions. Results Size analysis indicated the presence of five alleles that differed in length by three base pairs. Moreover, nucleotide gains occurred more frequently than loses in this trinucleotide microsatellite. Mutation frequency was not completely related with the total length, though the relative frequency in the longest allele was remarkably higher than that in the shortest. Sequence analysis was able to detect seven alleles and revealed that point mutations enhanced the level of locus variation. Introduction of an interruptive motif correlated with the total allele length and genetic lineage, rather than the length of the longest stretch of perfect repeats. Finally, the level of locus variation was drastically different between the two genetic lineages. Conclusion The Rv0050 locus encodes the bifunctional penicillin-binding protein ponA1 and is essential to mycobacterial survival. Our investigations of this particularly dynamic genomic region provide insights into the overall mode of microsatellite evolution. Specifically, replication slippage was implicated in the mutational process of this microsatellite and a sequence-based genetic analysis was necessary to determine that point mutation events acted to maintain microsatellite size integrity while providing genomic diversity.

  14. The genome sequence of taurine cattle: A window to ruminant biology and evolution

    Science.gov (United States)

    To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (ma...

  15. Mathematical Analysis of Genomic Evolution

    Directory of Open Access Journals (Sweden)

    Cedric Green

    2011-01-01

    Full Text Available Changes in nucleotide sequences, or mutations, accumulate from generation to generation in the genomes of all living organisms. The mutations can be advantageous, deleterious, or neutral. The goal of this project is to determine the amount of advantageous mutations it takes to get human (Homo sapiens DNA from the DNA of genetically distinct organisms. We do this by collecting the genomic data of such organisms, and estimating the amount of mutations it takes to transform yeast (Saccharomyces cerevisiae DNA to the DNA of a human. We calculate the typical number of mutations occurring annually through the organism's average life span and the average mutation rate. This allows us to determine the total number of mutations as well as the probability of advantageous mutations. Not surprisingly, this probability proves to be fairly small. A more precise estimate can be determined by accounting for the differences in the chromosomal structure and phenomena like horizontal gene transfer.

  16. Volcanic rises on Venus: Geology, formation, and sequence of evolution

    Science.gov (United States)

    Senske, D. A.; Stofan, E. R.; Bindschadler, D. L.; Smrekar, S. E.

    1993-01-01

    Large centers of volcanism on Venus are concentrated primarily in the equatorial region of the planet and are associated with regional topographic rises. Analysis of both radar images and geophysical data suggest that these uplands are sites of mantle upwelling. Magellan radar imaging provides a globally contiguous data set from which the geology of these regions is evaluated and compared. In addition, high resolution gravity data currently being collected provide a basis to assess the relationship between these uplands and processes in the planet's interior. Studies of the geology of the three largest volcanic highlands (Beta Regio, Atla Regio, Western Eistla Regio) show them to be distinct, having a range of volcanic and tectonic characteristics. In addition to these large areas, a number of smaller uplands are identified and are being analyzed (Bell Regio, Imdr Regio, Dione Regio (Ushas, Innini, and Hathor Montes), and Themis Regio). To understand better the mechanisms by which these volcanic rises form and evolve, we assess their geologic and geophysical characteristics.

  17. Evolution of endogenous sequences of banana streak virus: what can we learn from banana (Musa sp.) evolution?

    Science.gov (United States)

    Gayral, Philippe; Blondin, Laurence; Guidolin, Olivier; Carreel, Françoise; Hippolyte, Isabelle; Perrier, Xavier; Iskra-Caruana, Marie-Line

    2010-07-01

    Endogenous plant pararetroviruses (EPRVs) are viral sequences of the family Caulimoviridae integrated into the nuclear genome of numerous plant species. The ability of some endogenous sequences of Banana streak viruses (eBSVs) in the genome of banana (Musa sp.) to induce infections just like the virus itself was recently demonstrated (P. Gayral et al., J. Virol. 83:6697-6710, 2008). Although eBSVs probably arose from accidental events, infectious eBSVs constitute an extreme case of parasitism, as well as a newly described strategy for vertical virus transmission in plants. We investigated the early evolutionary stages of infectious eBSV for two distinct BSV species-GF (BSGFV) and Imové (BSImV)-through the study of their distribution, insertion polymorphism, and structure evolution among selected banana genotypes representative of the diversity of 60 wild Musa species and genotypes. To do so, the historical frame of host evolution was analyzed by inferring banana phylogeny from two chloroplast regions-matK and trnL-trnF-as well as from the nuclear genome, using 19 microsatellite loci. We demonstrated that both BSV species integrated recently in banana evolution, circa 640,000 years ago. The two infectious eBSVs were subjected to different selective pressures and showed distinct levels of rearrangement within their final structure. In addition, the molecular phylogenies of integrated and nonintegrated BSVs enabled us to establish the phylogenetic origins of eBSGFV and eBSImV.

  18. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently

    Science.gov (United States)

    Currin, Andrew; Swainston, Neil; Day, Philip J.

    2015-01-01

    The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the ‘search space’ of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (K d) and catalytic (k cat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving k cat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the ‘best’ amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole

  19. Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.

    Science.gov (United States)

    Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami

    2012-08-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or 15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.

  20. FAST: FAST Analysis of Sequences Toolbox

    Directory of Open Access Journals (Sweden)

    Travis J. Lawrence

    2015-05-01

    Full Text Available FAST (FAST Analysis of Sequences Toolbox provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU’s Not Unix Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics makes FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format. Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought.

  1. Bayesian Correlation Analysis for Sequence Count Data.

    Directory of Open Access Journals (Sweden)

    Daniel Sánchez-Taltavull

    Full Text Available Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low-especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.

  2. New Insights about Enzyme Evolution from Large Scale Studies of Sequence and Structure Relationships*

    Science.gov (United States)

    Brown, Shoshana D.; Babbitt, Patricia C.

    2014-01-01

    Understanding how enzymes have evolved offers clues about their structure-function relationships and mechanisms. Here, we describe evolution of functionally diverse enzyme superfamilies, each representing a large set of sequences that evolved from a common ancestor and that retain conserved features of their structures and active sites. Using several examples, we describe the different structural strategies nature has used to evolve new reaction and substrate specificities in each unique superfamily. The results provide insight about enzyme evolution that is not easily obtained from studies of one or only a few enzymes. PMID:25210038

  3. New insights about enzyme evolution from large scale studies of sequence and structure relationships.

    Science.gov (United States)

    Brown, Shoshana D; Babbitt, Patricia C

    2014-10-31

    Understanding how enzymes have evolved offers clues about their structure-function relationships and mechanisms. Here, we describe evolution of functionally diverse enzyme superfamilies, each representing a large set of sequences that evolved from a common ancestor and that retain conserved features of their structures and active sites. Using several examples, we describe the different structural strategies nature has used to evolve new reaction and substrate specificities in each unique superfamily. The results provide insight about enzyme evolution that is not easily obtained from studies of one or only a few enzymes. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.

  4. A basic analysis toolkit for biological sequences

    Directory of Open Access Journals (Sweden)

    Siragusa Enrico

    2007-09-01

    Full Text Available Abstract This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at http://www.math.unipa.it/~raffaele/BATS/ under the GNU GPL.

  5. Effects of main-sequence mass loss on stellar and galactic chemical evolution

    International Nuclear Information System (INIS)

    Guzik, J.A.

    1988-01-01

    L.A. Willson, G.H. Bowen and C. Struck-Marcell have proposed that 1 to 3 solar mass stars may experience evolutionarily significant mass loss during the early part of their main-sequence phase. The suggested mass-loss mechanism is pulsation, facilitated by rapid rotation. Initial mass-loss rates may be as large as several times 10 -9 M mass of sun/yr, diminishing over several times 10 8 years. The author attempts to test this hypothesis by comparing some theoretical implications with observations. Three areas are addressed: Solar models, cluster HR diagrams, and galactic chemical evolution. Mass-losing solar models were evolved that match the Sun's luminosity and radius at its present age. The most extreme viable models have initial mass 2.0 M 0 , and mass-loss rates decreasing exponentially over 2-3 x 10 8 years. Evolution calculations incorporating main-sequence mass loss were completed for a grid of models with initial masses 1.25 to 2.0 M mass of sun and mass loss timescales 0.2 to 2.0 Gry. Cluster HR diagrams synthesized with these models confirm the potential for the hypothesis to explain observed spreads or bifurcations in the upper main sequence, blue stragglers, anomalous giants, and poor fits of main-sequence turnoffs by standard isochrones. Simple closed galactic chemical evolution models were used to test the effects of main-sequence mass loss on the F and G dwarf distribution. Stars between 3.0 M mass of sun and a metallicity-dependent lower mass are assumed to lose mass. The models produce a 30 to 60% increase in the stars to stars-plus-remnants ratio, with fewer early-F dwarfs and many more late-F dwarfs remaining on the main sequence to the present

  6. The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution

    OpenAIRE

    Elsik, Christine G.; Tellam, Ross L.; Worley, Kim C.; Gibbs, Richard A.; Abatepaulo, Antonio R. R.; Abbey, Colette A.; Adelson, David L.; Aerts, Jan; Ahola, Virpi; Alexander, Lee; Alioto, Tyler; Almeida, Iassudara G.; Amadio, Ariel F.; Anatriello, Elen; Antonarakis, Stylianos E.

    2009-01-01

    To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specifi...

  7. Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

    Science.gov (United States)

    Bastien, Olivier; Maréchal, Eric

    2008-08-07

    Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the

  8. Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores

    Directory of Open Access Journals (Sweden)

    Maréchal Eric

    2008-08-01

    Full Text Available Abstract Background Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2 following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. Results We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure. Homologous sequences were considered as systems 1 having a high redundancy of information reflected by the magnitude of their alignment scores, 2 which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a

  9. Comparative analysis of sequences from PT 2013

    DEFF Research Database (Denmark)

    Mikkelsen, Susie Sommer

    Sheatfish and not EHNV. Generally, mistakes occurred at the ends of the sequences. This can be due to several factors. One is that the sequence has not been trimmed of the sequence primer sites. Another is the lack of quality control of the chromatogram. Finally, sequencing in just one direction can result...... diseases in Europe. As part of the EURL proficiency test for fish diseases it is required to sequence any RANA virus isolates found in any of the samples. It is also highly recommended to sequence the ISA virus to determine whether it be HPRΔ or HPR0. Furthermore, it is recommended that any VHSV and IHNV...... isolates be genotyped. As part of the evaluation of the proficiency results it was decided this year to look into the quality and similarity of the sequence results for selected viruses. Ampoule III in the proficiency test 2013 contained an EHNV isolate. The EURL received 43 sequences from 41 laboratories...

  10. Time fluctuation analysis of forest fire sequences

    Science.gov (United States)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value

  11. SVAMP: Sequence variation analysis, maps and phylogeny

    KAUST Repository

    Naeem, Raeece

    2014-04-03

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis in real time from variant call format (VCF) and associated metadata files. Allele frequency map, geographical map of isolates, Tajima\\'s D metric, single nucleotide polymorphism density, GC and variation density are also available for visualization in real time. We demonstrate the utility of SVAMP in tracking a methicillin-resistant Staphylococcus aureus outbreak from published next-generation sequencing data across 15 countries. We also demonstrate the scalability and accuracy of our software on 245 Plasmodium falciparum malaria isolates from three continents. Availability and implementation: The Qt/C++ software code, binaries, user manual and example datasets are available at http://cbrc.kaust.edu.sa/svamp. © The Author 2014.

  12. Statistical analysis of next generation sequencing data

    CERN Document Server

    Nettleton, Dan

    2014-01-01

    Next Generation Sequencing (NGS) is the latest high throughput technology to revolutionize genomic research. NGS generates massive genomic datasets that play a key role in the big data phenomenon that surrounds us today. To extract signals from high-dimensional NGS data and make valid statistical inferences and predictions, novel data analytic and statistical techniques are needed. This book contains 20 chapters written by prominent statisticians working with NGS data. The topics range from basic preprocessing and analysis with NGS data to more complex genomic applications such as copy number variation and isoform expression detection. Research statisticians who want to learn about this growing and exciting area will find this book useful. In addition, many chapters from this book could be included in graduate-level classes in statistical bioinformatics for training future biostatisticians who will be expected to deal with genomic data in basic biomedical research, genomic clinical trials and personalized med...

  13. Movement Pattern Analysis Based on Sequence Signatures

    Directory of Open Access Journals (Sweden)

    Seyed Hossein Chavoshi

    2015-09-01

    Full Text Available Increased affordability and deployment of advanced tracking technologies have led researchers from various domains to analyze the resulting spatio-temporal movement data sets for the purpose of knowledge discovery. Two different approaches can be considered in the analysis of moving objects: quantitative analysis and qualitative analysis. This research focuses on the latter and uses the qualitative trajectory calculus (QTC, a type of calculus that represents qualitative data on moving point objects (MPOs, and establishes a framework to analyze the relative movement of multiple MPOs. A visualization technique called sequence signature (SESI is used, which enables to map QTC patterns in a 2D indexed rasterized space in order to evaluate the similarity of relative movement patterns of multiple MPOs. The applicability of the proposed methodology is illustrated by means of two practical examples of interacting MPOs: cars on a highway and body parts of a samba dancer. The results show that the proposed method can be effectively used to analyze interactions of multiple MPOs in different domains.

  14. Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding.

    Directory of Open Access Journals (Sweden)

    Marta Brozynska

    Full Text Available Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina and Ion Torrent (Life Technology sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare. Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.

  15. Social network analysis community detection and evolution

    CERN Document Server

    Missaoui, Rokia

    2015-01-01

    This book is devoted to recent progress in social network analysis with a high focus on community detection and evolution. The eleven chapters cover the identification of cohesive groups, core components and key players either in static or dynamic networks of different kinds and levels of heterogeneity. Other important topics in social network analysis such as influential detection and maximization, information propagation, user behavior analysis, as well as network modeling and visualization are also presented. Many studies are validated through real social networks such as Twitter. This edit

  16. Survey sequencing and comparative analysis of the elephant shark (Callorhinchus milii genome.

    Directory of Open Access Journals (Sweden)

    Byrappa Venkatesh

    2007-04-01

    Full Text Available Owing to their phylogenetic position, cartilaginous fishes (sharks, rays, skates, and chimaeras provide a critical reference for our understanding of vertebrate genome evolution. The relatively small genome of the elephant shark, Callorhinchus milii, a chimaera, makes it an attractive model cartilaginous fish genome for whole-genome sequencing and comparative analysis. Here, the authors describe survey sequencing (1.4x coverage and comparative analysis of the elephant shark genome, one of the first cartilaginous fish genomes to be sequenced to this depth. Repetitive sequences, represented mainly by a novel family of short interspersed element-like and long interspersed element-like sequences, account for about 28% of the elephant shark genome. Fragments of approximately 15,000 elephant shark genes reveal specific examples of genes that have been lost differentially during the evolution of tetrapod and teleost fish lineages. Interestingly, the degree of conserved synteny and conserved sequences between the human and elephant shark genomes are higher than that between human and teleost fish genomes. Elephant shark contains putative four Hox clusters indicating that, unlike teleost fish genomes, the elephant shark genome has not experienced an additional whole-genome duplication. These findings underscore the importance of the elephant shark as a critical reference vertebrate genome for comparative analysis of the human and other vertebrate genomes. This study also demonstrates that a survey-sequencing approach can be applied productively for comparative analysis of distantly related vertebrate genomes.

  17. Noncoding sequence classification based on wavelet transform analysis: part I

    Science.gov (United States)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  18. Image sequence analysis workstation for multipoint motion analysis

    Science.gov (United States)

    Mostafavi, Hassan

    1990-08-01

    This paper describes an application-specific engineering workstation designed and developed to analyze motion of objects from video sequences. The system combines the software and hardware environment of a modem graphic-oriented workstation with the digital image acquisition, processing and display techniques. In addition to automation and Increase In throughput of data reduction tasks, the objective of the system Is to provide less invasive methods of measurement by offering the ability to track objects that are more complex than reflective markers. Grey level Image processing and spatial/temporal adaptation of the processing parameters is used for location and tracking of more complex features of objects under uncontrolled lighting and background conditions. The applications of such an automated and noninvasive measurement tool include analysis of the trajectory and attitude of rigid bodies such as human limbs, robots, aircraft in flight, etc. The system's key features are: 1) Acquisition and storage of Image sequences by digitizing and storing real-time video; 2) computer-controlled movie loop playback, freeze frame display, and digital Image enhancement; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored Image sequence; 4) model-based estimation and tracking of the six degrees of freedom of a rigid body: 5) field-of-view and spatial calibration: 6) Image sequence and measurement data base management; and 7) offline analysis software for trajectory plotting and statistical analysis.

  19. Targeted Sequencing of Venom Genes from Cone Snail Genomes Improves Understanding of Conotoxin Molecular Evolution.

    Science.gov (United States)

    Phuong, Mark A; Mahardika, Gusti N

    2018-05-01

    To expand our capacity to discover venom sequences from the genomes of venomous organisms, we applied targeted sequencing techniques to selectively recover venom gene superfamilies and nontoxin loci from the genomes of 32 cone snail species (family, Conidae), a diverse group of marine gastropods that capture their prey using a cocktail of neurotoxic peptides (conotoxins). We were able to successfully recover conotoxin gene superfamilies across all species with high confidence (> 100× coverage) and used these data to provide new insights into conotoxin evolution. First, we found that conotoxin gene superfamilies are composed of one to six exons and are typically short in length (mean = ∼85 bp). Second, we expanded our understanding of the following genetic features of conotoxin evolution: 1) positive selection, where exons coding the mature toxin region were often three times more divergent than their adjacent noncoding regions, 2) expression regulation, with comparisons to transcriptome data showing that cone snails only express a fraction of the genes available in their genome (24-63%), and 3) extensive gene turnover, where Conidae species varied from 120 to 859 conotoxin gene copies. Finally, using comparative phylogenetic methods, we found that while diet specificity did not predict patterns of conotoxin evolution, dietary breadth was positively correlated with total conotoxin gene diversity. Overall, the targeted sequencing technique demonstrated here has the potential to radically increase the pace at which venom gene families are sequenced and studied, reshaping our ability to understand the impact of genetic changes on ecologically relevant phenotypes and subsequent diversification.

  20. Genetic diversity and molecular evolution of Naga King Chili inferred from internal transcribed spacer sequence of nuclear ribosomal DNA.

    Science.gov (United States)

    Kehie, Mechuselie; Kumaria, Suman; Devi, Khumuckcham Sangeeta; Tandon, Pramod

    2016-02-01

    Sequences of the Internal Transcribed Spacer (ITS1-5.8S-ITS2) of nuclear ribosomal DNAs were explored to study the genetic diversity and molecular evolution of Naga King Chili. Our study indicated the occurrence of nucleotide polymorphism and haplotypic diversity in the ITS regions. The present study demonstrated that the variability of ITS1 with respect to nucleotide diversity and sequence polymorphism exceeded that of ITS2. Sequence analysis of 5.8S gene revealed a much conserved region in all the accessions of Naga King Chili. However, strong phylogenetic information of this species is the distinct 13 bp deletion in the 5.8S gene which discriminated Naga King Chili from the rest of the Capsicum sp. Neutrality test results implied a neutral variation, and population seems to be evolving at drift-mutation equilibrium and free from directed selection pressure. Furthermore, mismatch analysis showed multimodal curve indicating a demographic equilibrium. Phylogenetic relationships revealed by Median Joining Network (MJN) analysis denoted a clear discrimination of Naga King Chili from its closest sister species (Capsicum chinense and Capsicum frutescens). The absence of star-like network of haplotypes suggested an ancient population expansion of this chili.

  1. Increasing genomic diversity and evidence of constrained lifestyle evolution due to insertion sequences in Aeromonas salmonicida.

    Science.gov (United States)

    Vincent, Antony T; Trudel, Mélanie V; Freschi, Luca; Nagar, Vandan; Gagné-Thivierge, Cynthia; Levesque, Roger C; Charette, Steve J

    2016-01-12

    Aeromonads make up a group of Gram-negative bacteria that includes human and fish pathogens. The Aeromonas salmonicida species has the peculiarity of including five known subspecies. However, few studies of the genomes of A. salmonicida subspecies have been reported to date. We sequenced the genomes of additional A. salmonicida isolates, including three from India, using next-generation sequencing in order to gain a better understanding of the genomic and phylogenetic links between A. salmonicida subspecies. Their relative phylogenetic positions were confirmed by a core genome phylogeny based on 1645 gene sequences. The Indian isolates, which formed a sub-group together with A. salmonicida subsp. pectinolytica, were able to grow at either at 18 °C and 37 °C, unlike the A. salmonicida psychrophilic isolates that did not grow at 37 °C. Amino acid frequencies, GC content, tRNA composition, loss and gain of genes during evolution, pseudogenes as well as genes under positive selection and the mobilome were studied to explain this intraspecies dichotomy. Insertion sequences appeared to be an important driving force that locked the psychrophilic strains into their particular lifestyle in order to conserve their genomic integrity. This observation, based on comparative genomics, is in agreement with previous results showing that insertion sequence mobility induced by heat in A. salmonicida subspecies causes genomic plasticity, resulting in a deleterious effect on the virulence of the bacterium. We provide a proof-of-concept that selfish DNAs play a major role in the evolution of bacterial species by modeling genomes.

  2. Reverse transcriptase sequences from mulberry LTR retrotransposons: characterization analysis

    Directory of Open Access Journals (Sweden)

    Ma Bi

    2017-10-01

    Full Text Available Copia and Gypsy play important roles in structural, functional and evolutionary dynamics of plant genomes. In this study, a total of 106 and 101, Copia and Gypsy reverse transcriptase (rt were amplified respectively in the Morus notabilis genome using degenerate primers. All sequences exhibited high levels of heterogeneity, were rich in AT and possessed higher sequence divergence of Copia rt in comparison to Gypsy rt. Two reasons are likely to account for this phenomenon: a these elements often experience deletions or fragmentation by illegitimate or unequal homologous recombination in the transposition process; b strong purifying selective pressure drives the evolution of these elements through “selective silencing” with random mutation and eventual deletion from the host genome. Interestingly, mulberry rt clustered with other rt from distantly related taxa according to the phylogenetic analysis. This phenomenon did not result from horizontal transposable element transfer. Results obtained from fluorescence in situ hybridization revealed that most of the hybridization signals were preferentially concentrated in pericentromeric and distal regions of chromosomes, and these elements may play important roles in the regions in which they are found. Results of this study support the continued pursuit of further functional studies of Copia and Gypsy in the mulberry genome.

  3. THE QUADRUPLE PRE-MAIN-SEQUENCE SYSTEM LkCa 3: IMPLICATIONS FOR STELLAR EVOLUTION MODELS

    International Nuclear Information System (INIS)

    Torres, Guillermo; Latham, David W.; Ruíz-Rodríguez, Dary; Prato, L.; Wasserman, Lawrence H.; Badenas, Mariona; Schaefer, G. H.; Mathieu, Robert D.

    2013-01-01

    We report the discovery that the pre-main-sequence (PMS) object LkCa 3 in the Taurus-Auriga star-forming region is a hierarchical quadruple system of M stars. It was previously known to be a close (∼0.''5) visual pair, with one component being a moderately eccentric 12.94 day single-lined spectroscopic binary. A re-analysis of archival optical spectra complemented by new near-infrared (NIR) spectroscopy shows both visual components to be double lined; the second one has a period of 4.06 days and a circular orbit. In addition to the orbital elements, we determine optical and NIR flux ratios, effective temperatures, and projected rotational velocities for all four stars. Using existing photometric monitoring observations of the system that had previously revealed the rotational period of the primary in the longer-period binary, we also detect the rotational signal of the primary in the 4.06 day binary, which is synchronized with the orbital motion. With only the assumption of coevality, a comparison of all of these constraints with current stellar evolution models from the Dartmouth series points to an age of 1.4 Myr and a distance of 133 pc, consistent with previous estimates for the region and suggesting that the system is on the near side of the Taurus complex. Similar comparisons of the properties of LkCa 3 and the well-known quadruple PMS system GG Tau with the widely used models from the Lyon series for a mixing length parameter of α ML = 1.0 strongly favor the Dartmouth models

  4. Focal plane AIT sequence: evolution from HRG-Spot 5 to Pleiades HR

    Science.gov (United States)

    Le Goff, Roland; Pranyies, Pascal; Toubhans, Isabelle

    2017-11-01

    Optical and geometrical image qualities of Focal Planes, for "push-broom" high resolution remote sensing satellites, require the implementation of specific means and methods for the AIT sequence. Indeed the geometric performances of the focal plane mainly axial focusing and transverse registration, are duly obtained on the basis of adjustment, setting and measurement of optical and CCD components with an accuracy of a few microns. Since the end of the 1970s, EADS-SODERN has developed a series of detection units for earth observation instruments like SPOT and Helios. And EADS-SODERN is now responsible for the development of the Pleiades High Resolution Focal Plane assembly. This paper presents the AIT sequences. We introduce all the efforts, innovative solutions and improvements made on the assembly facilities to match the technical evolutions and breakthrough of the Pleiades HR FP concept in comparison with the previous High Resolution Geometric SPOT 5 Focal Plane. The main evolution drivers are the implementation of strip filters and the realization of 400 mm continuous retinas. For Pleiades HR AIT sequence, three specific integration and measuring benches, corresponding with the different assembly stages, are used: a 3-D non-contact measurement machine for the assembly of detection module, a 3-D measurement machine for mirror integration on the main Focal Plane SiC structure, and a 3-D geometric coordinates control bench to focus detection module lines and to ensure they are well registered together.

  5. Effects of Main-Sequence Mass Loss on Stellar and Galactic Chemical Evolution.

    Science.gov (United States)

    Guzik, Joyce Ann

    1988-06-01

    L. A. Willson, G. H. Bowen and C. Struck -Marcell have proposed that 1 to 3 solar mass stars may experience evolutionarily significant mass loss during the early part of their main-sequence phase. The suggested mass-loss mechanism is pulsation, facilitated by rapid rotation. Initial mass-loss rates may be as large as several times 10^{-9}M o/yr, diminishing over several times 10^8 years. We attempted to test this hypothesis by comparing some theoretical implications with observations. Three areas are addressed: Solar models, cluster HR diagrams, and galactic chemical evolution. Mass-losing solar models were evolved that match the Sun's luminosity and radius at its present age. The most extreme viable models have initial mass 2.0 M o, and mass-loss rates decreasing exponentially over 2-3 times 10^8 years. Compared to a constant -mass model, these models require a reduced initial ^4He abundance, have deeper envelope convection zones and higher ^8B neutrino fluxes. Early processing of present surface layers at higher interior temperatures increases the surface ^3He abundance, destroys Li, Be and B, and decreases the surface C/N ratio following first dredge-up. Evolution calculations incorporating main-sequence mass loss were completed for a grid of models with initial masses 1.25 to 2.0 Mo and mass loss timescales 0.2 to 2.0 Gyr. Cluster HR diagrams synthesized with these models confirm the potential for the hypothesis to explain observed spreads or bifurcations in the upper main sequence, blue stragglers, anomalous giants, and poor fits of main-sequence turnoffs by standard isochrones. Simple closed galactic chemical evolution models were used to test the effects of main-sequence mass loss on the F and G dwarf distribution. Stars between 3.0 M o and a metallicity -dependent lower mass are assumed to lose mass. The models produce a 30 to 60% increase in the stars to stars-plus -remnants ratio, with fewer early-F dwarfs and many more late-F dwarfs remaining on the main

  6. Novel algorithms for protein sequence analysis

    NARCIS (Netherlands)

    Ye, Kai

    2008-01-01

    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology”s paradigm is that this order of amino acids determines the protein”s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1

  7. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication. CONCLUSIONS...

  8. Numerical Simulation of Stress evolution and earthquake sequence of the Tibetan Plateau

    Science.gov (United States)

    Dong, Peiyu; Hu, Caibo; Shi, Yaolin

    2015-04-01

    The India-Eurasia's collision produces N-S compression and results in large thrust fault in the southern edge of the Tibetan Plateau. Differential eastern flow of the lower crust of the plateau leads to large strike-slip faults and normal faults within the plateau. From 1904 to 2014, more than 30 earthquakes of Mw > 6.5 occurred sequentially in this distinctive tectonic environment. How did the stresses evolve during the last 110 years, how did the earthquakes interact with each other? Can this knowledge help us to forecast the future seismic hazards? In this essay, we tried to simulate the evolution of the stress field and the earthquake sequence in the Tibetan plateau within the last 110 years with a 2-D finite element model. Given an initial state of stress, the boundary condition was constrained by the present-day GPS observation, which was assumed as a constant rate during the 110 years. We calculated stress evolution year by year, and earthquake would occur if stress exceed the crustal strength. Stress changes due to each large earthquake in the sequence was calculated and contributed to the stress evolution. A key issue is the choice of initial stress state of the modeling, which is actually unknown. Usually, in the study of earthquake triggering, people assume the initial stress is zero, and only calculate the stress changes by large earthquakes - the Coulomb failure stress changes (Δ CFS). To some extent, this simplified method is a powerful tool because it can reveal which fault or which part of a fault becomes more risky or safer relatively. Nonetheless, it has not utilized all information available to us. The earthquake sequence reveals, though far from complete, some information about the stress state in the region. If the entire region is close to a self-organized critical or subcritical state, earthquake stress drop provides an estimate of lower limit of initial state. For locations no earthquakes occurred during the period, initial stress has to be

  9. Characterization and sequence analysis of cysteine and glycine-rich ...

    African Journals Online (AJOL)

    Primers specific for CSRP3 were designed using known cDNA sequences of Bos taurus published in database with different accession numbers. Polymerase chain reaction (PCR) was performed and products were purified and sequenced. Sequence analysis and alignment were carried out using CLUSTAL W (1.83).

  10. A bibliometric analysis of global research on genome sequencing ...

    African Journals Online (AJOL)

    The results show that disease and protein related researches were the leading research focuses, and comparative genomics and evolution related research had strong potential in the near future. Key words: Genome sequencing, research trend, scientometrics, science citation index expanded (SCI-Expanded), word cluster ...

  11. Incident sequence analysis; event trees, methods and graphical symbols

    International Nuclear Information System (INIS)

    1980-11-01

    When analyzing incident sequences, unwanted events resulting from a certain cause are looked for. Graphical symbols and explanations of graphical representations are presented. The method applies to the analysis of incident sequences in all types of facilities. By means of the incident sequence diagram, incident sequences, i.e. the logical and chronological course of repercussions initiated by the failure of a component or by an operating error, can be presented and analyzed simply and clearly

  12. Computer-aided visualization and analysis system for sequence evaluation

    Energy Technology Data Exchange (ETDEWEB)

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  13. Rapid evolution of the env gene leader sequence in cats naturally infected with feline immunodeficiency virus

    Science.gov (United States)

    Hughes, Joseph; Biek, Roman; Litster, Annette; Willett, Brian J.; Hosie, Margaret J.

    2015-01-01

    Analysing the evolution of feline immunodeficiency virus (FIV) at the intra-host level is important in order to address whether the diversity and composition of viral quasispecies affect disease progression. We examined the intra-host diversity and the evolutionary rates of the entire env and structural fragments of the env sequences obtained from sequential blood samples in 43 naturally infected domestic cats that displayed different clinical outcomes. We observed in the majority of cats that FIV env showed very low levels of intra-host diversity. We estimated that env evolved at a rate of 1.16×10−3 substitutions per site per year and demonstrated that recombinant sequences evolved faster than non-recombinant sequences. It was evident that the V3–V5 fragment of FIV env displayed higher evolutionary rates in healthy cats than in those with terminal illness. Our study provided the first evidence that the leader sequence of env, rather than the V3–V5 sequence, had the highest intra-host diversity and the highest evolutionary rate of all env fragments, consistent with this region being under a strong selective pressure for genetic variation. Overall, FIV env displayed relatively low intra-host diversity and evolved slowly in naturally infected cats. The maximum evolutionary rate was observed in the leader sequence of env. Although genetic stability is not necessarily a prerequisite for clinical stability, the higher genetic stability of FIV compared with human immunodeficiency virus might explain why many naturally infected cats do not progress rapidly to AIDS. PMID:25535323

  14. Relation of chromospheric activity to convection, rotation, and pre-main-sequence evolution

    International Nuclear Information System (INIS)

    Gilliland, R.L.

    1986-01-01

    Pre-main-sequence, or T Tauri, stars are characterized by much larger fluxes of nonradiative origin than their main-sequence counterparts. As a class, the T Tauri stars have only moderate rotation rates, making an explanation of their chromospheric properties based on rapid rotation problematic. The recent success of correlating nonradiative fluxes to the Rossby number, Ro = P/sub rot//tau/sub conv/, a central parameter of simple dynamo theories of magnetic field generation, has led to the suggestion that the same relation might be of use in explaining the pre-main-sequence (PMS) stars if tau/sub conv/ is very large. We show that tau/sub conv/ does depend strongly on evolutionary effects above the main sequence (MS), but that this dependence alone cannot account for the high observed nonradiative fluxes. The acoustic flux is also strongly dependent on PMS evolutionary state, and when coupled to the parameterization of magnetic activity based on Ro, these two mechanisms seem capable of explaining the high observed level of chromospheric activity in T Tauri stars. The moment of inertia decreases by two to three order of magnitude during PMS evolution. Since young MS stars do not rotate two to three orders of magnitude faster than PMS stars, rapid loss or redistribution of angular momentum must occur

  15. Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes.

    Science.gov (United States)

    Sun, Yan-Bo; Xiong, Zi-Jun; Xiang, Xue-Yan; Liu, Shi-Ping; Zhou, Wei-Wei; Tu, Xiao-Long; Zhong, Li; Wang, Lu; Wu, Dong-Dong; Zhang, Bao-Lin; Zhu, Chun-Ling; Yang, Min-Min; Chen, Hong-Man; Li, Fang; Zhou, Long; Feng, Shao-Hong; Huang, Chao; Zhang, Guo-Jie; Irwin, David; Hillis, David M; Murphy, Robert W; Yang, Huan-Ming; Che, Jing; Wang, Jun; Zhang, Ya-Ping

    2015-03-17

    The development of efficient sequencing techniques has resulted in large numbers of genomes being available for evolutionary studies. However, only one genome is available for all amphibians, that of Xenopus tropicalis, which is distantly related from the majority of frogs. More than 96% of frogs belong to the Neobatrachia, and no genome exists for this group. This dearth of amphibian genomes greatly restricts genomic studies of amphibians and, more generally, our understanding of tetrapod genome evolution. To fill this gap, we provide the de novo genome of a Tibetan Plateau frog, Nanorana parkeri, and compare it to that of X. tropicalis and other vertebrates. This genome encodes more than 20,000 protein-coding genes, a number similar to that of Xenopus. Although the genome size of Nanorana is considerably larger than that of Xenopus (2.3 vs. 1.5 Gb), most of the difference is due to the respective number of transposable elements in the two genomes. The two frogs exhibit considerable conserved whole-genome synteny despite having diverged approximately 266 Ma, indicating a slow rate of DNA structural evolution in anurans. Multigenome synteny blocks further show that amphibians have fewer interchromosomal rearrangements than mammals but have a comparable rate of intrachromosomal rearrangements. Our analysis also identifies 11 Mb of anuran-specific highly conserved elements that will be useful for comparative genomic analyses of frogs. The Nanorana genome offers an improved understanding of evolution of tetrapod genomes and also provides a genomic reference for other evolutionary studies.

  16. Nearly Complete 28S rRNA Gene Sequences Confirm New Hypotheses of Sponge Evolution

    Science.gov (United States)

    Thacker, Robert W.; Hill, April L.; Hill, Malcolm S.; Redmond, Niamh E.; Collins, Allen G.; Morrow, Christine C.; Spicer, Lori; Carmack, Cheryl A.; Zappe, Megan E.; Pohlmann, Deborah; Hall, Chelsea; Diaz, Maria C.; Bangalore, Purushotham V.

    2013-01-01

    The highly collaborative research sponsored by the NSF-funded Assembling the Porifera Tree of Life (PorToL) project is providing insights into some of the most difficult questions in metazoan systematics. Our understanding of phylogenetic relationships within the phylum Porifera has changed considerably with increased taxon sampling and data from additional molecular markers. PorToL researchers have falsified earlier phylogenetic hypotheses, discovered novel phylogenetic alliances, found phylogenetic homes for enigmatic taxa, and provided a more precise understanding of the evolution of skeletal features, secondary metabolites, body organization, and symbioses. Some of these exciting new discoveries are shared in the papers that form this issue of Integrative and Comparative Biology. Our analyses of over 300 nearly complete 28S ribosomal subunit gene sequences provide specific case studies that illustrate how our dataset confirms new hypotheses of sponge evolution. We recovered monophyletic clades for all 4 classes of sponges, as well as the 4 major clades of Demospongiae (Keratosa, Myxospongiae, Haploscleromorpha, and Heteroscleromorpha), but our phylogeny differs in several aspects from traditional classifications. In most major clades of sponges, families within orders appear to be paraphyletic. Although additional sampling of genes and taxa are needed to establish whether this pattern results from a lack of phylogenetic resolution or from a paraphyletic classification system, many of our results are congruent with those obtained from 18S ribosomal subunit gene sequences and complete mitochondrial genomes. These data provide further support for a revision of the traditional classification of sponges. PMID:23748742

  17. Phylogeny, character evolution, and biogeography of Cuscuta (dodders; Convolvulaceae) inferred from coding plastid and nuclear sequences.

    Science.gov (United States)

    García, Miguel A; Costea, Mihai; Kuzmina, Maria; Stefanović, Saša

    2014-04-01

    The parasitic genus Cuscuta, containing some 200 species circumscribed traditionally in three subgenera, is nearly cosmopolitan, occurring in a wide range of habitats and hosts. Previous molecular studies, on subgenera Grammica and Cuscuta, delimited major clades within these groups. However, the sequences used were unalignable among subgenera, preventing the phylogenetic comparison across the genus. We conducted a broad phylogenetic study using rbcL and nrLSU sequences covering the morphological, physiological, and geographical diversity of Cuscuta. We used parsimony methods to reconstruct ancestral states for taxonomically important characters. Biogeographical inferences were obtained using statistical and Bayesian approaches. Four well-supported major clades are resolved. Two of them correspond to subgenera Monogynella and Grammica. Subgenus Cuscuta is paraphyletic, with section Pachystigma sister to subgenus Grammica. Previously described cases of strongly supported discordance between plastid and nuclear phylogenies, interpreted as reticulation events, are confirmed here and three new cases are detected. Dehiscent fruits and globose stigmas are inferred as ancestral character states, whereas the ancestral style number is ambiguous. Biogeographical reconstructions suggest an Old World origin for the genus and subsequent spread to the Americas as a consequence of one long-distance dispersal. Hybridization may play an important yet underestimated role in the evolution of Cuscuta. Our results disagree with scenarios of evolution (polarity) previously proposed for several taxonomically important morphological characters, and with their usage and significance. While several cases of long-distance dispersal are inferred, vicariance or dispersal to adjacent areas emerges as the dominant biogeographical pattern.

  18. Impacts of WIMP dark matter upon stellar evolution: main-sequence stars

    CERN Document Server

    Scott, Pat; Edsjo, Joakim

    2008-01-01

    The presence of large amounts of WIMP dark matter in stellar cores has been shown to have significant effects upon models of stellar evolution. We present a series of detailed grids of WIMP-influenced stellar models for main sequence stars, computed using the DarkStars code. We describe the changes in stellar structure and main sequence evolution which occur for masses ranging from 0.3 to 2.0 solar masses and metallicities from Z = 0.0003-0.02, as a function of the rate of energy injection by WIMPs. We then go on to show what rates of energy injection can be obtained using realistic orbital parameters for stars near supermassive black holes, including detailed considerations of dark matter halo velocity and density profiles. Capture and annihilation rates are strongly boosted when stars follow elliptical rather than circular orbits, causing WIMP annihilation to provide up to 100 times the energy of hydrogen fusion in stars at the Galactic centre.

  19. High-resolution sequence stratigraphy and continental environmental evolution: An example from east-central Argentina

    Science.gov (United States)

    Beilinson, Elisa; Veiga, Gonzalo D.; Spalletti, Luis A.

    2013-10-01

    The aims of this contribution is to establish a high-resolution sequence stratigraphic scheme for the continental deposits that constitute the Punta San Andrés Alloformation (Plio-Pleistocene) in east-central Argentina, to analyze the basin fill evolution and to identify and assess the role that extrinsic factors such as climate and sea-level oscillations played during evolution of the unit. For the high-resolution sequence stratigraphical study of the Punta San Andrés Alloformation, high- and low-accommodation system tracts were defined mainly on the basis of the architectural elements present in the succession, also taking into account the relative degree of channel and floodplain deposits. Discontinuities and the nature of depositional systems generated during variations in accommodation helped identify two fourth-order high-accommodation system tracts and two fourth-order low-accommodation system tracts. At a third-order scale, the Punta San Andrés Alloformation may be interpreted as the progradation of continental depositional systems, characterized by a braided system in the proximal areas, and a low-sinuosity, single-channel system in the distal areas, defined by a high rate of sediment supply and discharge peaks which periodically flooded the plains and generated high aggradation rates during the late Pliocene and lower Pleistocene.

  20. Genome Sequences of Marine Shrimp Exopalaemon carinicauda Holthuis Provide Insights into Genome Size Evolution of Caridea.

    Science.gov (United States)

    Yuan, Jianbo; Gao, Yi; Zhang, Xiaojun; Wei, Jiankai; Liu, Chengzhang; Li, Fuhua; Xiang, Jianhai

    2017-07-05

    Crustacea, particularly Decapoda, contains many economically important species, such as shrimps and crabs. Crustaceans exhibit enormous (nearly 500-fold) variability in genome size. However, limited genome resources are available for investigating these species. Exopalaemon carinicauda Holthuis, an economical caridean shrimp, is a potential ideal experimental animal for research on crustaceans. In this study, we performed low-coverage sequencing and de novo assembly of the E. carinicauda genome. The assembly covers more than 95% of coding regions. E. carinicauda possesses a large complex genome (5.73 Gb), with size twice higher than those of many decapod shrimps. As such, comparative genomic analyses were implied to investigate factors affecting genome size evolution of decapods. However, clues associated with genome duplication were not identified, and few horizontally transferred sequences were detected. Ultimately, the burst of transposable elements, especially retrotransposons, was determined as the major factor influencing genome expansion. A total of 2 Gb repeats were identified, and RTE-BovB, Jockey, Gypsy, and DIRS were the four major retrotransposons that significantly expanded. Both recent (Jockey and Gypsy) and ancestral (DIRS) originated retrotransposons responsible for the genome evolution. The E. carinicauda genome also exhibited potential for the genomic and experimental research of shrimps.

  1. Modeling coding-sequence evolution within the context of residue solvent accessibility.

    Science.gov (United States)

    Scherrer, Michael P; Meyer, Austin G; Wilke, Claus O

    2012-09-12

    Protein structure mediates site-specific patterns of sequence divergence. In particular, residues in the core of a protein (solvent-inaccessible residues) tend to be more evolutionarily conserved than residues on the surface (solvent-accessible residues). Here, we present a model of sequence evolution that explicitly accounts for the relative solvent accessibility of each residue in a protein. Our model is a variant of the Goldman-Yang 1994 (GY94) model in which all model parameters can be functions of the relative solvent accessibility (RSA) of a residue. We apply this model to a data set comprised of nearly 600 yeast genes, and find that an evolutionary-rate ratio ω that varies linearly with RSA provides a better model fit than an RSA-independent ω or an ω that is estimated separately in individual RSA bins. We further show that the branch length t and the transition-transverion ratio κ also vary with RSA. The RSA-dependent GY94 model performs better than an RSA-dependent Muse-Gaut 1994 (MG94) model in which the synonymous and non-synonymous rates individually are linear functions of RSA. Finally, protein core size affects the slope of the linear relationship between ω and RSA, and gene expression level affects both the intercept and the slope. Structure-aware models of sequence evolution provide a significantly better fit than traditional models that neglect structure. The linear relationship between ω and RSA implies that genes are better characterized by their ω slope and intercept than by just their mean ω.

  2. Modeling coding-sequence evolution within the context of residue solvent accessibility

    Directory of Open Access Journals (Sweden)

    Scherrer Michael P

    2012-09-01

    Full Text Available Abstract Background Protein structure mediates site-specific patterns of sequence divergence. In particular, residues in the core of a protein (solvent-inaccessible residues tend to be more evolutionarily conserved than residues on the surface (solvent-accessible residues. Results Here, we present a model of sequence evolution that explicitly accounts for the relative solvent accessibility of each residue in a protein. Our model is a variant of the Goldman-Yang 1994 (GY94 model in which all model parameters can be functions of the relative solvent accessibility (RSA of a residue. We apply this model to a data set comprised of nearly 600 yeast genes, and find that an evolutionary-rate ratio ω that varies linearly with RSA provides a better model fit than an RSA-independent ω or an ω that is estimated separately in individual RSA bins. We further show that the branch length t and the transition-transverion ratio κ also vary with RSA. The RSA-dependent GY94 model performs better than an RSA-dependent Muse-Gaut 1994 (MG94 model in which the synonymous and non-synonymous rates individually are linear functions of RSA. Finally, protein core size affects the slope of the linear relationship between ω and RSA, and gene expression level affects both the intercept and the slope. Conclusions Structure-aware models of sequence evolution provide a significantly better fit than traditional models that neglect structure. The linear relationship between ω and RSA implies that genes are better characterized by their ω slope and intercept than by just their mean ω.

  3. Scalable Kernel Methods and Algorithms for General Sequence Analysis

    Science.gov (United States)

    Kuksa, Pavel

    2011-01-01

    Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…

  4. Recurrence plot analysis of DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    Wu Zuobing [State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing 100080 (China)]. E-mail: wuzb@lnm.imech.ac.cn

    2004-11-15

    Recurrence plot technique of DNA sequences is established on metric representation and employed to analyze correlation structure of nucleotide strings. It is found that, in the transference of nucleotide strings, a human DNA fragment has a major correlation distance, but a yeast chromosome's correlation distance has a constant increasing.

  5. The consequences of sequence erosion in the evolution of recombination hotspots.

    Science.gov (United States)

    Tiemann-Boege, Irene; Schwarz, Theresa; Striedner, Yasmin; Heissl, Angelika

    2017-12-19

    Meiosis is initiated by a double-strand break (DSB) introduced in the DNA by a highly controlled process that is repaired by recombination. In many organisms, recombination occurs at specific and narrow regions of the genome, known as recombination hotspots, which overlap with regions enriched for DSBs. In recent years, it has been demonstrated that conversions and mutations resulting from the repair of DSBs lead to a rapid sequence evolution at recombination hotspots eroding target sites for DSBs. We still do not fully understand the effect of this erosion in the recombination activity, but evidence has shown that the binding of trans -acting factors like PRDM9 is affected. PRDM9 is a meiosis-specific, multi-domain protein that recognizes DNA target motifs by its zinc finger domain and directs DSBs to these target sites. Here we discuss the changes in affinity of PRDM9 to eroded recognition sequences, and explain how these changes in affinity of PRDM9 can affect recombination, leading sometimes to sterility in the context of hybrid crosses. We also present experimental data showing that DNA methylation reduces PRDM9 binding in vitro Finally, we discuss PRDM9-independent hotspots, posing the question how these hotspots evolve and change with sequence erosion.This article is part of the themed issue 'Evolutionary causes and consequences of recombination rate variation in sexual organisms'. © 2017 The Authors.

  6. Parameters of proteome evolution from histograms of amino-acid sequence identities of paralogous proteins

    Directory of Open Access Journals (Sweden)

    Yan Koon-Kiu

    2007-11-01

    Full Text Available Abstract Background The evolution of the full repertoire of proteins encoded in a given genome is mostly driven by gene duplications, deletions, and sequence modifications of existing proteins. Indirect information about relative rates and other intrinsic parameters of these three basic processes is contained in the proteome-wide distribution of sequence identities of pairs of paralogous proteins. Results We introduce a simple mathematical framework based on a stochastic birth-and-death model that allows one to extract some of this information and apply it to the set of all pairs of paralogous proteins in H. pylori, E. coli, S. cerevisiae, C. elegans, D. melanogaster, and H. sapiens. It was found that the histogram of sequence identities p generated by an all-to-all alignment of all protein sequences encoded in a genome is well fitted with a power-law form ~ p-γ with the value of the exponent γ around 4 for the majority of organisms used in this study. This implies that the intra-protein variability of substitution rates is best described by the Gamma-distribution with the exponent α ≈ 0.33. Different features of the shape of such histograms allow us to quantify the ratio between the genome-wide average deletion/duplication rates and the amino-acid substitution rate. Conclusion We separately measure the short-term ("raw" duplication and deletion rates rdup∗ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemOCai3aa0baaSqaaiabbsgaKjabbwha1jabbchaWbqaaiabgEHiQaaaaaa@3283@, rdel∗ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemOCai3aa0baaSqaaiabbsga

  7. Analysis of Neuronal Sequences Using Pairwise Biases

    Science.gov (United States)

    2015-08-27

    semantic memory (knowledge of facts) and implicit memory (e.g., how to ride a bike ). Evidence for the participation of the hippocampus in the formation of...hippocampal formation in an attempt to be cured of severe epileptic seizures. Although the surgery was successful in regards to reducing the frequency and...very different from each other in many ways including duration and number of spikes. Still, these sequences share a similar trend in the general order

  8. Google matrix analysis of DNA sequences.

    Science.gov (United States)

    Kandiah, Vivek; Shepelyansky, Dima L

    2013-01-01

    For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  9. Google matrix analysis of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Vivek Kandiah

    Full Text Available For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW. At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  10. Large Scale Sequencing of Dothideomycetes Provides Insights into Genome Evolution and Adaptation

    Energy Technology Data Exchange (ETDEWEB)

    Haridas, Sajeet; Crous, Pedro; Binder, Manfred; Spatafora, Joseph; Grigoriev, Igor

    2015-03-16

    Dothideomycetes is the largest and most diverse class of ascomycete fungi with 23 orders 110 families, 1300 genera and over 19,000 known species. We present comparative analysis of 70 Dothideomycete genomes including over 50 that we sequenced and are as yet unpublished. This extensive sampling has almost quadrupled the previous study of 18 species and uncovered a 10 fold range of genome sizes. We were able to clarify the phylogenetic positions of several species whose origins were unclear in previous morphological and sequence comparison studies. We analyzed selected gene families including proteases, transporters and small secreted proteins and show that major differences in gene content is influenced by speciation.

  11. The sequence and analysis of a Chinese pig genome

    Directory of Open Access Journals (Sweden)

    Fang Xiaodong

    2012-11-01

    Full Text Available Abstract Background The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP, as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome. Results Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes. Conclusion Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models.

  12. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    Directory of Open Access Journals (Sweden)

    Wadim L. Matochko

    2013-01-01

    Full Text Available Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa. The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq. Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN, which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  13. Analysis of expressed sequence tags from Prunus mume flower and fruit and development of simple sequence repeat markers

    Directory of Open Access Journals (Sweden)

    Gao Zhihong

    2010-07-01

    Full Text Available Abstract Background Expressed Sequence Tag (EST has been a cost-effective tool in molecular biology and represents an abundant valuable resource for genome annotation, gene expression, and comparative genomics in plants. Results In this study, we constructed a cDNA library of Prunus mume flower and fruit, sequenced 10,123 clones of the library, and obtained 8,656 expressed sequence tag (EST sequences with high quality. The ESTs were assembled into 4,473 unigenes composed of 1,492 contigs and 2,981 singletons and that have been deposited in NCBI (accession IDs: GW868575 - GW873047, among which 1,294 unique ESTs were with known or putative functions. Furthermore, we found 1,233 putative simple sequence repeats (SSRs in the P. mume unigene dataset. We randomly tested 42 pairs of PCR primers flanking potential SSRs, and 14 pairs were identified as true-to-type SSR loci and could amplify polymorphic bands from 20 individual plants of P. mume. We further used the 14 EST-SSR primer pairs to test the transferability on peach and plum. The result showed that nearly 89% of the primer pairs produced target PCR bands in the two species. A high level of marker polymorphism was observed in the plum species (65% and low in the peach (46%, and the clustering analysis of the three species indicated that these SSR markers were useful in the evaluation of genetic relationships and diversity between and within the Prunus species. Conclusions We have constructed the first cDNA library of P. mume flower and fruit, and our data provide sets of molecular biology resources for P. mume and other Prunus species. These resources will be useful for further study such as genome annotation, new gene discovery, gene functional analysis, molecular breeding, evolution and comparative genomics between Prunus species.

  14. The Evolution of the Observed Hubble Sequence over the past 6Gyr

    Science.gov (United States)

    Delgado-Serrano, R.; Hammer, F.; Yang, Y. B.; Puech, M.; Flores, H.; Rodrigues, M.

    2011-10-01

    During the past years we have confronted serious problems of methodology concerning the morphological and kinematic classification of distant galaxies. This has forced us to create a new simple and effective morphological classification methodology, in order to guarantee a morpho-kinematic correlation, make the reproducibility easier and restrict the classification subjectivity. Giving the characteristic of our morphological classification, we have thus been able to apply the same methodology, using equivalent observations, to representative samples of local and distant galaxies. It has allowed us to derive, for the first time, the distant Hubble sequence (~6 Gyr ago), and determine a morphological evolution of galaxies over the past 6 Gyr. Our results strongly suggest that more than half of the present-day spirals had peculiar morphologies, 6 Gyr ago.

  15. Evolution of green plants as deduced from 5S rRNA sequences.

    Science.gov (United States)

    Hori, H; Lim, B L; Osawa, S

    1985-02-01

    We have constructed a phylogenic tree for green plants by comparing 5S rRNA sequences. The tree suggests that the emergence of most of the uni- and multicellular green algae such as Chlamydomonas, Spirogyra, Ulva, and Chlorella occurred in the early stage of green plant evolution. The branching point of Nitella is a little earlier than that of land plants and much later than that of the above green algae, supporting the view that Nitella-like green algae may be the direct precursor to land plants. The Bryophyta and the Pteridophyta separated from each other after emergence of the Spermatophyta. The result is consistent with the view that the Bryophyta evolved from ferns by degeneration. In the Pteridophyta, Psilotum (whisk fern) separated first, and a little later Lycopodium (club moss) separated from the ancestor common to Equisetum (horsetail) and Dryopteris (fern). This order is in accordance with the classical view. During the Spermatophyta evolution, the gymnosperms (Cycas, Ginkgo, and Metasequoia have been studied here) and the angiosperms (flowering plants) separated, and this was followed by the separation of Metasequoia and Cycas (cycad)/Ginkgo (maidenhair tree) on one branch and various flowering plants on the other.

  16. Cloning and sequence analysis of benzo-a-pyreneinducible ...

    African Journals Online (AJOL)

    The phylogenetic tree based on the amino acid sequences clearly shows tilapia CYP1A and killifish CYP1A to be more closely related to each other than to the other CYP1A subfamilies. Sequence analysis of 3727 bp of genomic DNA showed that the clone obtained was the structural gene of CYP1A which consists of ...

  17. Biological sequence analysis: probabilistic models of proteins and nucleic acids

    National Research Council Canada - National Science Library

    Durbin, Richard

    1998-01-01

    ... analysis methods are now based on principles of probabilistic modelling. Examples of such methods include the use of probabilistically derived score matrices to determine the significance of sequence alignments, the use of hidden Markov models as the basis for profile searches to identify distant members of sequence families, and the inference...

  18. Phylogenetic analysis of the genus Hordeum using repetitive DNA sequences

    DEFF Research Database (Denmark)

    Svitashev, S.; Bryngelsson, T.; Vershinin, A.

    1994-01-01

    A set of six cloned barley (Hordeum vulgare) repetitive DNA sequences was used for the analysis of phylogenetic relationships among 31 species (46 taxa) of the genus Hordeum, using molecular hybridization techniques. In situ hybridization experiments showed dispersed organization of the sequences...

  19. Genomic analysis of expressed sequence tags in American black bear Ursus americanus.

    Science.gov (United States)

    Zhao, Sen; Shao, Chunxuan; Goropashnaya, Anna V; Stewart, Nathan C; Xu, Yichi; Tøien, Øivind; Barnes, Brian M; Fedorov, Vadim B; Yan, Jun

    2010-03-26

    Species of the bear family (Ursidae) are important organisms for research in molecular evolution, comparative physiology and conservation biology, but relatively little genetic sequence information is available for this group. Here we report the development and analyses of the first large scale Expressed Sequence Tag (EST) resource for the American black bear (Ursus americanus). Comprehensive analyses of molecular functions, alternative splicing, and tissue-specific expression of 38,757 black bear EST sequences were conducted using the dog genome as a reference. We identified 18 genes, involved in functions such as lipid catabolism, cell cycle, and vesicle-mediated transport, that are showing rapid evolution in the bear lineage Three genes, Phospholamban (PLN), cysteine glycine-rich protein 3 (CSRP3) and Troponin I type 3 (TNNI3), are related to heart contraction, and defects in these genes in humans lead to heart disease. Two genes, biphenyl hydrolase-like (BPHL) and CSRP3, contain positively selected sites in bear. Global analysis of evolution rates of hibernation-related genes in bear showed that they are largely conserved and slowly evolving genes, rather than novel and fast-evolving genes. We provide a genomic resource for an important mammalian organism and our study sheds new light on the possible functions and evolution of bear genes.

  20. Genomic analysis of expressed sequence tags in American black bear Ursus americanus

    Science.gov (United States)

    2010-01-01

    Background Species of the bear family (Ursidae) are important organisms for research in molecular evolution, comparative physiology and conservation biology, but relatively little genetic sequence information is available for this group. Here we report the development and analyses of the first large scale Expressed Sequence Tag (EST) resource for the American black bear (Ursus americanus). Results Comprehensive analyses of molecular functions, alternative splicing, and tissue-specific expression of 38,757 black bear EST sequences were conducted using the dog genome as a reference. We identified 18 genes, involved in functions such as lipid catabolism, cell cycle, and vesicle-mediated transport, that are showing rapid evolution in the bear lineage Three genes, Phospholamban (PLN), cysteine glycine-rich protein 3 (CSRP3) and Troponin I type 3 (TNNI3), are related to heart contraction, and defects in these genes in humans lead to heart disease. Two genes, biphenyl hydrolase-like (BPHL) and CSRP3, contain positively selected sites in bear. Global analysis of evolution rates of hibernation-related genes in bear showed that they are largely conserved and slowly evolving genes, rather than novel and fast-evolving genes. Conclusion We provide a genomic resource for an important mammalian organism and our study sheds new light on the possible functions and evolution of bear genes. PMID:20338065

  1. Galactic chemical evolution with main-sequence mass loss and the distribution of F and G dwarfs

    International Nuclear Information System (INIS)

    Guzik, J.A.; Struck-Marcell, C.

    1988-01-01

    Simple closed galactic chemical-evolution models incorporating early main-sequence stellar mass loss have been developed for disk ages of 5, 10, and 15 Gyr. Relative to models without stellar mass loss, the models are shown to produce a 30-60 percent increase in the present mass ratio of dwarfs to dwarfs plus remnants, and a 200-250 percent increase in the total mass of late F dwarfs remaining on the main sequence at the current disk age. For present disk ages 5 and 10 Gyr, the total mass of mid-F dwarfs remaining on the main sequence is also shown to increase by 90-120 percent. It is concluded that models with main-sequence mass loss have a slightly reduced gas metallicity and slightly increased gas fraction midway through the evolution. 30 references

  2. Repetitive sequences and epigenetic modification: inseparable partners play important roles in the evolution of plant sex chromosomes.

    Science.gov (United States)

    Li, Shu-Fen; Zhang, Guo-Jun; Yuan, Jin-Hong; Deng, Chuan-Liang; Gao, Wu-Jun

    2016-05-01

    The present review discusses the roles of repetitive sequences played in plant sex chromosome evolution, and highlights epigenetic modification as potential mechanism of repetitive sequences involved in sex chromosome evolution. Sex determination in plants is mostly based on sex chromosomes. Classic theory proposes that sex chromosomes evolve from a specific pair of autosomes with emergence of a sex-determining gene(s). Subsequently, the newly formed sex chromosomes stop recombination in a small region around the sex-determining locus, and over time, the non-recombining region expands to almost all parts of the sex chromosomes. Accumulation of repetitive sequences, mostly transposable elements and tandem repeats, is a conspicuous feature of the non-recombining region of the Y chromosome, even in primitive one. Repetitive sequences may play multiple roles in sex chromosome evolution, such as triggering heterochromatization and causing recombination suppression, leading to structural and morphological differentiation of sex chromosomes, and promoting Y chromosome degeneration and X chromosome dosage compensation. In this article, we review the current status of this field, and based on preliminary evidence, we posit that repetitive sequences are involved in sex chromosome evolution probably via epigenetic modification, such as DNA and histone methylation, with small interfering RNAs as the mediator.

  3. Parametric inference for biological sequence analysis.

    Science.gov (United States)

    Pachter, Lior; Sturmfels, Bernd

    2004-11-16

    One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied to these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.

  4. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    Science.gov (United States)

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  5. RESEARCH NOTE Genome-based exome-sequencing analysis ...

    Indian Academy of Sciences (India)

    Navya

    2017-02-22

    Feb 22, 2017 ... Genome-based exome-sequencing analysis identifies GYG1, DIS3L, DDRGK1 genes ... Cardiology Division, Department of Internal Medicine, Severance .... with p values of <0.05 byanalyzing differences in allele distribution.

  6. Editorial: Special Issue on Algorithms for Sequence Analysis and Storage

    Directory of Open Access Journals (Sweden)

    Veli Mäkinen

    2014-03-01

    Full Text Available This special issue of Algorithms is dedicated to approaches to biological sequence analysis that have algorithmic novelty and potential for fundamental impact in methods used for genome research.

  7. Tools for integrated sequence-structure analysis with UCSF Chimera

    Directory of Open Access Journals (Sweden)

    Huang Conrad C

    2006-07-01

    Full Text Available Abstract Background Comparing related structures and viewing the structures in the context of sequence alignments are important tasks in protein structure-function research. While many programs exist for individual aspects of such work, there is a need for interactive visualization tools that: (a provide a deep integration of sequence and structure, far beyond mapping where a sequence region falls in the structure and vice versa; (b facilitate changing data of one type based on the other (for example, using only sequence-conserved residues to match structures, or adjusting a sequence alignment based on spatial fit; (c can be used with a researcher's own data, including arbitrary sequence alignments and annotations, closely or distantly related sets of proteins, etc.; and (d interoperate with each other and with a full complement of molecular graphics features. We describe enhancements to UCSF Chimera to achieve these goals. Results The molecular graphics program UCSF Chimera includes a suite of tools for interactive analyses of sequences and structures. Structures automatically associate with sequences in imported alignments, allowing many kinds of crosstalk. A novel method is provided to superimpose structures in the absence of a pre-existing sequence alignment. The method uses both sequence and secondary structure, and can match even structures with very low sequence identity. Another tool constructs structure-based sequence alignments from superpositions of two or more proteins. Chimera is designed to be extensible, and mechanisms for incorporating user-specific data without Chimera code development are also provided. Conclusion The tools described here apply to many problems involving comparison and analysis of protein structures and their sequences. Chimera includes complete documentation and is intended for use by a wide range of scientists, not just those in the computational disciplines. UCSF Chimera is free for non-commercial use and is

  8. Sequencing and Analysis of Neanderthal Genomic DNA

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  9. SVAMP: Sequence variation analysis, maps and phylogeny

    KAUST Repository

    Naeem, Raeece; Hidayah, Lailatul; Preston, Mark D.; Clark, Taane G.; Pain, Arnab

    2014-01-01

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis

  10. Analysis Of Segmental Duplications In The Pig Genome Based On Next-Generation Sequencing

    DEFF Research Database (Denmark)

    Fadista, João; Bendixen, Christian

    Segmental duplications are >1kb segments of duplicated DNA present in a genome with high sequence identity (>90%). They are associated with genomic rearrangements and provide a significant source of gene and genome evolution within mammalian genomes. Although segmental duplications have been...... extensively studied in other organisms, its analysis in pig has been hampered by the lack of a complete pig genome assembly. By measuring the depth of coverage of Illumina whole-genome shotgun sequencing reads of the Tabasco animal aligned to the latest pig genome assembly (Sus scrofa 10 – based also...... and their associated copy number alterations, focusing on the global organization of these segments and their possible functional significance in porcine phenotypes. This work provides insights into mammalian genome evolution and generates a valuable resource for porcine genomics research...

  11. Phylogenomic Analysis and Dynamic Evolution of Chloroplast Genomes in Salicaceae

    Directory of Open Access Journals (Sweden)

    Yuan Huang

    2017-06-01

    Full Text Available Chloroplast genomes of plants are highly conserved in both gene order and gene content. Analysis of the whole chloroplast genome is known to provide much more informative DNA sites and thus generates high resolution for plant phylogenies. Here, we report the complete chloroplast genomes of three Salix species in family Salicaceae. Phylogeny of Salicaceae inferred from complete chloroplast genomes is generally consistent with previous studies but resolved with higher statistical support. Incongruences of phylogeny, however, are observed in genus Populus, which most likely results from homoplasy. By comparing three Salix chloroplast genomes with the published chloroplast genomes of other Salicaceae species, we demonstrate that the synteny and length of chloroplast genomes in Salicaceae are highly conserved but experienced dynamic evolution among species. We identify seven positively selected chloroplast genes in Salicaceae, which might be related to the adaptive evolution of Salicaceae species. Comparative chloroplast genome analysis within the family also indicates that some chloroplast genes are lost or became pseudogenes, infer that the chloroplast genes horizontally transferred to the nucleus genome. Based on the complete nucleus genome sequences from two Salicaceae species, we remarkably identify that the entire chloroplast genome is indeed transferred and integrated to the nucleus genome in the individual of the reference genome of P. trichocarpa at least once. This observation, along with presence of the large nuclear plastid DNA (NUPTs and NUPTs-containing multiple chloroplast genes in their original order in the chloroplast genome, favors the DNA-mediated hypothesis of organelle to nucleus DNA transfer. Overall, the phylogenomic analysis using chloroplast complete genomes clearly elucidates the phylogeny of Salicaceae. The identification of positively selected chloroplast genes and dynamic chloroplast-to-nucleus gene transfers in

  12. Evolution of amino acid metabolism inferred through cladistic analysis.

    Science.gov (United States)

    Cunchillos, Chomin; Lecointre, Guillaume

    2003-11-28

    Because free amino acids were most probably available in primitive abiotic environments, their metabolism is likely to have provided some of the very first metabolic pathways of life. What were the first enzymatic reactions to emerge? A cladistic analysis of metabolic pathways of the 16 aliphatic amino acids and 2 portions of the Krebs cycle was performed using four criteria of homology. The analysis is not based on sequence comparisons but, rather, on coding similarities in enzyme properties. The properties used are shared specific enzymatic activity, shared enzymatic function without substrate specificity, shared coenzymes, and shared functional family. The tree shows that the earliest pathways to emerge are not portions of the Krebs cycle but metabolisms of aspartate, asparagine, glutamate, and glutamine. The views of Horowitz (Horowitz, N. H. (1945) Proc. Natl. Acad. Sci. U. S. A. 31, 153-157) and Cordón (Cordón, F. (1990) Tratado Evolucionista de Biologia, Aguilar, Madrid, Spain), according to which the upstream reactions in the catabolic pathways and the downstream reactions in the anabolic pathways are the earliest in evolution, are globally corroborated; however, with some exceptions. These are due to later opportunistic connections of pathways (actually already suggested by these authors). Earliest enzymatic functions are mostly catabolic; they were deaminations, transaminations, and decarboxylations. From the consensus tree we extracted four time spans for amino acid metabolism development. For some amino acids catabolism and biosynthesis occurred at the same time (Asp, Glu, Lys, Leu, Ala, Val, Ile, Pro, Arg). For others ultimate reactions that use amino acids as a substrate or as a product are distinct in time, with catabolism preceding anabolism for Asn, Gln, and Cys and anabolism preceding catabolism for Ser, Met, and Thr. Cladistic analysis of the structure of biochemical pathways makes hypotheses in biochemical evolution explicit and parsimonious.

  13. DSAP: deep-sequencing small RNA analysis pipeline.

    Science.gov (United States)

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  14. Quantiprot - a Python package for quantitative analysis of protein sequences.

    Science.gov (United States)

    Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

    2017-07-17

    The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.

  15. Whole-genome sequencing reveals the mechanisms for evolution of streptomycin resistance in Lactobacillus plantarum.

    Science.gov (United States)

    Zhang, Fuxin; Gao, Jiayuan; Wang, Bini; Huo, Dongxue; Wang, Zhaoxia; Zhang, Jiachao; Shao, Yuyu

    2018-04-01

    In this research, we investigated the evolution of streptomycin resistance in Lactobacillus plantarum ATCC14917, which was passaged in medium containing a gradually increasing concentration of streptomycin. After 25 d, the minimum inhibitory concentration (MIC) of L. plantarum ATCC14917 had reached 131,072 µg/mL, which was 8,192-fold higher than the MIC of the original parent isolate. The highly resistant L. plantarum ATCC14917 isolate was then passaged in antibiotic-free medium to determine the stability of resistance. The MIC value of the L. plantarum ATCC14917 isolate decreased to 2,048 µg/mL after 35 d but remained constant thereafter, indicating that resistance was irreversible even in the absence of selection pressure. Whole-genome sequencing of parent isolates, control isolates, and isolates following passage was used to study the resistance mechanism of L. plantarum ATCC14917 to streptomycin and adaptation in the presence and absence of selection pressure. Five mutated genes (single nucleotide polymorphisms and structural variants) were verified in highly resistant L. plantarum ATCC14917 isolates, which were related to ribosomal protein S12, LPXTG-motif cell wall anchor domain protein, LrgA family protein, Ser/Thr phosphatase family protein, and a hypothetical protein that may correlate with resistance to streptomycin. After passage in streptomycin-free medium, only the mutant gene encoding ribosomal protein S12 remained; the other 4 mutant genes had reverted to the wild type as found in the parent isolate. Although the MIC value of L. plantarum ATCC14917 was reduced in the absence of selection pressure, it remained 128-fold higher than the MIC value of the parent isolate, indicating that ribosomal protein S12 may play an important role in streptomycin resistance. Using the mobile elements database, we demonstrated that streptomycin resistance-related genes in L. plantarum ATCC14917 were not located on mobile elements. This research offers a way of

  16. A systematic identification of Kolobok superfamily transposons in Trichomonas vaginalis and sequence analysis on related transposases

    Institute of Scientific and Technical Information of China (English)

    Qingshu Meng; Kaifu Chen; Lina Ma; Songnian Hu; Jun Yu

    2011-01-01

    Transposons are sequence elements widely distributed among genomes of all three kingdoms of life, providing genomic changes and playing significant roles in genome evolution. Trichomonas vaginalis is an excellent model system for transposon study since its genome ( ~ 160 Mb) has been sequenced and is composed of ~65% transposons and other repetitive elements. In this study, we primarily report the identification of Kolobok-type transposons (termed tvBac) in T. vaginalis and the results of transposase sequence analysis. We categorized 24 novel subfamilies of the Kolobok element, including one autonomous subfamily and 23 non-autonomous subfamilies. We also identified a novel H2CH motif in tvBac transposases based on multiple sequence alignment. In addition, we supposed that tvBac and Mutator transposons may have evolved independently from a common ancestor according to our phylogenetic analysis. Our results provide basic information for the understanding of the function and evolution of tvBac transposons in particular and other related transposon families in general.

  17. Sequencing and analysis of the Mediterranean amphioxus (Branchiostoma lanceolatum transcriptome.

    Directory of Open Access Journals (Sweden)

    Silvan Oulion

    Full Text Available BACKGROUND: The basally divergent phylogenetic position of amphioxus (Cephalochordata, as well as its conserved morphology, development and genetics, make it the best proxy for the chordate ancestor. Particularly, studies using the amphioxus model help our understanding of vertebrate evolution and development. Thus, interest for the amphioxus model led to the characterization of both the transcriptome and complete genome sequence of the American species, Branchiostoma floridae. However, recent technical improvements allowing induction of spawning in the laboratory during the breeding season on a daily basis with the Mediterranean species Branchiostoma lanceolatum have encouraged European Evo-Devo researchers to adopt this species as a model even though no genomic or transcriptomic data have been available. To fill this need we used the pyrosequencing method to characterize the B. lanceolatum transcriptome and then compared our results with the published transcriptome of B. floridae. RESULTS: Starting with total RNA from nine different developmental stages of B. lanceolatum, a normalized cDNA library was constructed and sequenced on Roche GS FLX (Titanium mode. Around 1.4 million of reads were produced and assembled into 70,530 contigs (average length of 490 bp. Overall 37% of the assembled sequences were annotated by BlastX and their Gene Ontology terms were determined. These results were then compared to genomic and transcriptomic data of B. floridae to assess similarities and specificities of each species. CONCLUSION: We obtained a high-quality amphioxus (B. lanceolatum reference transcriptome using a high throughput sequencing approach. We found that 83% of the predicted genes in the B. floridae complete genome sequence are also found in the B. lanceolatum transcriptome, while only 41% were found in the B. floridae transcriptome obtained with traditional Sanger based sequencing. Therefore, given the high degree of sequence conservation

  18. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse.

    Science.gov (United States)

    Orlando, Ludovic; Ginolhac, Aurélien; Zhang, Guojie; Froese, Duane; Albrechtsen, Anders; Stiller, Mathias; Schubert, Mikkel; Cappellini, Enrico; Petersen, Bent; Moltke, Ida; Johnson, Philip L F; Fumagalli, Matteo; Vilstrup, Julia T; Raghavan, Maanasa; Korneliussen, Thorfinn; Malaspinas, Anna-Sapfo; Vogt, Josef; Szklarczyk, Damian; Kelstrup, Christian D; Vinther, Jakob; Dolocan, Andrei; Stenderup, Jesper; Velazquez, Amhed M V; Cahill, James; Rasmussen, Morten; Wang, Xiaoli; Min, Jiumeng; Zazula, Grant D; Seguin-Orlando, Andaine; Mortensen, Cecilie; Magnussen, Kim; Thompson, John F; Weinstock, Jacobo; Gregersen, Kristian; Røed, Knut H; Eisenmann, Véra; Rubin, Carl J; Miller, Donald C; Antczak, Douglas F; Bertelsen, Mads F; Brunak, Søren; Al-Rasheid, Khaled A S; Ryder, Oliver; Andersson, Leif; Mundy, John; Krogh, Anders; Gilbert, M Thomas P; Kjær, Kurt; Sicheritz-Ponten, Thomas; Jensen, Lars Juhl; Olsen, Jesper V; Hofreiter, Michael; Nielsen, Rasmus; Shapiro, Beth; Wang, Jun; Willerslev, Eske

    2013-07-04

    The rich fossil record of equids has made them a model for evolutionary processes. Here we present a 1.12-times coverage draft genome from a horse bone recovered from permafrost dated to approximately 560-780 thousand years before present (kyr BP). Our data represent the oldest full genome sequence determined so far by almost an order of magnitude. For comparison, we sequenced the genome of a Late Pleistocene horse (43 kyr BP), and modern genomes of five domestic horse breeds (Equus ferus caballus), a Przewalski's horse (E. f. przewalskii) and a donkey (E. asinus). Our analyses suggest that the Equus lineage giving rise to all contemporary horses, zebras and donkeys originated 4.0-4.5 million years before present (Myr BP), twice the conventionally accepted time to the most recent common ancestor of the genus Equus. We also find that horse population size fluctuated multiple times over the past 2 Myr, particularly during periods of severe climatic changes. We estimate that the Przewalski's and domestic horse populations diverged 38-72 kyr BP, and find no evidence of recent admixture between the domestic horse breeds and the Przewalski's horse investigated. This supports the contention that Przewalski's horses represent the last surviving wild horse population. We find similar levels of genetic variation among Przewalski's and domestic populations, indicating that the former are genetically viable and worthy of conservation efforts. We also find evidence for continuous selection on the immune system and olfaction throughout horse evolution. Finally, we identify 29 genomic regions among horse breeds that deviate from neutrality and show low levels of genetic variation compared to the Przewalski's horse. Such regions could correspond to loci selected early during domestication.

  19. The Complete Chloroplast and Mitochondrial Genome Sequences of Boea hygrometrica: Insights into the Evolution of Plant Organellar Genomes

    Science.gov (United States)

    Wang, Xumin; Deng, Xin; Zhang, Xiaowei; Hu, Songnian; Yu, Jun

    2012-01-01

    The complete nucleotide sequences of the chloroplast (cp) and mitochondrial (mt) genomes of resurrection plant Boea hygrometrica (Bh, Gesneriaceae) have been determined with the lengths of 153,493 bp and 510,519 bp, respectively. The smaller chloroplast genome contains more genes (147) with a 72% coding sequence, and the larger mitochondrial genome have less genes (65) with a coding faction of 12%. Similar to other seed plants, the Bh cp genome has a typical quadripartite organization with a conserved gene in each region. The Bh mt genome has three recombinant sequence repeats of 222 bp, 843 bp, and 1474 bp in length, which divide the genome into a single master circle (MC) and four isomeric molecules. Compared to other angiosperms, one remarkable feature of the Bh mt genome is the frequent transfer of genetic material from the cp genome during recent Bh evolution. We also analyzed organellar genome evolution in general regarding genome features as well as compositional dynamics of sequence and gene structure/organization, providing clues for the understanding of the evolution of organellar genomes in plants. The cp-derived sequences including tRNAs found in angiosperm mt genomes support the conclusion that frequent gene transfer events may have begun early in the land plant lineage. PMID:22291979

  20. Pleiotropy constrains the evolution of protein but not regulatory sequences in a transcription regulatory network influencing complex social behaviours

    Directory of Open Access Journals (Sweden)

    Daria eMolodtsova

    2014-12-01

    Full Text Available It is increasingly apparent that genes and networks that influence complex behaviour are evolutionary conserved, which is paradoxical considering that behaviour is labile over evolutionary timescales. How does adaptive change in behaviour arise if behaviour is controlled by conserved, pleiotropic, and likely evolutionary constrained genes? Pleiotropy and connectedness are known to constrain the general rate of protein evolution, prompting some to suggest that the evolution of complex traits, including behaviour, is fuelled by regulatory sequence evolution. However, we seldom have data on the strength of selection on mutations in coding and regulatory sequences, and this hinders our ability to study how pleiotropy influences coding and regulatory sequence evolution. Here we use population genomics to estimate the strength of selection on coding and regulatory mutations for a transcriptional regulatory network that influences complex behaviour of honey bees. We found that replacement mutations in highly connected transcription factors and target genes experience significantly stronger negative selection relative to weakly connected transcription factors and targets. Adaptively evolving proteins were significantly more likely to reside at the periphery of the regulatory network, while proteins with signs of negative selection were near the core of the network. Interestingly, connectedness and network structure had minimal influence on the strength of selection on putative regulatory sequences for both transcription factors and their targets. Our study indicates that adaptive evolution of complex behaviour can arise because of positive selection on protein-coding mutations in peripheral genes, and on regulatory sequence mutations in both transcription factors and their targets throughout the network.

  1. Analysis of methods. [information systems evolution environment

    Science.gov (United States)

    Mayer, Richard J. (Editor); Ackley, Keith A.; Wells, M. Sue; Mayer, Paula S. D.; Blinn, Thomas M.; Decker, Louis P.; Toland, Joel A.; Crump, J. Wesley; Menzel, Christopher P.; Bodenmiller, Charles A.

    1991-01-01

    Information is one of an organization's most important assets. For this reason the development and maintenance of an integrated information system environment is one of the most important functions within a large organization. The Integrated Information Systems Evolution Environment (IISEE) project has as one of its primary goals a computerized solution to the difficulties involved in the development of integrated information systems. To develop such an environment a thorough understanding of the enterprise's information needs and requirements is of paramount importance. This document is the current release of the research performed by the Integrated Development Support Environment (IDSE) Research Team in support of the IISEE project. Research indicates that an integral part of any information system environment would be multiple modeling methods to support the management of the organization's information. Automated tool support for these methods is necessary to facilitate their use in an integrated environment. An integrated environment makes it necessary to maintain an integrated database which contains the different kinds of models developed under the various methodologies. In addition, to speed the process of development of models, a procedure or technique is needed to allow automatic translation from one methodology's representation to another while maintaining the integrity of both. The purpose for the analysis of the modeling methods included in this document is to examine these methods with the goal being to include them in an integrated development support environment. To accomplish this and to develop a method for allowing intra-methodology and inter-methodology model element reuse, a thorough understanding of multiple modeling methodologies is necessary. Currently the IDSE Research Team is investigating the family of Integrated Computer Aided Manufacturing (ICAM) DEFinition (IDEF) languages IDEF(0), IDEF(1), and IDEF(1x), as well as ENALIM, Entity

  2. Nonlinear analysis of river flow time sequences

    Science.gov (United States)

    Porporato, Amilcare; Ridolfi, Luca

    1997-06-01

    Within the field of chaos theory several methods for the analysis of complex dynamical systems have recently been proposed. In light of these ideas we study the dynamics which control the behavior over time of river flow, investigating the existence of a low-dimension deterministic component. The present article follows the research undertaken in the work of Porporato and Ridolfi [1996a] in which some clues as to the existence of chaos were collected. Particular emphasis is given here to the problem of noise and to nonlinear prediction. With regard to the latter, the benefits obtainable by means of the interpolation of the available time series are reported and the remarkable predictive results attained with this nonlinear method are shown.

  3. Evolution Models of Helium White Dwarf–Main-sequence Star Merger Remnants

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Xianfei; Bi, Shaolan [Department of Astronomy, Beijing Normal University, Beijing, 100875 (China); Hall, Philip D.; Jeffery, C. Simon, E-mail: zxf@bnu.edu.cn [Armagh Observatory, College Hill, Armagh BT61 9DG (United Kingdom)

    2017-02-01

    It is predicted that orbital decay by gravitational-wave radiation and tidal interaction will cause some close binary stars to merge within a Hubble time. The merger of a helium-core white dwarf with a main-sequence (MS) star can produce a red giant branch star that has a low-mass hydrogen envelope when helium is ignited and thus become a hot subdwarf. Because detailed calculations have not been made, we compute post-merger models with a stellar evolution code. We find the evolutionary paths available to merger remnants and find the pre-merger conditions that lead to the formation of hot subdwarfs. We find that some such mergers result in the formation of stars with intermediate helium-rich surfaces. These stars later develop helium-poor surfaces owing to diffusion. Combining our results with a model population and comparing to observed stars, we find that some observed intermediate helium-rich hot subdwarfs can be explained as the remnants of the mergers of helium-core white dwarfs with low-mass MS stars.

  4. Evolution Models of Helium White Dwarf–Main-sequence Star Merger Remnants

    International Nuclear Information System (INIS)

    Zhang, Xianfei; Bi, Shaolan; Hall, Philip D.; Jeffery, C. Simon

    2017-01-01

    It is predicted that orbital decay by gravitational-wave radiation and tidal interaction will cause some close binary stars to merge within a Hubble time. The merger of a helium-core white dwarf with a main-sequence (MS) star can produce a red giant branch star that has a low-mass hydrogen envelope when helium is ignited and thus become a hot subdwarf. Because detailed calculations have not been made, we compute post-merger models with a stellar evolution code. We find the evolutionary paths available to merger remnants and find the pre-merger conditions that lead to the formation of hot subdwarfs. We find that some such mergers result in the formation of stars with intermediate helium-rich surfaces. These stars later develop helium-poor surfaces owing to diffusion. Combining our results with a model population and comparing to observed stars, we find that some observed intermediate helium-rich hot subdwarfs can be explained as the remnants of the mergers of helium-core white dwarfs with low-mass MS stars.

  5. Accident sequence analysis of human-computer interface design

    International Nuclear Information System (INIS)

    Fan, C.-F.; Chen, W.-H.

    2000-01-01

    It is important to predict potential accident sequences of human-computer interaction in a safety-critical computing system so that vulnerable points can be disclosed and removed. We address this issue by proposing a Multi-Context human-computer interaction Model along with its analysis techniques, an Augmented Fault Tree Analysis, and a Concurrent Event Tree Analysis. The proposed augmented fault tree can identify the potential weak points in software design that may induce unintended software functions or erroneous human procedures. The concurrent event tree can enumerate possible accident sequences due to these weak points

  6. Food Fish Identification from DNA Extraction through Sequence Analysis

    Science.gov (United States)

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  7. Characterization of shale gas enrichment in the Wufeng Formation–Longmaxi Formation in the Sichuan Basin of China and evaluation of its geological construction–transformation evolution sequence

    Directory of Open Access Journals (Sweden)

    Zhiliang He

    2017-02-01

    Full Text Available Shale gas in Upper Ordovician Wufeng Formation–Lower Silurian Longmaxi Formation in the Sichuan Basin is one of the key strata being explored and developed in China, where shale gas reservoirs have been found in Fuling, Weiyuan, Changning and Zhaotong. Characteristics of shale gas enrichment in the formation shown by detailed profiling and analysis are summarized as “high, handsome and rich”. “High” mainly refers to the high quality of original materials for the formation of shale with excellent key parameters, including the good type and high abundance of organic matters, high content of brittle minerals and moderate thermal evolution. “Handsome” means late and weak deformation, favorable deformation mode and structure, and appropriate uplift and current burial depth. “Rich” includes high gas content, high formation pressure coefficient, good reservoir property, favorable reservoir scale transformation and high initial and final output, with relative ease of development and obvious economic benefit. For shale gas enrichment and high yield, it is important that the combination of shale was deposited and formed in excellent conditions (geological construction, and then underwent appropriate tectonic deformation, uplift, and erosion (geological transformation. Evaluation based on geological construction (evolution sequence from formation to the reservoir includes sequence stratigraphy and sediment, hydrocarbon generation and formation of reservoir pores. Based on geological transformation (evolution sequence from the reservoir to preservation, the strata should be evaluated for structural deformation, the formation of reservoir fracture and preservation of shale gas. The evaluation of the “construction - transformation” sequence is to cover the whole process of shale gas formation and preservation. This way, both positive and negative effects of the formation-transformation sequence on shale gas are assessed. The evaluation

  8. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers.

    Directory of Open Access Journals (Sweden)

    Stephan Pabinger

    Full Text Available Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM. Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage

  9. Multilocus Sequence Analysis and rpoB Sequencing of Mycobacterium abscessus (Sensu Lato) Strains▿

    Science.gov (United States)

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-01-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536T, M. massiliense CIP 108297T, and M. bolletii CIP 108541T) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the clustering

  10. Multilocus sequence analysis and rpoB sequencing of Mycobacterium abscessus (sensu lato) strains.

    Science.gov (United States)

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-02-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536(T), M. massiliense CIP 108297(T), and M. bolletii CIP 108541(T)) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the

  11. An optimum analysis sequence for environmental gamma-ray spectrometry

    Energy Technology Data Exchange (ETDEWEB)

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L., E-mail: fta777@hotmail.co [Universidad Autonoma de Zacatecas, Centro Regional de Estudis Nucleares, Calle Cipres No. 10, Fracc. La Penuela, 98068 Zacatecas (Mexico)

    2010-10-15

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced {chi}{sup 2} criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  12. An optimum analysis sequence for environmental gamma-ray spectrometry

    International Nuclear Information System (INIS)

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L.

    2010-10-01

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced χ 2 criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  13. Whole genome sequencing reveals complex evolution patterns of multidrug-resistant Mycobacterium tuberculosis Beijing strains in patients.

    Directory of Open Access Journals (Sweden)

    Matthias Merker

    Full Text Available Multidrug-resistant (MDR Mycobacterium tuberculosis complex (MTBC strains represent a major threat for tuberculosis (TB control. Treatment of MDR-TB patients is long and less effective, resulting in a significant number of treatment failures. The development of further resistances leads to extensively drug-resistant (XDR variants. However, data on the individual reasons for treatment failure, e.g. an induced mutational burst, and on the evolution of bacteria in the patient are only sparsely available. To address this question, we investigated the intra-patient evolution of serial MTBC isolates obtained from three MDR-TB patients undergoing longitudinal treatment, finally leading to XDR-TB. Sequential isolates displayed identical IS6110 fingerprint patterns, suggesting the absence of exogenous re-infection. We utilized whole genome sequencing (WGS to screen for variations in three isolates from Patient A and four isolates from Patient B and C, respectively. Acquired polymorphisms were subsequently validated in up to 15 serial isolates by Sanger sequencing. We determined eight (Patient A and nine (Patient B polymorphisms, which occurred in a stepwise manner during the course of the therapy and were linked to resistance or a potential compensatory mechanism. For both patients, our analysis revealed the long-term co-existence of clonal subpopulations that displayed different drug resistance allele combinations. Out of these, the most resistant clone was fixed in the population. In contrast, baseline and follow-up isolates of Patient C were distinguished each by eleven unique polymorphisms, indicating an exogenous re-infection with an XDR strain not detected by IS6110 RFLP typing. Our study demonstrates that intra-patient microevolution of MDR-MTBC strains under longitudinal treatment is more complex than previously anticipated. However, a mutator phenotype was not detected. The presence of different subpopulations might confound phenotypic and

  14. The Iquique 2014 sequence: understanding its nucleation and propagation from the seismicity evolution

    Science.gov (United States)

    Fuenzalida, A.; Rietbrock, A.; Woollam, J.; Tavera, H.; Ruiz, S.

    2017-12-01

    during the sequence can be associated to the observed seamounts belonging to the Iquique ridge by previous marine experiment. To conclude our study, we perform a space and time analysis of the seismicity and we propose several scenarios to explain the nucleation of the earthquake and the way on which the seismicity behave during the sequence.

  15. Transcriptome sequencing and De Novo analysis of Youngia japonica using the illumina platform.

    Directory of Open Access Journals (Sweden)

    Yulan Peng

    Full Text Available Youngia japonica, a weed species distributed worldwide, has been widely used in traditional Chinese medicine. It is an ideal plant for studying the evolution of Asteraceae plants because of its short life history and abundant source. However, little is known about its evolution and genetic diversity. In this study, de novo transcriptome sequencing was conducted for the first time for the comprehensive analysis of the genetic diversity of Y. japonica. The Y. japonica transcriptome was sequenced using Illumina paired-end sequencing technology. We produced 21,847,909 high-quality reads for Y. japonica and assembled them into contigs. A total of 51,850 unigenes were identified, among which 46,087 were annotated in the NCBI non-redundant protein database and 41,752 were annotated in the Swiss-Prot database. We mapped 9,125 unigenes onto 163 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database. In addition, 3,648 simple sequence repeats (SSRs were detected. Our data provide the most comprehensive transcriptome resource currently available for Y. japonica. C4 photosynthesis unigenes were found in the biological process of Y. japonica. There were 5596 unigenes related to defense response and 1344 ungienes related to signal transduction mechanisms (10.95%. These data provide insights into the genetic diversity of Y. japonica. Numerous SSRs contributed to the development of novel markers. These data may serve as a new valuable resource for genomic studies on Youngia and, more generally, Cichoraceae.

  16. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    Science.gov (United States)

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  17. Evolutive and regressive soil sequences for characterization of soils in laurel forest (Tenerife, Canary Islands

    Directory of Open Access Journals (Sweden)

    José Asterio Guerra-García

    2014-03-01

    Full Text Available Soil degradation processes have achieved the recognition of a global environmental problem in recent years. It has been suggested by various international forums and organizations that in order to adequately establish methods to combat land degradation, it is necessary to evaluate this degradation locally and at a detailed scale. The evaluation of soil degradation of natural ecosystems at a detailed scale requires the definition of standards to which to compare this degradation. To define these standards and properly handle the processes that give rise to variations in soil quality and degradation, it is necessary to establish in some detail the pedogenic processes that have or have not taken place in a particular area and which lead to the formation of a mature soil. A mature soil should be considered as standard in these situations and, therefore, a non-degraded soil. This paper presents the possible evolutive and regressive sequences of soil, and provides some examples of using this methodology to evaluate the degradation of the same in the Monteverde of the island of Tenerife. It also presents some physical, chemical and mineralogical properties of climacic mature soils, degraded soils and low quality soils, and examines their similarities and differences in this bioclimatic environment and on different parent materials. Thus it is observed that the main processes of degradation in these areas are related to plant cover modifications that lead to the decreasing protection of the soil surface, which results in the long term, in the onset of degradation processes such as water erosion, biological degradation, loss of andic properties, compaction and sealing and crusting surface, loss of water retention capacity, illuviation, etc. Climacic soils that can be found in areas of steep lava flows are Leptosols, while gently sloping areas are Cambisols and Andosols. On pyroclastic materials there are vitric Andosols and andic Andosols according to

  18. Genomic sequence around butterfly wing development genes: annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Inês C Conceição

    Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high

  19. RNA-Seq analysis and gene discovery of Andrias davidianus using Illumina short read sequencing.

    Directory of Open Access Journals (Sweden)

    Fenggang Li

    Full Text Available The Chinese giant salamander, Andrias davidianus, is an important species in the course of evolution; however, there is insufficient genomic data in public databases for understanding its immunologic mechanisms. High-throughput transcriptome sequencing is necessary to generate an enormous number of transcript sequences from A. davidianus for gene discovery. In this study, we generated more than 40 million reads from samples of spleen and skin tissue using the Illumina paired-end sequencing technology. De novo assembly yielded 87,297 transcripts with a mean length of 734 base pairs (bp. Based on the sequence similarities, searching with known proteins, 38,916 genes were identified. Gene enrichment analysis determined that 981 transcripts were assigned to the immune system. Tissue-specific expression analysis indicated that 443 of transcripts were specifically expressed in the spleen and skin. Among these transcripts, 147 transcripts were found to be involved in immune responses and inflammatory reactions, such as fucolectin, β-defensins and lymphotoxin beta. Eight tissue-specific genes were selected for validation using real time reverse transcription quantitative PCR (qRT-PCR. The results showed that these genes were significantly more expressed in spleen and skin than in other tissues, suggesting that these genes have vital roles in the immune response. This work provides a comprehensive genomic sequence resource for A. davidianus and lays the foundation for future research on the immunologic and disease resistance mechanisms of A. davidianus and other amphibians.

  20. Evolution and strengthening of the Calabrian Regional Seismic Network during the Pollino sequence

    Science.gov (United States)

    D'Alessandro, Antonino; Gervasi, Anna; Guerra, Ignazio

    2013-04-01

    In the last three years the Calabria-Lucania border area is affected by an intense seismic activity generated by the activation of geological structures which be seat of clusters of microearthquakes, with energy release sufficient to be felt and to generate alarm and bother. Besides to the historical memory of the inhabitants of Mormanno (the town most affected of macroseismic effects) there are some historical documents that indicate the occurrence of a similar seismic crisis in 1888. A more recent seismic sequence, the first monitored by seismic instruments, occurred in 1973-1974. In the last case, the activity started in early 2010 and is still ongoing. The two shocks of ML = 4.3 and 5.0 and the the very long time duration differs this crisis from the previous ones. Given this background, in 1981 was installed at Mormanno a seismic station (MMN) belonging to Regional Seismic Network of the University of Calabria (RSRC), now also a station of the Italian National Seismic Network of the Istituto Nazionale di Geofisica Vulcanolgia (INSN-INGV). This seismic station made it possible to follow the evolution of seismicity in this area and in particular the progressive increase in seismic activity started in 2010. Since 2010, some 3D stand-alone, was installed by the University of Calabria. Further stations of INGV were installed in November 2011 after a sharp increase of the energy release and subsequently by the INGV and the GeoForschungsZentrum (Potsdam) after the main shock of the whole sequence. Seismic networks are powerful tools for understanding active tectonic processes in a monitored seismically active region. However, the optimal monitoring of a seismic region requires the assessment of the seismic network capabilities to identify seismogenic areas that are not adequately covered and to quantify measures that will allow the network improvement. In this paper we examine in detail the evolution and the strengthening of the RSRC in the last years analyzing the

  1. Rapid sequencing of the bamboo mitochondrial genome using Illumina technology and parallel episodic evolution of organelle genomes in grasses.

    Science.gov (United States)

    Ma, Peng-Fei; Guo, Zhen-Hua; Li, De-Zhu

    2012-01-01

    Compared to their counterparts in animals, the mitochondrial (mt) genomes of angiosperms exhibit a number of unique features. However, unravelling their evolution is hindered by the few completed genomes, of which are essentially Sanger sequenced. While next-generation sequencing technologies have revolutionized chloroplast genome sequencing, they are just beginning to be applied to angiosperm mt genomes. Chloroplast genomes of grasses (Poaceae) have undergone episodic evolution and the evolutionary rate was suggested to be correlated between chloroplast and mt genomes in Poaceae. It is interesting to investigate whether correlated rate change also occurred in grass mt genomes as expected under lineage effects. A time-calibrated phylogenetic tree is needed to examine rate change. We determined a largely completed mt genome from a bamboo, Ferrocalamus rimosivaginus (Poaceae), through Illumina sequencing of total DNA. With combination of de novo and reference-guided assembly, 39.5-fold coverage Illumina reads were finally assembled into scaffolds totalling 432,839 bp. The assembled genome contains nearly the same genes as the completed mt genomes in Poaceae. For examining evolutionary rate in grass mt genomes, we reconstructed a phylogenetic tree including 22 taxa based on 31 mt genes. The topology of the well-resolved tree was almost identical to that inferred from chloroplast genome with only minor difference. The inconsistency possibly derived from long branch attraction in mtDNA tree. By calculating absolute substitution rates, we found significant rate change (∼4-fold) in mt genome before and after the diversification of Poaceae both in synonymous and nonsynonymous terms. Furthermore, the rate change was correlated with that of chloroplast genomes in grasses. Our result demonstrates that it is a rapid and efficient approach to obtain angiosperm mt genome sequences using Illumina sequencing technology. The parallel episodic evolution of mt and chloroplast

  2. Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes

    Directory of Open Access Journals (Sweden)

    Rebecca M. Davidson

    2011-11-01

    Full Text Available Transcriptome sequencing is a powerful method for studying global expression patterns in large, complex genomes. Evaluation of sequence-based expression profiles during reproductive development would provide functional annotation to genes underlying agronomic traits. We generated transcriptome profiles for 12 diverse maize ( L. reproductive tissues representing male, female, developing seed, and leaf tissues using high throughput transcriptome sequencing. Overall, ∼80% of annotated genes were expressed. Comparative analysis between sequence and hybridization-based methods demonstrated the utility of ribonucleic acid sequencing (RNA-seq for expression determination and differentiation of paralagous genes (∼85% of maize genes. Analysis of 4975 gene families across reproductive tissues revealed expression divergence is proportional to family size. In all pairwise comparisons between tissues, 7 (pre- vs. postemergence cobs to 48% (pollen vs. ovule of genes were differentially expressed. Genes with expression restricted to a single tissue within this study were identified with the highest numbers observed in leaves, endosperm, and pollen. Coexpression network analysis identified 17 gene modules with complex and shared expression patterns containing many previously described maize genes. The data and analyses in this study provide valuable tools through improved gene annotation, gene family characterization, and a core set of candidate genes to further characterize maize reproductive development and improve grain yield potential.

  3. Phylogenetic relationships and evolution of Crassulaceae inferred from matK sequence data.

    Science.gov (United States)

    Mort, Mark E.; Soltis, Douglas E.; Soltis, Pamela S.; Francisco-Ortega, Javier; Santos-Guerra, Arnoldo

    2001-01-01

    Chloroplast gene matK sequence data were used to estimate the phylogeny of 112 species of Crassulaceae sampled from 33 genera and all six recognized subfamilies. Our analyses suggest that five of six subfamilies recognized in the most recent comprehensive classification of the family are not monophyletic. Instead, we recovered a basal split in Crassulaceae between the southern African CRASSULA: clade (Crassuloideae) and the rest of the family (Sedoideae). These results are compatible with recent studies of cpDNA restriction site analyses. Within Sedoideae, four subclades were also recovered: KALANCHOE:, Leucosedum, Acre, and AEONIUM:; evidence also exists for a TELEPHIUM: clade and SEMPERVIVUM: clade. The genus SEDUM: is highly polyphyletic with representatives spread throughout the large Sedoideae clade. Sympetaly and polymerous flowers have arisen multiple times in Crassulaceae and thus are not appropriate characters upon which to base subfamilial limits, as has been done in the past. One floral character, haplostemy, appears to be confined to the well-supported CRASSULA: clade. Our analyses suggest a southern African origin of the family, with subsequent dispersal northward into the Mediterranean region. From there, the family spread to Asia/eastern Europe and northern Europe; two separate lineages of European Crassulaceae subsequently dispersed to North America and underwent substantial diversification. Our analyses also suggest that the original base chromosome number in Crassulaceae is x = 8 and that polyploidy has played an important role in seven clades. Three of these clades are exclusively polyploid (SEMPERVIVUM: clade and two subclades within the KALANCHOE: and AEONIUM: clades), whereas four (Crassula, Telephium, Leucosedum, and ACRE: clades) comprise both diploid and polyploid taxa. Polyploidy is particularly rampant and cytological evolution especially complex in the ACRE: clade.

  4. Lithium abundances for 185 main-sequence stars: Galactic evolution and stellar depletion of lithium

    Science.gov (United States)

    Chen, Y. Q.; Nissen, P. E.; Benoni, T.; Zhao, G.

    2001-06-01

    We present a survey of lithium abundances in 185 main-sequence field stars with 5600 interesting result from this study is the presence of a large gap in the log varepsilon (Li) - Teff plane, which distinguishes ``Li-dip'' stars like those first identified in the Hyades cluster by Boesgaard & Tripicco (\\cite{Boesgaard86}) from other stars with a much higher Li abundance. The Li-dip stars concentrate on a certain mass, which decreases with metallicity from about 1.4 Msun at solar metallicity to 1.1 Msun at [Fe/H] =~ -1.0. Excluding the Li-dip stars and a small group of lower mass stars with Teff rate of angular momentum loss. It cannot be excluded, however, that a cosmic scatter of the Li abundance in the Galaxy at a given metallicity contributes to the dispersion in Li abundance. These problems make it difficult to determine the Galactic evolution of Li from the data, but a comparison of the upper envelope of the distribution of stars in the log varepsilon (Li) - [Fe/H] plane with recent Galactic evolutionary models by Romano et al. (\\cite{Romano99}) suggests that novae are a major source for the Li production in the Galactic disk; their occurrence seems to be the explanation for the steep increase of Li abundance at [Fe/H] =~ -0.4. Based on observations carried out at Beijing Astronomical Observatory (Xinglong, PR China) and European Southern Observatory, La Silla, Chile. Table 1 is only available in electronic form at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via http://cdsweb.u-strasbg.fr/cgi-bin/qcat?J/A+A/371/943 and at http://www.edpsciences.org

  5. The subclonal structure and genomic evolution of oral squamous cell carcinoma revealed by ultra-deep sequencing

    DEFF Research Database (Denmark)

    Tabatabaeifar, Siavosh; Thomassen, Mads; Larsen, Martin J

    2017-01-01

    Recent studies suggest that head and neck squamous cell carcinomas are very heterogeneous between patients; however the subclonal structure remains unexplored mainly due to studies using only a single biopsy per patient. To deconvolutethe clonal structure and describe the genomic cancer evolution......, we applied whole-exome sequencing combined with ultra-deep targeted sequencing on oral squamous cell carcinomas (OSCC). From each patient, a set of biopsies was sampled from distinct geographical sites in primary tumor and lymph node metastasis.We demonstrate that the included OSCCs show a high...

  6. Research on the Compression Algorithm of the Infrared Thermal Image Sequence Based on Differential Evolution and Double Exponential Decay Model

    Science.gov (United States)

    Zhang, Jin-Yu; Meng, Xiang-Bing; Xu, Wei; Zhang, Wei; Zhang, Yong

    2014-01-01

    This paper has proposed a new thermal wave image sequence compression algorithm by combining double exponential decay fitting model and differential evolution algorithm. This study benchmarked fitting compression results and precision of the proposed method was benchmarked to that of the traditional methods via experiment; it investigated the fitting compression performance under the long time series and improved model and validated the algorithm by practical thermal image sequence compression and reconstruction. The results show that the proposed algorithm is a fast and highly precise infrared image data processing method. PMID:24696649

  7. Research on the Compression Algorithm of the Infrared Thermal Image Sequence Based on Differential Evolution and Double Exponential Decay Model

    Directory of Open Access Journals (Sweden)

    Jin-Yu Zhang

    2014-01-01

    Full Text Available This paper has proposed a new thermal wave image sequence compression algorithm by combining double exponential decay fitting model and differential evolution algorithm. This study benchmarked fitting compression results and precision of the proposed method was benchmarked to that of the traditional methods via experiment; it investigated the fitting compression performance under the long time series and improved model and validated the algorithm by practical thermal image sequence compression and reconstruction. The results show that the proposed algorithm is a fast and highly precise infrared image data processing method.

  8. Evolution of Sequence Type 4821 Clonal Complex Meningococcal Strains in China from Prequinolone to Quinolone Era, 1972–2013

    Science.gov (United States)

    Guo, Qinglan; Mustapha, Mustapha M.; Chen, Mingliang; Qu, Di; Zhang, Xi; Harrison, Lee H.

    2018-01-01

    The expansion of hypervirulent sequence type 4821 clonal complex (CC4821) lineage Neisseria meningitidis bacteria has led to a shift in meningococcal disease epidemiology in China, from serogroup A (MenA) to MenC. Knowledge of the evolution and genetic origin of the emergent MenC strains is limited. In this study, we subjected 76 CC4821 isolates collected across China during 1972–1977 and 2005–2013 to phylogenetic analysis, traditional genotyping, or both. We show that successive recombination events within genes encoding surface antigens and acquisition of quinolone resistance mutations possibly played a role in the emergence of CC4821 as an epidemic clone in China. MenC and MenB CC4821 strains have spread across China and have been detected in several countries in different continents. Capsular switches involving serogroups B and C occurred among epidemic strains, raising concerns regarding possible increases in MenB disease, given that vaccines in use in China do not protect against MenB. PMID:29553310

  9. Structural and Sequence Similarities of Hydra Xeroderma Pigmentosum A Protein to Human Homolog Suggest Early Evolution and Conservation

    Directory of Open Access Journals (Sweden)

    Apurva Barve

    2013-01-01

    Full Text Available Xeroderma pigmentosum group A (XPA is a protein that binds to damaged DNA, verifies presence of a lesion, and recruits other proteins of the nucleotide excision repair (NER pathway to the site. Though its homologs from yeast, Drosophila, humans, and so forth are well studied, XPA has not so far been reported from protozoa and lower animal phyla. Hydra is a fresh-water cnidarian with a remarkable capacity for regeneration and apparent lack of organismal ageing. Cnidarians are among the first metazoa with a defined body axis, tissue grade organisation, and nervous system. We report here for the first time presence of XPA gene in hydra. Putative protein sequence of hydra XPA contains nuclear localization signal and bears the zinc-finger motif. It contains two conserved Pfam domains and various characterized features of XPA proteins like regions for binding to excision repair cross-complementing protein-1 (ERCC1 and replication protein A 70 kDa subunit (RPA70 proteins. Hydra XPA shows a high degree of similarity with vertebrate homologs and clusters with deuterostomes in phylogenetic analysis. Homology modelling corroborates the very close similarity between hydra and human XPA. The protein thus most likely functions in hydra in the same manner as in other animals, indicating that it arose early in evolution and has been conserved across animal phyla.

  10. Development of a state machine sequencer for the Keck Interferometer: evolution, development, and lessons learned using a CASE tool approach

    Science.gov (United States)

    Reder, Leonard J.; Booth, Andrew; Hsieh, Jonathan; Summers, Kellee R.

    2004-09-01

    This paper presents a discussion of the evolution of a sequencer from a simple Experimental Physics and Industrial Control System (EPICS) based sequencer into a complex implementation designed utilizing UML (Unified Modeling Language) methodologies and a Computer Aided Software Engineering (CASE) tool approach. The main purpose of the Interferometer Sequencer (called the IF Sequencer) is to provide overall control of the Keck Interferometer to enable science operations to be carried out by a single operator (and/or observer). The interferometer links the two 10m telescopes of the W. M. Keck Observatory at Mauna Kea, Hawaii. The IF Sequencer is a high-level, multi-threaded, Harel finite state machine software program designed to orchestrate several lower-level hardware and software hard real-time subsystems that must perform their work in a specific and sequential order. The sequencing need not be done in hard real-time. Each state machine thread commands either a high-speed real-time multiple mode embedded controller via CORBA, or slower controllers via EPICS Channel Access interfaces. The overall operation of the system is simplified by the automation. The UML is discussed and our use of it to implement the sequencer is presented. The decision to use the Rhapsody product as our CASE tool is explained and reflected upon. Most importantly, a section on lessons learned is presented and the difficulty of integrating CASE tool automatically generated C++ code into a large control system consisting of multiple infrastructures is presented.

  11. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    Science.gov (United States)

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  12. Interactions of Chromatin Context, Binding Site Sequence Content, and Sequence Evolution in Stress-Induced p53 Occupancy and Transactivation

    OpenAIRE

    Su, Dan; Wang, Xuting; Campbell, Michelle R.; Song, Lingyun; Safi, Alexias; Crawford, Gregory E.; Bell, Douglas A.

    2015-01-01

    Cellular stresses activate the tumor suppressor p53 protein leading to selective binding to DNA response elements (REs) and gene transactivation from a large pool of potential p53 REs (p53REs). To elucidate how p53RE sequences and local chromatin context interact to affect p53 binding and gene transactivation, we mapped genome-wide binding localizations of p53 and H3K4me3 in untreated and doxorubicin (DXR)-treated human lymphoblastoid cells. We examined the relationships among p53 occupancy, ...

  13. PseudoMLSA: a database for multigenic sequence analysis of Pseudomonas species

    Directory of Open Access Journals (Sweden)

    Lalucat Jorge

    2010-04-01

    -reviewed articles, and direct external links to more specialized strain information databases (StrainInfo and GeneBank (NCBI. The PseudoMLSA is intended to provide helpful strain-sequence information for a better and more comprehensive discriminative multigenic sequence based analysis of this special group of bacteria, contributing to enhance our understanding of the evolution of Pseudomonas species.

  14. De novo transcriptome sequencing and sequence analysis of the malaria vector Anopheles sinensis (Diptera: Culicidae)

    Science.gov (United States)

    2014-01-01

    Background Anopheles sinensis is the major malaria vector in China and Southeast Asia. Vector control is one of the most effective measures to prevent malaria transmission. However, there is little transcriptome information available for the malaria vector. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to build a transcriptome dataset for functional genomics analysis by large-scale RNA sequencing (RNA-seq). Methods To provide a more comprehensive and complete transcriptome of An. sinensis, eggs, larvae, pupae, male adults and female adults RNA were pooled together for cDNA preparation, sequenced using the Illumina paired-end sequencing technology and assembled into unigenes. These unigenes were then analyzed in their genome mapping, functional annotation, homology, codon usage bias and simple sequence repeats (SSRs). Results Approximately 51.6 million clean reads were obtained, trimmed, and assembled into 38,504 unigenes with an average length of 571 bp, an N50 of 711 bp, and an average GC content 51.26%. Among them, 98.4% of unigenes could be mapped onto the reference genome, and 69% of unigenes could be annotated with known biological functions. Homology analysis identified certain numbers of An. sinensis unigenes that showed homology or being putative 1:1 orthologues with genomes of other Dipteran species. Codon usage bias was analyzed and 1,904 SSRs were detected, which will provide effective molecular markers for the population genetics of this species. Conclusions Our data and analysis provide the most comprehensive transcriptomic resource and characteristics currently available for An. sinensis, and will facilitate genetic, genomic studies, and further vector control of An. sinensis. PMID:25000941

  15. Sequence analysis corresponding to the PPE and PE proteins in ...

    Indian Academy of Sciences (India)

    Unknown

    AB repeats; Mycobacterium tuberculosis genome; PE-PPE domain; PPE, PE proteins; sequence analysis; surface antigens. J. Biosci. | Vol. ... bacterium tuberculosis genomes resulted in the identification of a previously uncharacterized 225 amino acid- ...... Vega Lopez F, Brooks L A, Dockrell H M, De Smet K A,. Thompson ...

  16. Molecular cloning, expression analysis and sequence prediction of ...

    African Journals Online (AJOL)

    CCAAT/enhancer-binding protein beta as an essential transcriptional factor, regulates the differentiation of adipocytes and the deposition of fat. Herein, we cloned the whole open reading frame (ORF) of bovine C/EBPβ gene and analyzed its putative protein structures via DNA cloning and sequence analysis. Then, the ...

  17. Multilocus sequence analysis of phytopathogenic species of the genus Streptomyces

    Science.gov (United States)

    The identification and classification of species within the genus Streptomyces is difficult because there are presently 576 validly described species and this number increases every year. The value of the application of multilocus sequence analysis scheme to the systematics of Streptomyces species h...

  18. Sequence symmetry analysis in pharmacovigilance and pharmacoepidemiologic studies

    DEFF Research Database (Denmark)

    Lai, Edward Chia Cheng; Pratt, Nicole; Hsieh, Cheng Yang

    2017-01-01

    Sequence symmetry analysis (SSA) is a method for detecting adverse drug events by utilizing computerized claims data. The method has been increasingly used to investigate safety concerns of medications and as a pharmacovigilance tool to identify unsuspected side effects. Validation studies have i...

  19. DNAApp: a mobile application for sequencing data analysis.

    Science.gov (United States)

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-11-15

    There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. The Android version of DNAApp is available in Google Play Store as 'DNAApp', and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. samuelg@bii.a-star.edu.sg. © The Author 2014. Published by Oxford University Press.

  20. DNAApp: a mobile application for sequencing data analysis

    Science.gov (United States)

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-01-01

    Summary: There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. Availability and implementation: The Android version of DNAApp is available in Google Play Store as ‘DNAApp’, and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. Contact: samuelg@bii.a-star.edu.sg PMID:25095882

  1. Long-read sequencing data analysis for yeasts.

    Science.gov (United States)

    Yue, Jia-Xing; Liti, Gianni

    2018-06-01

    Long-read sequencing technologies have become increasingly popular due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast Saccharomyces cerevisiae has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here, we present a modular computational framework named long-read sequencing data analysis for yeasts (LRSDAY), the first one-stop solution that streamlines this process. Starting from the raw sequencing reads, LRSDAY can produce chromosome-level genome assembly and comprehensive genome annotation in a highly automated manner with minimal manual intervention, which is not possible using any alternative tool available to date. The annotated genomic features include centromeres, protein-coding genes, tRNAs, transposable elements (TEs), and telomere-associated elements. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable to virtually any eukaryotic organism. When applying LRSDAY to an S. cerevisiae strain, it takes ∼41 h to generate a complete and well-annotated genome from ∼100× Pacific Biosciences (PacBio) running the basic workflow with four threads. Basic experience working within the Linux command-line environment is recommended for carrying out the analysis using LRSDAY.

  2. Genetic diversity analysis of Leuconostoc mesenteroides from Korean vegetables and food products by multilocus sequence typing.

    Science.gov (United States)

    Sharma, Anshul; Kaur, Jasmine; Lee, Sulhee; Park, Young-Seo

    2018-06-01

    In the present study, 35 Leuconostoc mesenteroides strains isolated from vegetables and food products from South Korea were studied by multilocus sequence typing (MLST) of seven housekeeping genes (atpA, groEL, gyrB, pheS, pyrG, rpoA, and uvrC). The fragment sizes of the seven amplified housekeeping genes ranged in length from 366 to 1414 bp. Sequence analysis indicated 27 different sequence types (STs) with 25 of them being represented by a single strain indicating high genetic diversity, whereas the remaining 2 were characterized by five strains each. In total, 220 polymorphic nucleotide sites were detected among seven housekeeping genes. The phylogenetic analysis based on the STs of the seven loci indicated that the 35 strains belonged to two major groups, A (28 strains) and B (7 strains). Split decomposition analysis showed that intraspecies recombination played a role in generating diversity among strains. The minimum spanning tree showed that the evolution of the STs was not correlated with food source. This study signifies that the multilocus sequence typing is a valuable tool to access the genetic diversity among L. mesenteroides strains from South Korea and can be used further to monitor the evolutionary changes.

  3. POST-MAIN SEQUENCE EVOLUTION OF ICY MINOR PLANETS: IMPLICATIONS FOR WATER RETENTION AND WHITE DWARF POLLUTION

    Energy Technology Data Exchange (ETDEWEB)

    Malamud, Uri; Perets, Hagai B., E-mail: uri.mal@tx.technion.ac.il, E-mail: hperets@physics.technion.ac.il [Department of Physics, Technion (Israel)

    2016-12-01

    Most observations of polluted white dwarf atmospheres are consistent with accretion of water-depleted planetary material. Among tens of known cases, merely two involve accretion of objects that contain a considerable mass fraction of water. The purpose of this study is to investigate the relative scarcity of these detections. Based on a new and highly detailed model, we evaluate the retention of water inside icy minor planets during the high-luminosity stellar evolution that follows the main sequence. Our model fully considers the thermal, physical, and chemical evolution of icy bodies, following their internal differentiation as well as water depletion, from the moment of their birth and through all stellar evolution phases preceding the formation of the white dwarf. We also account for different initial compositions and formation times. Our results differ from previous studies, which have either underestimated or overestimated water retention. We show that water can survive in a variety of circumstances and in great quantities, and therefore other possibilities are discussed in order to explain the infrequency of water detection. We predict that the sequence of accretion is such that water accretes earlier, and more rapidly, than the rest of the silicate disk, considerably reducing the chance of its detection in H-dominated atmospheres. In He-dominated atmospheres, the scarcity of water detections could be observationally biased. It implies that the accreted material is typically intrinsically dry, which may be the result of the inside-out depopulation sequence of minor planets.

  4. POST-MAIN SEQUENCE EVOLUTION OF ICY MINOR PLANETS: IMPLICATIONS FOR WATER RETENTION AND WHITE DWARF POLLUTION

    International Nuclear Information System (INIS)

    Malamud, Uri; Perets, Hagai B.

    2016-01-01

    Most observations of polluted white dwarf atmospheres are consistent with accretion of water-depleted planetary material. Among tens of known cases, merely two involve accretion of objects that contain a considerable mass fraction of water. The purpose of this study is to investigate the relative scarcity of these detections. Based on a new and highly detailed model, we evaluate the retention of water inside icy minor planets during the high-luminosity stellar evolution that follows the main sequence. Our model fully considers the thermal, physical, and chemical evolution of icy bodies, following their internal differentiation as well as water depletion, from the moment of their birth and through all stellar evolution phases preceding the formation of the white dwarf. We also account for different initial compositions and formation times. Our results differ from previous studies, which have either underestimated or overestimated water retention. We show that water can survive in a variety of circumstances and in great quantities, and therefore other possibilities are discussed in order to explain the infrequency of water detection. We predict that the sequence of accretion is such that water accretes earlier, and more rapidly, than the rest of the silicate disk, considerably reducing the chance of its detection in H-dominated atmospheres. In He-dominated atmospheres, the scarcity of water detections could be observationally biased. It implies that the accreted material is typically intrinsically dry, which may be the result of the inside-out depopulation sequence of minor planets.

  5. Phylogenetics and evolution of Trx SET genes in fully sequenced land plants.

    Science.gov (United States)

    Zhu, Xinyu; Chen, Caoyi; Wang, Baohua

    2012-04-01

    Plant Trx SET proteins are involved in H3K4 methylation and play a key role in plant floral development. Genes encoding Trx SET proteins constitute a multigene family in which the copy number varies among plant species and functional divergence appears to have occurred repeatedly. To investigate the evolutionary history of the Trx SET gene family, we made a comprehensive evolutionary analysis on this gene family from 13 major representatives of green plants. A novel clustering (here named as cpTrx clade), which included the III-1, III-2, and III-4 orthologous groups, previously resolved was identified. Our analysis showed that plant Trx proteins possessed a variety of domain organizations and gene structures among paralogs. Additional domains such as PHD, PWWP, and FYR were early integrated into primordial SET-PostSET domain organization of cpTrx clade. We suggested that the PostSET domain was lost in some members of III-4 orthologous group during the evolution of land plants. At least four classes of gene structures had been formed at the early evolutionary stage of land plants. Three intronless orphan Trx SET genes from the Physcomitrella patens (moss) were identified, and supposedly, their parental genes have been eliminated from the genome. The structural differences among evolutionary groups of plant Trx SET genes with different functions were described, contributing to the design of further experimental studies.

  6. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  7. Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data

    Directory of Open Access Journals (Sweden)

    William H Thiel

    2016-01-01

    Full Text Available Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment. High-throughput sequencing (HTS revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs.

  8. Distribution and evolution of repeated sequences in genomes of Triatominae (Hemiptera-Reduviidae inferred from genomic in situ hybridization.

    Directory of Open Access Journals (Sweden)

    Sebastian Pita

    Full Text Available The subfamily Triatominae, vectors of Chagas disease, comprises 140 species characterized by a highly homogeneous chromosome number. We analyzed the chromosomal distribution and evolution of repeated sequences in Triatominae genomes by Genomic in situ Hybridization using Triatoma delpontei and Triatoma infestans genomic DNAs as probes. Hybridizations were performed on their own chromosomes and on nine species included in six genera from the two main tribes: Triatomini and Rhodniini. Genomic probes clearly generate two different hybridization patterns, dispersed or accumulated in specific regions or chromosomes. The three used probes generate the same hybridization pattern in each species. However, these patterns are species-specific. In closely related species, the probes strongly hybridized in the autosomal heterochromatic regions, resembling C-banding and DAPI patterns. However, in more distant species these co-localizations are not observed. The heterochromatic Y chromosome is constituted by highly repeated sequences, which is conserved among 10 species of Triatomini tribe suggesting be an ancestral character for this group. However, the Y chromosome in Rhodniini tribe is markedly different, supporting the early evolutionary dichotomy between both tribes. In some species, sex chromosomes and autosomes shared repeated sequences, suggesting meiotic chromatin exchanges among these heterologous chromosomes. Our GISH analyses enabled us to acquire not only reliable information about autosomal repeated sequences distribution but also an insight into sex chromosome evolution in Triatominae. Furthermore, the differentiation obtained by GISH might be a valuable marker to establish phylogenetic relationships and to test the controversial origin of the Triatominae subfamily.

  9. Evolution of R5 and X4 human immunodeficiency virus type 1 gag sequences in vivo: evidence for recombination

    International Nuclear Information System (INIS)

    Rij, Ronald P. van; Worobey, Michael; Visser, Janny A.; Schuitemaker, Hanneke

    2003-01-01

    Human immunodeficiency virus type 1 (HIV-1) infection is in general established by CCR5-utilizing (R5) virus variants, which persist throughout the course of infection. R5 HIV-1 variants evolve into CXCR4-utilizing (X4) HIV-1 variants in approximately half of the infected individuals. We have previously observed an ongoing genetic evolution with a continuous divergence of envelope gp120 sequences of coexisting R5 and X4 virus variants over time. Here, we studied evolution of gag p17 sequences in two patients who developed X4 variants in the course of infection. In contrast to the envelope gp120 sequences, gag p17 sequences of R5 and X4 virus populations intermingled in phylogenetic trees and did not diverge from each other over time. Statistical evaluation using the Shimodaira-Hasegawa test indicated that the different genomic regions evolved along different topologies, supporting the hypothesis of recombination. Therefore, our data imply that recombination between R5 and X4 HIV-1 variants occurs in vivo

  10. Analysis of Sequence Diagram Layout in Advanced UML Modelling Tools

    Directory of Open Access Journals (Sweden)

    Ņikiforova Oksana

    2016-05-01

    Full Text Available System modelling using Unified Modelling Language (UML is the task that should be solved for software development. The more complex software becomes the higher requirements are stated to demonstrate the system to be developed, especially in its dynamic aspect, which in UML is offered by a sequence diagram. To solve this task, the main attention is devoted to the graphical presentation of the system, where diagram layout plays the central role in information perception. The UML sequence diagram due to its specific structure is selected for a deeper analysis on the elements’ layout. The authors research represents the abilities of modern UML modelling tools to offer automatic layout of the UML sequence diagram and analyse them according to criteria required for the diagram perception.

  11. Network clustering coefficient approach to DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gerhardt, Guenther J.L. [Universidade Federal do Rio Grande do Sul-Hospital de Clinicas de Porto Alegre, Rua Ramiro Barcelos 2350/sala 2040/90035-003 Porto Alegre (Brazil); Departamento de Fisica e Quimica da Universidade de Caxias do Sul, Rua Francisco Getulio Vargas 1130, 95001-970 Caxias do Sul (Brazil); Lemke, Ney [Programa Interdisciplinar em Computacao Aplicada, Unisinos, Av. Unisinos, 950, 93022-000 Sao Leopoldo, RS (Brazil); Corso, Gilberto [Departamento de Biofisica e Farmacologia, Centro de Biociencias, Universidade Federal do Rio Grande do Norte, Campus Universitario, 59072 970 Natal, RN (Brazil)]. E-mail: corso@dfte.ufrn.br

    2006-05-15

    In this work we propose an alternative DNA sequence analysis tool based on graph theoretical concepts. The methodology investigates the path topology of an organism genome through a triplet network. In this network, triplets in DNA sequence are vertices and two vertices are connected if they occur juxtaposed on the genome. We characterize this network topology by measuring the clustering coefficient. We test our methodology against two main bias: the guanine-cytosine (GC) content and 3-bp (base pairs) periodicity of DNA sequence. We perform the test constructing random networks with variable GC content and imposed 3-bp periodicity. A test group of some organisms is constructed and we investigate the methodology in the light of the constructed random networks. We conclude that the clustering coefficient is a valuable tool since it gives information that is not trivially contained in 3-bp periodicity neither in the variable GC content.

  12. alpha-Crystallin A sequences of Alligator mississippiensis and the lizard Tupinambis teguixin: molecular evolution and reptilian phylogeny.

    Science.gov (United States)

    de Jong, W W; Zweers, A; Versteeg, M; Dessauer, H C; Goodman, M

    1985-11-01

    The amino acid sequences of the eye lens protein alpha-crystallin A from many mammalian and avian species, two frog species, and a dogfish have provided detailed information about the molecular evolution of this protein and allowed some useful inferences about phylogenetic relationships among these species. We now have isolated and sequenced the alpha-crystallins of the American alligator and the common tegu lizard. The reptilian alpha A chains appear to have evolved as slowly as those of other vertebrates, i.e., at two to three amino acid replacements per 100 residues in 100 Myr. The lack of charged replacements and the general types and distribution of replacements also are similar to those in other vertebrate alpha A chains. Maximum-parsimony analyses of the total data set of 67 vertebrate alpha A sequences support the monophyletic origin of alligator, tegu, and birds and favor the grouping of crocodilians and birds as surviving sister groups in the subclass Archosauria.

  13. Empirical tests of pre-main-sequence stellar evolution models with eclipsing binaries

    Science.gov (United States)

    Stassun, Keivan G.; Feiden, Gregory A.; Torres, Guillermo

    2014-06-01

    We examine the performance of standard pre-main-sequence (PMS) stellar evolution models against the accurately measured properties of a benchmark sample of 26 PMS stars in 13 eclipsing binary (EB) systems having masses 0.04-4.0 M⊙ and nominal ages ≈1-20 Myr. We provide a definitive compilation of all fundamental properties for the EBs, with a careful and consistent reassessment of observational uncertainties. We also provide a definitive compilation of the various PMS model sets, including physical ingredients and limits of applicability. No set of model isochrones is able to successfully reproduce all of the measured properties of all of the EBs. In the H-R diagram, the masses inferred for the individual stars by the models are accurate to better than 10% at ≳1 M⊙, but below 1 M⊙ they are discrepant by 50-100%. Adjusting the observed radii and temperatures using empirical relations for the effects of magnetic activity helps to resolve the discrepancies in a few cases, but fails as a general solution. We find evidence that the failure of the models to match the data is linked to the triples in the EB sample; at least half of the EBs possess tertiary companions. Excluding the triples, the models reproduce the stellar masses to better than ∼10% in the H-R diagram, down to 0.5 M⊙, below which the current sample is fully contaminated by tertiaries. We consider several mechanisms by which a tertiary might cause changes in the EB properties and thus corrupt the agreement with stellar model predictions. We show that the energies of the tertiary orbits are comparable to that needed to potentially explain the scatter in the EB properties through injection of heat, perhaps involving tidal interaction. It seems from the evidence at hand that this mechanism, however it operates in detail, has more influence on the surface properties of the stars than on their internal structure, as the lithium abundances are broadly in good agreement with model predictions. The

  14. Evolutionary analysis of hepatitis C virus gene sequences from 1953

    Science.gov (United States)

    Gray, Rebecca R.; Tanaka, Yasuhito; Takebe, Yutaka; Magiorkinis, Gkikas; Buskell, Zelma; Seeff, Leonard; Alter, Harvey J.; Pybus, Oliver G.

    2013-01-01

    Reconstructing the transmission history of infectious diseases in the absence of medical or epidemiological records often relies on the evolutionary analysis of pathogen genetic sequences. The precision of evolutionary estimates of epidemic history can be increased by the inclusion of sequences derived from ‘archived’ samples that are genetically distinct from contemporary strains. Historical sequences are especially valuable for viral pathogens that circulated for many years before being formally identified, including HIV and the hepatitis C virus (HCV). However, surprisingly few HCV isolates sampled before discovery of the virus in 1989 are currently available. Here, we report and analyse two HCV subgenomic sequences obtained from infected individuals in 1953, which represent the oldest genetic evidence of HCV infection. The pairwise genetic diversity between the two sequences indicates a substantial period of HCV transmission prior to the 1950s, and their inclusion in evolutionary analyses provides new estimates of the common ancestor of HCV in the USA. To explore and validate the evolutionary information provided by these sequences, we used a new phylogenetic molecular clock method to estimate the date of sampling of the archived strains, plus the dates of four more contemporary reference genomes. Despite the short fragments available, we conclude that the archived sequences are consistent with a proposed sampling date of 1953, although statistical uncertainty is large. Our cross-validation analyses suggest that the bias and low statistical power observed here likely arise from a combination of high evolutionary rate heterogeneity and an unstructured, star-like phylogeny. We expect that attempts to date other historical viruses under similar circumstances will meet similar problems. PMID:23938759

  15. Whole genome sequence phylogenetic analysis of four Mexican rabies viruses isolated from cattle.

    Science.gov (United States)

    Bárcenas-Reyes, I; Loza-Rubio, E; Cantó-Alarcón, G J; Luna-Cozar, J; Enríquez-Vázquez, A; Barrón-Rodríguez, R J; Milián-Suazo, F

    2017-08-01

    Phylogenetic analysis of the rabies virus in molecular epidemiology has been traditionally performed on partial sequences of the genome, such as the N, G, and P genes; however, that approach raises concerns about the discriminatory power compared to whole genome sequencing. In this study we characterized four strains of the rabies virus isolated from cattle in Querétaro, Mexico by comparing the whole genome sequence to that of strains from the American, European and Asian continents. Four cattle brain samples positive to rabies and characterized as AgV11, genotype 1, were used in the study. A cDNA sequence was generated by reverse transcription PCR (RT-PCR) using oligo dT. cDNA samples were sequenced in an Illumina NextSeq 500 platform. The phylogenetic analysis was performed with MEGA 6.0. Minimum evolution phylogenetic trees were constructed with the Neighbor-Joining method and bootstrapped with 1000 replicates. Three large and seven small clusters were formed with the 26 sequences used. The largest cluster grouped strains from different species in South America: Brazil, and the French Guyana. The second cluster grouped five strains from Mexico. A Mexican strain reported in a different study was highly related to our four strains, suggesting common source of infection. The phylogenetic analysis shows that the type of host is different for the different regions in the American Continent; rabies is more related to bats. It was concluded that the rabies virus in central Mexico is genetically stable and that it is transmitted by the vampire bat Desmodus rotundus. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. The sequence and analysis of duplication rich human chromosome 16

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-08-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  17. The Sequence and Analysis of Duplication Rich Human Chromosome 16

    Science.gov (United States)

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-01-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  18. Analysis of transposable elements in the genome of Asparagus officinalis from high coverage sequence data.

    Science.gov (United States)

    Li, Shu-Fen; Gao, Wu-Jun; Zhao, Xin-Peng; Dong, Tian-Yu; Deng, Chuan-Liang; Lu, Long-Dou

    2014-01-01

    Asparagus officinalis is an economically and nutritionally important vegetable crop that is widely cultivated and is used as a model dioecious species to study plant sex determination and sex chromosome evolution. To improve our understanding of its genome composition, especially with respect to transposable elements (TEs), which make up the majority of the genome, we performed Illumina HiSeq2000 sequencing of both male and female asparagus genomes followed by bioinformatics analysis. We generated 17 Gb of sequence (12×coverage) and assembled them into 163,406 scaffolds with a total cumulated length of 400 Mbp, which represent about 30% of asparagus genome. Overall, TEs masked about 53% of the A. officinalis assembly. Majority of the identified TEs belonged to LTR retrotransposons, which constitute about 28% of genomic DNA, with Ty1/copia elements being more diverse and accumulated to higher copy numbers than Ty3/gypsy. Compared with LTR retrotransposons, non-LTR retrotransposons and DNA transposons were relatively rare. In addition, comparison of the abundance of the TE groups between male and female genomes showed that the overall TE composition was highly similar, with only slight differences in the abundance of several TE groups, which is consistent with the relatively recent origin of asparagus sex chromosomes. This study greatly improves our knowledge of the repetitive sequence construction of asparagus, which facilitates the identification of TEs responsible for the early evolution of plant sex chromosomes and is helpful for further studies on this dioecious plant.

  19. Using SQL Databases for Sequence Similarity Searching and Analysis.

    Science.gov (United States)

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  20. An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries

    Science.gov (United States)

    Habermann, Bianca; Bebin, Anne-Gaelle; Herklotz, Stephan; Volkmer, Michael; Eckelt, Kay; Pehlke, Kerstin; Epperlein, Hans Henning; Schackert, Hans Konrad; Wiebe, Glenis; Tanaka, Elly M

    2004-01-01

    Background The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum. Results Two cDNA libraries, one made from stage 18-22 embryos and the other from day-6 regenerating tail blastemas, generated 17,352 sequences. From the sequenced ESTs, 6,377 contigs were assembled that probably represent 25% of the expressed genes in this organism. Sequence comparison revealed significant homology to entries in the NCBI non-redundant database. Further examination of this gene set revealed the presence of genes involved in important cell and developmental processes, including cell proliferation, cell differentiation and cell-cell communication. On the basis of these data, we have performed phylogenetic analysis of key cell-cycle regulators. Interestingly, while cell-cycle proteins such as the cyclin B family display expected evolutionary relationships, the cyclin-dependent kinase inhibitor 1 gene family shows an unusual evolutionary behavior among the amphibians. Conclusions Our analysis reveals the importance of a comprehensive sequence set from a representative of the Caudata and illustrates that the EST sequence database is a rich source of molecular, developmental and regeneration studies. To aid in data mining, the ESTs have been organized into an easily searchable database that is freely available online. PMID:15345051

  1. Now And Next Generation Sequencing Techniques: Future of Sequence Analysis using Cloud Computing

    Directory of Open Access Journals (Sweden)

    Radhe Shyam Thakur

    2012-12-01

    Full Text Available Advancements in the field of sequencing techniques resulted in the huge sequenced data to be produced at a very faster rate. It is going cumbersome for the datacenter to maintain the databases. Data mining and sequence analysis approaches needs to analyze the databases several times to reach any efficient conclusion. To cope with such overburden on computer resources and to reach efficient and effective conclusions quickly, the virtualization of the resources and computation on pay as you go concept was introduced and termed as cloud computing. The datacenter’s hardware and software is collectively known as cloud which when available publicly is termed as public cloud. The datacenter’s resources are provided in a virtual mode to the clients via a service provider like Amazon, Google and Joyent which charges on pay as you go manner. The workload is shifted to the provider which is maintained by the required hardware and software upgradation. The service provider manages it by upgrading the requirements in the virtual mode. Basically a virtual environment is created according to the need of the user by taking permission from datacenter via internet, the task is performed and the environment is deleted after the task is over. In this discussion, we are focusing on the basics of cloud computing, the prerequisites and overall working of clouds. Furthermore, briefly the applications of cloud computing in biological systems, especially in comparative genomics, genome informatics and SNP detection with reference to traditional workflow are discussed.

  2. Now and next-generation sequencing techniques: future of sequence analysis using cloud computing.

    Science.gov (United States)

    Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav

    2012-01-01

    Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed "cloud computing") has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows.

  3. SEQUENCING AND SEQUENCE ANALYSIS OF MYOSTATIN GENE IN THE EXON 1 OF THE CAMEL (CAMELUS DROMEDARIUS

    Directory of Open Access Journals (Sweden)

    M. G. SHAH, A. S. QURESHI1, M. REISSMANN2 AND H. J. SCHWARTZ3

    2006-10-01

    Full Text Available Myostatin, also called growth differentiation factor-8 (GDF-8, is a member of the mammalian growth transforming family (TGF-beta superfamily, which is expressed specifically in developing an adult skeletal muscle. Muscular hypertrophy allele (mh allele in the double muscle breeds involved mutation within the myostatin gene. Genomic DNA was isolated from the camel hair using NucleoSpin Tissue kit. Two animals of each of the six breeds namely, Marecha, Dhatti, Larri, Kohi, Sakrai and Cambelpuri were used for sequencing. For PCR amplification of the gene, a primer pair was designed from homolog regions of already published sequences of farm animals from GenBank. Results showed that camel myostatin possessed more than 90% homology with that of cattle, sheep and pig. Camel formed separate cluster from the pig in spite of having high homology (98% and showed 94% homology with cattle and sheep as reported in literature. Sequence analysis of the PCR amplified part of exon 1 (256 bp of the camel myostatin was identical among six camel breeds.

  4. Genetic diversity and molecular evolution of Ornithogalum mosaic virus based on the coat protein gene sequence

    Directory of Open Access Journals (Sweden)

    Fangluan Gao

    2018-03-01

    Full Text Available Ornithogalum mosaic virus (OrMV has a wide host range and affects the production of a variety of ornamentals. In this study, the coat protein (CP gene of OrMVwas used to investigate the molecular mechanisms underlying the evolution of this virus. The 36 OrMV isolates fell into two groups which have significant subpopulation differentiation with an FST value of 0.470. One isolate was identified as a recombinant and the other 35 recombination-free isolates could be divided into two major clades under different evolutionary constraints with dN/dS values of 0.055 and 0.028, respectively, indicating a role of purifying selection in the differentiation of OrMV. In addition, the results from analysis of molecular variance (AMOVA indicated that the effect of host species on the genetic divergence of OrMV is greater than that of geography. Furthermore, OrMV isolates from the genera Ornithogalum, Lachenalia and Diuri tended to group together, indicating that OrMV diversification was maintained, in part, by host-driven adaptation.

  5. In Silico Genome Comparison and Distribution Analysis of Simple Sequences Repeats in Cassava

    Directory of Open Access Journals (Sweden)

    Andrea Vásquez

    2014-01-01

    Full Text Available We conducted a SSRs density analysis in different cassava genomic regions. The information obtained was useful to establish comparisons between cassava’s SSRs genomic distribution and those of poplar, flax, and Jatropha. In general, cassava has a low SSR density (~50 SSRs/Mbp and has a high proportion of pentanucleotides, (24,2 SSRs/Mbp. It was found that coding sequences have 15,5 SSRs/Mbp, introns have 82,3 SSRs/Mbp, 5′ UTRs have 196,1 SSRs/Mbp, and 3′ UTRs have 50,5 SSRs/Mbp. Through motif analysis of cassava’s genome SSRs, the most abundant motif was AT/AT while in intron sequences and UTRs regions it was AG/CT. In addition, in coding sequences the motif AAG/CTT was also found to occur most frequently; in fact, it is the third most used codon in cassava. Sequences containing SSRs were classified according to their functional annotation of Gene Ontology categories. The identified SSRs here may be a valuable addition for genetic mapping and future studies in phylogenetic analyses and genomic evolution.

  6. Micropathogen Community Analysis in Hyalomma rufipes via High-Throughput Sequencing of Small RNAs

    Science.gov (United States)

    Luo, Jin; Liu, Min-Xuan; Ren, Qiao-Yun; Chen, Ze; Tian, Zhan-Cheng; Hao, Jia-Wei; Wu, Feng; Liu, Xiao-Cui; Luo, Jian-Xun; Yin, Hong; Wang, Hui; Liu, Guang-Yuan

    2017-01-01

    Ticks are important vectors in the transmission of a broad range of micropathogens to vertebrates, including humans. Because of the role of ticks in disease transmission, identifying and characterizing the micropathogen profiles of tick populations have become increasingly important. The objective of this study was to survey the micropathogens of Hyalomma rufipes ticks. Illumina HiSeq2000 technology was utilized to perform deep sequencing of small RNAs (sRNAs) extracted from field-collected H. rufipes ticks in Gansu Province, China. The resultant sRNA library data revealed that the surveyed tick populations produced reads that were homologous to St. Croix River Virus (SCRV) sequences. We also observed many reads that were homologous to microbial and/or pathogenic isolates, including bacteria, protozoa, and fungi. As part of this analysis, a phylogenetic tree was constructed to display the relationships among the homologous sequences that were identified. The study offered a unique opportunity to gain insight into the micropathogens of H. rufipes ticks. The effective control of arthropod vectors in the future will require knowledge of the micropathogen composition of vectors harboring infectious agents. Understanding the ecological factors that regulate vector propagation in association with the prevalence and persistence of micropathogen lineages is also imperative. These interactions may affect the evolution of micropathogen lineages, especially if the micropathogens rely on the vector or host for dispersal. The sRNA deep-sequencing approach used in this analysis provides an intuitive method to survey micropathogen prevalence in ticks and other vector species. PMID:28861401

  7. Complete plastome sequences of Equisetum arvense and Isoetes flaccida: implications for phylogeny and plastid genome evolution of early land plant lineages

    Directory of Open Access Journals (Sweden)

    Mandoli Dina F

    2010-10-01

    Full Text Available Abstract Background Despite considerable progress in our understanding of land plant phylogeny, several nodes in the green tree of life remain poorly resolved. Furthermore, the bulk of currently available data come from only a subset of major land plant clades. Here we examine early land plant evolution using complete plastome sequences including two previously unexamined and phylogenetically critical lineages. To better understand the evolution of land plants and their plastomes, we examined aligned nucleotide sequences, indels, gene and nucleotide composition, inversions, and gene order at the boundaries of the inverted repeats. Results We present the plastome sequences of Equisetum arvense, a horsetail, and of Isoetes flaccida, a heterosporous lycophyte. Phylogenetic analysis of aligned nucleotides from 49 plastome genes from 43 taxa supported monophyly for the following clades: embryophytes (land plants, lycophytes, monilophytes (leptosporangiate ferns + Angiopteris evecta + Psilotum nudum + Equisetum arvense, and seed plants. Resolution among the four monilophyte lineages remained moderate, although nucleotide analyses suggested that P. nudum and E. arvense form a clade sister to A. evecta + leptosporangiate ferns. Results from phylogenetic analyses of nucleotides were consistent with the distribution of plastome gene rearrangements and with analysis of sequence gaps resulting from insertions and deletions (indels. We found one new indel and an inversion of a block of genes that unites the monilophytes. Conclusions Monophyly of monilophytes has been disputed on the basis of morphological and fossil evidence. In the context of a broad sampling of land plant data we find several new pieces of evidence for monilophyte monophyly. Results from this study demonstrate resolution among the four monilophytes lineages, albeit with moderate support; we posit a clade consisting of Equisetaceae and Psilotaceae that is sister to the "true ferns

  8. Sequence stratigraphy as a scientific enterprise: the evolution and persistence of conflicting paradigms

    Science.gov (United States)

    Miall, Andrew D.; Miall, Charlene E.

    2001-08-01

    In the 1970s, seismic stratigraphy represented a new paradigm in geological thought. The development of new techniques for analyzing seismic-reflection data constituted a "crisis," as conceptualized by T.S. Kuhn, and stimulated a revolution in stratigraphy. We analyze here a specific subset of the new ideas, that pertaining to the concept of global-eustasy and the global cycle chart published by Vail et al. [Vail, P.R., Mitchum, R.M., Jr., Todd, R.G., Widmier, J.M., Thompson, S., III, Sangree, J.B., Bubb, J.N., Hatlelid, W.G., 1977. Seismic stratigraphy and global changes of sea-level. In: Payton, C.E. (Ed.), Seismic Stratigraphy—Applications to Hydrocarbon Exploration, Am. Assoc. Pet. Geol. Mem. 26, pp. 49-212.] The global-eustasy model posed two challenges to the "normal science" of stratigraphy then underway: (1) that sequence stratigraphy, as exemplified by the global cycle chart, constitutes a superior standard of geologic time to that assembled from conventional chronostratigraphic evidence, and (2) that stratigraphic processes are dominated by the effects of eustasy, to the exclusion of other allogenic mechanisms, including tectonism. While many stratigraphers now doubt the universal validity of the model of global-eustasy, what we term the global-eustasy paradigm, a group of sequence researchers led by Vail still adheres to it, and the two conceptual approaches have evolved into two conflicting paradigms. Those who assert that there are multiple processes generating stratigraphic sequences (possibly including eustatic processes) are adherents of what we term the complexity paradigm. Followers of this paradigm argue that tests of the global cycle chart amount to little more than circular reasoning. A new body of work documenting the European sequence record was published in 1998 by de Graciansky et al. These workers largely follow the global-eustasy paradigm. Citation and textual analysis of this work indicates that they have not responded to any of the

  9. An Imaging And Graphics Workstation For Image Sequence Analysis

    Science.gov (United States)

    Mostafavi, Hassan

    1990-01-01

    This paper describes an application-specific engineering workstation designed and developed to analyze imagery sequences from a variety of sources. The system combines the software and hardware environment of the modern graphic-oriented workstations with the digital image acquisition, processing and display techniques. The objective is to achieve automation and high throughput for many data reduction tasks involving metric studies of image sequences. The applications of such an automated data reduction tool include analysis of the trajectory and attitude of aircraft, missile, stores and other flying objects in various flight regimes including launch and separation as well as regular flight maneuvers. The workstation can also be used in an on-line or off-line mode to study three-dimensional motion of aircraft models in simulated flight conditions such as wind tunnels. The system's key features are: 1) Acquisition and storage of image sequences by digitizing real-time video or frames from a film strip; 2) computer-controlled movie loop playback, slow motion and freeze frame display combined with digital image sharpening, noise reduction, contrast enhancement and interactive image magnification; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored image sequence; 4) automatic and manual field-of-view and spatial calibration; 5) image sequence data base generation and management, including the measurement data products; 6) off-line analysis software for trajectory plotting and statistical analysis; 7) model-based estimation and tracking of object attitude angles; and 8) interface to a variety of video players and film transport sub-systems.

  10. Multilocus sequence analysis of Treponema denticola strains of diverse origin

    Directory of Open Access Journals (Sweden)

    Mo Sisu

    2013-02-01

    Full Text Available Abstract Background The oral spirochete bacterium Treponema denticola is associated with both the incidence and severity of periodontal disease. Although the biological or phenotypic properties of a significant number of T. denticola isolates have been reported in the literature, their genetic diversity or phylogeny has never been systematically investigated. Here, we describe a multilocus sequence analysis (MLSA of 20 of the most highly studied reference strains and clinical isolates of T. denticola; which were originally isolated from subgingival plaque samples taken from subjects from China, Japan, the Netherlands, Canada and the USA. Results The sequences of the 16S ribosomal RNA gene, and 7 conserved protein-encoding genes (flaA, recA, pyrH, ppnK, dnaN, era and radC were successfully determined for each strain. Sequence data was analyzed using a variety of bioinformatic and phylogenetic software tools. We found no evidence of positive selection or DNA recombination within the protein-encoding genes, where levels of intraspecific sequence polymorphism varied from 18.8% (flaA to 8.9% (dnaN. Phylogenetic analysis of the concatenated protein-encoding gene sequence data (ca. 6,513 nucleotides for each strain using Bayesian and maximum likelihood approaches indicated that the T. denticola strains were monophyletic, and formed 6 well-defined clades. All analyzed T. denticola strains appeared to have a genetic origin distinct from that of ‘Treponema vincentii’ or Treponema pallidum. No specific geographical relationships could be established; but several strains isolated from different continents appear to be closely related at the genetic level. Conclusions Our analyses indicate that previous biological and biophysical investigations have predominantly focused on a subset of T. denticola strains with a relatively narrow range of genetic diversity. Our methodology and results establish a genetic framework for the discrimination and phylogenetic

  11. Sirius PSB: a generic system for analysis of biological sequences.

    Science.gov (United States)

    Koh, Chuan Hock; Lin, Sharene; Jedd, Gregory; Wong, Limsoon

    2009-12-01

    Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models - one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at: http://compbio.ddns.comp.nus.edu.sg/~sirius.

  12. Whole genome sequencing and bioinformatics analysis of two Egyptian genomes.

    Science.gov (United States)

    ElHefnawi, Mahmoud; Jeon, Sungwon; Bhak, Youngjune; ElFiky, Asmaa; Horaiz, Ahmed; Jun, JeHoon; Kim, Hyunho; Bhak, Jong

    2018-05-15

    We report two Egyptian male genomes (EGP1 and EGP2) sequenced at ~ 30× sequencing depths. EGP1 had 4.7 million variants, where 198,877 were novel variants while EGP2 had 209,109 novel variants out of 4.8 million variants. The mitochondrial haplogroup of the two individuals were identified to be H7b1 and L2a1c, respectively. We also identified the Y haplogroup of EGP1 (R1b) and EGP2 (J1a2a1a2 > P58 > FGC11). EGP1 had a mutation in the NADH gene of the mitochondrial genome ND4 (m.11778 G > A) that causes Leber's hereditary optic neuropathy. Some SNPs shared by the two genomes were associated with an increased level of cholesterol and triglycerides, probably related with Egyptians obesity. Comparison of these genomes with African and Western-Asian genomes can provide insights on Egyptian ancestry and genetic history. This resource can be used to further understand genomic diversity and functional classification of variants as well as human migration and evolution across Africa and Western-Asia. Copyright © 2017. Published by Elsevier B.V.

  13. The complete genome sequence and comparative genome analysis of the high pathogenicity Yersinia enterocolitica strain 8081.

    Directory of Open Access Journals (Sweden)

    Nicholas R Thomson

    2006-12-01

    Full Text Available The human enteropathogen, Yersinia enterocolitica, is a significant link in the range of Yersinia pathologies extending from mild gastroenteritis to bubonic plague. Comparison at the genomic level is a key step in our understanding of the genetic basis for this pathogenicity spectrum. Here we report the genome of Y. enterocolitica strain 8081 (serotype 0:8; biotype 1B and extensive microarray data relating to the genetic diversity of the Y. enterocolitica species. Our analysis reveals that the genome of Y. enterocolitica strain 8081 is a patchwork of horizontally acquired genetic loci, including a plasticity zone of 199 kb containing an extraordinarily high density of virulence genes. Microarray analysis has provided insights into species-specific Y. enterocolitica gene functions and the intraspecies differences between the high, low, and nonpathogenic Y. enterocolitica biotypes. Through comparative genome sequence analysis we provide new information on the evolution of the Yersinia. We identify numerous loci that represent ancestral clusters of genes potentially important in enteric survival and pathogenesis, which have been lost or are in the process of being lost, in the other sequenced Yersinia lineages. Our analysis also highlights large metabolic operons in Y. enterocolitica that are absent in the related enteropathogen, Yersinia pseudotuberculosis, indicating major differences in niche and nutrients used within the mammalian gut. These include clusters directing, the production of hydrogenases, tetrathionate respiration, cobalamin synthesis, and propanediol utilisation. Along with ancestral gene clusters, the genome of Y. enterocolitica has revealed species-specific and enteropathogen-specific loci. This has provided important insights into the pathology of this bacterium and, more broadly, into the evolution of the genus. Moreover, wider investigations looking at the patterns of gene loss and gain in the Yersinia have highlighted common

  14. Economic analysis of evolution/devolution of electronic devices functionality

    Directory of Open Access Journals (Sweden)

    Esipov A. S.

    2017-12-01

    Full Text Available the researcher of this article has presented the analysis of evolution/devolution of electronic devices functionality as well as the analysis of the current situation at the computers and mobile devices market, and some thoughts about new products. Is a newer device better? Are corporations producing really new devices or they are only the improvement of old ones.

  15. Comparative analysis of full genomic sequences among different genotypes of dengue virus type 3

    Directory of Open Access Journals (Sweden)

    Lin Ting-Hsiang

    2008-05-01

    Full Text Available Abstract Background Although the previous study demonstrated the envelope protein of dengue viruses is under purifying selection pressure, little is known about the genetic differences of full-length viral genomes of DENV-3. In our study, complete genomic sequencing of DENV-3 strains collected from different geographical locations and isolation years were determined and the sequence diversity as well as selection pressure sites in the DENV genome other than within the E gene were also analyzed. Results Using maximum likelihood and Bayesian approaches, our phylogenetic analysis revealed that the Taiwan's indigenous DENV-3 isolated from 1994 and 1998 dengue/DHF epidemics and one 1999 sporadic case were of the three different genotypes – I, II, and III, each associated with DENV-3 circulating in Indonesia, Thailand and Sri Lanka, respectively. Sequence diversity and selection pressure of different genomic regions among DENV-3 different genotypes was further examined to understand the global DENV-3 evolution. The highest nucleotide sequence diversity among the fully sequenced DENV-3 strains was found in the nonstructural protein 2A (mean ± SD: 5.84 ± 0.54 and envelope protein gene regions (mean ± SD: 5.04 ± 0.32. Further analysis found that positive selection pressure of DENV-3 may occur in the non-structural protein 1 gene region and the positive selection site was detected at position 178 of the NS1 gene. Conclusion Our study confirmed that the envelope protein is under purifying selection pressure although it presented higher sequence diversity. The detection of positive selection pressure in the non-structural protein along genotype II indicated that DENV-3 originated from Southeast Asia needs to monitor the emergence of DENV strains with epidemic potential for better epidemic prevention and vaccine development.

  16. Molecular characterization of Giardia psittaci by multilocus sequence analysis.

    Science.gov (United States)

    Abe, Niichiro; Makino, Ikuko; Kojima, Atsushi

    2012-12-01

    Multilocus sequence analyses targeting small subunit ribosomal DNA (SSU rDNA), elongation factor 1 alpha (ef1α), glutamate dehydrogenase (gdh), and beta giardin (β-giardin) were performed on Giardia psittaci isolates from three Budgerigars (Melopsittacus undulates) and four Barred parakeets (Bolborhynchus lineola) kept in individual households or imported from overseas. Nucleotide differences and phylogenetic analyses at four loci indicate the distinction of G. psittaci from the other known Giardia species: Giardia muris, Giardia microti, Giardia ardeae, and Giardia duodenalis assemblages. Furthermore, G. psittaci was related more closely to G. duodenalis than to the other known Giardia species, except for G. microti. Conflicting signals regarded as "double peaks" were found at the same nucleotide positions of the ef1α in all isolates. However, the sequences of the other three loci, including gdh and β-giardin, which are known to be highly variable, from all isolates were also mutually identical at every locus. They showed no double peaks. These results suggest that double peaks found in the ef1α sequences are caused not by mixed infection with genetically different G. psittaci isolates but by allelic sequence heterogeneity (ASH), which is observed in diplomonad lineages including G. duodenalis. No sequence difference was found in any G. psittaci isolates at the gdh and β-giardin, suggesting that G. psittaci is indeed not more diverse genetically than other Giardia species. This report is the first to provide evidence related to the genetic characteristics of G. psittaci obtained using multilocus sequence analysis. Copyright © 2012 Elsevier B.V. All rights reserved.

  17. CISAPS: Complex Informational Spectrum for the Analysis of Protein Sequences

    Directory of Open Access Journals (Sweden)

    Charalambos Chrysostomou

    2015-01-01

    Full Text Available Complex informational spectrum analysis for protein sequences (CISAPS and its web-based server are developed and presented. As recent studies show, only the use of the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient. Therefore, CISAPS is developed to consider and provide results in three forms including absolute, real, and imaginary spectrum. Biologically related features to the analysis of influenza A subtypes as presented as a case study in this study can also appear individually either in the real or imaginary spectrum. As the results presented, protein classes can present similarities or differences according to the features extracted from CISAPS web server. These associations are probable to be related with the protein feature that the specific amino acid index represents. In addition, various technical issues such as zero-padding and windowing that may affect the analysis are also addressed. CISAPS uses an expanded list of 611 unique amino acid indices where each one represents a different property to perform the analysis. This web-based server enables researchers with little knowledge of signal processing methods to apply and include complex informational spectrum analysis to their work.

  18. EVOLUTION OF NUCLEAR RDNA ITS SEQUENCES IN THE CLADOPHORA ALBIDA/SERICEA CLADE (CHLOROPHYTA)

    NARCIS (Netherlands)

    BAKKER, FT; OLSEN, JL; STAM, WT

    Ribosomal DNA ITS sequences were compared among 13 different species and biogeographic isolates from the monophyletic ''abbida/sericea clade'' in the green algal genus Cladophora. Six distinct ITS sequence types were found, characterized by multiple insertions and deletions and high levels of

  19. Genome sequence of the Brown Norway rat yields insights into mammalian evolution

    DEFF Research Database (Denmark)

    Gibbs, Richard A; Weinstock, George M; Metzker, Michael L

    2004-01-01

    The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering ove...

  20. Molecular phylogeny, population genetics, and evolution of heterocystous cyanobacteria using nifH gene sequences

    Czech Academy of Sciences Publication Activity Database

    Singh, P.; Singh, S. S.; Elster, Josef; Mishra, A. K.

    2013-01-01

    Roč. 250, č. 3 (2013), s. 751-764 ISSN 0033-183X Institutional support: RVO:67985939 Keywords : evolution * heterocystous cyanobacteria * nifH gene Subject RIV: EH - Ecology, Behaviour Impact factor: 3.171, year: 2013

  1. New insights about enzyme evolution from large scale studies of sequence and structure relationships

    OpenAIRE

    Babbitt, Patricia; Brown, SD; Babbitt, PC

    2014-01-01

    © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.Understanding how enzymes have evolved offers clues about their structure-function relationships and mechanisms. Here, we describe evolution of functionally diverse enzyme superfami

  2. Phylogeny and evolution of Digitulati ground beetles (Coleoptera, Carabidae) inferred from mitochondrial ND5 gene sequences.

    Science.gov (United States)

    Su, Zhi-Hui; Imura, Yûki; Okamoto, Munehiro; Kim, Choong-Gon; Zhou, Hong-Zhang; Paik, Jong-Cheol; Osawa, Syozo

    2004-01-01

    Genealogical trees have been constructed using mitochondrial ND5 gene sequences of 87 specimens consisting of 32 species which have been believed to belong to the division Digitulati (one of the lineages of the subtribe Carabina) of the world. There have been recognized six lineages, which are well separated from each other. Each lineage contains the following genus: (1) the lineage A: Ohomopterus from Japan; (2) the lineage B: Isiocarabus from eastern Eurasian Continent; (3) the lineage C: Carabus from China which are further subdivided into three sublineages; (4) the lineage D: Carabus from USA; (5) the lineage E: Carabus from the Eurasian Continent, Japan and North America; and (6) the lineage F: Eucarabus from the Eurasian Continent. Additionally, the genus Acrocarabus which had been treated as a constituent of the division Archicarabomorphi has been recognized to be the 7th lineage of the division Digitulati from the ND5 genealogical analysis as well as morphology. These lineages are assumed to have radiated within a short period and are largely linked to their geographic distribution.

  3. THE RISE AND FALL OF PASSIVE DISK GALAXIES: MORPHOLOGICAL EVOLUTION ALONG THE RED SEQUENCE REVEALED BY COSMOS

    International Nuclear Information System (INIS)

    Bundy, Kevin; Hopkins, Philip; Ma, Chung-Pei; Scarlata, Claudia; Capak, Peter; Carollo, C. M.; Oesch, Pascal; Ellis, Richard S.; Salvato, Mara; Scoville, Nick; Drory, Niv; Leauthaud, Alexie; Koekemoer, Anton M.; Murray, Norman; Ilbert, Olivier; Pozzetti, Lucia

    2010-01-01

    The increasing abundance of passive 'red-sequence' galaxies since z ∼ 1-2 is mirrored by a coincident rise in the number of galaxies with spheroidal morphologies. In this paper, however, we show in detail, that, the correspondence between galaxy morphology and color is not perfect, providing insight into the physical origin of this evolution. Using the COSMOS survey, we study a significant population of red-sequence galaxies with disk-like morphologies. These passive disks typically have Sa-Sb morphological types with large bulges, but they are not confined to dense environments. They represent nearly one-half of all red-sequence galaxies and dominate at lower masses (∼ 10 M sun ) where they are increasingly disk-dominated. As a function of time, the abundance of passive disks with M * ∼ 11 M sun increases, but not as fast as red-sequence spheroidals in the same mass range. At higher mass, the passive disk population has declined since z ∼ 1, likely because they transform into spheroidals. Based on these trends, we estimate that as much as 60% of galaxies transitioning onto the red sequence evolve through a passive disk phase. The origin of passive disks therefore has broad implications for our understanding of how star formation shuts down. Because passive disks tend to be more bulge-dominated than their star-forming counterparts, a simple fading of blue disks does not fully explain their origin. We explore the strengths and weaknesses of several more sophisticated explanations, including environmental effects, internal stabilization, and disk regrowth during gas-rich mergers. While previous work has sought to explain color and morphological transformations with a single process, these observations open the way to new insight by highlighting the fact that galaxy evolution may actually proceed through several separate stages.

  4. CAFE: aCcelerated Alignment-FrEe sequence analysis.

    Science.gov (United States)

    Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A; Waterman, Michael S; Sun, Fengzhu

    2017-07-03

    Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Convergent Evolution of Hemoglobin Function in High-Altitude Andean Waterfowl Involves Limited Parallelism at the Molecular Sequence Level.

    Directory of Open Access Journals (Sweden)

    Chandrasekhar Natarajan

    2015-12-01

    Full Text Available A fundamental question in evolutionary genetics concerns the extent to which adaptive phenotypic convergence is attributable to convergent or parallel changes at the molecular sequence level. Here we report a comparative analysis of hemoglobin (Hb function in eight phylogenetically replicated pairs of high- and low-altitude waterfowl taxa to test for convergence in the oxygenation properties of Hb, and to assess the extent to which convergence in biochemical phenotype is attributable to repeated amino acid replacements. Functional experiments on native Hb variants and protein engineering experiments based on site-directed mutagenesis revealed the phenotypic effects of specific amino acid replacements that were responsible for convergent increases in Hb-O2 affinity in multiple high-altitude taxa. In six of the eight taxon pairs, high-altitude taxa evolved derived increases in Hb-O2 affinity that were caused by a combination of unique replacements, parallel replacements (involving identical-by-state variants with independent mutational origins in different lineages, and collateral replacements (involving shared, identical-by-descent variants derived via introgressive hybridization. In genome scans of nucleotide differentiation involving high- and low-altitude populations of three separate species, function-altering amino acid polymorphisms in the globin genes emerged as highly significant outliers, providing independent evidence for adaptive divergence in Hb function. The experimental results demonstrate that convergent changes in protein function can occur through multiple historical paths, and can involve multiple possible mutations. Most cases of convergence in Hb function did not involve parallel substitutions and most parallel substitutions did not affect Hb-O2 affinity, indicating that the repeatability of phenotypic evolution does not require parallelism at the molecular level.

  6. Cyclic Sequences, Events and Evolution of the Sino-Korean Plate,with a Discussion on the Evolution of Molar-tooth Carbonates,Phosphorites and Source Rocks

    Institute of Scientific and Technical Information of China (English)

    MENG Xianghua; GE Ming

    2003-01-01

    This paper gives an account of the research that the authors conducted on the cyclic sequences, events and evolutionary history from Proterozoic to Meso-Cenozoic in the Sino-Korean plate based on the principle of the Cosmos-Earth System. The authors divided this plate into 20 super-cyclic or super-mega-cyclic periods and more than 100 Oort periods. The research focused on important sea flooding events, uplift interruption events, tilting movement events, molar-tooth carbonate events, thermal events, polarity reversal events, karst events, volcanic explosion events and storm events, as well as types of resource areas and paleotectonic evolution. By means of the isochronous theory of the Cosmos-Earth System periodicity and based on long-excentricity and periodicity, the authors elaborately studied the paleogeographic evolution of the aulacogen of the Sino-Korean plate, the oolitic beach platform formation, the development of foreland basin and continental rift valley basin, and reconstructed the evolution of tectonic paleogeography and stratigraphic framework in the Sino-Korean plate in terms of evolutionary maps. Finally, the authors gave a profound discussion on the formation and development of molar-tooth carbonates, phosphorites and source rocks.

  7. Stratigraphic analysis and geodynamic evolution of the 'Transitional Sequence' in the southern Sergipe-Alagoas Basin; Analise estratigrafica e evolucao geodinamica da 'Sequencia Transicional' na porcao sul da Bacia de Sergipe-Alagoas

    Energy Technology Data Exchange (ETDEWEB)

    Cruz, Liliane Rabelo [Petrobras, Natal/Fortaleza, RN/CE (Brazil). Unidade de Operacoes de Exploracao e Producao. Avaliacao de Blocos e Interpretacao Geologica e Geofisica], E-mail: lilianerabelo@petrobras.com.br; Cordoba, Valeria Centurion; Sa, Emanuel Ferraz Jardim de [Universidade Federal do Rio Grande do Norte (UFRN), Natal, RN (Brazil). Centro de Ciencias Exatas e da Terra. Programa de Pos-Graduacao em Geodinamica e Geofisica], Emails: valeria.geo@ufrnet.br, emanuel@ccet.ufrn.br

    2009-11-15

    This paper presents a stratigraphic analysis and considerations about the tectonic activity and subsidence processes that controlled the deposition of the 'Transitional Sequence' in the Sergipe Sub-basin (SBSE), southern Sergipe-Alagoas Basin, during the Neoaptian (upper Alagoas), the transition between the rift and drift stages in this basin. The stratigraphic analysis of selected wells led to the recognition of five third order depositional sequences, that were deposited in decreasing conditions of water circulation, from continental siliciclastic systems to lagoonar-evaporitic systems and, finally, to restricted marine systems, indicating an increase of the base level. In terms of a second order cycle, the 'Transitional Sequence' represents the deposition in a stage of base-level rise and the beginning deposition of a transgressive system tract, whose transition to the overlying marine transgressive sequence is marked by flooding. In terms of a third order cycle, this transition is marked by a local unconformity, which laterally changes to a correlative conformity. Such unconformity is interpreted as a breakup unconformity, of pre-Albian age in the SBSE, contrasting with the more expressive (especially for its angular character) of the Pre-Upper Alagoas Unconformity (DPAL), at the base of the 'Transitional Sequence', frequently regarded in the literature as the breakup unconformity. Even though, the Pre-Albian Unconformity is here favoured as the major change in depositional context and tectonic environment (rift to drift) in the SBSE. The deposition of the 'Transitional Sequence' occurred close to the end of the Rifting Stage, with an epicontinental sea advancing along a segment of continental crust still in extension, controlled by a combination of thermal and mechanical subsidence, the latter already in pronounced decline. (author)

  8. Multilocus sequence analysis of nectar pseudomonads reveals high genetic diversity and contrasting recombination patterns.

    Science.gov (United States)

    Alvarez-Pérez, Sergio; de Vega, Clara; Herrera, Carlos M

    2013-01-01

    The genetic and evolutionary relationships among floral nectar-dwelling Pseudomonas 'sensu stricto' isolates associated to South African and Mediterranean plants were investigated by multilocus sequence analysis (MLSA) of four core housekeeping genes (rrs, gyrB, rpoB and rpoD). A total of 35 different sequence types were found for the 38 nectar bacterial isolates characterised. Phylogenetic analyses resulted in the identification of three main clades [nectar groups (NGs) 1, 2 and 3] of nectar pseudomonads, which were closely related to five intrageneric groups: Pseudomonas oryzihabitans (NG 1); P. fluorescens, P. lutea and P. syringae (NG 2); and P. rhizosphaerae (NG 3). Linkage disequilibrium analysis pointed to a mostly clonal population structure, even when the analysis was restricted to isolates from the same floristic region or belonging to the same NG. Nevertheless, signatures of recombination were observed for NG 3, which exclusively included isolates retrieved from the floral nectar of insect-pollinated Mediterranean plants. In contrast, the other two NGs comprised both South African and Mediterranean isolates. Analyses relating diversification to floristic region and pollinator type revealed that there has been more unique evolution of the nectar pseudomonads within the Mediterranean region than would be expected by chance. This is the first work analysing the sequence of multiple loci to reveal geno- and ecotypes of nectar bacteria.

  9. Multilocus Sequence Analysis of Nectar Pseudomonads Reveals High Genetic Diversity and Contrasting Recombination Patterns

    Science.gov (United States)

    Álvarez-Pérez, Sergio; de Vega, Clara; Herrera, Carlos M.

    2013-01-01

    The genetic and evolutionary relationships among floral nectar-dwelling Pseudomonas ‘sensu stricto’ isolates associated to South African and Mediterranean plants were investigated by multilocus sequence analysis (MLSA) of four core housekeeping genes (rrs, gyrB, rpoB and rpoD). A total of 35 different sequence types were found for the 38 nectar bacterial isolates characterised. Phylogenetic analyses resulted in the identification of three main clades [nectar groups (NGs) 1, 2 and 3] of nectar pseudomonads, which were closely related to five intrageneric groups: Pseudomonas oryzihabitans (NG 1); P. fluorescens, P. lutea and P. syringae (NG 2); and P. rhizosphaerae (NG 3). Linkage disequilibrium analysis pointed to a mostly clonal population structure, even when the analysis was restricted to isolates from the same floristic region or belonging to the same NG. Nevertheless, signatures of recombination were observed for NG 3, which exclusively included isolates retrieved from the floral nectar of insect-pollinated Mediterranean plants. In contrast, the other two NGs comprised both South African and Mediterranean isolates. Analyses relating diversification to floristic region and pollinator type revealed that there has been more unique evolution of the nectar pseudomonads within the Mediterranean region than would be expected by chance. This is the first work analysing the sequence of multiple loci to reveal geno- and ecotypes of nectar bacteria. PMID:24116076

  10. Environmental impact analysis for the main accidental sequences of ignitor

    International Nuclear Information System (INIS)

    Carpignano, A.; Francabandiera, S.; Vella, R.; Zucchetti, M.

    1996-01-01

    A safety analysis study has been applied to the Ignitor machine using Probabilistic Safety Assessment. The main initiating events have been identified, and accident sequences have been studied by means of traditional methods such as Failure Mode and Effect Analysis (FMEA), Fault Trees (FT) and Event Trees (ET). The consequences of the radioactive environmental releases have been assessed in terms of Effective Dose Equivalent (EDEs) to the Most Exposed Individuals (MEI) of the chosen site, by means of a population dose code. Results point out the low enviromental impact of the machine. 13 refs., 1 fig., 3 tabs

  11. Concerted evolution of sea anemone neurotoxin genes is revealed through analysis of the Nematostella vectensis genome.

    Science.gov (United States)

    Moran, Yehu; Weinberger, Hagar; Sullivan, James C; Reitzel, Adam M; Finnerty, John R; Gurevitz, Michael

    2008-04-01

    Gene families, which encode toxins, are found in many poisonous animals, yet there is limited understanding of their evolution at the nucleotide level. The release of the genome draft sequence for the sea anemone Nematostella vectensis enabled a comprehensive study of a gene family whose neurotoxin products affect voltage-gated sodium channels. All gene family members are clustered in a highly repetitive approximately 30-kb genomic region and encode a single toxin, Nv1. These genes exhibit extreme conservation at the nucleotide level which cannot be explained by purifying selection. This conservation greatly differs from the toxin gene families of other animals (e.g., snakes, scorpions, and cone snails), whose evolution was driven by diversifying selection, thereby generating a high degree of genetic diversity. The low nucleotide diversity at the Nv1 genes is reminiscent of that reported for DNA encoding ribosomal RNA (rDNA) and 2 hsp70 genes from Drosophila, which have evolved via concerted evolution. This evolutionary pattern was experimentally demonstrated in yeast rDNA and was shown to involve unequal crossing-over. Through sequence analysis of toxin genes from multiple N. vectensis populations and 2 other anemone species, Anemonia viridis and Actinia equina, we observed that the toxin genes for each sea anemone species are more similar to one another than to those of other species, suggesting they evolved by manner of concerted evolution. Furthermore, in 2 of the species (A. viridis and A. equina) we found genes that evolved under diversifying selection, suggesting that concerted evolution and accelerated evolution may occur simultaneously.

  12. Illuminating the evolution of equids and rodents with next-generation sequencing of ancient specimens

    DEFF Research Database (Denmark)

    Mouatt, Julia Thidamarth Vilstrup

    enrichment methods and the massive throughput and latest advances within DNA sequencing, the field of ancient DNA has flourished in later years. Those advances have even enabled the sequencing of complete genomes from the past, moving the field into genomic sciences. In this thesis we have used these latest......The sequencing of ancient DNA provides perspectives on the genetic history of past populations and extinct species. However, ancient DNA research presents specific limitations mostly due to DNA survival, damage and contamination. Yet with stringent laboratory procedures, the sensitivity of target...... developments within ancient DNA research, including target enrichment capture and Next-Generation Sequencing, to address a range of evolutionary questions related to two major mammalian groups, equids and rodents. In particular we have resolved phylogenetic relationships within equids using complete mitochond...

  13. Multilocus sequence typing and rtxA toxin gene sequencing analysis of Kingella kingae isolates demonstrates genetic diversity and international clones.

    Directory of Open Access Journals (Sweden)

    Romain Basmaci

    Full Text Available BACKGROUND: Kingella kingae, a normal component of the upper respiratory flora, is being increasingly recognized as an important invasive pathogen in young children. Genetic diversity of this species has not been studied. METHODS: We analyzed 103 strains from different countries and clinical origins by a new multilocus sequence-typing (MLST schema. Putative virulence gene rtxA, encoding an RTX toxin, was also sequenced, and experimental virulence of representative strains was assessed in a juvenile-rat model. RESULTS: Thirty-six sequence-types (ST and nine ST-complexes (STc were detected. The main STc 6, 14 and 23 comprised 23, 17 and 20 strains respectively, and were internationally distributed. rtxA sequencing results were mostly congruent with MLST, and showed horizontal transfer events. Of interest, all members of the distantly related ST-6 (n = 22 and ST-5 (n = 4 harboured a 33 bp duplication or triplication in their rtxA sequence, suggesting that this genetic trait arose through selective advantage. The animal model revealed significant differences in virulence among strains of the species. CONCLUSION: MLST analysis reveals international spread of ST-complexes and will help to decipher acquisition and evolution of virulence traits and diversity of pathogenicity among K. kingae strains, for which an experimental animal model is now available.

  14. Sequence comparisons of odorant receptors among tortricid moths reveal different rates of molecular evolution among family members.

    Directory of Open Access Journals (Sweden)

    Colm Carraher

    Full Text Available In insects, odorant receptors detect volatile cues involved in behaviours such as mate recognition, food location and oviposition. We have investigated the evolution of three odorant receptors from five species within the moth genera Ctenopseustis and Planotrotrix, family Tortricidae, which fall into distinct clades within the odorant receptor multigene family. One receptor is the orthologue of the co-receptor Or83b, now known as Orco (OR2, and encodes the obligate ion channel subunit of the receptor complex. In comparison, the other two receptors, OR1 and OR3, are ligand-binding receptor subunits, activated by volatile compounds produced by plants--methyl salicylate and citral, respectively. Rates of sequence evolution at non-synonymous sites were significantly higher in OR1 compared with OR2 and OR3. Within the dataset OR1 contains 109 variable amino acid positions that are distributed evenly across the entire protein including transmembrane helices, loop regions and termini, while OR2 and OR3 contain 18 and 16 variable sites, respectively. OR2 shows a high level of amino acid conservation as expected due to its essential role in odour detection; however we found unexpected differences in the rate of evolution between two ligand-binding odorant receptors, OR1 and OR3. OR3 shows high sequence conservation suggestive of a conserved role in odour reception, whereas the higher rate of evolution observed in OR1, particularly at non-synonymous sites, may be suggestive of relaxed constraint, perhaps associated with the loss of an ancestral role in sex pheromone reception.

  15. Analysis of sequence diversity through internal transcribed spacers and simple sequence repeats to identify Dendrobium species.

    Science.gov (United States)

    Liu, Y T; Chen, R K; Lin, S J; Chen, Y C; Chin, S W; Chen, F C; Lee, C Y

    2014-04-08

    The Orchidaceae is one of the largest and most diverse families of flowering plants. The Dendrobium genus has high economic potential as ornamental plants and for medicinal purposes. In addition, the species of this genus are able to produce large crops. However, many Dendrobium varieties are very similar in outward appearance, making it difficult to distinguish one species from another. This study demonstrated that the 12 Dendrobium species used in this study may be divided into 2 groups by internal transcribed spacer (ITS) sequence analysis. Red and yellow flowers may also be used to separate these species into 2 main groups. In particular, the deciduous characteristic is associated with the ITS genetic diversity of the A group. Of 53 designed simple sequence repeat (SSR) primer pairs, 7 pairs were polymorphic for polymerase chain reaction products that were amplified from a specific band. The results of this study demonstrate that these 7 SSR primer pairs may potentially be used to identify Dendrobium species and their progeny in future studies.

  16. Phylogenomic analysis of secondary metabolism genes sheds light on their evolution in Aspergilli

    DEFF Research Database (Denmark)

    Theobald, Sebastian; Vesth, Tammi Camilla; Rasmussen, Jane Lind Nybo

    .Natural products are encoded by genes located in close proximity, called secondary metabolic gene clusters, which makes them interesting targets for genomic analysis. We use a modified version of the Secondary Metabolite Unique Regions Finder (SMURF) algorithm, combined with InterPro annotations to create...... approximate maximum likelihood trees of conserved domains from secondary metabolic genes across 56 species, giving insights into the secondary metabolism gene diversity and evolution.In this study we can describe the evolution of non ribosomal peptide synthetases (NRPS), polyketide synthases (PKS) and hybrids.......In the aspMine project, we are sequencing and analyzing over 300 species of Aspergilli, agroup of filamentous fungi rich in natural compounds. The vast amount of data obtained from these species challenges the way we were mining for products and requires new pipelines for secondary metabolite analysis...

  17. Using Behavior Sequence Analysis to Map Serial Killers' Life Histories.

    Science.gov (United States)

    Keatley, David A; Golightly, Hayley; Shephard, Rebecca; Yaksic, Enzo; Reid, Sasha

    2018-03-01

    The aim of the current research was to provide a novel method for mapping the developmental sequences of serial killers' life histories. An in-depth biographical account of serial killers' lives, from birth through to conviction, was gained and analyzed using Behavior Sequence Analysis. The analyses highlight similarities in behavioral events across the serial killers' lives, indicating not only which risk factors occur, but the temporal order of these factors. Results focused on early childhood environment, indicating the role of parental abuse; behaviors and events surrounding criminal histories of serial killers, showing that many had previous convictions and were known to police for other crimes; behaviors surrounding their murders, highlighting differences in victim choice and modus operandi; and, finally, trial pleas and convictions. The present research, therefore, provides a novel approach to synthesizing large volumes of data on criminals and presenting results in accessible, understandable outcomes.

  18. Greater than the sum of its parts: single-nucleus sequencing identifies convergent evolution of independent EGFR mutants in GBM.

    Science.gov (United States)

    Gini, Beatrice; Mischel, Paul S

    2014-08-01

    Single-cell sequencing approaches are needed to characterize the genomic diversity of complex tumors, shedding light on their evolutionary paths and potentially suggesting more effective therapies. In this issue of Cancer Discovery, Francis and colleagues develop a novel integrative approach to identify distinct tumor subpopulations based on joint detection of clonal and subclonal events from bulk tumor and single-nucleus whole-genome sequencing, allowing them to infer a subclonal architecture. Surprisingly, the authors identify convergent evolution of multiple, mutually exclusive, independent EGFR gain-of-function variants in a single tumor. This study demonstrates the value of integrative single-cell genomics and highlights the biologic primacy of EGFR as an actionable target in glioblastoma. ©2014 American Association for Cancer Research.

  19. Optimization of dynamic economic dispatch with valve-point effect using chaotic sequence based differential evolution algorithms

    International Nuclear Information System (INIS)

    He Dakuo; Dong Gang; Wang Fuli; Mao Zhizhong

    2011-01-01

    A chaotic sequence based differential evolution (DE) approach for solving the dynamic economic dispatch problem (DEDP) with valve-point effect is presented in this paper. The proposed method combines the DE algorithm with the local search technique to improve the performance of the algorithm. DE is the main optimizer, while an approximated model for local search is applied to fine tune in the solution of the DE run. To accelerate convergence of DE, a series of constraints handling rules are adopted. An initial population obtained by using chaotic sequence exerts optimal performance of the proposed algorithm. The combined algorithm is validated for two test systems consisting of 10 and 13 thermal units whose incremental fuel cost function takes into account the valve-point loading effects. The proposed combined method outperforms other algorithms reported in literatures for DEDP considering valve-point effects.

  20. Genome sequence and comparative analysis of a putative entomopathogenic Serratia isolated from Caenorhabditis briggsae.

    Science.gov (United States)

    Abebe-Akele, Feseha; Tisa, Louis S; Cooper, Vaughn S; Hatcher, Philip J; Abebe, Eyualem; Thomas, W Kelley

    2015-07-18

    Entomopathogenic associations between nematodes in the genera Steinernema and Heterorhabdus with their cognate bacteria from the bacterial genera Xenorhabdus and Photorhabdus, respectively, are extensively studied for their potential as biological control agents against invasive insect species. These two highly coevolved associations were results of convergent evolution. Given the natural abundance of bacteria, nematodes and insects, it is surprising that only these two associations with no intermediate forms are widely studied in the entomopathogenic context. Discovering analogous systems involving novel bacterial and nematode species would shed light on the evolutionary processes involved in the transition from free living organisms to obligatory partners in entomopathogenicity. We report the complete genome sequence of a new member of the enterobacterial genus Serratia that forms a putative entomopathogenic complex with Caenorhabditis briggsae. Analysis of the 5.04 MB chromosomal genome predicts 4599 protein coding genes, seven sets of ribosomal RNA genes, 84 tRNA genes and a 64.8 KB plasmid encoding 74 genes. Comparative genomic analysis with three of the previously sequenced Serratia species, S. marcescens DB11 and S. proteamaculans 568, and Serratia sp. AS12, revealed that these four representatives of the genus share a core set of ~3100 genes and extensive structural conservation. The newly identified species shares a more recent common ancestor with S. marcescens with 99% sequence identity in rDNA sequence and orthology across 85.6% of predicted genes. Of the 39 genes/operons implicated in the virulence, symbiosis, recolonization, immune evasion and bioconversion, 21 (53.8%) were present in Serratia while 33 (84.6%) and 35 (89%) were present in Xenorhabdus and Photorhabdus EPN bacteria respectively. The majority of unique sequences in Serratia sp. SCBI (South African Caenorhabditis briggsae Isolate) are found in ~29 genomic islands of 5 to 65 genes and are

  1. Novel arenavirus sequences in Hylomyscus sp. and Mus (Nannomys setulosus from Côte d'Ivoire: implications for evolution of arenaviruses in Africa.

    Directory of Open Access Journals (Sweden)

    David Coulibaly-N'Golo

    Full Text Available This study aimed to identify new arenaviruses and gather insights in the evolution of arenaviruses in Africa. During 2003 through 2005, 1,228 small mammals representing 14 different genera were trapped in 9 villages in south, east, and middle west of Côte d'Ivoire. Specimens were screened by pan-Old World arenavirus RT-PCRs targeting S and L RNA segments as well as immunofluorescence assay. Sequences of two novel tentative species of the family Arenaviridae, Menekre and Gbagroube virus, were detected in Hylomyscus sp. and Mus (Nannomys setulosus, respectively. Arenavirus infection of Mus (Nannomys setulosus was also demonstrated by serological testing. Lassa virus was not found, although 60% of the captured animals were Mastomys natalensis. Complete S RNA and partial L RNA sequences of the novel viruses were recovered from the rodent specimens and subjected to phylogenetic analysis. Gbagroube virus is a closely related sister taxon of Lassa virus, while Menekre virus clusters with the Ippy/Mobala/Mopeia virus complex. Reconstruction of possible virus-host co-phylogeny scenarios suggests that, within the African continent, signatures of co-evolution might have been obliterated by multiple host-switching events.

  2. Swab-to-Sequence: Real-time Data Analysis Platform for the Biomolecule Sequencer

    Data.gov (United States)

    National Aeronautics and Space Administration — DNA was successfully sequenced on the ISS in 2016, but the DNA sequenced was prepared on the ground. With FY’16 IRAD funds, the same team developed a...

  3. Phylogenetic analysis of Gossypium L. using restriction fragment length polymorphism of repeated sequences.

    Science.gov (United States)

    Zhang, Meiping; Rong, Ying; Lee, Mi-Kyung; Zhang, Yang; Stelly, David M; Zhang, Hong-Bin

    2015-10-01

    Cotton is the world's leading textile fiber crop and is also grown as a bioenergy and food crop. Knowledge of the phylogeny of closely related species and the genome origin and evolution of polyploid species is significant for advanced genomics research and breeding. We have reconstructed the phylogeny of the cotton genus, Gossypium L., and deciphered the genome origin and evolution of its five polyploid species by restriction fragment analysis of repeated sequences. Nuclear DNA of 84 accessions representing 35 species and all eight genomes of the genus were analyzed. The phylogenetic tree of the genus was reconstructed using the parsimony method on 1033 polymorphic repeated sequence restriction fragments. The genome origin of its polyploids was determined by calculating the diploid-polyploid restriction fragment correspondence (RFC). The tree is consistent with the morphological classification, genome designation and geographic distribution of the species at subgenus, section and subsection levels. Gossypium lobatum (D7) was unambiguously shown to have the highest RFC with the D-subgenomes of all five polyploids of the genus, while the common ancestor of Gossypium herbaceum (A1) and Gossypium arboreum (A2) likely contributed to the A-subgenomes of the polyploids. These results provide a comprehensive phylogenetic tree of the cotton genus and new insights into the genome origin and evolution of its polyploid species. The results also further demonstrate a simple, rapid and inexpensive method suitable for phylogenetic analysis of closely related species, especially congeneric species, and the inference of genome origin of polyploids that constitute over 70 % of flowering plants.

  4. Whole-exome sequencing reveals the spectrum of gene mutations and the clonal evolution patterns in paediatric acute myeloid leukaemia.

    Science.gov (United States)

    Shiba, Norio; Yoshida, Kenichi; Shiraishi, Yuichi; Okuno, Yusuke; Yamato, Genki; Hara, Yusuke; Nagata, Yasunobu; Chiba, Kenichi; Tanaka, Hiroko; Terui, Kiminori; Kato, Motohiro; Park, Myoung-Ja; Ohki, Kentaro; Shimada, Akira; Takita, Junko; Tomizawa, Daisuke; Kudo, Kazuko; Arakawa, Hirokazu; Adachi, Souichi; Taga, Takashi; Tawa, Akio; Ito, Etsuro; Horibe, Keizo; Sanada, Masashi; Miyano, Satoru; Ogawa, Seishi; Hayashi, Yasuhide

    2016-11-01

    Acute myeloid leukaemia (AML) is a molecularly and clinically heterogeneous disease. Targeted sequencing efforts have identified several mutations with diagnostic and prognostic values in KIT, NPM1, CEBPA and FLT3 in both adult and paediatric AML. In addition, massively parallel sequencing enabled the discovery of recurrent mutations (i.e. IDH1/2 and DNMT3A) in adult AML. In this study, whole-exome sequencing (WES) of 22 paediatric AML patients revealed mutations in components of the cohesin complex (RAD21 and SMC3), BCORL1 and ASXL2 in addition to previously known gene mutations. We also revealed intratumoural heterogeneities in many patients, implicating multiple clonal evolution events in the development of AML. Furthermore, targeted deep sequencing in 182 paediatric AML patients identified three major categories of recurrently mutated genes: cohesion complex genes [STAG2, RAD21 and SMC3 in 17 patients (8·3%)], epigenetic regulators [ASXL1/ASXL2 in 17 patients (8·3%), BCOR/BCORL1 in 7 patients (3·4%)] and signalling molecules. We also performed WES in four patients with relapsed AML. Relapsed AML evolved from one of the subclones at the initial phase and was accompanied by many additional mutations, including common driver mutations that were absent or existed only with lower allele frequency in the diagnostic samples, indicating a multistep process causing leukaemia recurrence. © 2016 John Wiley & Sons Ltd.

  5. The Swiss-Army-Knife Approach to the Nearly Automatic Analysis for Microearthquake Sequences.

    Science.gov (United States)

    Kraft, T.; Simon, V.; Tormann, T.; Diehl, T.; Herrmann, M.

    2017-12-01

    Many Swiss earthquake sequence have been studied using relative location techniques, which often allowed to constrain the active fault planes and shed light on the tectonic processes that drove the seismicity. Yet, in the majority of cases the number of located earthquakes was too small to infer the details of the space-time evolution of the sequences, or their statistical properties. Therefore, it has mostly been impossible to resolve clear patterns in the seismicity of individual sequences, which are needed to improve our understanding of the mechanisms behind them. Here we present a nearly automatic workflow that combines well-established seismological analysis techniques and allows to significantly improve the completeness of detected and located earthquakes of a sequence. We start from the manually timed routine catalog of the Swiss Seismological Service (SED), which contains the larger events of a sequence. From these well-analyzed earthquakes we dynamically assemble a template set and perform a matched filter analysis on the station with: the best SNR for the sequence; and a recording history of at least 10-15 years, our typical analysis period. This usually allows us to detect events several orders of magnitude below the SED catalog detection threshold. The waveform similarity of the events is then further exploited to derive accurate and consistent magnitudes. The enhanced catalog is then analyzed statistically to derive high-resolution time-lines of the a- and b-value and consequently the occurrence probability of larger events. Many of the detected events are strong enough to be located using double-differences. No further manual interaction is needed; we simply time-shift the arrival-time pattern of the detecting template to the associated detection. Waveform similarity assures a good approximation of the expected arrival-times, which we use to calculate event-pair arrival-time differences by cross correlation. After a SNR and cycle-skipping quality

  6. Nucleotide sequences of immunoglobulin eta genes of chimpanzee and orangutan: DNA molecular clock and hominoid evolution

    Energy Technology Data Exchange (ETDEWEB)

    Sakoyama, Y.; Hong, K.J.; Byun, S.M.; Hisajima, H.; Ueda, S.; Yaoita, Y.; Hayashida, H.; Miyata, T.; Honjo, T.

    1987-02-01

    To determine the phylogenetic relationships among hominoids and the dates of their divergence, the complete nucleotide sequences of the constant region of the immunoglobulin eta-chain (C/sub eta1/) genes from chimpanzee and orangutan have been determined. These sequences were compared with the human eta-chain constant-region sequence. A molecular clock (silent molecular clock), measured by the degree of sequence divergence at the synonymous (silent) positions of protein-encoding regions, was introduced for the present study. From the comparison of nucleotide sequences of ..cap alpha../sub 1/-antitrypsin and ..beta..- and delta-globulin genes between humans and Old World monkeys, the silent molecular clock was calibrated: the mean evolutionary rate of silent substitution was determined to be 1.56 x 10/sup -9/ substitutions per site per year. Using the silent molecular clock, the mean divergence dates of chimpanzee and orangutan from the human lineage were estimated as 6.4 +/- 2.6 million years and 17.3 +/- 4.5 million years, respectively. It was also shown that the evolutionary rate of primate genes is considerably slower than those of other mammalian genes.

  7. Evolution of the sedimentation technique for particle size distribution analysis

    International Nuclear Information System (INIS)

    Maley, R.

    1998-01-01

    After an introduction on the significance of particle size measurements, sedimentation methods are described, with emphasis on the evolution of the gravitational approach. The gravitational technique based on mass determination by X-ray adsorption allows fast analysis by automation and easy data handling, in addition to providing the accuracy required by quality control and research applications [it

  8. Evolution and expression analysis of the soybean glutamate ...

    Indian Academy of Sciences (India)

    Evolution and expression analysis of the soybean glutamate decarboxylase gene family. TAE KYUNG HYUN, SEUNG HEE EOM, XIAO HAN and JU-SUNG KIM http://www.ias.ac.in/jbiosci. J. Biosci. 39(5), December 2014, 899–907, © Indian Academy of Sciences. Supplementary material. Supplementary figure 1.

  9. Xe-135 and Sm-149 Isotopic Evolution Analysis Xesamo code

    International Nuclear Information System (INIS)

    Caro, R.; Gallego, J.; Martinez Fanegas, R.

    1977-01-01

    In this report the time evolution analysis of the nuclides concentration Xe-135 and Sm-149 as a function of the neutron flux is carried out. The neutron flux may be any function of time. It is analyzed as well the reactivity changes associated with the xenon and samarium concentration variations. (Author) 5 refs

  10. Assembly and comparative analysis of complete mitochondrial genome sequence of an economic plant Salix suchowensis

    Directory of Open Access Journals (Sweden)

    Ning Ye

    2017-03-01

    Full Text Available Willow is a widely used dioecious woody plant of Salicaceae family in China. Due to their high biomass yields, willows are promising sources for bioenergy crops. In this study, we assembled the complete mitochondrial (mt genome sequence of S. suchowensis with the length of 644,437 bp using Roche-454 GS FLX Titanium sequencing technologies. Base composition of the S. suchowensis mt genome is A (27.43%, T (27.59%, C (22.34%, and G (22.64%, which shows a prevalent GC content with that of other angiosperms. This long circular mt genome encodes 58 unique genes (32 protein-coding genes, 23 tRNA genes and 3 rRNA genes, and 9 of the 32 protein-coding genes contain 17 introns. Through the phylogenetic analysis of 35 species based on 23 protein-coding genes, it is supported that Salix as a sister to Populus. With the detailed phylogenetic information and the identification of phylogenetic position, some ribosomal protein genes and succinate dehydrogenase genes are found usually lost during evolution. As a native shrub willow species, this worthwhile research of S. suchowensis mt genome will provide more desirable information for better understanding the genomic breeding and missing pieces of sex determination evolution in the future.

  11. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    STORAGESEVER

    2010-07-19

    Jul 19, 2010 ... and antisense primers, a single band of 573 base pairs .... Amino acid sequence alignment of Cluster I and Cluster II of phylogenetic tree. First ten sequences ... sequence weighting, postion-spiecific gap penalties and weight.

  12. Linear discriminant analysis of character sequences using occurrences of words

    KAUST Repository

    Dutta, Subhajit; Chaudhuri, Probal; Ghosh, Anil

    2014-01-01

    Classification of character sequences, where the characters come from a finite set, arises in disciplines such as molecular biology and computer science. For discriminant analysis of such character sequences, the Bayes classifier based on Markov models turns out to have class boundaries defined by linear functions of occurrences of words in the sequences. It is shown that for such classifiers based on Markov models with unknown orders, if the orders are estimated from the data using cross-validation, the resulting classifier has Bayes risk consistency under suitable conditions. Even when Markov models are not valid for the data, we develop methods for constructing classifiers based on linear functions of occurrences of words, where the word length is chosen by cross-validation. Such linear classifiers are constructed using ideas of support vector machines, regression depth, and distance weighted discrimination. We show that classifiers with linear class boundaries have certain optimal properties in terms of their asymptotic misclassification probabilities. The performance of these classifiers is demonstrated in various simulated and benchmark data sets.

  13. Planarian homeobox genes: cloning, sequence analysis, and expression.

    Science.gov (United States)

    Garcia-Fernàndez, J; Baguñà, J; Saló, E

    1991-01-01

    Freshwater planarians (Platyhelminthes, Turbellaria, and Tricladida) are acoelomate, triploblastic, unsegmented, and bilaterally symmetrical organisms that are mainly known for their ample power to regenerate a complete organism from a small piece of their body. To identify potential pattern-control genes in planarian regeneration, we have isolated two homeobox-containing genes, Dth-1 and Dth-2 [Dugesia (Girardia) tigrina homeobox], by using degenerate oligonucleotides corresponding to the most conserved amino acid sequence from helix-3 of the homeodomain. Dth-1 and Dth-2 homeodomains are closely related (68% at the nucleotide level and 78% at the protein level) and show the conserved residues characteristic of the homeodomains identified to data. Similarity with most homeobox sequences is low (30-50%), except with Drosophila NK homeodomains (80-82% with NK-2) and the rodent TTF-1 homeodomain (77-87%). Some unusual amino acid residues specific to NK-2, TTF-1, Dth-1, and Dth-2 can be observed in the recognition helix (helix-3) and may define a family of homeodomains. The deduced amino acid sequences from the cDNAs contain, in addition to the homeodomain, other domains also present in various homeobox-containing genes. The expression of both genes, detected by Northern blot analysis, appear slightly higher in cephalic regions than in the rest of the intact organism, while a slight increase is detected in the central period (5 days) or regeneration. Images PMID:1714599

  14. Analysis of correlations between sites in models of protein sequences

    International Nuclear Information System (INIS)

    Giraud, B.G.; Lapedes, A.; Liu, L.C.

    1998-01-01

    A criterion based on conditional probabilities, related to the concept of algorithmic distance, is used to detect correlated mutations at noncontiguous sites on sequences. We apply this criterion to the problem of analyzing correlations between sites in protein sequences; however, the analysis applies generally to networks of interacting sites with discrete states at each site. Elementary models, where explicit results can be derived easily, are introduced. The number of states per site considered ranges from 2, illustrating the relation to familiar classical spin systems, to 20 states, suitable for representing amino acids. Numerical simulations show that the criterion remains valid even when the genetic history of the data samples (e.g., protein sequences), as represented by a phylogenetic tree, introduces nonindependence between samples. Statistical fluctuations due to finite sampling are also investigated and do not invalidate the criterion. A subsidiary result is found: The more homogeneous a population, the more easily its average properties can drift from the properties of its ancestor. copyright 1998 The American Physical Society

  15. Linear discriminant analysis of character sequences using occurrences of words

    KAUST Repository

    Dutta, Subhajit

    2014-02-01

    Classification of character sequences, where the characters come from a finite set, arises in disciplines such as molecular biology and computer science. For discriminant analysis of such character sequences, the Bayes classifier based on Markov models turns out to have class boundaries defined by linear functions of occurrences of words in the sequences. It is shown that for such classifiers based on Markov models with unknown orders, if the orders are estimated from the data using cross-validation, the resulting classifier has Bayes risk consistency under suitable conditions. Even when Markov models are not valid for the data, we develop methods for constructing classifiers based on linear functions of occurrences of words, where the word length is chosen by cross-validation. Such linear classifiers are constructed using ideas of support vector machines, regression depth, and distance weighted discrimination. We show that classifiers with linear class boundaries have certain optimal properties in terms of their asymptotic misclassification probabilities. The performance of these classifiers is demonstrated in various simulated and benchmark data sets.

  16. The Cenozoic geological evolution of the Central and Northern North Sea based on seismic sequence stratigraphy

    Energy Technology Data Exchange (ETDEWEB)

    Jordt, Henrik

    1996-03-01

    This thesis represents scientific results from seismic sequence stratigraphic investigations. These investigations and results are integrated into an ongoing mineralogical study of the Cenozoic deposits. the main results from this mineralogical study are presented and discussed. The seismic investigations have provided boundary conditions for a forward modelling study of the Cenozoic depositional history. Results from the forward modelling are presented as they emphasise the influence of tectonics on sequence development. The tectonic motions described were important for the formation of the large oil and gas fields in the North Sea.

  17. Sequence analysis of L RNA of Lassa virus

    International Nuclear Information System (INIS)

    Vieth, Simon; Torda, Andrew E.; Asper, Marcel; Schmitz, Herbert; Guenther, Stephan

    2004-01-01

    The L RNA of three Lassa virus strains originating from Nigeria, Ghana/Ivory Coast, and Sierra Leone was sequenced and the data subjected to structure predictions and phylogenetic analyses. The L gene products had 2218-2221 residues, diverged by 18% at the amino acid level, and contained several conserved regions. Only one region of 504 residues (positions 1043-1546) could be assigned a function, namely that of an RNA polymerase. Secondary structure predictions suggest that this domain is very similar to RNA-dependent RNA polymerases of known structure encoded by plus-strand RNA viruses, permitting a model to be built. Outside the polymerase region, there is little structural data, except for regions of strong alpha-helical content and probably a coiled-coil domain at the N terminus. No evidence for reassortment or recombination during Lassa virus evolution was found. The secondary structure-assisted alignment of the RNA polymerase region permitted a reliable reconstruction of the phylogeny of all negative-strand RNA viruses, indicating that Arenaviridae are most closely related to Nairoviruses. In conclusion, the data provide a basis for structural and functional characterization of the Lassa virus L protein and reveal new insights into the phylogeny of negative-strand RNA viruses

  18. Sequence analysis of PROTEOLYSIS 6 from Solanum lycopersicum

    Science.gov (United States)

    Roslan, Nur Farhana; Chew, Bee Lyn; Goh, Hoe-Han; Isa, Nurulhikma Md

    2018-04-01

    The N-end rule pathway is a protein degradation pathway that relates the protein half-life with the identity of its N-terminal residues. A destabilizing N-terminal residues is created by enzymatic reaction or chemical modifications. This destabilized substrate will be recognized by PROTEOLYSIS 6 (PRT6) protein, which encodes an E3 ligase enzyme and resulted in substrate degradation by proteasome. PRT6 has been studied in Arabidopsis thaliana and barley but not yet been studied in fleshy fruit plants. Hence, this study was carried out in tomato that is known as the model for fleshy fruit plants. BLASTX analysis identified that Solyc09g010830 which encodes for a PRT6 gene in tomato based on its sequence similarity with PRT6 in A. thaliana. In silico gene expression analysis shows that PRT6 gene was highly expressed in tomato fruits breaker +5. Co-expression analysis shows that PRT6 may not only involved in abiotic stresses but also in biotic stresses. The objective is to analyze the sequence and characterize PRT6 gene in tomato.

  19. Determining physical constraints in transcriptional initiationcomplexes using DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Shultzaberger, Ryan K.; Chiang, Derek Y.; Moses, Alan M.; Eisen,Michael B.

    2007-07-01

    Eukaryotic gene expression is often under the control ofcooperatively acting transcription factors whose binding is limited bystructural constraints. By determining these structural constraints, wecan understand the "rules" that define functional cooperativity.Conversely, by understanding the rules of binding, we can inferstructural characteristics. We have developed an information theory basedmethod for approximating the physical limitations of cooperativeinteractions by comparing sequence analysis to microarray expressiondata. When applied to the coordinated binding of the sulfur amino acidregulatory protein Met4 by Cbf1 and Met31, we were able to create acombinatorial model that can correctly identify Met4 regulatedgenes.

  20. Comparative analysis of protein coding sequences from human, mouse and the domesticated pig

    DEFF Research Database (Denmark)

    Jørgensen, Frank Grønlund; Hobolth, Asger; Hornshøj, Henrik

    2005-01-01

    Background: The availability of abundant sequence data from key model organisms has made large scale studies of mulecular evolution an exciting possibility. Here we use full length cDNA alignments comprising more than 700,000 nucleotides from human, mouse, pig and the Japanese pufferfish Fugu rub...... rubrices in order to investigate 1) the relationships between three major lineages of mammals: rodents, artiodactys and primates, and 2) the rate of evolution and the occurrence of positive Darwinian selection using codon based models of sequence evolution. Results: We provide evidence...

  1. Ultra-deep sequencing reveals the subclonal structure and genomic evolution of oral squamous cell carcinoma

    DEFF Research Database (Denmark)

    Tabatabaeifar, Siavosh; Thomassen, Mads; Larsen, Martin Jakob

    Background: Oral squamous cell carcinoma (OSCC), a subgroup of head and neck squamous cell carcinoma (HNSCC), is primarily caused by alcohol consumption and tobacco use. Recent DNA sequencing studies suggests that HNSCC are very heterogeneous between patients; however the intra-patient subclonal...

  2. Identifying Learning Behaviors by Contextualizing Differential Sequence Mining with Action Features and Performance Evolution

    Science.gov (United States)

    Kinnebrew, John S.; Biswas, Gautam

    2012-01-01

    Our learning-by-teaching environment, Betty's Brain, captures a wealth of data on students' learning interactions as they teach a virtual agent. This paper extends an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs sequence mining techniques to…

  3. Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution

    OpenAIRE

    Kuangyu Wang; Shuhui Yu; Xiang Ji; Clemens Lakner; Alexander Griffing; Jeffrey L. Thorne

    2015-01-01

    Models of protein evolution tend to ignore functional constraints, although structural constraints are sometimes incorporated. Here we propose a probabilistic framework for codon substitution that evaluates joint effects of relative solvent accessibility (RSA), a structural constraint; and gene expression, a functional constraint. First, we explore the relationship between RSA and codon usage at the genomic scale as well as at the individual gene scale. Motivated by these results, we construc...

  4. Cambro-ordovician sea-level fluctuations and sequence boundaries: The missing record and the evolution of new taxa

    Science.gov (United States)

    Lehnert, O.; Miller, J.F.; Leslie, Stephen A.; Repetski, J.E.; Ethington, Raymond L.

    2005-01-01

    The evolution of early Palaeozoic conodont faunas shows a clear connection to sea-level changes. One way that this connection manifests itself is that thick successions of carbonates are missing beneath major sequence boundaries due to karstification and erosion. From this observation arises the question of how many taxa have been lost from different conodont lineages in these incomplete successions. Although many taxa suffered extinction due to the environmental stresses associated with falling sea-levels, some must have survived in these extreme conditions. The number of taxa missing in the early Palaeozoic tropics always will be unclear, but it will be even more difficult to evaluate the missing record in detrital successions of higher latitudes. A common pattern in the evolution of Cambrian-Ordovician conodont lineages is appearances of new species at sea-level rises and disappearances at sea-level drops. This simple picture can be complicated by intervals that consistently have no representatives of a particular lineage, even after extensive sampling of the most complete sections. Presumably the lineages survived in undocumented refugia. In this paper, we give examples of evolution in Cambrian-Ordovician shallowmarine conodont faunas and highlight problems of undiscovered or truly missing segments of lineages. ?? The Palaeontological Association.

  5. Comparative Analysis of Gene Expression for Convergent Evolution of Camera Eye Between Octopus and Human

    Science.gov (United States)

    Ogura, Atsushi; Ikeo, Kazuho; Gojobori, Takashi

    2004-01-01

    Although the camera eye of the octopus is very similar to that of humans, phylogenetic and embryological analyses have suggested that their camera eyes have been acquired independently. It has been known as a typical example of convergent evolution. To study the molecular basis of convergent evolution of camera eyes, we conducted a comparative analysis of gene expression in octopus and human camera eyes. We sequenced 16,432 ESTs of the octopus eye, leading to 1052 nonredundant genes that have matches in the protein database. Comparing these 1052 genes with 13,303 already-known ESTs of the human eye, 729 (69.3%) genes were commonly expressed between the human and octopus eyes. On the contrary, when we compared octopus eye ESTs with human connective tissue ESTs, the expression similarity was quite low. To trace the evolutionary changes that are potentially responsible for camera eye formation, we also compared octopus-eye ESTs with the completed genome sequences of other organisms. We found that 1019 out of the 1052 genes had already existed at the common ancestor of bilateria, and 875 genes were conserved between humans and octopuses. It suggests that a larger number of conserved genes and their similar gene expression may be responsible for the convergent evolution of the camera eye. PMID:15289475

  6. Streaming support for data intensive cloud-based sequence analysis.

    Science.gov (United States)

    Issa, Shadi A; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of "resources-on-demand" and "pay-as-you-go", scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

  7. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Shadi A. Issa

    2013-01-01

    Full Text Available Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client’s site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

  8. Next-generation sequence analysis of cancer xenograft models.

    Directory of Open Access Journals (Sweden)

    Fernando J Rossello

    Full Text Available Next-generation sequencing (NGS studies in cancer are limited by the amount, quality and purity of tissue samples. In this situation, primary xenografts have proven useful preclinical models. However, the presence of mouse-derived stromal cells represents a technical challenge to their use in NGS studies. We examined this problem in an established primary xenograft model of small cell lung cancer (SCLC, a malignancy often diagnosed from small biopsy or needle aspirate samples. Using an in silico strategy that assign reads according to species-of-origin, we prospectively compared NGS data from primary xenograft models with matched cell lines and with published datasets. We show here that low-coverage whole-genome analysis demonstrated remarkable concordance between published genome data and internal controls, despite the presence of mouse genomic DNA. Exome capture sequencing revealed that this enrichment procedure was highly species-specific, with less than 4% of reads aligning to the mouse genome. Human-specific expression profiling with RNA-Seq replicated array-based gene expression experiments, whereas mouse-specific transcript profiles correlated with published datasets from human cancer stroma. We conclude that primary xenografts represent a useful platform for complex NGS analysis in cancer research for tumours with limited sample resources, or those with prominent stromal cell populations.

  9. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Science.gov (United States)

    Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461

  10. Extended -Regular Sequence for Automated Analysis of Microarray Images

    Directory of Open Access Journals (Sweden)

    Jin Hee-Jeong

    2006-01-01

    Full Text Available Microarray study enables us to obtain hundreds of thousands of expressions of genes or genotypes at once, and it is an indispensable technology for genome research. The first step is the analysis of scanned microarray images. This is the most important procedure for obtaining biologically reliable data. Currently most microarray image processing systems require burdensome manual block/spot indexing work. Since the amount of experimental data is increasing very quickly, automated microarray image analysis software becomes important. In this paper, we propose two automated methods for analyzing microarray images. First, we propose the extended -regular sequence to index blocks and spots, which enables a novel automatic gridding procedure. Second, we provide a methodology, hierarchical metagrid alignment, to allow reliable and efficient batch processing for a set of microarray images. Experimental results show that the proposed methods are more reliable and convenient than the commercial tools.

  11. Genome sequence of the hot pepper provides insights into the evolution of pungency in Capsicum species.

    Science.gov (United States)

    Kim, Seungill; Park, Minkyu; Yeom, Seon-In; Kim, Yong-Min; Lee, Je Min; Lee, Hyun-Ah; Seo, Eunyoung; Choi, Jaeyoung; Cheong, Kyeongchae; Kim, Ki-Tae; Jung, Kyongyong; Lee, Gir-Won; Oh, Sang-Keun; Bae, Chungyun; Kim, Saet-Byul; Lee, Hye-Young; Kim, Shin-Young; Kim, Myung-Shin; Kang, Byoung-Cheorl; Jo, Yeong Deuk; Yang, Hee-Bum; Jeong, Hee-Jin; Kang, Won-Hee; Kwon, Jin-Kyung; Shin, Chanseok; Lim, Jae Yun; Park, June Hyun; Huh, Jin Hoe; Kim, June-Sik; Kim, Byung-Dong; Cohen, Oded; Paran, Ilan; Suh, Mi Chung; Lee, Saet Buyl; Kim, Yeon-Ki; Shin, Younhee; Noh, Seung-Jae; Park, Junhyung; Seo, Young Sam; Kwon, Suk-Yoon; Kim, Hyun A; Park, Jeong Mee; Kim, Hyun-Jin; Choi, Sang-Bong; Bosland, Paul W; Reeves, Gregory; Jo, Sung-Hwan; Lee, Bong-Woo; Cho, Hyung-Taeg; Choi, Hee-Seung; Lee, Min-Soo; Yu, Yeisoo; Do Choi, Yang; Park, Beom-Seok; van Deynze, Allen; Ashrafi, Hamid; Hill, Theresa; Kim, Woo Taek; Pai, Hyun-Sook; Ahn, Hee Kyung; Yeam, Inhwa; Giovannoni, James J; Rose, Jocelyn K C; Sørensen, Iben; Lee, Sang-Jik; Kim, Ryan W; Choi, Ik-Young; Choi, Beom-Soon; Lim, Jong-Sung; Lee, Yong-Hwan; Choi, Doil

    2014-03-01

    Hot pepper (Capsicum annuum), one of the oldest domesticated crops in the Americas, is the most widely grown spice crop in the world. We report whole-genome sequencing and assembly of the hot pepper (Mexican landrace of Capsicum annuum cv. CM334) at 186.6× coverage. We also report resequencing of two cultivated peppers and de novo sequencing of the wild species Capsicum chinense. The genome size of the hot pepper was approximately fourfold larger than that of its close relative tomato, and the genome showed an accumulation of Gypsy and Caulimoviridae family elements. Integrative genomic and transcriptomic analyses suggested that change in gene expression and neofunctionalization of capsaicin synthase have shaped capsaicinoid biosynthesis. We found differential molecular patterns of ripening regulators and ethylene synthesis in hot pepper and tomato. The reference genome will serve as a platform for improving the nutritional and medicinal values of Capsicum species.

  12. Sequencing of the sea lamprey (Petromyzon marinus) genome provides insights into vertebrate evolution

    Science.gov (United States)

    Smith, Jeramiah J; Kuraku, Shigehiro; Holt, Carson; Sauka-Spengler, Tatjana; Jiang, Ning; Campbell, Michael S; Yandell, Mark D; Manousaki, Tereza; Meyer, Axel; Bloom, Ona E; Morgan, Jennifer R; Buxbaum, Joseph D; Sachidanandam, Ravi; Sims, Carrie; Garruss, Alexander S; Cook, Malcolm; Krumlauf, Robb; Wiedemann, Leanne M; Sower, Stacia A; Decatur, Wayne A; Hall, Jeffrey A; Amemiya, Chris T; Saha, Nil R; Buckley, Katherine M; Rast, Jonathan P; Das, Sabyasachi; Hirano, Masayuki; McCurley, Nathanael; Guo, Peng; Rohner, Nicolas; Tabin, Clifford J; Piccinelli, Paul; Elgar, Greg; Ruffier, Magali; Aken, Bronwen L; Searle, Stephen MJ; Muffato, Matthieu; Pignatelli, Miguel; Herrero, Javier; Jones, Matthew; Brown, C Titus; Chung-Davidson, Yu-Wen; Nanlohy, Kaben G; Libants, Scot V; Yeh, Chu-Yin; McCauley, David W; Langeland, James A; Pancer, Zeev; Fritzsch, Bernd; de Jong, Pieter J; Zhu, Baoli; Fulton, Lucinda L; Theising, Brenda; Flicek, Paul; Bronner, Marianne E; Warren, Wesley C; Clifton, Sandra W; Wilson, Richard K; Li, Weiming

    2013-01-01

    Lampreys are representatives of an ancient vertebrate lineage that diverged from our own ~500 million years ago. By virtue of this deeply shared ancestry, the sea lamprey (P. marinus) genome is uniquely poised to provide insight into the ancestry of vertebrate genomes and the underlying principles of vertebrate biology. Here, we present the first lamprey whole-genome sequence and assembly. We note challenges faced owing to its high content of repetitive elements and GC bases, as well as the absence of broad-scale sequence information from closely related species. Analyses of the assembly indicate that two whole-genome duplications likely occurred before the divergence of ancestral lamprey and gnathostome lineages. Moreover, the results help define key evolutionary events within vertebrate lineages, including the origin of myelin-associated proteins and the development of appendages. The lamprey genome provides an important resource for reconstructing vertebrate origins and the evolutionary events that have shaped the genomes of extant organisms. PMID:23435085

  13. Insights into the evolution of mitochondrial genome size from complete sequences of Citrullus lanatus and Cucurbita pepo (Cucurbitaceae).

    Science.gov (United States)

    Alverson, Andrew J; Wei, XiaoXin; Rice, Danny W; Stern, David B; Barry, Kerrie; Palmer, Jeffrey D

    2010-06-01

    The mitochondrial genomes of seed plants are unusually large and vary in size by at least an order of magnitude. Much of this variation occurs within a single family, the Cucurbitaceae, whose genomes range from an estimated 390 to 2,900 kb in size. We sequenced the mitochondrial genomes of Citrullus lanatus (watermelon: 379,236 nt) and Cucurbita pepo (zucchini: 982,833 nt)--the two smallest characterized cucurbit mitochondrial genomes--and determined their RNA editing content. The relatively compact Citrullus mitochondrial genome actually contains more and longer genes and introns, longer segmental duplications, and more discernibly nuclear-derived DNA. The large size of the Cucurbita mitochondrial genome reflects the accumulation of unprecedented amounts of both chloroplast sequences (>113 kb) and short repeated sequences (>370 kb). A low mutation rate has been hypothesized to underlie increases in both genome size and RNA editing frequency in plant mitochondria. However, despite its much larger genome, Cucurbita has a significantly higher synonymous substitution rate (and presumably mutation rate) than Citrullus but comparable levels of RNA editing. The evolution of mutation rate, genome size, and RNA editing are apparently decoupled in Cucurbitaceae, reflecting either simple stochastic variation or governance by different factors.

  14. Comparative analysis of complete chloroplast genome sequence and inversion variation in Lasthenia burkei (Madieae, Asteraceae).

    Science.gov (United States)

    Walker, Joseph F; Zanis, Michael J; Emery, Nancy C

    2014-04-01

    Complete chloroplast genome studies can help resolve relationships among large, complex plant lineages such as Asteraceae. We present the first whole plastome from the Madieae tribe and compare its sequence variation to other chloroplast genomes in Asteraceae. We used high throughput sequencing to obtain the Lasthenia burkei chloroplast genome. We compared sequence structure and rates of molecular evolution in the small single copy (SSC), large single copy (LSC), and inverted repeat (IR) regions to those for eight Asteraceae accessions and one Solanaceae accession. The chloroplast sequence of L. burkei is 150 746 bp and contains 81 unique protein coding genes and 4 coding ribosomal RNA sequences. We identified three major inversions in the L. burkei chloroplast, all of which have been found in other Asteraceae lineages, and a previously unreported inversion in Lactuca sativa. Regions flanking inversions contained tRNA sequences, but did not have particularly high G + C content. Substitution rates varied among the SSC, LSC, and IR regions, and rates of evolution within each region varied among species. Some observed differences in rates of molecular evolution may be explained by the relative proportion of coding to noncoding sequence within regions. Rates of molecular evolution vary substantially within and among chloroplast genomes, and major inversion events may be promoted by the presence of tRNAs. Collectively, these results provide insight into different mechanisms that may promote intramolecular recombination and the inversion of large genomic regions in the plastome.

  15. Biodiversity Meets Neuroscience: From the Sequencing Ship (Ship-Seq) to Deciphering Parallel Evolution of Neural Systems in Omic's Era.

    Science.gov (United States)

    Moroz, Leonid L

    2015-12-01

    The origins of neural systems and centralized brains are one of the major transitions in evolution. These events might occur more than once over 570-600 million years. The convergent evolution of neural circuits is evident from a diversity of unique adaptive strategies implemented by ctenophores, cnidarians, acoels, molluscs, and basal deuterostomes. But, further integration of biodiversity research and neuroscience is required to decipher critical events leading to development of complex integrative and cognitive functions. Here, we outline reference species and interdisciplinary approaches in reconstructing the evolution of nervous systems. In the "omic" era, it is now possible to establish fully functional genomics laboratories aboard of oceanic ships and perform sequencing and real-time analyses of data at any oceanic location (named here as Ship-Seq). In doing so, fragile, rare, cryptic, and planktonic organisms, or even entire marine ecosystems, are becoming accessible directly to experimental and physiological analyses by modern analytical tools. Thus, we are now in a position to take full advantages from countless "experiments" Nature performed for us in the course of 3.5 billion years of biological evolution. Together with progress in computational and comparative genomics, evolutionary neuroscience, proteomic and developmental biology, a new surprising picture is emerging that reveals many ways of how nervous systems evolved. As a result, this symposium provides a unique opportunity to revisit old questions about the origins of biological complexity. © The Author 2015. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.

  16. Dynamic Evolution of Pathogenicity Revealed by Sequencing and Comparative Genomics of 19 Pseudomonas syringae Isolates

    Science.gov (United States)

    Romanchuk, Artur; Chang, Jeff H.; Mukhtar, M. Shahid; Cherkis, Karen; Roach, Jeff; Grant, Sarah R.; Jones, Corbin D.; Dangl, Jeffery L.

    2011-01-01

    Closely related pathogens may differ dramatically in host range, but the molecular, genetic, and evolutionary basis for these differences remains unclear. In many Gram- negative bacteria, including the phytopathogen Pseudomonas syringae, type III effectors (TTEs) are essential for pathogenicity, instrumental in structuring host range, and exhibit wide diversity between strains. To capture the dynamic nature of virulence gene repertoires across P. syringae, we screened 11 diverse strains for novel TTE families and coupled this nearly saturating screen with the sequencing and assembly of 14 phylogenetically diverse isolates from a broad collection of diseased host plants. TTE repertoires vary dramatically in size and content across all P. syringae clades; surprisingly few TTEs are conserved and present in all strains. Those that are likely provide basal requirements for pathogenicity. We demonstrate that functional divergence within one conserved locus, hopM1, leads to dramatic differences in pathogenicity, and we demonstrate that phylogenetics-informed mutagenesis can be used to identify functionally critical residues of TTEs. The dynamism of the TTE repertoire is mirrored by diversity in pathways affecting the synthesis of secreted phytotoxins, highlighting the likely role of both types of virulence factors in determination of host range. We used these 14 draft genome sequences, plus five additional genome sequences previously reported, to identify the core genome for P. syringae and we compared this core to that of two closely related non-pathogenic pseudomonad species. These data revealed the recent acquisition of a 1 Mb megaplasmid by a sub-clade of cucumber pathogens. This megaplasmid encodes a type IV secretion system and a diverse set of unknown proteins, which dramatically increases both the genomic content of these strains and the pan-genome of the species. PMID:21799664

  17. Sequencing Infrastructure Investments under Deep Uncertainty Using Real Options Analysis

    Directory of Open Access Journals (Sweden)

    Nishtha Manocha

    2018-02-01

    Full Text Available The adaptation tipping point and adaptation pathway approach developed to make decisions under deep uncertainty do not shed light on which among the multiple available pathways should be chosen as the preferred pathway. This creates the need to extend these approaches by means of suitable tools that can help sequence actions and subsequently enable the outlining of relevant policies. This paper presents two sequencing approaches, namely, the “Build to Target” and “Build Up” approach, to aid in sub-selecting a set of preferred pathways. Both approaches differ in the levels of flexibility they offer. They are exemplified by means of two case studies wherein the Net Present Valuation and the Real Options Analysis are employed as selection criterions. The results demonstrate the benefit of these two approaches when used in conjunction with the adaptation pathways and show how the pathways selected by means of a Build to Target approach generally have a value greater than, or at least the same as, the pathways selected by the Build Up approach. Further, this paper also demonstrates the capacity of Real Options to quantify and capture the economic value of flexibility, which cannot be done by traditional valuation approaches such as Net Present Valuation.

  18. Evolution analysis of the states of the EZ model

    International Nuclear Information System (INIS)

    Qing-Hua, Chen; Yi-Ming, Ding; Hong-Guang, Dong

    2009-01-01

    Based on suitable choice of states, this paper studies the stability of the equilibrium state of the EZ model by regarding the evolution of the EZ model as a Markov chain and by showing that the Markov chain is ergodic. The Markov analysis is applied to the EZ model with small number of agents, the exact equilibrium state for N = 5 and numerical results for N = 18 are obtained. (cross-disciplinary physics and related areas of science and technology)

  19. Insights from the Genome Sequence of Mycobacterium lepraemurium: Massive Gene Decay and Reductive Evolution

    Directory of Open Access Journals (Sweden)

    Andrej Benjak

    2017-10-01

    Full Text Available Mycobacterium lepraemurium is the causative agent of murine leprosy, a chronic, granulomatous disease similar to human leprosy. Due to the similar clinical manifestations of human and murine leprosy and the difficulty of growing both bacilli axenically, Mycobacterium leprae and M. lepraemurium were once thought to be closely related, although it was later suggested that M. lepraemurium might be related to Mycobacterium avium. In this study, the complete genome of M. lepraemurium was sequenced using a combination of PacBio and Illumina sequencing. Phylogenomic analyses confirmed that M. lepraemurium is a distinct species within the M. avium complex (MAC. The M. lepraemurium genome is 4.05 Mb in length, which is considerably smaller than other MAC genomes, and it comprises 2,682 functional genes and 1,139 pseudogenes, which indicates that M. lepraemurium has undergone genome reduction. An error-prone repair homologue of the DNA polymerase III α-subunit was found to be nonfunctional in M. lepraemurium, which might contribute to pseudogene formation due to the accumulation of mutations in nonessential genes. M. lepraemurium has retained the functionality of several genes thought to influence virulence among members of the MAC.

  20. Overview of the creative genome: effects of genome structure and sequence on the generation of variation and evolution.

    Science.gov (United States)

    Caporale, Lynn Helena

    2012-09-01

    This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.

  1. Nonlinear analysis of sequence repeats of multi-domain proteins

    Energy Technology Data Exchange (ETDEWEB)

    Huang Yanzhao [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Li Mingfeng [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Xiao Yi [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China)]. E-mail: lmf_bill@sina.com

    2007-11-15

    Many multi-domain proteins have repetitive three-dimensional structures but nearly-random amino acid sequences. In the present paper, by using a modified recurrence plot proposed by us previously, we show that these amino acid sequences have hidden repetitions in fact. These results indicate that the repetitive domain structures are encoded by the repetitive sequences. This also gives a method to detect the repetitive domain structures directly from amino acid sequences.

  2. Human factors review for Severe Accident Sequence Analysis (SASA)

    International Nuclear Information System (INIS)

    Krois, P.A.; Haas, P.M.; Manning, J.J.; Bovell, C.R.

    1984-01-01

    The paper will discuss work being conducted during this human factors review including: (1) support of the Severe Accident Sequence Analysis (SASA) Program based on an assessment of operator actions, and (2) development of a descriptive model of operator severe accident management. Research by SASA analysts on the Browns Ferry Unit One (BF1) anticipated transient without scram (ATWS) was supported through a concurrent assessment of operator performance to demonstrate contributions to SASA analyses from human factors data and methods. A descriptive model was developed called the Function Oriented Accident Management (FOAM) model, which serves as a structure for bridging human factors, operations, and engineering expertise and which is useful for identifying needs/deficiencies in the area of accident management. The assessment of human factors issues related to ATWS required extensive coordination with SASA analysts. The analysis was consolidated primarily to six operator actions identified in the Emergency Procedure Guidelines (EPGs) as being the most critical to the accident sequence. These actions were assessed through simulator exercises, qualitative reviews, and quantitative human reliability analyses. The FOAM descriptive model assumes as a starting point that multiple operator/system failures exceed the scope of procedures and necessitates a knowledge-based emergency response by the operators. The FOAM model provides a functionally-oriented structure for assembling human factors, operations, and engineering data and expertise into operator guidance for unconventional emergency responses to mitigate severe accident progression and avoid/minimize core degradation. Operators must also respond to potential radiological release beyond plant protective barriers. Research needs in accident management and potential uses of the FOAM model are described. 11 references, 1 figure

  3. Sequence analysis of cereal sucrose synthase genes and isolation ...

    African Journals Online (AJOL)

    SERVER

    2007-10-18

    Oct 18, 2007 ... sequencing of sucrose synthase gene fragment from sor- ghum using primers designed at their conserved exons. MATERIALS AND METHODS. Multiple sequence alignment. Sucrose synthase gene sequences of various cereals like rice, maize, and barley were accessed from NCBI Genbank database.

  4. Chimera: construction of chimeric sequences for phylogenetic analysis

    NARCIS (Netherlands)

    Leunissen, J.A.M.

    2003-01-01

    Chimera allows the construction of chimeric protein or nucleic acid sequence files by concatenating sequences from two or more sequence files in PHYLIP formats. It allows the user to interactively select genes and species from the input files. The concatenated result is stored to one single output

  5. Analysis of the Science and Technology Preservice Teachers' Opinions on Teaching Evolution and Theory of Evolution

    Science.gov (United States)

    Töman, Ufuk; Karatas, Faik Özgür; Çimer, Sabiha Odabasi

    2014-01-01

    In this study, we investigate of science and technology teachers' opinions about the theory of evolution and the evolution teaching. The aim of this study, we investigate of science and technology teachers' opinions about the theory of evolution and the evolution teaching. This study is a descriptive study. Open-ended questions were used to…

  6. Accident Sequence Evaluation Program: Human reliability analysis procedure

    Energy Technology Data Exchange (ETDEWEB)

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs.

  7. Accident Sequence Evaluation Program: Human reliability analysis procedure

    International Nuclear Information System (INIS)

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs

  8. A Quantitative Accident Sequence Analysis for a VHTR

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Jintae; Lee, Joeun; Jae, Moosung [Hanyang University, Seoul (Korea, Republic of)

    2016-05-15

    In Korea, the basic design features of VHTR are currently discussed in the various design concepts. Probabilistic risk assessment (PRA) offers a logical and structured method to assess risks of a large and complex engineered system, such as a nuclear power plant. It will be introduced at an early stage in the design, and will be upgraded at various design and licensing stages as the design matures and the design details are defined. Risk insights to be developed from the PRA are viewed as essential to developing a design that is optimized in meeting safety objectives and in interpreting the applicability of the existing demands to the safety design approach of the VHTR. In this study, initiating events which may occur in VHTRs were selected through MLD method. The initiating events were then grouped into four categories for the accident sequence analysis. Initiating events frequency and safety systems failure rate were calculated by using reliability data obtained from the available sources and fault tree analysis. After quantification, uncertainty analysis was conducted. The SR and LR frequency are calculated respectively 7.52E- 10/RY and 7.91E-16/RY, which are relatively less than the core damage frequency of LWRs.

  9. Evolution of early life inferred from protein and ribonucleic acid sequences

    Science.gov (United States)

    Dayhoff, M. O.; Schwartz, R. M.

    1978-01-01

    The chemical structures of ferredoxin, 5S ribosomal RNA, and c-type cytochrome sequences have been employed to construct a phylogenetic tree which connects all major photosynthesizing organisms: the three types of bacteria, blue-green algae, and chloroplasts. Anaerobic and aerobic bacteria, eukaryotic cytoplasmic components and mitochondria are also included in the phylogenetic tree. Anaerobic nonphotosynthesizing bacteria similar to Clostridium were the earliest organisms, arising more than 3.2 billion years ago. Bacterial photosynthesis evolved nearly 3.0 billion years ago, while oxygen-evolving photosynthesis, originating in the blue-green algal line, came into being about 2.0 billion years ago. The phylogenetic tree supports the symbiotic theory of the origin of eukaryotes.

  10. From Sequence and Forces to Structure, Function and Evolution of Intrinsically Disordered Proteins

    Science.gov (United States)

    Forman-Kay, Julie D.; Mittag, Tanja

    2015-01-01

    Intrinsically disordered proteins (IDPs), which lack persistent structure, are a challenge to structural biology due to the inapplicability of standard methods for characterization of folded proteins as well as their deviation from the dominant structure/function paradigm. Their widespread presence and involvement in biological function, however, has spurred the growing acceptance of the importance of IDPs and the development of new tools for studying their structure, dynamics and function. The interplay of folded and disordered domains or regions for function and the existence of a continuum of protein states with respect to conformational energetics, motional timescales and compactness is shaping a unified understanding of structure-dynamics-disorder/function relationships. On the 20th anniversary of this journal, Structure, we provide a historical perspective on the investigation of IDPs and summarize the sequence features and physical forces that underlie their unique structural, functional and evolutionary properties. PMID:24010708

  11. Phylogeny and evolution of the auks (subfamily Alcinae) based on mitochondrial DNA sequences

    Science.gov (United States)

    Moum, Truls; Johansen, Steinar; Erikstad, Kjell Einar; Piatt, John F.

    1994-01-01

    The genetic divergence and phylogeny of the auks was assessed by mitochondrial DNA sequence comparisons in a study using 19 of the 22 auk species and two outgroup representatives. We compared more than 500 nucleotides from each of two mitochondrial genes encoding 12S rRNA and the NADH dehydrogenase subunit 6. Divergence times were estimated from transversional substitutions. The dovekie (Alle alle) is related to the razorbill (Alca torda) and the murres (Uria spp). Furthermore, the Xantus's murrelet (Synthliboramphus hypoleucus) and the ancient (Synthliboramphus antiquus) and Japanese murrelets (Synthliboramphus wumizusume) are genetically distinct members of the same main lineage, whereas brachyramphine and synthliboramphine murrelets are not closely related. An early adaptive radiation of six main species groups of auks seems to trace back to Middle Miocene. Later speciation probably involved ecological differentiations and geographical isolations.

  12. Insights into hominin phenotypic and dietary evolution from ancient DNA sequence data.

    Science.gov (United States)

    Perry, George H; Kistler, Logan; Kelaita, Mary A; Sams, Aaron J

    2015-02-01

    Nuclear genome sequence data from Neandertals, Denisovans, and archaic anatomically modern humans can be used to complement our understanding of hominin evolutionary biology and ecology through i) direct inference of archaic hominin phenotypes, ii) indirect inference of those phenotypes by identifying the effects of previously-introgressed alleles still present among modern humans, or iii) determining the evolutionary timing of relevant hominin-specific genetic changes. Here we review and reanalyze published Neandertal and Denisovan genome sequence data to illustrate an example of the third approach. Specifically, we infer the timing of five human gene presence/absence changes that may be related to particular hominin-specific dietary changes and discuss these results in the context of our broader reconstructions of hominin evolutionary ecology. We show that pseudogenizing (gene loss) mutations in the TAS2R62 and TAS2R64 bitter taste receptor genes and the MYH16 masticatory myosin gene occurred after the hominin-chimpanzee divergence but before the divergence of the human and Neandertal/Denisovan lineages. The absence of a functional MYH16 protein may explain our relatively reduced jaw muscles; this gene loss may have followed the adoption of cooking behavior. In contrast, salivary amylase gene (AMY1) duplications were not observed in the Neandertal and Denisovan genomes, suggesting a relatively recent origin for the AMY1 copy number gains that are observed in modern humans. Thus, if earlier hominins were consuming large quantities of starch-rich underground storage organs, as previously hypothesized, then they were likely doing so without the digestive benefits of increased salivary amylase production. Our most surprising result was the observation of a heterozygous mutation in the first codon of the TAS2R38 bitter taste receptor gene in the Neandertal individual, which likely would have resulted in a non-functional protein and inter-individual PTC

  13. Evolution of MHC class I genes in the endangered loggerhead sea turtle (Caretta caretta) revealed by 454 amplicon sequencing.

    Science.gov (United States)

    Stiebens, Victor A; Merino, Sonia E; Chain, Frédéric J J; Eizaguirre, Christophe

    2013-04-30

    In evolutionary and conservation biology, parasitism is often highlighted as a major selective pressure. To fight against parasites and pathogens, genetic diversity of the immune genes of the major histocompatibility complex (MHC) are particularly important. However, the extensive degree of polymorphism observed in these genes makes it difficult to conduct thorough population screenings. We utilized a genotyping protocol that uses 454 amplicon sequencing to characterize the MHC class I in the endangered loggerhead sea turtle (Caretta caretta) and to investigate their evolution at multiple relevant levels of organization. MHC class I genes revealed signatures of trans-species polymorphism across several reptile species. In the studied loggerhead turtle individuals, it results in the maintenance of two ancient allelic lineages. We also found that individuals carrying an intermediate number of MHC class I alleles are larger than those with either a low or high number of alleles. Multiple modes of evolution seem to maintain MHC diversity in the loggerhead turtles, with relatively high polymorphism for an endangered species.

  14. Large-Scale Analysis Exploring Evolution of Catalytic Machineries and Mechanisms in Enzyme Superfamilies.

    Science.gov (United States)

    Furnham, Nicholas; Dawson, Natalie L; Rahman, Syed A; Thornton, Janet M; Orengo, Christine A

    2016-01-29

    Enzymes, as biological catalysts, form the basis of all forms of life. How these proteins have evolved their functions remains a fundamental question in biology. Over 100 years of detailed biochemistry studies, combined with the large volumes of sequence and protein structural data now available, means that we are able to perform large-scale analyses to address this question. Using a range of computational tools and resources, we have compiled information on all experimentally annotated changes in enzyme function within 379 structurally defined protein domain superfamilies, linking the changes observed in functions during evolution to changes in reaction chemistry. Many superfamilies show changes in function at some level, although one function often dominates one superfamily. We use quantitative measures of changes in reaction chemistry to reveal the various types of chemical changes occurring during evolution and to exemplify these by detailed examples. Additionally, we use structural information of the enzymes active site to examine how different superfamilies have changed their catalytic machinery during evolution. Some superfamilies have changed the reactions they perform without changing catalytic machinery. In others, large changes of enzyme function, in terms of both overall chemistry and substrate specificity, have been brought about by significant changes in catalytic machinery. Interestingly, in some superfamilies, relatives perform similar functions but with different catalytic machineries. This analysis highlights characteristics of functional evolution across a wide range of superfamilies, providing insights that will be useful in predicting the function of uncharacterised sequences and the design of new synthetic enzymes. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  15. Comparing methods of classifying life courses: Sequence analysis and latent class analysis

    NARCIS (Netherlands)

    Elzinga, C.H.; Liefbroer, Aart C.; Han, Sapphire

    2017-01-01

    We compare life course typology solutions generated by sequence analysis (SA) and latent class analysis (LCA). First, we construct an analytic protocol to arrive at typology solutions for both methodologies and present methods to compare the empirical quality of alternative typologies. We apply this

  16. Comparing methods of classifying life courses: sequence analysis and latent class analysis

    NARCIS (Netherlands)

    Han, Y.; Liefbroer, A.C.; Elzinga, C.

    2017-01-01

    We compare life course typology solutions generated by sequence analysis (SA) and latent class analysis (LCA). First, we construct an analytic protocol to arrive at typology solutions for both methodologies and present methods to compare the empirical quality of alternative typologies. We apply this

  17. Complete plastome sequences of Equisetum arvense and Isoetes flaccida: implications for phylogeny and plastid genome evolution of early land plant lineages

    OpenAIRE

    Karol, Kenneth G; Arumuganathan, Kathiravetpillai; Boore, Jeffrey L; Duffy, Aaron M; Everett, Karin DE; Hall, John D; Hansen, S Kellon; Kuehl, Jennifer V; Mandoli, Dina F; Mishler, Brent D; Olmstead, Richard G; Renzaglia, Karen S; Wolf, Paul G

    2010-01-01

    Abstract Background Despite considerable progress in our understanding of land plant phylogeny, several nodes in the green tree of life remain poorly resolved. Furthermore, the bulk of currently available data come from only a subset of major land plant clades. Here we examine early land plant evolution using complete plastome sequences including two previously unexamined and phylogenetically critical lineages. To better understand the evolution of land plants and their plastomes, we examined...

  18. Analysis of horse genomes provides insight into the diversification and adaptive evolution of karyotype.

    Science.gov (United States)

    Huang, Jinlong; Zhao, Yiping; Shiraigol, Wunierfu; Li, Bei; Bai, Dongyi; Ye, Weixing; Daidiikhuu, Dorjsuren; Yang, Lihua; Jin, Burenqiqige; Zhao, Qinan; Gao, Yahan; Wu, Jing; Bao, Wuyundalai; Li, Anaer; Zhang, Yuhong; Han, Haige; Bai, Haitang; Bao, Yanqing; Zhao, Lele; Zhai, Zhengxiao; Zhao, Wenjing; Sun, Zikui; Zhang, Yan; Meng, He; Dugarjaviin, Manglai

    2014-05-14

    Karyotypic diversification is more prominent in Equus species than in other mammals. Here, using next generation sequencing technology, we generated and de novo assembled quality genomes sequences for a male wild horse (Przewalski's horse) and a male domestic horse (Mongolian horse), with about 93-fold and 91-fold coverage, respectively. Portion of Y chromosome from wild horse assemblies (3 M bp) and Mongolian horse (2 M bp) were also sequenced and de novo assembled. We confirmed a Robertsonian translocation event through the wild horse's chromosomes 23 and 24, which contained sequences that were highly homologous with those on the domestic horse's chromosome 5. The four main types of rearrangement, insertion of unknown origin, inserted duplication, inversion, and relocation, are not evenly distributed on all the chromosomes, and some chromosomes, such as the X chromosome, contain more rearrangements than others, and the number of inversions is far less than the number of insertions and relocations in the horse genome. Furthermore, we discovered the percentages of LINE_L1 and LTR_ERV1 are significantly increased in rearrangement regions. The analysis results of the two representative Equus species genomes improved our knowledge of Equus chromosome rearrangement and karyotype evolution.

  19. Comparative analysis of mycobacterium and related actinomycetes yields insight into the evolution of mycobacterium tuberculosis pathogenesis

    Directory of Open Access Journals (Sweden)

    McGuire Abigail

    2012-03-01

    Full Text Available Abstract Background The sequence of the pathogen Mycobacterium tuberculosis (Mtb strain H37Rv has been available for over a decade, but the biology of the pathogen remains poorly understood. Genome sequences from other Mtb strains and closely related bacteria present an opportunity to apply the power of comparative genomics to understand the evolution of Mtb pathogenesis. We conducted a comparative analysis using 31 genomes from the Tuberculosis Database (TBDB.org, including 8 strains of Mtb and M. bovis, 11 additional Mycobacteria, 4 Corynebacteria, 2 Streptomyces, Rhodococcus jostii RHA1, Nocardia farcinia, Acidothermus cellulolyticus, Rhodobacter sphaeroides, Propionibacterium acnes, and Bifidobacterium longum. Results Our results highlight the functional importance of lipid metabolism and its regulation, and reveal variation between the evolutionary profiles of genes implicated in saturated and unsaturated fatty acid metabolism. It also suggests that DNA repair and molybdopterin cofactors are important in pathogenic Mycobacteria. By analyzing sequence conservation and gene expression data, we identify nearly 400 conserved noncoding regions. These include 37 predicted promoter regulatory motifs, of which 14 correspond to previously validated motifs, as well as 50 potential noncoding RNAs, of which we experimentally confirm the expression of four. Conclusions Our analysis of protein evolution highlights gene families that are associated with the adaptation of environmental Mycobacteria to obligate pathogenesis. These families include fatty acid metabolism, DNA repair, and molybdopterin biosynthesis. Our analysis reinforces recent findings suggesting that small noncoding RNAs are more common in Mycobacteria than previously expected. Our data provide a foundation for understanding the genome and biology of Mtb in a comparative context, and are available online and through TBDB.org.

  20. Comparative analysis of Mycobacterium and related Actinomycetes yields insight into the evolution of Mycobacterium tuberculosis pathogenesis.

    Science.gov (United States)

    McGuire, Abigail Manson; Weiner, Brian; Park, Sang Tae; Wapinski, Ilan; Raman, Sahadevan; Dolganov, Gregory; Peterson, Matthew; Riley, Robert; Zucker, Jeremy; Abeel, Thomas; White, Jared; Sisk, Peter; Stolte, Christian; Koehrsen, Mike; Yamamoto, Robert T; Iacobelli-Martinez, Milena; Kidd, Matthew J; Maer, Andreia M; Schoolnik, Gary K; Regev, Aviv; Galagan, James

    2012-03-28

    The sequence of the pathogen Mycobacterium tuberculosis (Mtb) strain H37Rv has been available for over a decade, but the biology of the pathogen remains poorly understood. Genome sequences from other Mtb strains and closely related bacteria present an opportunity to apply the power of comparative genomics to understand the evolution of Mtb pathogenesis. We conducted a comparative analysis using 31 genomes from the Tuberculosis Database (TBDB.org), including 8 strains of Mtb and M. bovis, 11 additional Mycobacteria, 4 Corynebacteria, 2 Streptomyces, Rhodococcus jostii RHA1, Nocardia farcinia, Acidothermus cellulolyticus, Rhodobacter sphaeroides, Propionibacterium acnes, and Bifidobacterium longum. Our results highlight the functional importance of lipid metabolism and its regulation, and reveal variation between the evolutionary profiles of genes implicated in saturated and unsaturated fatty acid metabolism. It also suggests that DNA repair and molybdopterin cofactors are important in pathogenic Mycobacteria. By analyzing sequence conservation and gene expression data, we identify nearly 400 conserved noncoding regions. These include 37 predicted promoter regulatory motifs, of which 14 correspond to previously validated motifs, as well as 50 potential noncoding RNAs, of which we experimentally confirm the expression of four. Our analysis of protein evolution highlights gene families that are associated with the adaptation of environmental Mycobacteria to obligate pathogenesis. These families include fatty acid metabolism, DNA repair, and molybdopterin biosynthesis. Our analysis reinforces recent findings suggesting that small noncoding RNAs are more common in Mycobacteria than previously expected. Our data provide a foundation for understanding the genome and biology of Mtb in a comparative context, and are available online and through TBDB.org.

  1. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    Science.gov (United States)

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  2. Frame sequences analysis technique of linear objects movement

    Science.gov (United States)

    Oshchepkova, V. Y.; Berg, I. A.; Shchepkin, D. V.; Kopylova, G. V.

    2017-12-01

    Obtaining data by noninvasive methods are often needed in many fields of science and engineering. This is achieved through video recording in various frame rate and light spectra. In doing so quantitative analysis of movement of the objects being studied becomes an important component of the research. This work discusses analysis of motion of linear objects on the two-dimensional plane. The complexity of this problem increases when the frame contains numerous objects whose images may overlap. This study uses a sequence containing 30 frames at the resolution of 62 × 62 pixels and frame rate of 2 Hz. It was required to determine the average velocity of objects motion. This velocity was found as an average velocity for 8-12 objects with the error of 15%. After processing dependencies of the average velocity vs. control parameters were found. The processing was performed in the software environment GMimPro with the subsequent approximation of the data obtained using the Hill equation.

  3. Transcriptome sequencing and positive selected genes analysis of Bombyx mandarina.

    Directory of Open Access Journals (Sweden)

    Tingcai Cheng

    Full Text Available The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG and posterior silk gland (PSG. Three sericin genes (sericin 1, sericin 2, and sericin 3 were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25 were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs and 361 insertion-deletions (INDELs were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research.

  4. Phylogenetic reconstruction of Bantu kinship challenges Main Sequence Theory of human social evolution.

    Science.gov (United States)

    Opie, Christopher; Shultz, Susanne; Atkinson, Quentin D; Currie, Thomas; Mace, Ruth

    2014-12-09

    Kinship provides the fundamental structure of human society: descent determines the inheritance pattern between generations, whereas residence rules govern the location a couple moves to after they marry. In turn, descent and residence patterns determine other key relationships such as alliance, trade, and marriage partners. Hunter-gatherer kinship patterns are viewed as flexible, whereas agricultural societies are thought to have developed much more stable kinship patterns as they expanded during the Holocene. Among the Bantu farmers of sub-Saharan Africa, the ancestral kinship patterns present at the beginning of the expansion are hotly contested, with some arguing for matrilineal and matrilocal patterns, whereas others maintain that any kind of lineality or sex-biased dispersal only emerged much later. Here, we use Bayesian phylogenetic methods to uncover the history of Bantu kinship patterns and trace the interplay between descent and residence systems. The results suggest a number of switches in both descent and residence patterns as Bantu farming spread, but that the first Bantu populations were patrilocal with patrilineal descent. Across the phylogeny, a change in descent triggered a switch away from patrifocal kinship, whereas a change in residence triggered a switch back from matrifocal kinship. These results challenge "Main Sequence Theory," which maintains that changes in residence rules precede change in other social structures. We also indicate the trajectory of kinship change, shedding new light on how this fundamental structure of society developed as farming spread across the globe during the Neolithic.

  5. The Dendrobium catenatum Lindl. genome sequence provides insights into polysaccharide synthase, floral development and adaptive evolution.

    Science.gov (United States)

    Zhang, Guo-Qiang; Xu, Qing; Bian, Chao; Tsai, Wen-Chieh; Yeh, Chuan-Ming; Liu, Ke-Wei; Yoshida, Kouki; Zhang, Liang-Sheng; Chang, Song-Bin; Chen, Fei; Shi, Yu; Su, Yong-Yu; Zhang, Yong-Qiang; Chen, Li-Jun; Yin, Yayi; Lin, Min; Huang, Huixia; Deng, Hua; Wang, Zhi-Wen; Zhu, Shi-Lin; Zhao, Xiang; Deng, Cao; Niu, Shan-Ce; Huang, Jie; Wang, Meina; Liu, Guo-Hui; Yang, Hai-Jun; Xiao, Xin-Ju; Hsiao, Yu-Yun; Wu, Wan-Lin; Chen, You-Yi; Mitsuda, Nobutaka; Ohme-Takagi, Masaru; Luo, Yi-Bo; Van de Peer, Yves; Liu, Zhong-Jian

    2016-01-12

    Orchids make up about 10% of all seed plant species, have great economical value, and are of specific scientific interest because of their renowned flowers and ecological adaptations. Here, we report the first draft genome sequence of a lithophytic orchid, Dendrobium catenatum. We predict 28,910 protein-coding genes, and find evidence of a whole genome duplication shared with Phalaenopsis. We observed the expansion of many resistance-related genes, suggesting a powerful immune system responsible for adaptation to a wide range of ecological niches. We also discovered extensive duplication of genes involved in glucomannan synthase activities, likely related to the synthesis of medicinal polysaccharides. Expansion of MADS-box gene clades ANR1, StMADS11, and MIKC(*), involved in the regulation of development and growth, suggests that these expansions are associated with the astonishing diversity of plant architecture in the genus Dendrobium. On the contrary, members of the type I MADS box gene family are missing, which might explain the loss of the endospermous seed. The findings reported here will be important for future studies into polysaccharide synthesis, adaptations to diverse environments and flower architecture of Orchidaceae.

  6. Cloning, sequencing, and sequence analysis of two novel plasmids from the thermophilic anaerobic bacterium Anaerocellum thermophilum

    DEFF Research Database (Denmark)

    Clausen, Anders; Mikkelsen, Marie Just; Schrøder, I.

    2004-01-01

    The nucleotide sequence of two novel plasmids isolated from the extreme thermophilic anaerobic bacterium Anaerocellum thermophilum DSM6725 (A. thermophilum), growing optimally at 70degreesC, has been determined. pBAS2 was found to be a 3653 bp plasmid with a GC content of 43%, and the sequence re...... with highest similarity to DNA repair protein from Campylobacter jejuni (25% aa). Orf34 showed similarity to sigma factors with highest similarity (28% aa) to the sporulation specific Sigma factor, Sigma 28(K) from Bacillus thuringiensis....

  7. Sequence, structure and function relationships in flaviviruses as assessed by evolutive aspects of its conserved non-structural protein domains.

    Science.gov (United States)

    da Fonseca, Néli José; Lima Afonso, Marcelo Querino; Pedersolli, Natan Gonçalves; de Oliveira, Lucas Carrijo; Andrade, Dhiego Souto; Bleicher, Lucas

    2017-10-28

    Flaviviruses are responsible for serious diseases such as dengue, yellow fever, and zika fever. Their genomes encode a polyprotein which, after cleavage, results in three structural and seven non-structural proteins. Homologous proteins can be studied by conservation and coevolution analysis as detected in multiple sequence alignments, usually reporting positions which are strictly necessary for the structure and/or function of all members in a protein family or which are involved in a specific sub-class feature requiring the coevolution of residue sets. This study provides a complete conservation and coevolution analysis on all flaviviruses non-structural proteins, with results mapped on all well-annotated available sequences. A literature review on the residues found in the analysis enabled us to compile available information on their roles and distribution among different flaviviruses. Also, we provide the mapping of conserved and coevolved residues for all sequences currently in SwissProt as a supplementary material, so that particularities in different viruses can be easily analyzed. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Complete Genome Sequence of Bradyrhizobium sp. S23321: Insights into Symbiosis Evolution in Soil Oligotrophs

    Science.gov (United States)

    Okubo, Takashi; Tsukui, Takahiro; Maita, Hiroko; Okamoto, Shinobu; Oshima, Kenshiro; Fujisawa, Takatomo; Saito, Akihiro; Futamata, Hiroyuki; Hattori, Reiko; Shimomura, Yumi; Haruta, Shin; Morimoto, Sho; Wang, Yong; Sakai, Yoriko; Hattori, Masahira; Aizawa, Shin-ichi; Nagashima, Kenji V. P.; Masuda, Sachiko; Hattori, Tsutomu; Yamashita, Akifumi; Bao, Zhihua; Hayatsu, Masahito; Kajiya-Kanegae, Hiromi; Yoshinaga, Ikuo; Sakamoto, Kazunori; Toyota, Koki; Nakao, Mitsuteru; Kohara, Mitsuyo; Anda, Mizue; Niwa, Rieko; Jung-Hwan, Park; Sameshima-Saito, Reiko; Tokuda, Shin-ichi; Yamamoto, Sumiko; Yamamoto, Syuji; Yokoyama, Tadashi; Akutsu, Tomoko; Nakamura, Yasukazu; Nakahira-Yanaka, Yuka; Hoshino, Yuko Takada; Hirakawa, Hideki; Mitsui, Hisayuki; Terasawa, Kimihiro; Itakura, Manabu; Sato, Shusei; Ikeda-Ohtsubo, Wakako; Sakakura, Natsuko; Kaminuma, Eli; Minamisawa, Kiwamu

    2012-01-01

    Bradyrhizobium sp. S23321 is an oligotrophic bacterium isolated from paddy field soil. Although S23321 is phylogenetically close to Bradyrhizobium japonicum USDA110, a legume symbiont, it is unable to induce root nodules in siratro, a legume often used for testing Nod factor-dependent nodulation. The genome of S23321 is a single circular chromosome, 7,231,841 bp in length, with an average GC content of 64.3%. The genome contains 6,898 potential protein-encoding genes, one set of rRNA genes, and 45 tRNA genes. Comparison of the genome structure between S23321 and USDA110 showed strong colinearity; however, the symbiosis islands present in USDA110 were absent in S23321, whose genome lacked a chaperonin gene cluster (groELS3) for symbiosis regulation found in USDA110. A comparison of sequences around the tRNA-Val gene strongly suggested that S23321 contains an ancestral-type genome that precedes the acquisition of a symbiosis island by horizontal gene transfer. Although S23321 contains a nif (nitrogen fixation) gene cluster, the organization, homology, and phylogeny of the genes in this cluster were more similar to those of photosynthetic bradyrhizobia ORS278 and BTAi1 than to those on the symbiosis island of USDA110. In addition, we found genes encoding a complete photosynthetic system, many ABC transporters for amino acids and oligopeptides, two types (polar and lateral) of flagella, multiple respiratory chains, and a system for lignin monomer catabolism in the S23321 genome. These features suggest that S23321 is able to adapt to a wide range of environments, probably including low-nutrient conditions, with multiple survival strategies in soil and rhizosphere. PMID:22452844

  9. Automatic analysis of the 2015 Gorkha earthquake aftershock sequence.

    Science.gov (United States)

    Baillard, C.; Lyon-Caen, H.; Bollinger, L.; Rietbrock, A.; Letort, J.; Adhikari, L. B.

    2016-12-01

    The Mw 7.8 Gorkha earthquake, that partially ruptured the Main Himalayan Thrust North of Kathmandu on the 25th April 2015, was the largest and most catastrophic earthquake striking Nepal since the great M8.4 1934 earthquake. This mainshock was followed by multiple aftershocks, among them, two notable events that occurred on the 12th May with magnitudes of 7.3 Mw and 6.3 Mw. Due to these recent events it became essential for the authorities and for the scientific community to better evaluate the seismic risk in the region through a detailed analysis of the earthquake catalog, amongst others, the spatio-temporal distribution of the Gorkha aftershock sequence. Here we complement this first study by doing a microseismic study using seismic data coming from the eastern part of the Nepalese Seismological Center network associated to one broadband station in Everest. Our primary goal is to deliver an accurate catalog of the aftershock sequence. Due to the exceptional number of events detected we performed an automatic picking/locating procedure which can be splitted in 4 steps: 1) Coarse picking of the onsets using a classical STA/LTA picker, 2) phase association of picked onsets to detect and declare seismic events, 3) Kurtosis pick refinement around theoretical arrival times to increase picking and location accuracy and, 4) local magnitude calculation based amplitude of waveforms. This procedure is time efficient ( 1 sec/event), reduces considerably the location uncertainties ( 2 to 5 km errors) and increases the number of events detected compared to manual processing. Indeed, the automatic detection rate is 10 times higher than the manual detection rate. By comparing to the USGS catalog we were able to give a new attenuation law to compute local magnitudes in the region. A detailed analysis of the seismicity shows a clear migration toward the east of the region and a sudden decrease of seismicity 100 km east of Kathmandu which may reveal the presence of a tectonic

  10. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  11. Searching for convergent evolution in manganese superoxidase dismutase using hydrophobic cluster analysis

    Directory of Open Access Journals (Sweden)

    Heng Xiang

    2014-06-01

    Full Text Available There are numerous examples of convergent evolution in nature. Major ecological adaptations such as flight, loss of limbs in vertebrates, pesticide resistance, adaptation to a parasitic way of life, etc., have all evolved more than once, as seen by their analogous functions in separate taxa. But what about protein evolution? Does the environment have a strong enough influence on intracellular processes that enzymes and other functional proteins play, to evolve similar functional roles separately in different organisms? Manganese Superoxide Dismutase (MnSOD is a manganesedependant metallo-enzyme which plays a crucial role in protecting cells from anti-oxidative stress by eliminating reactive (superoxide oxygen species. It is a ubiquitous housekeeping enzyme found in nearly all organisms. In this study we compare phylogenies based on MnSOD protein sequences to those based on scores from Hydrophobic Cluster Analysis (HCA. We calculated HCA similarity values for each pair of taxa to obtain a pair-wise distance matrix. A UPGMA tree based on the HCA distance matrix and a common tree based on the primary protein sequence for MnSOD was constructed. Differences between these two trees within animals, enterobacteriaceae, planctomycetes and cyanobacteria are presented and cited as possible examples of convergence. We note that several residue changes result in changes in hydrophobicity at positions which apparently are under the effect of positive selection.

  12. A genome-wide analysis of lentivector integration sites using targeted sequence capture and next generation sequencing technology.

    Science.gov (United States)

    Ustek, Duran; Sirma, Sema; Gumus, Ergun; Arikan, Muzaffer; Cakiris, Aris; Abaci, Neslihan; Mathew, Jaicy; Emrence, Zeliha; Azakli, Hulya; Cosan, Fulya; Cakar, Atilla; Parlak, Mahmut; Kursun, Olcay

    2012-10-01

    One application of next-generation sequencing (NGS) is the targeted resequencing of interested genes which has not been used in viral integration site analysis of gene therapy applications. Here, we combined targeted sequence capture array and next generation sequencing to address the whole genome profiling of viral integration sites. Human 293T and K562 cells were transduced with a HIV-1 derived vector. A custom made DNA probe sets targeted pLVTHM vector used to capture lentiviral vector/human genome junctions. The captured DNA was sequenced using GS FLX platform. Seven thousand four hundred and eighty four human genome sequences flanking the long terminal repeats (LTR) of pLVTHM fragment sequences matched with an identity of at least 98% and minimum 50 bp criteria in both cells. In total, 203 unique integration sites were identified. The integrations in both cell lines were totally distant from the CpG islands and from the transcription start sites and preferentially located in introns. A comparison between the two cell lines showed that the lentiviral-transduced DNA does not have the same preferred regions in the two different cell lines. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    Science.gov (United States)

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  14. The effect of the multi-pass non-circular drawing sequence on mechanical properties and microstructure evolution of low-carbon steel

    International Nuclear Information System (INIS)

    Lee, Jung Wan; Baek, Hyun Moo; Hwang, Sun Kwang; Son, Il-Heon; Bae, Chul Min; Im, Yong-Taek

    2014-01-01

    Highlights: • A multi-pass non-circular drawing sequence is proposed to make high-strength wires. • The sequence was designed and applied for a low-carbon steel wire up to the 10th pass. • Many LAGBs and small grain size of the wire produced by the sequence were obtained. • High plastic deformation was imposed on the wire, resulting in grain refinement. • The sequence made fine-grained wires with improved UTS, ductility and fatigue life. - Abstract: In this study, the multi-pass non-circular drawing sequence was investigated for manufacturing high-strength wires with better ductility in a simple continuous way without adding additional alloys and heat treatment considering the effect of microstructure evolution and die geometry of the sequence on the mechanical properties of low-carbon steel during the process. For this purpose, the non-circular drawing sequence was designed and applied up to the 10th pass at room temperature. Mechanical properties and microstructure evolution of the specimen processed by the sequence were investigated by tension, Vickers micro-hardness, electron backscattering diffraction (EBSD), and fatigue tests compared with those for the conventional wire-drawing process. From the EBSD results, the higher low angle grain boundaries length per unit area and smaller average grain size of the specimen processed by the non-circular drawing sequence were obtained than those of the specimen processed by the wire-drawing process for the 8th pass. These results indicated that more plastic deformation was imposed in the material by the non-circular drawing sequence, resulting in grain refinement of the deformed specimen compared to the wire-drawing process. It is demonstrated that the multi-pass non-circular drawing sequence could be beneficial in producing fine-grained wires with improved ultimate tensile strength, ductility, and fatigue property by simply changing drawing dies geometry of the conventional wire-drawing process

  15. Sequencing and analysis of an Irish human genome.

    LENUS (Irish Health Repository)

    Tong, Pin

    2010-01-01

    Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.

  16. Exome Sequence Analysis of 14 Families With High Myopia

    DEFF Research Database (Denmark)

    Kloss, Bethany A.; Tompson, Stuart W.; Whisenhunt, Kristina N.

    2017-01-01

    Purpose: To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Methods: Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sang...

  17. Database-driven primary analysis of raw sequencing data

    DEFF Research Database (Denmark)

    2014-01-01

    The present invention relates to methods for identifying the source of a biological sequence containing sample from raw sequencing reads. The method may be used to identify the source of unknown DNA and can be used for diagnostic, biodefense, food safety and quality, and hygiene applications...

  18. Sequence evolution of the hypervariable region in the putative envelope region E2/NS1 of hepatitis C virus is correlated with specific humoral immune responses.

    OpenAIRE

    van Doorn, L J; Capriles, I; Maertens, G; DeLeys, R; Murray, K; Kos, T; Schellekens, H; Quint, W

    1995-01-01

    Sequence evolution of the hypervariable region 1 (HVR1) in the N terminus of E2/NS1 of hepatitis C virus (HCV) was studied retrospectively in six chimpanzees inoculated with the same genotype 1b strain, containing a unique predominant HVR1 sequence. Immediately after inoculation, all animals contained the same HVR predominant sequence. Two animals developed an acute self-limiting infection. Anti-HVR1 immunoglobulin G (IgG) was produced 40 to 60 days after inoculation and rapidly disappeared a...

  19. Accelerating next generation sequencing data analysis with system level optimizations.

    Science.gov (United States)

    Kathiresan, Nagarajan; Temanni, Ramzi; Almabrazi, Hakeem; Syed, Najeeb; Jithesh, Puthen V; Al-Ali, Rashid

    2017-08-22

    Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.

  20. Analysis of expressed sequence tags from the Ulva prolifera (Chlorophyta)

    Science.gov (United States)

    Niu, Jianfeng; Hu, Haiyan; Hu, Songnian; Wang, Guangce; Peng, Guang; Sun, Song

    2010-01-01

    In 2008, a green tide broke out before the sailing competition of the 29th Olympic Games in Qingdao. The causative species was determined to be Enteromorpha prolifera ( Ulva prolifera O. F. Müller), a familiar green macroalga along the coastline of China. Rapid accumulation of a large biomass of floating U. prolifera prompted research on different aspects of this species. In this study, we constructed a nonnormalized cDNA library from the thalli of U. prolifera and acquired 10 072 high-quality expressed sequence tags (ESTs). These ESTs were assembled into 3 519 nonredundant gene groups, including 1 446 clusters and 2 073 singletons. After annotation with the nr database, a large number of genes were found to be related with chloroplast and ribosomal protein, GO functional classification showed 1 418 ESTs participated in photosynthesis and 1 359 ESTs were responsible for the generation of precursor metabolites and energy. In addition, rather comprehensive carbon fixation pathways were found in U. prolifera using KEGG. Some stress-related and signal transduction-related genes were also found in this study. All the evidences displayed that U. prolifera had substance and energy foundation for the intense photosynthesis and the rapid proliferation. Phylogenetic analysis of cytochrome c oxidase subunit I revealed that this green-tide causative species is most closely affiliated to Pseudendoclonium akinetum (Ulvophyceae).

  1. Evolution of Analysis of Polyhenols from Grapes, Wines, and Extracts

    Directory of Open Access Journals (Sweden)

    Pierre-Louis Teissedre

    2013-01-01

    Full Text Available Grape and wine phenolics are structurally diverse, from simple molecules to oligomers and polymers usually designated as tannins. They have an important impact on the organoleptic properties of wines, that’s why their analysis and quantification are of primordial importance. The extraction of phenolics from grapes and from wines is the first step involved in the analysis. Then, several analytical methods have been developed for the determination of total content of phenolic, while chromatographic and spectrophotometric analyses are continuously improved in order to achieve adequate separation of phenolic molecules, their subsequent identification and quantification. This review provides a summary of evolution of analysis of polyphenols from grapes, wines and extracts.

  2. Ancient, recurrent phage attacks and recombination shaped dynamic sequence-variable mosaics at the root of phytoplasma genome evolution.

    Science.gov (United States)

    Wei, Wei; Davis, Robert E; Jomantiene, Rasa; Zhao, Yan

    2008-08-19

    Mobile genetic elements have impacted biological evolution across all studied organisms, but evidence for a role in evolutionary emergence of an entire phylogenetic clade has not been forthcoming. We suggest that mobile element predation played a formative role in emergence of the phytoplasma clade. Phytoplasmas are cell wall-less bacteria that cause numerous diseases in plants. Phylogenetic analyses indicate that these transkingdom parasites descended from Gram-positive walled bacteria, but events giving rise to the first phytoplasma have remained unknown. Previously we discovered a unique feature of phytoplasmal genome architecture, genes clustered in sequence-variable mosaics (SVMs), and suggested that such structures formed through recurrent, targeted attacks by mobile elements. In the present study, we discovered that cryptic prophage remnants, originating from phages in the order Caudovirales, formed SVMs and comprised exceptionally large percentages of the chromosomes of 'Candidatus Phytoplasma asteris'-related strains OYM and AYWB, occupying nearly all major nonsyntenic sections, and accounting for most of the size difference between the two genomes. The clustered phage remnants formed genomic islands exhibiting distinct DNA physical signatures, such as dinucleotide relative abundance and codon position GC values. Phytoplasma strain-specific genes identified as phage morons were located in hypervariable regions within individual SVMs, indicating that prophage remnants played important roles in generating phytoplasma genetic diversity. Because no SVM-like structures could be identified in genomes of ancestral relatives including Acholeplasma spp., we hypothesize that ancient phage attacks leading to SVM formation occurred after divergence of phytoplasmas from acholeplasmas, triggering evolution of the phytoplasma clade.

  3. Comparison of Boolean analysis and standard phylogenetic methods using artificially evolved and natural mt-tRNA sequences from great apes.

    Science.gov (United States)

    Ari, Eszter; Ittzés, Péter; Podani, János; Thi, Quynh Chi Le; Jakó, Eena

    2012-04-01

    Boolean analysis (or BOOL-AN; Jakó et al., 2009. BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction. Mol. Phylogenet. Evol. 52, 887-97.), a recently developed method for sequence comparison uses the Iterative Canonical Form of Boolean functions. It considers sequence information in a way entirely different from standard phylogenetic methods (i.e. Maximum Parsimony, Maximum-Likelihood, Neighbor-Joining, and Bayesian analysis). The performance and reliability of Boolean analysis were tested and compared with the standard phylogenetic methods, using artificially evolved - simulated - nucleotide sequences and the 22 mitochondrial tRNA genes of the great apes. At the outset, we assumed that the phylogeny of Hominidae is generally well established, and the guide tree of artificial sequence evolution can also be used as a benchmark. These offer a possibility to compare and test the performance of different phylogenetic methods. Trees were reconstructed by each method from 2500 simulated sequences and 22 mitochondrial tRNA sequences. We also introduced a special re-sampling method for Boolean analysis on permuted sequence sites, the P-BOOL-AN procedure. Considering the reliability values (branch support values of consensus trees and Robinson-Foulds distances) we used for simulated sequence trees produced by different phylogenetic methods, BOOL-AN appeared as the most reliable method. Although the mitochondrial tRNA sequences of great apes are relatively short (59-75 bases long) and the ratio of their constant characters is about 75%, BOOL-AN, P-BOOL-AN and the Bayesian approach produced the same tree-topology as the established phylogeny, while the outcomes of Maximum Parsimony, Maximum-Likelihood and Neighbor-Joining methods were equivocal. We conclude that Boolean analysis is a promising alternative to existing methods of sequence comparison for phylogenetic reconstruction and congruence analysis. Copyright © 2012 Elsevier Inc. All

  4. Generation and Analysis of Full-length cDNA Sequences from Elephant Shark (Callorhinchus milii)

    KAUST Repository

    Kodzius, Rimantas

    2009-03-17

    Cartilaginous fishes are the oldest living group of jawed vertebrates and therefore is an important group for understanding the evolution of vertebrate genomes including the human genome. Our laboratory has proposed elephant shark (C. milii) as a model cartilaginous fish genome because of its relatively small genome size (910 Mb). The whole genome of C. milii is being sequenced (first cartilaginous fish genome to be sequenced completely). To characterize the transcriptome of C. milii and to assist in annotating exon-intron boundaries, transcriptional start sites and alternatively spliced transcripts, we are generating full-length cDNA sequences from C. milii.

  5. Species-level analysis of DNA sequence data from the NIH Human Microbiome Project.

    Science.gov (United States)

    Conlan, Sean; Kong, Heidi H; Segre, Julia A

    2012-01-01

    Outbreaks of antibiotic-resistant bacterial infections emphasize the importance of surveillance of potentially pathogenic bacteria. Genomic sequencing of clinical microbiological specimens expands our capacity to study cultivable, fastidious and uncultivable members of the bacterial community. Herein, we compared the primary data collected by the NIH's Human Microbiome Project (HMP) with published epidemiological surveillance data of Staphylococcus aureus. The HMP's initial dataset contained microbial survey data from five body regions (skin, nares, oral cavity, gut and vagina) of 242 healthy volunteers. A significant component of the HMP dataset was deep sequencing of the 16S ribosomal RNA gene, which contains variable regions enabling taxonomic classification. Since species-level identification is essential in clinical microbiology, we built a reference database and used phylogenetic placement followed by most recent common ancestor classification to look at the species distribution for Staphylococcus, Klebsiella and Enterococcus. We show that selecting the accurate region of the 16S rRNA gene to sequence is analogous to carefully selecting culture conditions to distinguish closely related bacterial species. Analysis of the HMP data showed that Staphylococcus aureus was present in the nares of 36% of healthy volunteers, consistent with culture-based epidemiological data. Klebsiella pneumoniae and Enterococcus faecalis were found less frequently, but across many habitats. This work demonstrates that large 16S rRNA survey studies can be used to support epidemiological goals in the context of an increasing awareness that microbes flourish and compete within a larger bacterial community. This study demonstrates how genomic techniques and information could be critically important to trace microbial evolution and implement hospital infection control.

  6. Species-level analysis of DNA sequence data from the NIH Human Microbiome Project.

    Directory of Open Access Journals (Sweden)

    Sean Conlan

    Full Text Available BACKGROUND: Outbreaks of antibiotic-resistant bacterial infections emphasize the importance of surveillance of potentially pathogenic bacteria. Genomic sequencing of clinical microbiological specimens expands our capacity to study cultivable, fastidious and uncultivable members of the bacterial community. Herein, we compared the primary data collected by the NIH's Human Microbiome Project (HMP with published epidemiological surveillance data of Staphylococcus aureus. METHODS: The HMP's initial dataset contained microbial survey data from five body regions (skin, nares, oral cavity, gut and vagina of 242 healthy volunteers. A significant component of the HMP dataset was deep sequencing of the 16S ribosomal RNA gene, which contains variable regions enabling taxonomic classification. Since species-level identification is essential in clinical microbiology, we built a reference database and used phylogenetic placement followed by most recent common ancestor classification to look at the species distribution for Staphylococcus, Klebsiella and Enterococcus. MAIN RESULTS: We show that selecting the accurate region of the 16S rRNA gene to sequence is analogous to carefully selecting culture conditions to distinguish closely related bacterial species. Analysis of the HMP data showed that Staphylococcus aureus was present in the nares of 36% of healthy volunteers, consistent with culture-based epidemiological data. Klebsiella pneumoniae and Enterococcus faecalis were found less frequently, but across many habitats. CONCLUSIONS: This work demonstrates that large 16S rRNA survey studies can be used to support epidemiological goals in the context of an increasing awareness that microbes flourish and compete within a larger bacterial community. This study demonstrates how genomic techniques and information could be critically important to trace microbial evolution and implement hospital infection control.

  7. Genome sequencing of the extinct Eurasian wild aurochs, Bos primigenius, illuminates the phylogeography and evolution of cattle.

    Science.gov (United States)

    Park, Stephen D E; Magee, David A; McGettigan, Paul A; Teasdale, Matthew D; Edwards, Ceiridwen J; Lohan, Amanda J; Murphy, Alison; Braud, Martin; Donoghue, Mark T; Liu, Yuan; Chamberlain, Andrew T; Rue-Albrecht, Kévin; Schroeder, Steven; Spillane, Charles; Tai, Shuaishuai; Bradley, Daniel G; Sonstegard, Tad S; Loftus, Brendan J; MacHugh, David E

    2015-10-26

    Domestication of the now-extinct wild aurochs, Bos primigenius, gave rise to the two major domestic extant cattle taxa, B. taurus and B. indicus. While previous genetic studies have shed some light on the evolutionary relationships between European aurochs and modern cattle, important questions remain unanswered, including the phylogenetic status of aurochs, whether gene flow from aurochs into early domestic populations occurred, and which genomic regions were subject to selection processes during and after domestication. Here, we address these questions using whole-genome sequencing data generated from an approximately 6,750-year-old British aurochs bone and genome sequence data from 81 additional cattle plus genome-wide single nucleotide polymorphism data from a diverse panel of 1,225 modern animals. Phylogenomic analyses place the aurochs as a distinct outgroup to the domestic B. taurus lineage, supporting the predominant Near Eastern origin of European cattle. Conversely, traditional British and Irish breeds share more genetic variants with this aurochs specimen than other European populations, supporting localized gene flow from aurochs into the ancestors of modern British and Irish cattle, perhaps through purposeful restocking by early herders in Britain. Finally, the functions of genes showing evidence for positive selection in B. taurus are enriched for neurobiology, growth, metabolism and immunobiology, suggesting that these biological processes have been important in the domestication of cattle. This work provides important new information regarding the origins and functional evolution of modern cattle, revealing that the interface between early European domestic populations and wild aurochs was significantly more complex than previously thought.

  8. Evolution of simeprevir-resistant variants over time by ultra-deep sequencing in HCV genotype 1b.

    Science.gov (United States)

    Akuta, Norio; Suzuki, Fumitaka; Sezaki, Hitomi; Suzuki, Yoshiyuki; Hosaka, Tetsuya; Kobayashi, Masahiro; Kobayashi, Mariko; Saitoh, Satoshi; Ikeda, Kenji; Kumada, Hiromitsu

    2014-08-01

    Using ultra-deep sequencing technology, the present study was designed to investigate the evolution of simeprevir-resistant variants (amino acid substitutions of aa80, aa155, aa156, and aa168 positions in HCV NS3 region) over time. In Toranomon Hospital, 18 Japanese patients infected with HCV genotype 1b, received triple therapy of simeprevir/PEG-IFN/ribavirin (DRAGON or CONCERT study). Sustained virological response rate was 67%, and that was significantly higher in patients with IL28B rs8099917 TT than in those with non-TT. Six patients, who did not achieve sustained virological response, were tested for resistant variants by ultra-deep sequencing, at the baseline, at the time of re-elevation of viral loads, and at 96 weeks after the completion of treatment. Twelve of 18 resistant variants, detected at re-elevation of viral load, were de novo resistant variants. Ten of 12 de novo resistant variants become undetectable over time, and that five of seven resistant variants, detected at baseline, persisted over time. In one patient, variants of Q80R at baseline (0.3%) increased at 96-week after the cessation of treatment (10.2%), and de novo resistant variants of D168E (0.3%) also increased at 96-week after the cessation of treatment (9.7%). In conclusion, the present study indicates that the emergence of simeprevir-resistant variants after the start of treatment could not be predicted at baseline, and the majority of de novo resistant variants become undetectable over time. Further large-scale prospective studies should be performed to investigate the clinical utility in detecting simeprevir-resistant variants. © 2014 Wiley Periodicals, Inc.

  9. Evolution of blue-flowered species of genus Linum based on high-throughput sequencing of ribosomal RNA genes.

    Science.gov (United States)

    Bolsheva, Nadezhda L; Melnikova, Nataliya V; Kirov, Ilya V; Speranskaya, Anna S; Krinitsina, Anastasia A; Dmitriev, Alexey A; Belenikin, Maxim S; Krasnov, George S; Lakunina, Valentina A; Snezhkina, Anastasiya V; Rozhmina, Tatiana A; Samatadze, Tatiana E; Yurkevich, Olga Yu; Zoshchuk, Svyatoslav A; Amosova, Аlexandra V; Kudryavtseva, Anna V; Muravenko, Olga V

    2017-12-28

    The species relationships within the genus Linum have already been studied several times by means of different molecular and phylogenetic approaches. Nevertheless, a number of ambiguities in phylogeny of Linum still remain unresolved. In particular, the species relationships within the sections Stellerolinum and Dasylinum need further clarification. Also, the question of independence of the species of the section Adenolinum still remains unanswered. Moreover, the relationships of L. narbonense and other species of the section Linum require further clarification. Additionally, the origin of tetraploid species of the section Linum (2n = 30) including the cultivated species L. usitatissimum has not been explored. The present study examines the phylogeny of blue-flowered species of Linum by comparisons of 5S rRNA gene sequences as well as ITS1 and ITS2 sequences of 35S rRNA genes. High-throughput sequencing has been used for analysis of multicopy rRNA gene families. In addition to the molecular phylogenetic analysis, the number and chromosomal localization of 5S and 35S rDNA sites has been determined by FISH. Our findings confirm that L. stelleroides forms a basal branch from the clade of blue-flowered flaxes which is independent of the branch formed by species of the sect. Dasylinum. The current molecular phylogenetic approaches, the cytogenetic analysis as well as different genomic DNA fingerprinting methods applied previously did not discriminate certain species within the sect. Adenolinum. The allotetraploid cultivated species L. usitatissimum and its wild ancestor L. angustifolium (2n = 30) could originate either as the result of hybridization of two diploid species (2n = 16) related to the modern L. gandiflorum and L. decumbens, or hybridization of a diploid species (2n = 16) and a diploid ancestor of modern L. narbonense (2n = 14). High-throughput sequencing of multicopy rRNA gene families allowed us to make several adjustments to the

  10. Sequencing and analysis of the gene-rich space of cowpea

    Directory of Open Access Journals (Sweden)

    Cheung Foo

    2008-02-01

    total of 5,888 GSRs had homology to genes encoding transcription factors (TFs and transcription associated factors (TAFs representing about 5% of the total annotated sequences in the dataset. Sixty-two (62 of the 64 well-characterized plant transcription factor (TF gene families are represented in the cowpea GSRs, and these families are of similar size and phylogenetic organization to those characterized in other plants. The cowpea GSRs also provides a rich source of genes involved in photoperiodic control, symbiosis, and defense-related responses. Comparisons to available databases revealed that about 74% of cowpea ESTs and 70% of all legume ESTs were represented in the GSR dataset. As approximately 12% of all GSRs contain an identifiable simple-sequence repeat, the dataset is a powerful resource for the design of microsatellite markers. Conclusion The availability of extensive publicly available genomic data for cowpea, a non-model legume with significant importance in the developing world, represents a significant step forward in legume research. Not only does the gene space sequence enable the detailed analysis of gene structure, gene family organization and phylogenetic relationships within cowpea, but it also facilitates the characterization of syntenic relationships with other cultivated and model legumes, and will contribute to determining patterns of chromosomal evolution in the Leguminosae. The micro and macrosyntenic relationships detected between cowpea and other cultivated and model legumes should simplify the identification of informative markers for marker-assisted trait selection and map-based gene isolation necessary for cowpea improvement.

  11. Event Sequence Analysis of the Air Intelligence Agency Information Operations Center Flight Operations

    National Research Council Canada - National Science Library

    Larsen, Glen

    1998-01-01

    This report applies Event Sequence Analysis, methodology adapted from aircraft mishap investigation, to an investigation of the performance of the Air Intelligence Agency's Information Operations Center (IOC...

  12. Analysis of the Macaca mulatta transcriptome and the sequence divergence between Macaca and human.

    Science.gov (United States)

    Magness, Charles L; Fellin, P Campion; Thomas, Matthew J; Korth, Marcus J; Agy, Michael B; Proll, Sean C; Fitzgibbon, Matthew; Scherer, Christina A; Miner, Douglas G; Katze, Michael G; Iadonato, Shawn P

    2005-01-01

    We report the initial sequencing and comparative analysis of the Macaca mulatta transcriptome. Cloned sequences from 11 tissues, nine animals, and three species (M. mulatta, M. fascicularis, and M. nemestrina) were sampled, resulting in the generation of 48,642 sequence reads. These data represent an initial sampling of the putative rhesus orthologs for 6,216 human genes. Mean nucleotide diversity within M. mulatta and sequence divergence among M. fascicularis, M. nemestrina, and M. mulatta are also reported.

  13. Sequence analysis of mitochondrial 16S ribosomal RNA gene ...

    Indian Academy of Sciences (India)

    Unknown

    For the understanding of their vectorial capacity, identification of disease carrying and refractory strains is essential. ... been widely used for phylogenetic studies and sequence differences in ... In order to fill up the internal gap, a new set.

  14. simple sequence repeat (SSR) markers in genetic analysis of

    African Journals Online (AJOL)

    Yomi

    2012-08-28

    1998). Cross- species amplification of soybean (Glycine max) simple sequence repeats (SSRs) within the genus and other legume genera: implications for the transferability of SSRs in plants. Mol. Biol. Evol. 15:1275-1287.

  15. Sequence and expression analysis of gaps in human chromosome 20

    DEFF Research Database (Denmark)

    Minocherhomji, Sheroy; Seemann, Stefan; Mang, Yuan

    2012-01-01

    /or overlap disease-associated loci, including the DLGAP4 locus. In this study, we sequenced ~99% of all three unfinished gaps on human chr 20, determined their complete genomic sizes and assessed epigenetic profiles using a combination of Sanger sequencing, mate pair paired-end high-throughput sequencing......The finished human genome-assemblies comprise several hundred un-sequenced euchromatic gaps, which may be rich in long polypurine/polypyrimidine stretches. Human chromosome 20 (chr 20) currently has three unfinished gaps remaining on its q-arm. All three gaps are within gene-dense regions and...... and chromatin, methylation and expression analyses. We found histone 3 trimethylated at Lysine 27 to be distributed across all three gaps in immortalized B-lymphocytes. In one gap, five novel CpG islands were predominantly hypermethylated in genomic DNA from peripheral blood lymphocytes and human cerebellum...

  16. DELIMINATE--a fast and efficient method for loss-less compression of genomic sequences: sequence analysis.

    Science.gov (United States)

    Mohammed, Monzoorul Haque; Dutta, Anirban; Bose, Tungadri; Chadaram, Sudha; Mande, Sharmila S

    2012-10-01

    An unprecedented quantity of genome sequence data is currently being generated using next-generation sequencing platforms. This has necessitated the development of novel bioinformatics approaches and algorithms that not only facilitate a meaningful analysis of these data but also aid in efficient compression, storage, retrieval and transmission of huge volumes of the generated data. We present a novel compression algorithm (DELIMINATE) that can rapidly compress genomic sequence data in a loss-less fashion. Validation results indicate relatively higher compression efficiency of DELIMINATE when compared with popular general purpose compression algorithms, namely, gzip, bzip2 and lzma. Linux, Windows and Mac implementations (both 32 and 64-bit) of DELIMINATE are freely available for download at: http://metagenomics.atc.tcs.com/compression/DELIMINATE. sharmila@atc.tcs.com Supplementary data are available at Bioinformatics online.

  17. Analysis of 16S rRNA amplicon sequencing options on the Roche/454 next-generation titanium sequencing platform.

    Directory of Open Access Journals (Sweden)

    Hideyuki Tamaki

    Full Text Available BACKGROUND: 16S rRNA gene pyrosequencing approach has revolutionized studies in microbial ecology. While primer selection and short read length can affect the resulting microbial community profile, little is known about the influence of pyrosequencing methods on the sequencing throughput and the outcome of microbial community analyses. The aim of this study is to compare differences in output, ease, and cost among three different amplicon pyrosequencing methods for the Roche/454 Titanium platform METHODOLOGY/PRINCIPAL FINDINGS: The following three pyrosequencing methods for 16S rRNA genes were selected in this study: Method-1 (standard method is the recommended method for bi-directional sequencing using the LIB-A kit; Method-2 is a new option designed in this study for unidirectional sequencing with the LIB-A kit; and Method-3 uses the LIB-L kit for unidirectional sequencing. In our comparison among these three methods using 10 different environmental samples, Method-2 and Method-3 produced 1.5-1.6 times more useable reads than the standard method (Method-1, after quality-based trimming, and did not compromise the outcome of microbial community analyses. Specifically, Method-3 is the most cost-effective unidirectional amplicon sequencing method as it provided the most reads and required the least effort in consumables management. CONCLUSIONS: Our findings clearly demonstrated that alternative pyrosequencing methods for 16S rRNA genes could drastically affect sequencing output (e.g. number of reads before and after trimming but have little effect on the outcomes of microbial community analysis. This finding is important for both researchers and sequencing facilities utilizing 16S rRNA gene pyrosequencing for microbial ecological studies.

  18. Compilation and analysis of Escherichia coli promoter DNA sequences.

    OpenAIRE

    Hawley, D K; McClure, W R

    1983-01-01

    The DNA sequence of 168 promoter regions (-50 to +10) for Escherichia coli RNA polymerase were compiled. The complete listing was divided into two groups depending upon whether or not the promoter had been defined by genetic (promoter mutations) or biochemical (5' end determination) criteria. A consensus promoter sequence based on homologies among 112 well-defined promoters was determined that was in substantial agreement with previous compilations. In addition, we have tabulated 98 promoter ...

  19. Occurrence of Can-SINEs and intron sequence evolution supports robust phylogeny of pinniped carnivores and their terrestrial relatives.

    Science.gov (United States)

    Schröder, Christiane; Bleidorn, Christoph; Hartmann, Stefanie; Tiedemann, Ralph

    2009-12-15

    Investigating the dog genome we found 178965 introns with a moderate length of 200-1000 bp. A screening of these sequences against 23 different repeat libraries to find insertions of short interspersed elements (SINEs) detected 45276 SINEs. Virtually all of these SINEs (98%) belong to the tRNA-derived Can-SINE family. Can-SINEs arose about 55 million years ago before Carnivora split into two basal groups, the Caniformia (dog-like carnivores) and the Feliformia (cat-like carnivores). Genome comparisons of dog and cat recovered 506 putatively informative SINE loci for caniformian phylogeny. In this study we show how to use such genome information of model organisms to research the phylogeny of related non-model species of interest. Investigating a dataset including representatives of all major caniformian lineages, we analysed 24 randomly chosen loci for 22 taxa. All loci were amplifiable and revealed 17 parsimony-informative SINE insertions. The screening for informative SINE insertions yields a large amount of sequence information, in particular of introns, which contain reliable phylogenetic information as well. A phylogenetic analysis of intron- and SINE sequence data provided a statistically robust phylogeny which is congruent with the absence/presence pattern of our SINE markers. This phylogeny strongly supports a sistergroup relationship of Musteloidea and Pinnipedia. Within Pinnipedia, we see strong support from bootstrapping and the presence of a SINE insertion for a sistergroup relationship of the walrus with the Otariidae.

  20. Influenza virus sequence feature variant type analysis: evidence of a role for NS1 in influenza virus host range restriction.

    Science.gov (United States)

    Noronha, Jyothi M; Liu, Mengya; Squires, R Burke; Pickett, Brett E; Hale, Benjamin G; Air, Gillian M; Galloway, Summer E; Takimoto, Toru; Schmolke, Mirco; Hunt, Victoria; Klem, Edward; García-Sastre, Adolfo; McGee, Monnie; Scheuermann, Richard H

    2012-05-01

    Genetic drift of influenza virus genomic sequences occurs through the combined effects of sequence alterations introduced by a low-fidelity polymerase and the varying selective pressures experienced as the virus migrates through different host environments. While traditional phylogenetic analysis is useful in tracking the evolutionary heritage of these viruses, the specific genetic determinants that dictate important phenotypic characteristics are often difficult to discern within the complex genetic background arising through evolution. Here we describe a novel influenza virus sequence feature variant type (Flu-SFVT) approach, made available through the public Influenza Research Database resource (www.fludb.org), in which variant types (VTs) identified in defined influenza virus protein sequence features (SFs) are used for genotype-phenotype association studies. Since SFs have been defined for all influenza virus proteins based on known structural, functional, and immune epitope recognition properties, the Flu-SFVT approach allows the rapid identification of the molecular genetic determinants of important influenza virus characteristics and their connection to underlying biological functions. We demonstrate the use of the SFVT approach to obtain statistical evidence for effects of NS1 protein sequence variations in dictating influenza virus host range restriction.

  1. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo: genome assembly and analysis.

    Directory of Open Access Journals (Sweden)

    Rami A Dalloul

    2010-09-01

    Full Text Available A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo. Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.

  2. Phylogeny and character evolution of the fern genus Tectaria (Tectariaceae) in the Old World inferred from chloroplast DNA sequences.

    Science.gov (United States)

    Ding, Hui-Hui; Chao, Yi-Shan; Callado, John Rey; Dong, Shi-Yong

    2014-11-01

    In this study we provide a phylogeny for the pantropical fern genus Tectaria, with emphasis on the Old World species, based on sequences of five plastid regions (atpB, ndhF plus ndhF-trnL, rbcL, rps16-matK plus matK, and trnL-F). Maximum parsimony, maximum likelihood, and Bayesian inference are used to analyze 115 individuals, representing ca. 56 species of Tectaria s.l. and 36 species of ten related genera. The results strongly support the monophyly of Tectaria in a broad sense, in which Ctenitopsis, Hemigramma, Heterogonium, Psomiocarpa, Quercifilix, Stenosemia, and Tectaridium should be submerged. Such broadly circumscribed Tectaria is supported by the arising pattern of veinlets and the base chromosome number (x=40). Four primary clades are well resolved within Tectaria, one from the Neotropic (T. trifoliata clade) and three from the Old World (T. subtriphylla clade, Ctenitopsis clade, and T. crenata clade). Tectaria crenata clade is the largest one including six subclades. Of the genera previously recognized as tectarioid ferns, Ctenitis, Lastreopsis, and Pleocnemia, are confirmed to be members in Dryopteridaceae; while Pteridrys and Triplophyllum are supported in Tectariaceae. To infer morphological evolution, 13 commonly used characters are optimized on the resulting phylogenetic trees and in result, are all homoplastic in Tectaria. Copyright © 2014 Elsevier Inc. All rights reserved.

  3. THE EXTENDED MAIN-SEQUENCE TURNOFF CLUSTERS OF THE LARGE MAGELLANIC CLOUD-MISSING LINKS IN GLOBULAR CLUSTER EVOLUTION

    International Nuclear Information System (INIS)

    Keller, Stefan C.; Mackey, A. Dougal; Da Costa, Gary S.

    2011-01-01

    Recent observations of intermediate-age (1-3 Gyr) massive star clusters in the Large Magellanic Cloud have revealed that the majority possess bifurcated or extended main-sequence turnoff (EMSTO) morphologies. This effect can be understood to arise from subsequent star formation among the stellar population with age differences between constituent stars amounting to 50-300 Myr. Age spreads of this order are similarly invoked to explain the light-element abundance variations witnessed in ancient globular clusters (GCs). In this paper, we explore the proposition that the clusters exhibiting the EMSTO phenomenon are a general phase in the evolution of massive clusters, one that naturally leads to the particular chemical properties of the ancient GC population. We show that the isolation of EMSTO clusters to intermediate ages is the consequence of observational selection effects. In our proposed scenario, the EMSTO phenomenon is identical to that which establishes the light-element abundance variations that are ubiquitous in the ancient GC population. Our scenario makes a strong prediction: EMSTO clusters will exhibit abundance variations in the light-elements characteristic of the ancient GC population.

  4. Chromosome Mapping of Repetitive Sequences in Rachycentron canadum (Perciformes: Rachycentridae: Implications for Karyotypic Evolution and Perspectives for Biotechnological Uses

    Directory of Open Access Journals (Sweden)

    Uedson Pereira Jacobina

    2011-01-01

    Full Text Available The cobia, Rachycentron canadum, a species of marine fish, has been increasingly used in aquaculture worldwide. It is the only member of the family Rachycentridae (Perciformes showing wide geographic distribution and phylogenetic patterns still not fully understood. In this study, the species was cytogenetically analyzed by different methodologies, including Ag-NOR and chromomycin A3 (CMA3/DAPI staining, C-banding, early replication banding (RGB, and in situ fluorescent hybridization with probes for 18S and 5S ribosomal genes and for telomeric sequences (TTAGGGn. The results obtained allow a detailed chromosomal characterization of the Atlantic population. The chromosome diversification found in the karyotype of the cobia is apparently related to pericentric inversions, the main mechanism associated to the karyotypic evolution of Perciformes. The differential heterochromatin replication patterns found were in part associated to functional genes. Despite maintaining conservative chromosomal characteristics in relation to the basal pattern established for Perciformes, some chromosome pairs in the analyzed population exhibit markers that may be important for cytotaxonomic, population, and biodiversity studies as well as for monitoring the species in question.

  5. Evolution of a Pathogen: A Comparative Genomics Analysis Identifies a Genetic Pathway to Pathogenesis in Acinetobacter

    Science.gov (United States)

    Sahl, Jason W.; Gillece, John D.; Schupp, James M.; Waddell, Victor G.; Driebe, Elizabeth M.; Engelthaler, David M.; Keim, Paul

    2013-01-01

    Acinetobacter baumannii is an emergent and global nosocomial pathogen. In addition to A. baumannii, other Acinetobacter species, especially those in the Acinetobacter calcoaceticus-baumannii (Acb) complex, have also been associated with serious human infection. Although mechanisms of attachment, persistence on abiotic surfaces, and pathogenesis in A. baumannii have been identified, the genetic mechanisms that explain the emergence of A. baumannii as the most widespread and virulent Acinetobacter species are not fully understood. Recent whole genome sequencing has provided insight into the phylogenetic structure of the genus Acinetobacter. However, a global comparison of genomic features between Acinetobacter spp. has not been described in the literature. In this study, 136 Acinetobacter genomes, including 67 sequenced in this study, were compared to identify the acquisition and loss of genes in the expansion of the Acinetobacter genus. A whole genome phylogeny confirmed that A. baumannii is a monophyletic clade and that the larger Acb complex is also a well-supported monophyletic group. The whole genome phylogeny provided the framework for a global genomic comparison based on a blast score ratio (BSR) analysis. The BSR analysis demonstrated that specific genes have been both lost and acquired in the evolution of A. baumannii. In addition, several genes associated with A. baumannii pathogenesis were found to be more conserved in the Acb complex, and especially in A. baumannii, than in other Acinetobacter genomes; until recently, a global analysis of the distribution and conservation of virulence factors across the genus was not possible. The results demonstrate that the acquisition of specific virulence factors has likely contributed to the widespread persistence and virulence of A. baumannii. The identification of novel features associated with transcriptional regulation and acquired by clades in the Acb complex presents targets for better understanding the

  6. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.

    Science.gov (United States)

    Tian, Pengfei; Best, Robert B

    2017-10-17

    Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.

  7. First fungal genome sequence from Africa: A preliminary analysis

    Directory of Open Access Journals (Sweden)

    Rene Sutherland

    2012-01-01

    Full Text Available Some of the most significant breakthroughs in the biological sciences this century will emerge from the development of next generation sequencing technologies. The ease of availability of DNA sequence made possible through these new technologies has given researchers opportunities to study organisms in a manner that was not possible with Sanger sequencing. Scientists will, therefore, need to embrace genomics, as well as develop and nurture the human capacity to sequence genomes and utilise the ’tsunami‘ of data that emerge from genome sequencing. In response to these challenges, we sequenced the genome of Fusarium circinatum, a fungal pathogen of pine that causes pitch canker, a disease of great concern to the South African forestry industry. The sequencing work was conducted in South Africa, making F. circinatum the first eukaryotic organism for which the complete genome has been sequenced locally. Here we report on the process that was followed to sequence, assemble and perform a preliminary characterisation of the genome. Furthermore, details of the computer annotation and manual curation of this genome are presented. The F. circinatum genome was found to be nearly 44 million bases in size, which is similar to that of four other Fusarium genomes that have been sequenced elsewhere. The genome contains just over 15 000 open reading frames, which is less than that of the related species, Fusarium oxysporum, but more than that for Fusarium verticillioides. Amongst the various putative gene clusters identified in F. circinatum, those encoding the secondary metabolites fumosin and fusarin appeared to harbour evidence of gene translocation. It is anticipated that similar comparisons of other loci will provide insights into the genetic basis for pathogenicity of the pitch canker pathogen. Perhaps more importantly, this project has engaged a relatively large group of scientists

  8. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Science.gov (United States)

    Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.

    2009-01-01

    The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722

  9. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Directory of Open Access Journals (Sweden)

    Guy Leonard

    2009-01-01

    Full Text Available The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment fi le, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree fi les (with a user-defined combination of species name and/or database accession number. Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file and generation of species and accession number lists for use in supplementary materials or figure legends.

  10. [Analysis on property of meridian supramolecules by biological evolution path].

    Science.gov (United States)

    Deng, Kaiwen; Tao, Yeqin; Tang, Wenhan; He, Fuyuan; Liu, Wenlong; Shi, Jilian; Yang, Yantao; Zhou, Yiqun; Chang, Xiaorong

    2017-03-12

    With human placed in the whole nature, by following the biologic evolution path, the property of channel structure for "imprinting template" in meridian and zang-fu was explored with supramolecular chemistry. In the history of biologic evolution, each molecule in "molecule society" gradually developed into various highly-ordered supramolecular bodies based on self-identification, self-assembly, self-organization, self-replicating of"imprinting template", and thereby the original biochemical system was established, and finally evolved into human. In the forming process of supramolecular bodies, the channel structure of"imprinting template" in guest supramolecular bodies would be kept by host supramolecular bodies, and communicate with the outside to exchange materials, energy, information, otherwise life phenomenon could not continue, for which it was the chemical nature of biolo-gical supramolecular bodies for body to develop meridian. Therefore, the human was a gigantic and complicated supramolecules body in biological nature, and possessed the supramolecules "imprinting template" at each stage of evolution, for which the meridians were formed. When meridians converged, acupoints appeared; when acupointsconverged, zang-fu appeared. With the promotion of the blood from heart, according to"imprinting template", the guest supramolecular bodies and host meridian produced qi -analysis, which was the qi -phenomenon of guest in meridian. It presented as zang-fu image of physiology and pathology as well as action regularities of medication and acupuncture tolerance, by which current various meridian viewpoints could be explained and propose the hypothesis of meridian supramolecular bodies. The meridian and its phenomenon was decide by its "imprinting template" of supramolecular bodies and self-reaction regularities, which abided through the living nature. This was the substance for meridian biology.

  11. Analysis of the complement and molecular evolution of tRNA genes in cow

    Directory of Open Access Journals (Sweden)

    Barris Wesley C

    2009-04-01

    Full Text Available Abstract Background Detailed information regarding the number and organization of transfer RNA (tRNA genes at the genome level is becoming readily available with the increase of DNA sequencing of whole genomes. However the identification of functional tRNA genes is challenging for species that have large numbers of repetitive elements containing tRNA derived sequences, such as Bos taurus. Reliable identification and annotation of entire sets of tRNA genes allows the evolution of tRNA genes to be understood on a genomic scale. Results In this study, we explored the B. taurus genome using bioinformatics and comparative genomics approaches to catalogue and analyze cow tRNA genes. The initial analysis of the cow genome using tRNAscan-SE identified 31,868 putative tRNA genes and 189,183 pseudogenes, where 28,830 of the 31,868 predicted tRNA genes were classified as repetitive elements by the RepeatMasker program. We then used comparative genomics to further discriminate between functional tRNA genes and tRNA-derived sequences for the remaining set of 3,038 putative tRNA genes. For our analysis, we used the human, chimpanzee, mouse, rat, horse, dog, chicken and fugu genomes to predict that the number of active tRNA genes in cow lies in the vicinity of 439. Of this set, 150 tRNA genes were 100% identical in their sequences across all nine vertebrate genomes studied. Using clustering analyses, we identified a new tRNA-GlyCCC subfamily present in all analyzed mammalian genomes. We suggest that this subfamily originated from an ancestral tRNA-GlyGCC gene via a point mutation prior to the radiation of the mammalian lineages. Lastly, in a separate analysis we created phylogenetic profiles for each putative cow tRNA gene using a representative set of genomes to gain an overview of common evolutionary histories of tRNA genes. Conclusion The use of a combination of bioinformatics and comparative genomics approaches has allowed the confident identification of a

  12. Molecular cloning, sequence analysis and phylogeny of first caudata g-type lysozyme in axolotl (Ambystoma mexicanum).

    Science.gov (United States)

    Yu, Haining; Gao, Jiuxiang; Lu, Yiling; Guang, Huijuan; Cai, Shasha; Zhang, Songyan; Wang, Yipeng

    2013-11-01

    Lysozymes are key proteins that play important roles in innate immune defense in many animal phyla by breaking down the bacterial cell-walls. In this study, we report the molecular cloning, sequence analysis and phylogeny of the first caudate amphibian g-lysozyme: a full-length spleen cDNA library from axolotl (Ambystoma mexicanum). A goose-type (g-lysozyme) EST was identified and the full-length cDNA was obtained using RACE-PCR. The axolotl g-lysozyme sequence represents an open reading frame for a putative signal peptide and the mature protein composed of 184 amino acids. The calculated molecular mass and the theoretical isoelectric point (pl) of this mature protein are 21523.0 Da and 4.37, respectively. Expression of g-lysozyme mRNA is predominantly found in skin, with lower levels in spleen, liver, muscle, and lung. Phylogenetic analysis revealed that caudate amphibian g-lysozyme had distinct evolution pattern for being juxtaposed with not only anura amphibian, but also with the fish, bird and mammal. Although the first complete cDNA sequence for caudate amphibian g-lysozyme is reported in the present study, clones encoding axolotl's other functional immune molecules in the full-length cDNA library will have to be further sequenced to gain insight into the fundamental aspects of antibacterial mechanisms in caudate.

  13. Genetic Diversity and Phylogenetic Evolution of Tibetan Sheep Based on mtDNA D-Loop Sequences.

    Directory of Open Access Journals (Sweden)

    Jianbin Liu

    Full Text Available The molecular and population genetic evidence of the phylogenetic status of the Tibetan sheep (Ovis aries is not well understood, and little is known about this species' genetic diversity. This knowledge gap is partly due to the difficulty of sample collection. This is the first work to address this question. Here, the genetic diversity and phylogenetic relationship of 636 individual Tibetan sheep from fifteen populations were assessed using 642 complete sequences of the mitochondrial DNA D-loop. Samples were collected from the Qinghai-Tibetan Plateau area in China, and reference data were obtained from the six reference breed sequences available in GenBank. The length of the sequences varied considerably, between 1031 and 1259 bp. The haplotype diversity and nucleotide diversity were 0.992±0.010 and 0.019±0.001, respectively. The average number of nucleotide differences was 19.635. The mean nucleotide composition of the 350 haplotypes was 32.961% A, 29.708% T, 22.892% C, 14.439% G, 62.669% A+T, and 37.331% G+C. Phylogenetic analysis showed that all four previously defined haplogroups (A, B, C, and D were found in the 636 individuals of the fifteen Tibetan sheep populations but that only the D haplogroup was found in Linzhou sheep. Further, the clustering analysis divided the fifteen Tibetan sheep populations into at least two clusters. The estimation of the demographic parameters from the mismatch analyses showed that haplogroups A, B, and C had at least one demographic expansion in Tibetan sheep. These results contribute to the knowledge of Tibetan sheep populations and will help inform future conservation programs about the Tibetan sheep native to the Qinghai-Tibetan Plateau.

  14. Regional scale analysis of the altimetric stream network evolution

    Directory of Open Access Journals (Sweden)

    T. Ghizzoni

    2006-01-01

    Full Text Available Floods result from the limited carrying capacity of stream channels when compared to the discharge peak value. The transit of flood waves - with the associated erosion and sedimentation processes - often modifies local stream geometry. In some cases this results in a reduction of the stream carrying capacity, and consequently in an enhancement of the flooding risk. A mathematical model for the prediction of potential altimetric stream network evolution due to erosion and sedimentation processes is here formalized. It works at the regional scale, identifying the tendency of river segments to sedimentation, stability, or erosion. The model builds on geomorphologic concepts, and derives its parameters from extensive surveys. As a case study, tendencies of rivers pertaining to the Valle d'Aosta region are analyzed. Some validation is provided both at regional and local scales of analysis. Local validation is performed both through a mathematical model able to simulate the temporal evolution of the stream profile, and through comparison of the prediction with ante and post-event river surveys, where available. Overall results are strongly encouraging. Possible use of the information derived from the model in the context of flood and landslide hazard mitigation is briefly discussed.

  15. Genome-wide analysis of codon usage bias in four sequenced cotton species.

    Science.gov (United States)

    Wang, Liyuan; Xing, Huixian; Yuan, Yanchao; Wang, Xianlin; Saeed, Muhammad; Tao, Jincai; Feng, Wei; Zhang, Guihua; Song, Xianliang; Sun, Xuezhen

    2018-01-01

    Codon usage bias (CUB) is an important evolutionary feature in a genome which provides important information for studying organism evolution, gene function and exogenous gene expression. The CUB and its shaping factors in the nuclear genomes of four sequenced cotton species, G. arboreum (A2), G. raimondii (D5), G. hirsutum (AD1) and G. barbadense (AD2) were analyzed in the present study. The effective number of codons (ENC) analysis showed the CUB was weak in these four species and the four subgenomes of the two tetraploids. Codon composition analysis revealed these four species preferred to use pyrimidine-rich codons more frequently than purine-rich codons. Correlation analysis indicated that the base content at the third position of codons affect the degree of codon preference. PR2-bias plot and ENC-plot analyses revealed that the CUB patterns in these genomes and subgenomes were influenced by combined effects of translational selection, directional mutation and other factors. The translational selection (P2) analysis results, together with the non-significant correlation between GC12 and GC3, further revealed that translational selection played the dominant role over mutation pressure in the codon usage bias. Through relative synonymous codon usage (RSCU) analysis, we detected 25 high frequency codons preferred to end with T or A, and 31 low frequency codons inclined to end with C or G in these four species and four subgenomes. Finally, 19 to 26 optimal codons with 19 common ones were determined for each species and subgenomes, which preferred to end with A or T. We concluded that the codon usage bias was weak and the translation selection was the main shaping factor in nuclear genes of these four cotton genomes and four subgenomes.

  16. Full-Length Venom Protein cDNA Sequences from Venom-Derived mRNA: Exploring Compositional Variation and Adaptive Multigene Evolution.

    Science.gov (United States)

    Modahl, Cassandra M; Mackessy, Stephen P

    2016-06-01

    access to cDNA sequences in the absence of living specimens, even from commercial venom sources, to evaluate important regional differences in venom composition and to study snake venom protein evolution.

  17. Accident Sequence Precursor Analysis for SGTR by Using Dynamic PSA Approach

    International Nuclear Information System (INIS)

    Lee, Han Sul; Heo, Gyun Young; Kim, Tae Wan

    2016-01-01

    In order to address this issue, this study suggests the sequence tree model to analyze accident sequence systematically. Using the sequence tree model, all possible scenarios which need a specific safety action to prevent the core damage can be identified and success conditions of safety action under complicated situation such as combined accident will be also identified. Sequence tree is branch model to divide plant condition considering the plant dynamics. Since sequence tree model can reflect the plant dynamics, arising from interaction of different accident timing and plant condition and from the interaction between the operator action, mitigation system, and the indicators for operation, sequence tree model can be used to develop the dynamic event tree model easily. Target safety action for this study is a feed-and-bleed (F and B) operation. A F and B operation directly cools down the reactor cooling system (RCS) using the primary cooling system when residual heat removal by the secondary cooling system is not available. In this study, a TLOFW accident and a TLOFW accident with LOCA were the target accidents. Based on the conventional PSA model and indicators, the sequence tree model for a TLOFW accident was developed. Based on the results of a sampling analysis and data from the conventional PSA model, the CDF caused by Sequence no. 26 can be realistically estimated. For a TLOFW accident with LOCA, second accident timings were categorized according to plant condition. Indicators were selected as branch point using the flow chart and tables, and a corresponding sequence tree model was developed. If sampling analysis is performed, practical accident sequences can be identified based on the sequence analysis. If a realistic distribution for the variables can be obtained for sampling analysis, much more realistic accident sequences can be described. Moreover, if the initiating event frequency under a combined accident can be quantified, the sequence tree model

  18. Sequencing and phylogenetic analysis of Herpes simplex virus type ...

    African Journals Online (AJOL)

    For determination of the genetic relationship of HSV-2 glycoprotein G gene (gG) in Iran with those in other countries, DNA fragment of 1100 bp corresponding to gG from six HSV-2 strains have been isolated from human infected sera samples in Iran, it was amplified in PCR system and was sequenced for determining ...

  19. Transcriptome analysis of blueberry using 454 EST sequencing

    Science.gov (United States)

    Blueberry (Vaccinium corymbosum) is a major berry crop in the United States, and one that has great nutritional and economical value. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities du...

  20. Characterization and sequence analysis of cysteine and glycine-rich ...

    African Journals Online (AJOL)

    Tarek

    2011-04-18

    Apr 18, 2011 ... nucleotide alignment of both native buffalo and cattle CSRP3 cDNAs sequences ..... Exon III, Identities = 71/75 (94%), Gaps = 1/75 (1%) Strand=Plus/Plus ... Band MR, Larson JH, Rebeiz M, Green CA, Heyen DW, Donovan J,.

  1. Functional analysis of bipartite begomovirus coat protein promoter sequences

    International Nuclear Information System (INIS)

    Lacatus, Gabriela; Sunter, Garry

    2008-01-01

    We demonstrate that the AL2 gene of Cabbage leaf curl virus (CaLCuV) activates the CP promoter in mesophyll and acts to derepress the promoter in vascular tissue, similar to that observed for Tomato golden mosaic virus (TGMV). Binding studies indicate that sequences mediating repression and activation of the TGMV and CaLCuV CP promoter specifically bind different nuclear factors common to Nicotiana benthamiana, spinach and tomato. However, chromatin immunoprecipitation demonstrates that TGMV AL2 can interact with both sequences independently. Binding of nuclear protein(s) from different crop species to viral sequences conserved in both bipartite and monopartite begomoviruses, including TGMV, CaLCuV, Pepper golden mosaic virus and Tomato yellow leaf curl virus suggests that bipartite begomoviruses bind common host factors to regulate the CP promoter. This is consistent with a model in which AL2 interacts with different components of the cellular transcription machinery that bind viral sequences important for repression and activation of begomovirus CP promoters

  2. The DNA sequence, annotation and analysis of human chromosome 3

    DEFF Research Database (Denmark)

    Muzny, D.M.; Bolund, Lars; As part of the Chinese Human Genome Sequencing Consortium, E.T.A.L.

    2006-01-01

    as numerous loci involved in multiple human cancers such as the gene encoding FHIT, which contains the most common constitutive fragile site in the genome, FRA3B. Using genomic sequence from chimpanzee and rhesus macaque, we were able to characterize the breakpoints defining a large pericentric inversion...

  3. Sequence analysis of mitochondrial 16S ribosomal RNA gene

    Indian Academy of Sciences (India)

    Mosquitoes are vectors for the transmission of many human pathogens that include viruses, nematodes and protozoa. For the understanding of their vectorial capacity, identification of disease carrying and refractory strains is essential. Recently, molecular taxonomic techniques have been utilized for this purpose. Sequence ...

  4. Illumina-based de novo transcriptome sequencing and analysis

    Indian Academy of Sciences (India)

    In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI nonredundant ...

  5. Generation and analysis of expressed sequence tags from Botrytis cinerea

    Directory of Open Access Journals (Sweden)

    EVELYN SILVA

    2006-01-01

    Full Text Available Botrytis cinerea is a filamentous plant pathogen of a wide range of plant species, and its infection may cause enormous damage both during plant growth and in the post-harvest phase. We have constructed a cDNA library from an isolate of B. cinerea and have sequenced 11,482 expressed sequence tags that were assembled into 1,003 contigs sequences and 3,032 singletons. Approximately 81% of the unigenes showed significant similarity to genes coding for proteins with known functions: more than 50% of the sequences code for genes involved in cellular metabolism, 12% for transport of metabolites, and approximately 10% for cellular organization. Other functional categories include responses to biotic and abiotic stimuli, cell communication, cell homeostasis, and cell development. We carried out pair-wise comparisons with fungal databases to determine the B. cinerea unisequence set with relevant similarity to genes in other fungal pathogenic counterparts. Among the 4,035 non-redundant B. cinerea unigenes, 1,338 (23% have significant homology with Fusarium verticillioides unigenes. Similar values were obtained for Saccharomyces cerevisiae and Aspergillus nidulans (22% and 24%, respectively. The lower percentages of homology were with Magnaporthe grisae and Neurospora crassa (13% and 19%, respectively. Several genes involved in putative and known fungal virulence and general pathogenicity were identified. The results provide important information for future research on this fungal pathogen

  6. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  7. DNA sequence and prokaryotic expression analysis of vitellogenin ...

    African Journals Online (AJOL)

    In this study, the DNA sequence of vitellogenin from Antheraea pernyi (Ap-Vg) was identified and its functional domain (30-740 aa, Ap-Vg-1) was expressed in Escherichia coli BL21 (DE3) cells. The recombinant Ap-Vg-1 proteins were purified and used for antibody preparation. The results showed that the intact DNA ...

  8. Molecular cloning, sequence analysis and structure prediction of the ...

    African Journals Online (AJOL)

    AJL

    2012-04-19

    Apr 19, 2012 ... The primers were based on the rBAT sequences of other animals deposited in GenBank. .... fragment; M1, 2000 bp DNA ladder; M2, 1000 bp DNA ladder. spliced to obtain the ..... A traffic signal for heterodimeric amino acid.

  9. Cloning and sequence analysis of the defective in anther ...

    African Journals Online (AJOL)

    To clone the defective in anther dehiscence1 (DAD1) gene fragment of Chinese kale, about 700 bp product was obtained by PCR amplification using Chinese kale genomic DNA as the template and a pair of specific primers designed according to the conserved sequence of DAD1 genes of Arabidopsis thaliana and ...

  10. Sequence and comparative analysis of Leuconostoc dairy bacteriophages

    DEFF Research Database (Denmark)

    Kot, Witold; Hansen, Lars Henrik; Neve, Horst

    2014-01-01

    Bacteriophages attacking Leuconostoc species may significantly influence the quality of the final product. There is however limited knowledge of this group of phages in the literature. We have determined the complete genome sequences of nine Leuconostoc bacteriophages virulent to either Leuconostoc...

  11. High-resolution comparative mapping among man, cattle and mouse suggests a role for repeat sequences in mammalian genome evolution

    Directory of Open Access Journals (Sweden)

    Rodolphe François

    2006-08-01

    Full Text Available Abstract Background Comparative mapping provides new insights into the evolutionary history of genomes. In particular, recent studies in mammals have suggested a role for segmental duplication in genome evolution. In some species such as Drosophila or maize, transposable elements (TEs have been shown to be involved in chromosomal rearrangements. In this work, we have explored the presence of interspersed repeats in regions of chromosomal rearrangements, using an updated high-resolution integrated comparative map among cattle, man and mouse. Results The bovine, human and mouse comparative autosomal map has been constructed using data from bovine genetic and physical maps and from FISH-mapping studies. We confirm most previous results but also reveal some discrepancies. A total of 211 conserved segments have been identified between cattle and man, of which 33 are new segments and 72 correspond to extended, previously known segments. The resulting map covers 91% and 90% of the human and bovine genomes, respectively. Analysis of breakpoint regions revealed a high density of species-specific interspersed repeats in the human and mouse genomes. Conclusion Analysis of the breakpoint regions has revealed specific repeat density patterns, suggesting that TEs may have played a significant role in chromosome evolution and genome plasticity. However, we cannot rule out that repeats and breakpoints accumulate independently in the few same regions where modifications are better tolerated. Likewise, we cannot ascertain whether increased TE density is the cause or the consequence of chromosome rearrangements. Nevertheless, the identification of high density repeat clusters combined with a well-documented repeat phylogeny should highlight probable breakpoints, and permit their precise dating. Combining new statistical models taking the present information into account should help reconstruct ancestral karyotypes.

  12. Prevalence, complete genome sequencing and phylogenetic analysis of porcine deltacoronavirus in South Korea, 2014-2016.

    Science.gov (United States)

    Jang, G; Lee, K-K; Kim, S-H; Lee, C

    2017-10-01

    Porcine deltacoronavirus (PDCoV) is a newly emerged enterotropic swine coronavirus that causes enteritis and diarrhoea in piglets. Here, a nested reverse transcription (RT)-PCR approach for the detection of PDCoV was developed to identify and characterize aetiologic agent(s) associated with diarrhoeal diseases in piglets in South Korea. A PCR-based method was applied to investigate the presence of PDCoV in 683 diarrhoeic samples collected from 449 commercial pig farms in South Korea from January 2014 to December 2016. The molecular-based survey indicated a relatively high prevalence of PDCoV (19.03%) in South Korea. Among those, the monoinfection of PDCoV (9.66%) and co-infection of PDCoV (6.30%) with porcine epidemic diarrhoea (PEDV) were predominant in diarrhoeal samples. The full-length genomes or the complete spike genes of the most recent strains identified in 2016 (KNU16-07, KNU16-08 and KNU16-11) were sequenced and analysed to characterize PDCoV currently prevalent in South Korea. We found a single insertion-deletion signature and dozens of genetic changes in the spike (S) genes of the KNU16 isolates. Phylogenetic analysis based on the entire genome and spike protein sequences of these strains indicated that they are most closely related to other Korean isolates grouped with the US strains. However, Korean PDCoV strains formed different branches within the same cluster, implying continuous evolution in the field. Our data will advance the understanding of the molecular epidemiology and evolutionary characteristics of PDCoV circulating in South Korea. © 2017 Blackwell Verlag GmbH.

  13. XplorSeq: a software environment for integrated management and phylogenetic analysis of metagenomic sequence data.

    Science.gov (United States)

    Frank, Daniel N

    2008-10-07

    Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects. XplorSeq is a software package that facilitates the compilation, management and phylogenetic analysis of DNA sequences. XplorSeq was developed for, but is not limited to, high-throughput analysis of environmental rRNA gene sequences. XplorSeq integrates and extends several commonly used UNIX-based analysis tools by use of a Macintosh OS-X-based graphical user interface (GUI). Through this GUI, users may perform basic sequence import and assembly steps (base-calling, vector/primer trimming, contig assembly), perform BLAST (Basic Local Alignment and Search Tool; 123) searches of NCBI and local databases, create multiple sequence alignments, build phylogenetic trees, assemble Operational Taxonomic Units, estimate biodiversity indices, and summarize data in a variety of formats. Furthermore, sequences may be annotated with user-specified meta-data, which then can be used to sort data and organize analyses and reports. A document-based architecture permits parallel analysis of sequence data from multiple clones or amplicons, with sequences and other data stored in a single file. XplorSeq should benefit researchers who are engaged in analyses of environmental sequence data, especially those with little experience using bioinformatics software. Although XplorSeq was developed for management of rDNA sequence data, it can be applied to most any sequencing project. The application is available free of charge for non-commercial use at http://vent.colorado.edu/phyloware.

  14. Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools.

    Science.gov (United States)

    Kisand, Veljo; Lettieri, Teresa

    2013-04-01

    De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize

  15. Sequence analysis of the Legionella micdadei groELS operon

    DEFF Research Database (Denmark)

    Hindersson, P; Høiby, N; Bangsborg, Jette Marie

    1991-01-01

    A 2.7 kb DNA fragment encoding the 60 kDa common antigen (CA) and a 13 kDa protein of Legionella micdadei was sequenced. Two open reading frames of 57,677 and 10,456 Da were identified, corresponding to the heat shock proteins GroEL and GroES, respectively. Typical -35, -10, and Shine-Dalgarno heat...

  16. The Matrix Method of Representation, Analysis and Classification of Long Genetic Sequences

    Directory of Open Access Journals (Sweden)

    Ivan V. Stepanyan

    2017-01-01

    Full Text Available The article is devoted to a matrix method of comparative analysis of long nucleotide sequences by means of presenting each sequence in the form of three digital binary sequences. This method uses a set of symmetries of biochemical attributes of nucleotides. It also uses the possibility of presentation of every whole set of N-mers as one of the members of a Kronecker family of genetic matrices. With this method, a long nucleotide sequence can be visually represented as an individual fractal-like mosaic or another regular mosaic of binary type. In contrast to natural nucleotide sequences, artificial random sequences give non-regular patterns. Examples of binary mosaics of long nucleotide sequences are shown, including cases of human chromosomes and penicillins. The obtained results are then discussed.

  17. Analysis of microRNA profile of Anopheles sinensis by deep sequencing and bioinformatic approaches.

    Science.gov (United States)

    Feng, Xinyu; Zhou, Xiaojian; Zhou, Shuisen; Wang, Jingwen; Hu, Wei

    2018-03-12

    microRNAs (miRNAs) are small non-coding RNAs widely identified in many mosquitoes. They are reported to play important roles in development, differentiation and innate immunity. However, miRNAs in Anopheles sinensis, one of the Chinese malaria mosquitoes, remain largely unknown. We investigated the global miRNA expression profile of An. sinensis using Illumina Hiseq 2000 sequencing. Meanwhile, we applied a bioinformatic approach to identify potential miRNAs in An. sinensis. The identified miRNA profiles were compared and analyzed by two approaches. The selected miRNAs from the sequencing result and the bioinformatic approach were confirmed with qRT-PCR. Moreover, target prediction, GO annotation and pathway analysis were carried out to understand the role of miRNAs in An. sinensis. We identified 49 conserved miRNAs and 12 novel miRNAs by next-generation high-throughput sequencing technology. In contrast, 43 miRNAs were predicted by the bioinformatic approach, of which two were assigned as novel. Comparative analysis of miRNA profiles by two approaches showed that 21 miRNAs were shared between them. Twelve novel miRNAs did not match any known miRNAs of any organism, indicating that they are possibly species-specific. Forty miRNAs were found in many mosquito species, indicating that these miRNAs are evolutionally conserved and may have critical roles in the process of life. Both the selected known and novel miRNAs (asi-miR-281, asi-miR-184, asi-miR-14, asi-miR-nov5, asi-miR-nov4, asi-miR-9383, and asi-miR-2a) could be detected by quantitative real-time PCR (qRT-PCR) in the sequenced sample, and the expression patterns of these miRNAs measured by qRT-PCR were in concordance with the original miRNA sequencing data. The predicted targets for the known and the novel miRNAs covered many important biological roles and pathways indicating the diversity of miRNA functions. We also found 21 conserved miRNAs and eight counterparts of target immune pathway genes in An. sinensis

  18. OPTSDNA: Performance evaluation of an efficient distributed bioinformatics system for DNA sequence analysis.

    Science.gov (United States)

    Khan, Mohammad Ibrahim; Sheel, Chotan

    2013-01-01

    Storage of sequence data is a big concern as the amount of data generated is exponential in nature at several locations. Therefore, there is a need to develop techniques to store data using compression algorithm. Here we describe optimal storage algorithm (OPTSDNA) for storing large amount of DNA sequences of varying length. This paper provides performance analysis of optimal storage algorithm (OPTSDNA) of a distributed bioinformatics computing system for analysis of DNA sequences. OPTSDNA algorithm is used for storing various sizes of DNA sequences into database. DNA sequences of different lengths were stored by using this algorithm. These input DNA sequences are varied in size from very small to very large. Storage size is calculated by this algorithm. Response time is also calculated in this work. The efficiency and performance of the algorithm is high (in size calculation with percentage) when compared with other known with sequential approach.

  19. Phylogenetic analysis of bacterial and archaeal arsC gene sequences suggests an ancient, common origin for arsenate reductase

    Directory of Open Access Journals (Sweden)

    Dugas Sandra L

    2003-07-01

    Full Text Available Abstract Background The ars gene system provides arsenic resistance for a variety of microorganisms and can be chromosomal or plasmid-borne. The arsC gene, which codes for an arsenate reductase is essential for arsenate resistance and transforms arsenate into arsenite, which is extruded from the cell. A survey of GenBank shows that arsC appears to be phylogenetically widespread both in organisms with known arsenic resistance and those organisms that have been sequenced as part of whole genome projects. Results Phylogenetic analysis of aligned arsC sequences shows broad similarities to the established 16S rRNA phylogeny, with separation of bacterial, archaeal, and subsequently eukaryotic arsC genes. However, inconsistencies between arsC and 16S rRNA are apparent for some taxa. Cyanobacteria and some of the γ-Proteobacteria appear to possess arsC genes that are similar to those of Low GC Gram-positive Bacteria, and other isolated taxa possess arsC genes that would not be expected based on known evolutionary relationships. There is no clear separation of plasmid-borne and chromosomal arsC genes, although a number of the Enterobacteriales (γ-Proteobacteria possess similar plasmid-encoded arsC sequences. Conclusion The overall phylogeny of the arsenate reductases suggests a single, early origin of the arsC gene and subsequent sequence divergence to give the distinct arsC classes that exist today. Discrepancies between 16S rRNA and arsC phylogenies support the role of horizontal gene transfer (HGT in the evolution of arsenate reductases, with a number of instances of HGT early in bacterial arsC evolution. Plasmid-borne arsC genes are not monophyletic suggesting multiple cases of chromosomal-plasmid exchange and subsequent HGT. Overall, arsC phylogeny is complex and is likely the result of a number of evolutionary mechanisms.

  20. Lignin degradation: microorganisms, enzymes involved, genomes analysis and evolution.

    Science.gov (United States)

    Janusz, Grzegorz; Pawlik, Anna; Sulej, Justyna; Swiderska-Burek, Urszula; Jarosz-Wilkolazka, Anna; Paszczynski, Andrzej

    2017-11-01

    Extensive research efforts have been dedicated to describing degradation of wood, which is a complex process; hence, microorganisms have evolved different enzymatic and non-enzymatic strategies to utilize this plentiful plant material. This review describes a number of fungal and bacterial organisms which have developed both competitive and mutualistic strategies for the decomposition of wood and to thrive in different ecological niches. Through the analysis of the enzymatic machinery engaged in wood degradation, it was possible to elucidate different strategies of wood decomposition which often depend on ecological niches inhabited by given organism. Moreover, a detailed description of low molecular weight compounds is presented, which gives these organisms not only an advantage in wood degradation processes, but seems rather to be a new evolutionatory alternative to enzymatic combustion. Through analysis of genomics and secretomic data, it was possible to underline the probable importance of certain wood-degrading enzymes produced by different fungal organisms, potentially giving them advantage in their ecological niches. The paper highlights different fungal strategies of wood degradation, which possibly correlates to the number of genes coding for secretory enzymes. Furthermore, investigation of the evolution of wood-degrading organisms has been described. © FEMS 2017.

  1. Quantitative Analogue Experimental Sequence Stratigraphy : Modelling landscape evolution and sequence stratigraphy of river-shelf sedimentary systems by quantitative analogue experiments

    NARCIS (Netherlands)

    Heijst, Maximiliaan Wilhelmus Ignatius Maria van

    2000-01-01

    This thesis reports a series of flume tank experiments that were conducted to model the stratigraphic evolution of river-delta systems. Chapter 1 introduces the river-delta sedimentary system that is subject of modelling. The chapter also includes an overview of previous research and the summary and

  2. Analysis of xylem formation in pine by cDNA sequencing

    Science.gov (United States)

    Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.; hide

    1998-01-01

    Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.

  3. MiSeq: A Next Generation Sequencing Platform for Genomic Analysis.

    Science.gov (United States)

    Ravi, Rupesh Kanchi; Walton, Kendra; Khosroheidari, Mahdieh

    2018-01-01

    MiSeq, Illumina's integrated next generation sequencing instrument, uses reversible-terminator sequencing-by-synthesis technology to provide end-to-end sequencing solutions. The MiSeq instrument is one of the smallest benchtop sequencers that can perform onboard cluster generation, amplification, genomic DNA sequencing, and data analysis, including base calling, alignment and variant calling, in a single run. It performs both single- and paired-end runs with adjustable read lengths from 1 × 36 base pairs to 2 × 300 base pairs. A single run can produce output data of up to 15 Gb in as little as 4 h of runtime and can output up to 25 M single reads and 50 M paired-end reads. Thus, MiSeq provides an ideal platform for rapid turnaround time. MiSeq is also a cost-effective tool for various analyses focused on targeted gene sequencing (amplicon sequencing and target enrichment), metagenomics, and gene expression studies. For these reasons, MiSeq has become one of the most widely used next generation sequencing platforms. Here, we provide a protocol to prepare libraries for sequencing using the MiSeq instrument and basic guidelines for analysis of output data from the MiSeq sequencing run.

  4. A novel procedure on next generation sequencing data analysis using text mining algorithm.

    Science.gov (United States)

    Zhao, Weizhong; Chen, James J; Perkins, Roger; Wang, Yuping; Liu, Zhichao; Hong, Huixiao; Tong, Weida; Zou, Wen

    2016-05-13

    Next-generation sequencing (NGS) technologies have provided researchers with vast possibilities in various biological and biomedical research areas. Efficient data mining strategies are in high demand for large scale comparative and evolutional studies to be performed on the large amounts of data derived from NGS projects. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. We report a novel procedure to analyse NGS data using topic modeling. It consists of four major procedures: NGS data retrieval, preprocessing, topic modeling, and data mining using Latent Dirichlet Allocation (LDA) topic outputs. The NGS data set of the Salmonella enterica strains were used as a case study to show the workflow of this procedure. The perplexity measurement of the topic numbers and the convergence efficiencies of Gibbs sampling were calculated and discussed for achieving the best result from the proposed procedure. The output topics by LDA algorithms could be treated as features of Salmonella strains to accurately describe the genetic diversity of fliC gene in various serotypes. The results of a two-way hierarchical clustering and data matrix analysis on LDA-derived matrices successfully classified Salmonella serotypes based on the NGS data. The implementation of topic modeling in NGS data analysis procedure provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data. The implementation of topic modeling in NGS data analysis provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data.

  5. Construction of a plant-transformation-competent BIBAC library and genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.).

    Science.gov (United States)

    Lee, Mi-Kyung; Zhang, Yang; Zhang, Meiping; Goebel, Mark; Kim, Hee Jin; Triplett, Barbara A; Stelly, David M; Zhang, Hong-Bin

    2013-03-28

    . raimondii contains a D genome (D5). The library represents the first BIBAC library in cotton and related species, thus providing tools useful for integrative physical mapping, large-scale genome sequencing and large-scale functional analysis of the Upland cotton genome. Comparative sequence analysis provides insights into the Upland cotton genome, and a possible mechanism underlying the divergence and evolution of polyploid Upland cotton from its diploid putative progenitor species, G. raimondii.

  6. Sequence features and phylogenetic analysis of the stress protein Hsp90α in chinook salmon Oncorhynchus tshawytscha, a poikilothermic vertebrate

    Science.gov (United States)

    Palmisano, Aldo N.; Winton, James R.; Dickhoff, Walton W.

    1999-01-01

    We cloned and sequenced a chinook salmon Hsp90 cDNA; sequence analysis shows it to be Hsp90??. Phylogenetic analysis supports the hypothesis that ?? and ?? paralogs of Hsp90 arose as a result of a gene duplication event and that they diverged early in the evolution of vertebrates, before tetrapods separated from the teleost lineage. Among several differences distinguishing poikilothermic Hsp90?? sequences from their bird and mammal orthologs, the teleost versions specifically lack a characteristic QTQDQP phosphorylation site near the N-terminus. We used the cDNA to develop an RNA (Northern) blot to quantify cellular Hsp90 mRNA levels. Chinook salmon embryonic (CHSE-214) cells responded to heat shock with a rapid rise in Hsp90 mRNA through 4 h, followed by a gradual decline over the next 20 h. Hsp90 mRNA level may be useful as a stress indicator, especially in a laboratory setting or in response to acute heat stress.

  7. Maturity onset diabetes of youth (MODY) in Turkish children: sequence analysis of 11 causative genes by next generation sequencing.

    Science.gov (United States)

    Ağladıoğlu, Sebahat Yılmaz; Aycan, Zehra; Çetinkaya, Semra; Baş, Veysel Nijat; Önder, Aşan; Peltek Kendirci, Havva Nur; Doğan, Haldun; Ceylaner, Serdar

    2016-04-01

    Maturity-onset diabetes of the youth (MODY), is a genetically and clinically heterogeneous group of diseasesand is often misdiagnosed as type 1 or type 2 diabetes. The aim of this study is to investigate both novel and proven mutations of 11 MODY genes in Turkish children by using targeted next generation sequencing. A panel of 11 MODY genes were screened in 43 children with MODY diagnosed by clinical criterias. Studies of index cases was done with MISEQ-ILLUMINA, and family screenings and confirmation studies of mutations was done by Sanger sequencing. We identified 28 (65%) point mutations among 43 patients. Eighteen patients have GCK mutations, four have HNF1A, one has HNF4A, one has HNF1B, two have NEUROD1, one has PDX1 gene variations and one patient has both HNF1A and HNF4A heterozygote mutations. This is the first study including molecular studies of 11 MODY genes in Turkish children. GCK is the most frequent type of MODY in our study population. Very high frequency of novel mutations (42%) in our study population, supports that in heterogenous disorders like MODY sequence analysis provides rapid, cost effective and accurate genetic diagnosis.

  8. Complete genome sequence of a Chinese isolate of pepper vein yellows virus and evolutionary analysis based on the CP, MP and RdRp coding regions.

    Science.gov (United States)

    Liu, Maoyan; Liu, Xiangning; Li, Xun; Zhang, Deyong; Dai, Liangyin; Tang, Qianjun

    2016-03-01

    The genome sequence of pepper vein yellows virus (PeVYV) (PeVYV-HN, accession number KP326573), isolated from pepper plants (Capsicum annuum L.) grown at the Hunan Vegetables Institute (Changsha, Hunan, China), was determined by deep sequencing of small RNAs. The PeVYV-HN genome consists of 6244 nucleotides, contains six open reading frames (ORFs), and is similar to that of an isolate (AB594828) from Japan. Its genomic organization is similar to that of members of the genus Polerovirus. Sequence analysis revealed that PeVYV-HN shared 92% sequence identity with the Japanese PeVYV genome at both the nucleotide and amino acid levels. Evolutionary analysis based on the coat protein (CP), movement protein (MP), and RNA-dependent RNA polymerase (RdRP) showed that PeVYV could be divided into two major lineages corresponding to their geographical origins. The Asian isolates have a higher population expansion frequency than the African isolates. Negative selection and genetic drift (founder effect) were found to be the potential drivers of the molecular evolution of PeVYV. Moreover, recombination was not the distinct cause of PeVYV evolution. This is the first report of a complete genomic sequence of PeVYV in China.

  9. Accident sequence precursor analysis level 2/3 model development

    International Nuclear Information System (INIS)

    Lui, C.H.; Galyean, W.J.; Brownson, D.A.

    1997-01-01

    The US Nuclear Regulatory Commission's Accident Sequence Precursor (ASP) program currently uses simple Level 1 models to assess the conditional core damage probability for operational events occurring in commercial nuclear power plants (NPP). Since not all accident sequences leading to core damage will result in the same radiological consequences, it is necessary to develop simple Level 2/3 models that can be used to analyze the response of the NPP containment structure in the context of a core damage accident, estimate the magnitude of the resulting radioactive releases to the environment, and calculate the consequences associated with these releases. The simple Level 2/3 model development work was initiated in 1995, and several prototype models have been completed. Once developed, these simple Level 2/3 models are linked to the simple Level 1 models to provide risk perspectives for operational events. This paper describes the methods implemented for the development of these simple Level 2/3 ASP models, and the linkage process to the existing Level 1 models

  10. In Vivo Enhancer Analysis Chromosome 16 Conserved NoncodingSequences

    Energy Technology Data Exchange (ETDEWEB)

    Pennacchio, Len A.; Ahituv, Nadav; Moses, Alan M.; Nobrega,Marcelo; Prabhakar, Shyam; Shoukry, Malak; Minovitsky, Simon; Visel,Axel; Dubchak, Inna; Holt, Amy; Lewis, Keith D.; Plajzer-Frick, Ingrid; Akiyama, Jennifer; De Val, Sarah; Afzal, Veena; Black, Brian L.; Couronne, Olivier; Eisen, Michael B.; Rubin, Edward M.

    2006-02-01

    The identification of enhancers with predicted specificitiesin vertebrate genomes remains a significant challenge that is hampered bya lack of experimentally validated training sets. In this study, weleveraged extreme evolutionary sequence conservation as a filter toidentify putative gene regulatory elements and characterized the in vivoenhancer activity of human-fish conserved and ultraconserved1 noncodingelements on human chromosome 16 as well as such elements from elsewherein the genome. We initially tested 165 of these extremely conservedsequences in a transgenic mouse enhancer assay and observed that 48percent (79/165) functioned reproducibly as tissue-specific enhancers ofgene expression at embryonic day 11.5. While driving expression in abroad range of anatomical structures in the embryo, the majority of the79 enhancers drove expression in various regions of the developingnervous system. Studying a set of DNA elements that specifically droveforebrain expression, we identified DNA signatures specifically enrichedin these elements and used these parameters to rank all ~;3,400human-fugu conserved noncoding elements in the human genome. The testingof the top predictions in transgenic mice resulted in a three-foldenrichment for sequences with forebrain enhancer activity. These datadramatically expand the catalogue of in vivo-characterized human geneenhancers and illustrate the future utility of such training sets for avariety of iological applications including decoding the regulatoryvocabulary of the human genome.

  11. An x-space analysis of evolution equations: Soffer's inequality and the non-forward evolution

    International Nuclear Information System (INIS)

    Cafarella, Alessandro; Coriano, Claudio; Guzzi, Marco

    2003-01-01

    We analyze the use of algorithms based in x-space for the solution of renormalization group equations of DGLAP-type and test their consistency by studying bounds among partons distributions - in our specific case Soffer's inequality and the perturbative behaviour of the nucleon tensor charge - to next-to-leading order in QCD. A discussion of the perturbative resummation implicit in these expansions using Mellin moments is included. We also comment on the (kinetic) proof of positivity of the evolution of h1, using a kinetic analogy and illustrate the extension of the algorithm to the evolution of generalized parton distributions. We prove positivity of the non-forward evolution in a special case and illustrate a Fokker-Planck approximation to it. (author)

  12. Sequence analysis of putative swrW gene required for surfactant ...

    African Journals Online (AJOL)

    Serratia marcescens produces biosurfactant serrawettin, essential for its population migration behavior. Serrawettin W1 was revealed to be an antibiotic serratamolide that makes it significant for deoxyribonucleic acid (DNA) and protein sequence analysis. Four nucleotide and amino-acid sequences from local strains ...

  13. Genomic insight into the common carp (Cyprinus carpio genome by sequencing analysis of BAC-end sequences

    Directory of Open Access Journals (Sweden)

    Wang Jintu

    2011-04-01

    Full Text Available Abstract Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio, a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3

  14. Genomic insight into the common carp (Cyprinus carpio) genome by sequencing analysis of BAC-end sequences

    Science.gov (United States)

    2011-01-01

    Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of

  15. A phylogenetic analysis of normal modes evolution in enzymes and its relationship to enzyme function.

    Science.gov (United States)

    Lai, Jason; Jin, Jing; Kubelka, Jan; Liberles, David A

    2012-09-21

    Since the dynamic nature of protein structures is essential for enzymatic function, it is expected that functional evolution can be inferred from the changes in protein dynamics. However, dynamics can also diverge neutrally with sequence substitution between enzymes without changes of function. In this study, a phylogenetic approach is implemented to explore the relationship between enzyme dynamics and function through evolutionary history. Protein dynamics are described by normal mode analysis based on a simplified harmonic potential force field applied to the reduced C(α) representation of the protein structure while enzymatic function is described by Enzyme Commission numbers. Similarity of the binding pocket dynamics at each branch of the protein family's phylogeny was analyzed in two ways: (1) explicitly by quantifying the normal mode overlap calculated for the reconstructed ancestral proteins at each end and (2) implicitly using a diffusion model to obtain the reconstructed lineage-specific changes in the normal modes. Both explicit and implicit ancestral reconstruction identified generally faster rates of change in dynamics compared with the expected change from neutral evolution at the branches of potential functional divergences for the α-amylase, D-isomer-specific 2-hydroxyacid dehydrogenase, and copper-containing amine oxidase protein families. Normal mode analysis added additional information over just comparing the RMSD of static structures. However, the branch-specific changes were not statistically significant compared to background function-independent neutral rates of change of dynamic properties and blind application of the analysis would not enable prediction of changes in enzyme specificity. Copyright © 2012 Elsevier Ltd. All rights reserved.

  16. Gene repertoire evolution of Streptococcus pyogenes inferred from phylogenomic analysis with Streptococcus canis and Streptococcus dysgalactiae.

    Directory of Open Access Journals (Sweden)

    Tristan Lefébure

    Full Text Available Streptococcus pyogenes, is an important human pathogen classified within the pyogenic group of streptococci, exclusively adapted to the human host. Our goal was to employ a comparative evolutionary approach to better understand the genomic events concomitant with S. pyogenes human adaptation. As part of ascertaining these events, we sequenced the genome of one of the potential sister species, the agricultural pathogen S. canis, and combined it in a comparative genomics reconciliation analysis with two other closely related species, Streptococcus dysgalactiae and Streptococcus equi, to determine the genes that were gained and lost during S. pyogenes evolution. Genome wide phylogenetic analyses involving 15 Streptococcus species provided convincing support for a clade of S. equi, S. pyogenes, S. dysgalactiae, and S. canis and suggested that the most likely S. pyogenes sister species was S. dysgalactiae. The reconciliation analysis identified 113 genes that were gained on the lineage leading to S. pyogenes. Almost half (46% of these gained genes were phage associated and 14 showed significant matches to experimentally verified bacteria virulence factors. Subsequent to the origin of S. pyogenes, over half of the phage associated genes were involved in 90 different LGT events, mostly involving different strains of S. pyogenes, but with a high proportion involving the horse specific pathogen S. equi subsp. equi, with the directionality almost exclusively (86% in the S. pyogenes to S. equi direction. Streptococcus agalactiae appears to have played an important role in the evolution of S. pyogenes with a high proportion of LGTs originating from this species. Overall the analysis suggests that S. pyogenes adaptation to the human host was achieved in part by (i the integration of new virulence factors (e.g. speB, and the sal locus and (ii the construction of new regulation networks (e.g. rgg, and to some extent speB.

  17. A symbolic dynamics approach for the complexity analysis of chaotic pseudo-random sequences

    International Nuclear Information System (INIS)

    Xiao Fanghong

    2004-01-01

    By considering a chaotic pseudo-random sequence as a symbolic sequence, authors present a symbolic dynamics approach for the complexity analysis of chaotic pseudo-random sequences. The method is applied to the cases of Logistic map and one-way coupled map lattice to demonstrate how it works, and a comparison is made between it and the approximate entropy method. The results show that this method is applicable to distinguish the complexities of different chaotic pseudo-random sequences, and it is superior to the approximate entropy method

  18. Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi.

    Science.gov (United States)

    Subramanian, Sankar; Huynen, Leon; Millar, Craig D; Lambert, David M

    2010-12-15

    Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.

  19. Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi

    Directory of Open Access Journals (Sweden)

    Huynen Leon

    2010-12-01

    Full Text Available Abstract Background Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Results Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. Conclusions The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.

  20. Analysis of decision procedures for a sequence of inventory periods

    International Nuclear Information System (INIS)

    Avenhaus, R.

    1982-07-01

    Optimal test procedures for a sequence of inventory periods will be discussed. Starting with a game theoretical description of the conflict situation between the plant operator and the inspector, the objectives of the inspector as well as the general decision theoretical problem will be formulated. In the first part the objective of 'secure' detection will be emphasized which means that only at the end of the reference time a decision is taken by the inspector. In the second part the objective of 'timely' detection will be emphasized which will lead to sequential test procedures. At the end of the paper all procedures will be summarized, and in view of the multitude of procedures available at the moment some comments about future work will be given. (orig./HP) [de

  1. Factoring local sequence composition in motif significance analysis.

    Science.gov (United States)

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  2. Peptide Pattern Recognition for high-throughput protein sequence analysis and clustering

    DEFF Research Database (Denmark)

    Busk, Peter Kamp

    2017-01-01

    Large collections of protein sequences with divergent sequences are tedious to analyze for understanding their phylogenetic or structure-function relation. Peptide Pattern Recognition is an algorithm that was developed to facilitate this task but the previous version does only allow a limited...... number of sequences as input. I implemented Peptide Pattern Recognition as a multithread software designed to handle large numbers of sequences and perform analysis in a reasonable time frame. Benchmarking showed that the new implementation of Peptide Pattern Recognition is twenty times faster than...... the previous implementation on a small protein collection with 673 MAP kinase sequences. In addition, the new implementation could analyze a large protein collection with 48,570 Glycosyl Transferase family 20 sequences without reaching its upper limit on a desktop computer. Peptide Pattern Recognition...

  3. Information-Theoretical Analysis of EEG Microstate Sequences in Python

    Directory of Open Access Journals (Sweden)

    Frederic von Wegner

    2018-06-01

    Full Text Available We present an open-source Python package to compute information-theoretical quantities for electroencephalographic data. Electroencephalography (EEG measures the electrical potential generated by the cerebral cortex and the set of spatial patterns projected by the brain's electrical potential on the scalp surface can be clustered into a set of representative maps called EEG microstates. Microstate time series are obtained by competitively fitting the microstate maps back into the EEG data set, i.e., by substituting the EEG data at a given time with the label of the microstate that has the highest similarity with the actual EEG topography. As microstate sequences consist of non-metric random variables, e.g., the letters A–D, we recently introduced information-theoretical measures to quantify these time series. In wakeful resting state EEG recordings, we found new characteristics of microstate sequences such as periodicities related to EEG frequency bands. The algorithms used are here provided as an open-source package and their use is explained in a tutorial style. The package is self-contained and the programming style is procedural, focusing on code intelligibility and easy portability. Using a sample EEG file, we demonstrate how to perform EEG microstate segmentation using the modified K-means approach, and how to compute and visualize the recently introduced information-theoretical tests and quantities. The time-lagged mutual information function is derived as a discrete symbolic alternative to the autocorrelation function for metric time series and confidence intervals are computed from Markov chain surrogate data. The software package provides an open-source extension to the existing implementations of the microstate transform and is specifically designed to analyze resting state EEG recordings.

  4. Massively parallel sequencing and analysis of the Necator americanus transcriptome.

    Directory of Open Access Journals (Sweden)

    Cinzia Cantacessi

    2010-05-01

    Full Text Available The blood-feeding hookworm Necator americanus infects hundreds of millions of people worldwide. In order to elucidate fundamental molecular biological aspects of this hookworm, the transcriptome of the adult stage of Necator americanus was explored using next-generation sequencing and bioinformatic analyses.A total of 19,997 contigs were assembled from the sequence data; 6,771 of these contigs had known orthologues in the free-living nematode Caenorhabditis elegans, and most of them encoded proteins with WD40 repeats (10.6%, proteinase inhibitors (7.8% or calcium-binding EF-hand proteins (6.7%. Bioinformatic analyses inferred that the C. elegans homologues are involved mainly in biological pathways linked to ribosome biogenesis (70%, oxidative phosphorylation (63% and/or proteases (60%; most of these molecules were predicted to be involved in more than one biological pathway. Comparative analyses of the transcriptomes of N. americanus and the canine hookworm, Ancylostoma caninum, revealed qualitative and quantitative differences. For instance, proteinase inhibitors were inferred to be highly represented in the former species, whereas SCP/Tpx-1/Ag5/PR-1/Sc7 proteins ( = SCP/TAPS or Ancylostoma-secreted proteins were predominant in the latter. In N. americanus, essential molecules were predicted using a combination of orthology mapping and functional data available for C. elegans. Further analyses allowed the prioritization of 18 predicted drug targets which did not have homologues in the human host. These candidate targets were inferred to be linked to mitochondrial (e.g., processing proteins or amino acid metabolism (e.g., asparagine t-RNA synthetase.This study has provided detailed insights into the transcriptome of the adult stage of N. americanus and examines similarities and differences between this species and A. caninum. Future efforts should focus on comparative transcriptomic and proteomic investigations of the other predominant human

  5. Complete genome sequencing and evolutionary analysis of Indian isolates of Dengue virus type 2

    Energy Technology Data Exchange (ETDEWEB)

    Dash, Paban Kumar, E-mail: pabandash@rediffmail.com; Sharma, Shashi; Soni, Manisha; Agarwal, Ankita; Parida, Manmohan; Rao, P.V.Lakshmana

    2013-07-05

    Highlights: •Complete genome of Indian DENV-2 was deciphered for the first time in this study. •The recent Indian DENV-2 revealed presence of many unique amino acid residues. •Genotype shift (American to Cosmopolitan) characterizes evolution of DENV-2 in India. •Circulation of a unique clade of DENV-2 in South Asia was identified. -- Abstract: Dengue is the most important arboviral infection of global public health significance. It is now endemic in most parts of the South East Asia including India. Though Dengue virus type 2 (DENV-2) is predominantly associated with major outbreaks in India, complete genome information of Indian DENV-2 is not available. In this study, the full-length genome of five DENV-2 isolates (four from 2001 to 2011 and one from 1960), from different parts of India was determined. The complete genome of the Indian DENV-2 was found to be 10,670 bases long with an open reading frame coding for 3391 amino acids. The recent Indian DENV-2 (2001–2011) revealed a nucleotide sequence identity of around 90% and 97% with an older Indian DENV-2 (1960) and closely related Sri Lankan and Chinese DENV-2 respectively. Presence of unique amino acid residues and non-conservative substitutions in critical amino acid residues of major structural and non-structural proteins was observed in recent Indian DENV-2. Selection pressure analysis revealed positive selection in few amino acid sites of the genes encoding for structural and non-structural proteins. The molecular phylogenetic analysis based on comparison of both complete coding region and envelope protein gene with globally diverse DENV-2 viruses classified the recent Indian isolates into a unique South Asian clade within Cosmopolitan genotype. A shift of genotype from American to Cosmopolitan in 1970s characterized the evolution of DENV-2 in India. Present study is the first report on complete genome characterization of emerging DENV-2 isolates from India and highlights the circulation of a

  6. Complete genome sequencing and evolutionary analysis of Indian isolates of Dengue virus type 2

    International Nuclear Information System (INIS)

    Dash, Paban Kumar; Sharma, Shashi; Soni, Manisha; Agarwal, Ankita; Parida, Manmohan; Rao, P.V.Lakshmana

    2013-01-01

    Highlights: •Complete genome of Indian DENV-2 was deciphered for the first time in this study. •The recent Indian DENV-2 revealed presence of many unique amino acid residues. •Genotype shift (American to Cosmopolitan) characterizes evolution of DENV-2 in India. •Circulation of a unique clade of DENV-2 in South Asia was identified. -- Abstract: Dengue is the most important arboviral infection of global public health significance. It is now endemic in most parts of the South East Asia including India. Though Dengue virus type 2 (DENV-2) is predominantly associated with major outbreaks in India, complete genome information of Indian DENV-2 is not available. In this study, the full-length genome of five DENV-2 isolates (four from 2001 to 2011 and one from 1960), from different parts of India was determined. The complete genome of the Indian DENV-2 was found to be 10,670 bases long with an open reading frame coding for 3391 amino acids. The recent Indian DENV-2 (2001–2011) revealed a nucleotide sequence identity of around 90% and 97% with an older Indian DENV-2 (1960) and closely related Sri Lankan and Chinese DENV-2 respectively. Presence of unique amino acid residues and non-conservative substitutions in critical amino acid residues of major structural and non-structural proteins was observed in recent Indian DENV-2. Selection pressure analysis revealed positive selection in few amino acid sites of the genes encoding for structural and non-structural proteins. The molecular phylogenetic analysis based on comparison of both complete coding region and envelope protein gene with globally diverse DENV-2 viruses classified the recent Indian isolates into a unique South Asian clade within Cosmopolitan genotype. A shift of genotype from American to Cosmopolitan in 1970s characterized the evolution of DENV-2 in India. Present study is the first report on complete genome characterization of emerging DENV-2 isolates from India and highlights the circulation of a

  7. Combined DECS Analysis and Next-Generation Sequencing Enable Efficient Detection of Novel Plant RNA Viruses

    Directory of Open Access Journals (Sweden)

    Hironobu Yanagisawa

    2016-03-01

    Full Text Available The presence of high molecular weight double-stranded RNA (dsRNA within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV, a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as “DECS-C,” is a powerful method for detecting novel plant viruses.

  8. Data Analysis of Sequences and qPCR for Microbial Communities during Algal Blooms

    Science.gov (United States)

    A training opportunity is open to a highly microbial-research-motivated student to conduct sequence analysis, explore novel genes and metabolic pathways, validate resultant findings using qPCR/RT-qPCR and summarize the findings

  9. Sequence analysis of the N-acetyltransferase 2 gene (NAT2) among ...

    African Journals Online (AJOL)

    Yazun Bashir Jarrar

    2017-11-26

    Nov 26, 2017 ... Sequence analysis of the N-acetyltransferase 2 gene (NAT2) among Jordanian volunteers, Libyan. Journal of Medicine .... For molecular modeling of NAT2 protein, visualized ..... cal clustering. .... cular dynamics simulation.

  10. Analysis of common SHOX gene sequence variants and ∼4.9-kb ...

    Indian Academy of Sciences (India)

    [Solc R., Hirschfeldova K., Kebrdlova V. and Baxova A. 2014 Analysis of common SHOX gene sequence variants ... based on a Gibbs sampling strategy were done using .... SHOX (short stature homeobox) are an important cause of growth.

  11. Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

    Science.gov (United States)

    Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

    2004-03-01

    One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.

  12. Probabilistic topic modeling for the analysis and classification of genomic sequences

    Science.gov (United States)

    2015-01-01

    Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734

  13. Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer

    Science.gov (United States)

    2017-09-01

    AWARD NUMBER: W81XWH-14-1-0080 TITLE: Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer . PRINCIPAL INVESTIGATOR...TITLE AND SUBTITLE Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer . 5a. CONTRACT NUMBER 5b. GRANT NUMBER GRANT11489...institutional, NIH-funded study of genetic and epigenetic alterations of pre-invasive DCIS that did or did not progress to invasive breast cancer , with an

  14. Seismically induced accident sequence analysis of the advanced test reactor

    International Nuclear Information System (INIS)

    Khericha, S.T.; Henry, D.M.; Ravindra, M.K.; Hashimoto, P.S.; Griffin, M.J.; Tong, W.H.; Nafday, A.M.

    1991-01-01

    A seismic probabilistic risk assessment (PRA) was performed for the Department of Energy (DOE) Advanced Test Reactor (ATR) as part of the external events analysis. The risk from seismic events to the fuel in the core and in the fuel storage canal was evaluated. The key elements of this paper are the integration of seismically induced internal flood and internal fire, and the modeling of human error rates as a function of the magnitude of earthquake. The systems analysis was performed by EG ampersand G Idaho, Inc. and the fragility analysis and quantification were performed by EQE International, Inc. (EQE)

  15. Recent advances in nanopore-based nucleic acid analysis and sequencing

    International Nuclear Information System (INIS)

    Shi, Jidong; Fang, Ying; Hou, Junfeng

    2016-01-01

    Nanopore-based sequencing platforms are transforming the field of genomic science. This review (containing 116 references) highlights some recent progress on nanopore-based nucleic acid analysis and sequencing. These studies are classified into three categories, biological, solid-state, and hybrid nanopores, according to their nanoporous materials. We begin with a brief description of the translocation-based detection mechanism of nanopores. Next, specific examples are given in nanopore-based nucleic acid analysis and sequencing, with an emphasis on identifying strategies that can improve the resolution of nanopores. This review concludes with a discussion of future research directions that will advance the practical applications of nanopore technology. (author)

  16. Microscopic Analysis and Modeling of Airport Surface Sequencing, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — The complexity and interdependence of operations on the airport surface motivate the need for a comprehensive and detailed, yet flexible and validated analysis and...

  17. BioMatriX: Sequence analysis, structure visualization, phylogenetics ...

    African Journals Online (AJOL)

    bmx-biomatrix.blogspot.com) developed for biological science community to augment scientific research regarding genomics, proteomics, phylogenetics and linkage analysis in one platform. BioMatriX offers multi-functional services to perform ...

  18. Gossypium barbadense genome sequence provides insight into the evolution of extra-long staple fiber and specialized metabolites.

    Science.gov (United States)

    Liu, Xia; Zhao, Bo; Zheng, Hua-Jun; Hu, Yan; Lu, Gang; Yang, Chang-Qing; Chen, Jie-Dan; Chen, Jun-Jian; Chen, Dian-Yang; Zhang, Liang; Zhou, Yan; Wang, Ling-Jian; Guo, Wang-Zhen; Bai, Yu-Lin; Ruan, Ju-Xin; Shangguan, Xiao-Xia; Mao, Ying-Bo; Shan, Chun-Min; Jiang, Jian-Ping; Zhu, Yong-Qiang; Jin, Lei; Kang, Hui; Chen, Shu-Ting; He, Xu-Lin; Wang, Rui; Wang, Yue-Zhu; Chen, Jie; Wang, Li-Jun; Yu, Shu-Ting; Wang, Bi-Yun; Wei, Jia; Song, Si-Chao; Lu, Xin-Yan; Gao, Zheng-Chao; Gu, Wen-Yi; Deng, Xiao; Ma, Dan; Wang, Sen; Liang, Wen-Hua; Fang, Lei; Cai, Cai-Ping; Zhu, Xie-Fei; Zhou, Bao-Liang; Jeffrey Chen, Z; Xu, Shu-Hua; Zhang, Yu-Gao; Wang, Sheng-Yue; Zhang, Tian-Zhen; Zhao, Guo-Ping; Chen, Xiao-Ya

    2015-09-30

    Of the two cultivated species of allopolyploid cotton, Gossypium barbadense produces extra-long fibers for the production of superior textiles. We sequenced its genome (AD)2 and performed a comparative analysis. We identified three bursts of retrotransposons from 20 million years ago (Mya) and a genome-wide uneven pseudogenization peak at 11-20 Mya, which likely contributed to genomic divergences. Among the 2,483 genes preferentially expressed in fiber, a cell elongation regulator, PRE1, is strikingly At biased and fiber specific, echoing the A-genome origin of spinnable fiber. The expansion of the PRE members implies a genetic factor that underlies fiber elongation. Mature cotton fiber consists of nearly pure cellulose. G. barbadense and G. hirsutum contain 29 and 30 cellulose synthase (CesA) genes, respectively; whereas most of these genes (>25) are expressed in fiber, genes for secondary cell wall biosynthesis exhibited a delayed and higher degree of up-regulation in G. barbadense compared with G. hirsutum, conferring an extended elongation stage and highly active secondary wall deposition during extra-long fiber development. The rapid diversification of sesquiterpene synthase genes in the gossypol pathway exemplifies the chemical diversity of lineage-specific secondary metabolites. The G. barbadense genome advances our understanding of allopolyploidy, which will help improve cotton fiber quality.

  19. Novel parvoviruses in reptiles and genome sequence of a lizard parvovirus shed light on Dependoparvovirus genus evolution.

    Science.gov (United States)

    Pénzes, Judit J; Pham, Hanh T; Benkö, Mária; Tijssen, Peter

    2015-09-01

    Here, we report the detection and partial genome characterization of two novel reptilian parvoviruses derived from a short-tailed pygmy chameleon (Rampholeon brevicaudatus) and a corn snake (Pantherophis guttatus) along with the complete genome analysis of the first lizard parvovirus, obtained from four bearded dragons (Pogona vitticeps). Both homology searches and phylogenetic tree reconstructions demonstrated that all are members of the genus Dependoparvovirus. Even though most dependoparvoviruses replicate efficiently only in co-infections with large DNA viruses, no such agents could be detected in one of the bearded dragon samples, hence the possibility of autonomous replication was explored. The alternative ORF encoding the full assembly activating protein (AAP), typical for the genus, could be obtained from reptilian parvoviruses for the first time, with a structure that appears to be more ancient than that of avian and mammalian parvoviruses. All three viruses were found to harbour short introns as previously observed for snake adeno-associated virus, shorter than that of any non-reptilian dependoparvovirus. According to the phylogenetic calculations based on full non-structural protein (Rep) and AAP sequences, the monophyletic cluster of reptilian parvoviruses seems to be the most basal out of all lineages of genus Dependoparvovirus. The suspected ability for autonomous replication, results of phylogenetic tree reconstruction, intron lengths and the structure of the AAP suggested that a single Squamata origin instead of the earlier assumed diapsid (common avian-reptilian) origin is more likely for the genus Dependoparvovirus of the family Parvoviridae.

  20. Applying Unique Molecular Identifiers in Next Generation Sequencing Reveals a Constrained Viral Quasispecies Evolution under Cross-Reactive Antibody Pressure Targeting Long Alpha Helix of Hemagglutinin

    Science.gov (United States)

    Hauck, Nastasja C.; Kirpach, Josiane; Kiefer, Christina; Farinelle, Sophie; Morris, Stephen A.; Muller, Claude P.; Lu, I-Na

    2018-01-01

    To overcome yearly efforts and costs for the production of seasonal influenza vaccines, new approaches for the induction of broadly protective and long-lasting immune responses have been developed in the past decade. To warrant safety and efficacy of the emerging crossreactive vaccine candidates, it is critical to understand the evolution of influenza viruses in response to these new immune pressures. Here we applied unique molecular identifiers in next generation sequencing to analyze the evolution of influenza quasispecies under in vivo antibody pressure targeting the hemagglutinin (HA) long alpha helix (LAH). Our vaccine targeting LAH of hemagglutinin elicited significant seroconversion and protection against homologous and heterologous influenza virus strains in mice. The vaccine not only significantly reduced lung viral titers, but also induced a well-known bottleneck effect by decreasing virus diversity. In contrast to the classical bottleneck effect, here we showed a significant increase in the frequency of viruses with amino acid sequences identical to that of vaccine targeting LAH domain. No escape mutant emerged after vaccination. These results not only support the potential of a universal influenza vaccine targeting the conserved LAH domains, but also clearly demonstrate that the well-established bottleneck effect on viral quasispecies evolution does not necessarily generate escape mutants. PMID:29587397

  1. Applying Unique Molecular Identifiers in Next Generation Sequencing Reveals a Constrained Viral Quasispecies Evolution under Cross-Reactive Antibody Pressure Targeting Long Alpha Helix of Hemagglutinin

    Directory of Open Access Journals (Sweden)

    Nastasja C. Hauck

    2018-03-01

    Full Text Available To overcome yearly efforts and costs for the production of seasonal influenza vaccines, new approaches for the induction of broadly protective and long-lasting immune responses have been developed in the past decade. To warrant safety and efficacy of the emerging crossreactive vaccine candidates, it is critical to understand the evolution of influenza viruses in response to these new immune pressures. Here we applied unique molecular identifiers in next generation sequencing to analyze the evolution of influenza quasispecies under in vivo antibody pressure targeting the hemagglutinin (HA long alpha helix (LAH. Our vaccine targeting LAH of hemagglutinin elicited significant seroconversion and protection against homologous and heterologous influenza virus strains in mice. The vaccine not only significantly reduced lung viral titers, but also induced a well-known bottleneck effect by decreasing virus diversity. In contrast to the classical bottleneck effect, here we showed a significant increase in the frequency of viruses with amino acid sequences identical to that of vaccine targeting LAH domain. No escape mutant emerged after vaccination. These results not only support the potential of a universal influenza vaccine targeting the conserved LAH domains, but also clearly demonstrate that the well-established bottleneck effect on viral quasispecies evolution does not necessarily generate escape mutants.

  2. Complete genome sequence analysis of Nocardia brasiliensis HUJEG-1 reveals a saprobic lifestyle and the genes needed for human pathogenesis.

    Science.gov (United States)

    Vera-Cabrera, Lucio; Ortiz-Lopez, Rocio; Elizondo-Gonzalez, Ramiro; Ocampo-Candiani, Jorge

    2013-01-01

    Nocardia brasiliensis is an important etiologic agent of mycetoma. These bacteria live as a saprobe in soil or organic material and enter the tissue via minor trauma. Mycetoma is characterized by tumefaction and the production of fistula and abscesses, with no spontaneous cure. By using mass sequencing, we determined the complete genomic nucleotide sequence of the bacteria. According to our data, the genome is a circular chromosome 9,436,348-bp long with 68% G+C content that encodes 8,414 proteins. We observed orthologs for virulence factors, a higher number of genes involved in lipid biosynthesis and catabolism, and gene clusters for the synthesis of bioactive compounds, such as antibiotics, terpenes, and polyketides. An in silico analysis of the sequence supports the conclusion that the bacteria acquired diverse genes by horizontal transfer from other soil bacteria, even from eukaryotic organisms. The genome composition reflects the evolution of bacteria via the acquisition of a large amount of DNA, which allows it to survive in new ecological niches, including humans.

  3. Complete genome sequence analysis of Nocardia brasiliensis HUJEG-1 reveals a saprobic lifestyle and the genes needed for human pathogenesis.

    Directory of Open Access Journals (Sweden)

    Lucio Vera-Cabrera

    Full Text Available Nocardia brasiliensis is an important etiologic agent of mycetoma. These bacteria live as a saprobe in soil or organic material and enter the tissue via minor trauma. Mycetoma is characterized by tumefaction and the production of fistula and abscesses, with no spontaneous cure. By using mass sequencing, we determined the complete genomic nucleotide sequence of the bacteria. According to our data, the genome is a circular chromosome 9,436,348-bp long with 68% G+C content that encodes 8,414 proteins. We observed orthologs for virulence factors, a higher number of genes involved in lipid biosynthesis and catabolism, and gene clusters for the synthesis of bioactive compounds, such as antibiotics, terpenes, and polyketides. An in silico analysis of the sequence supports the conclusion that the bacteria acquired diverse genes by horizontal transfer from other soil bacteria, even from eukaryotic organisms. The genome composition reflects the evolution of bacteria via the acquisition of a large amount of DNA, which allows it to survive in new ecological niches, including humans.

  4. Evolution of Rhizaria: new insights from phylogenomic analysis of uncultivated protists.

    Science.gov (United States)

    Burki, Fabien; Kudryavtsev, Alexander; Matz, Mikhail V; Aglyamova, Galina V; Bulman, Simon; Fiers, Mark; Keeling, Patrick J; Pawlowski, Jan

    2010-12-02

    Recent phylogenomic analyses have revolutionized our view of eukaryote evolution by revealing unexpected relationships between and within the eukaryotic supergroups. However, for several groups of uncultivable protists, only the ribosomal RNA genes and a handful of proteins are available, often leading to unresolved evolutionary relationships. A striking example concerns the supergroup Rhizaria, which comprises several groups of uncultivable free-living protists such as radiolarians, foraminiferans and gromiids, as well as the parasitic plasmodiophorids and haplosporids. Thus far, the relationships within this supergroup have been inferred almost exclusively from rRNA, actin, and polyubiquitin genes, and remain poorly resolved. To address this, we have generated large Expressed Sequence Tag (EST) datasets for 5 species of Rhizaria belonging to 3 important groups: Acantharea (Astrolonche sp., Phyllostaurus sp.), Phytomyxea (Spongospora subterranea, Plasmodiophora brassicae) and Gromiida (Gromia sphaerica). 167 genes were selected for phylogenetic analyses based on the representation of at least one rhizarian species for each gene. Concatenation of these genes produced a supermatrix composed of 36,735 amino acid positions, including 10 rhizarians, 9 stramenopiles, and 9 alveolates. Phylogenomic analyses of this large dataset revealed a strongly supported clade grouping Foraminifera and Acantharea. The position of this clade within Rhizaria was sensitive to the method employed and the taxon sampling: Maximum Likelihood (ML) and Bayesian analyses using empirical model of evolution favoured an early divergence, whereas the CAT model and ML analyses with fast-evolving sites or the foraminiferan species Reticulomyxa filosa removed suggested a derived position, closely related to Gromia and Phytomyxea. In contrast to what has been previously reported, our analyses also uncovered the presence of the rhizarian-specific polyubiquitin insertion in Acantharea. Finally, this

  5. Evolution of Rhizaria: new insights from phylogenomic analysis of uncultivated protists

    Directory of Open Access Journals (Sweden)

    Bulman Simon

    2010-12-01

    Full Text Available Abstract Background Recent phylogenomic analyses have revolutionized our view of eukaryote evolution by revealing unexpected relationships between and within the eukaryotic supergroups. However, for several groups of uncultivable protists, only the ribosomal RNA genes and a handful of proteins are available, often leading to unresolved evolutionary relationships. A striking example concerns the supergroup Rhizaria, which comprises several groups of uncultivable free-living protists such as radiolarians, foraminiferans and gromiids, as well as the parasitic plasmodiophorids and haplosporids. Thus far, the relationships within this s