WorldWideScience

Sample records for acid sequence analysis

  1. Biological sequence analysis: probabilistic models of proteins and nucleic acids

    National Research Council Canada - National Science Library

    Durbin, Richard

    1998-01-01

    ... analysis methods are now based on principles of probabilistic modelling. Examples of such methods include the use of probabilistically derived score matrices to determine the significance of sequence alignments, the use of hidden Markov models as the basis for profile searches to identify distant members of sequence families, and the inference...

  2. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    Energy Technology Data Exchange (ETDEWEB)

    Myers, G.; Foley, B.; Korber, B. [eds.] [Los Alamos National Lab., NM (United States). Theoretical Div.; Mellors, J.W. [ed.] [Univ. of Pittsburgh, PA (United States); Jeang, K.T. [ed.] [National Institutes of Health, Bethesda, MD (United States). Molecular Virology Section; Wain-Hobson, S. [Pasteur Inst., Paris (France)] [ed.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.

  3. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3.

    Directory of Open Access Journals (Sweden)

    Xiaoyu Wang

    Full Text Available Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals.

  4. A Single Electrochemical Probe Used for Analysis of Multiple Nucleic Acid Sequences

    Science.gov (United States)

    Mills, Dawn M.; Calvo-Marzal, Percy; Pinzon, Jeffer M.; Armas, Stephanie; Kolpashchikov, Dmitry M.; Chumbimuni-Torres, Karin Y.

    2017-01-01

    Electrochemical hybridization sensors have been explored extensively for analysis of specific nucleic acids. However, commercialization of the platform is hindered by the need for attachment of separate oligonucleotide probes complementary to a RNA or DNA target to an electrode’s surface. Here we demonstrate that a single probe can be used to analyze several nucleic acid targets with high selectivity and low cost. The universal electrochemical four-way junction (4J)-forming (UE4J) sensor consists of a universal DNA stem-loop (USL) probe attached to the electrode’s surface and two adaptor strands (m and f) which hybridize to the USL probe and the analyte to form a 4J associate. The m adaptor strand was conjugated with a methylene blue redox marker for signal ON sensing and monitored using square wave voltammetry. We demonstrated that a single sensor can be used for detection of several different DNA/RNA sequences and can be regenerated in 30 seconds by a simple water rinse. The UE4J sensor enables a high selectivity by recognition of a single base substitution, even at room temperature. The UE4J sensor opens a venue for a re-useable universal platform that can be adopted at low cost for the analysis of DNA or RNA targets. PMID:29371782

  5. Microwave-assisted acid and base hydrolysis of intact proteins containing disulfide bonds for protein sequence analysis by mass spectrometry.

    Science.gov (United States)

    Reiz, Bela; Li, Liang

    2010-09-01

    Controlled hydrolysis of proteins to generate peptide ladders combined with mass spectrometric analysis of the resultant peptides can be used for protein sequencing. In this paper, two methods of improving the microwave-assisted protein hydrolysis process are described to enable rapid sequencing of proteins containing disulfide bonds and increase sequence coverage, respectively. It was demonstrated that proteins containing disulfide bonds could be sequenced by MS analysis by first performing hydrolysis for less than 2 min, followed by 1 h of reduction to release the peptides originally linked by disulfide bonds. It was shown that a strong base could be used as a catalyst for microwave-assisted protein hydrolysis, producing complementary sequence information to that generated by microwave-assisted acid hydrolysis. However, using either acid or base hydrolysis, amide bond breakages in small regions of the polypeptide chains of the model proteins (e.g., cytochrome c and lysozyme) were not detected. Dynamic light scattering measurement of the proteins solubilized in an acid or base indicated that protein-protein interaction or aggregation was not the cause of the failure to hydrolyze certain amide bonds. It was speculated that there were some unknown local structures that might play a role in preventing an acid or base from reacting with the peptide bonds therein. 2010 American Society for Mass Spectrometry. Published by Elsevier Inc. All rights reserved.

  6. Analysis of protein function and its prediction from amino acid sequence.

    Science.gov (United States)

    Clark, Wyatt T; Radivojac, Predrag

    2011-07-01

    Understanding protein function is one of the keys to understanding life at the molecular level. It is also important in the context of human disease because many conditions arise as a consequence of alterations of protein function. The recent availability of relatively inexpensive sequencing technology has resulted in thousands of complete or partially sequenced genomes with millions of functionally uncharacterized proteins. Such a large volume of data, combined with the lack of high-throughput experimental assays to functionally annotate proteins, attributes to the growing importance of automated function prediction. Here, we study proteins annotated by Gene Ontology (GO) terms and estimate the accuracy of functional transfer from protein sequence only. We find that the transfer of GO terms by pairwise sequence alignments is only moderately accurate, showing a surprisingly small influence of sequence identity (SID) in a broad range (30-100%). We developed and evaluated a new predictor of protein function, functional annotator (FANN), from amino acid sequence. The predictor exploits a multioutput neural network framework which is well suited to simultaneously modeling dependencies between functional terms. Experiments provide evidence that FANN-GO (predictor of GO terms; available from http://www.informatics.indiana.edu/predrag) outperforms standard methods such as transfer by global or local SID as well as GOtcha, a method that incorporates the structure of GO. Copyright © 2011 Wiley-Liss, Inc.

  7. Genome Sequence Analysis of the Emerging Human Pathogenic Acetic Acid Bacterium Granulibacter bethesdensis▿ †

    Science.gov (United States)

    Greenberg, David E.; Porcella, Stephen F.; Zelazny, Adrian M.; Virtaneva, Kimmo; Sturdevant, Dan E.; Kupko, John J.; Barbian, Kent D.; Babar, Amenah; Dorward, David W.; Holland, Steven M.

    2007-01-01

    Chronic granulomatous disease (CGD) is an inherited immune deficiency characterized by increased susceptibility to infection with Staphylococcus, certain gram-negative bacteria, and fungi. Granulibacter bethesdensis, a newly described genus and species within the family Acetobacteraceae, was recently isolated from four CGD patients residing in geographically distinct locales who presented with fever and lymphadenitis. We sequenced the genome of the reference strain of Granulibacter bethesdensis, which was isolated from lymph nodes of the original patient. The genome contains 2,708,355 base pairs in a single circular chromosome, in which 2,437 putative open reading frames (ORFs) were identified, 1,470 of which share sequence similarity with ORFs in the nonpathogenic but related Gluconobacter oxydans genome. Included in the 967 ORFs that are unique to G. bethesdensis are ORFs potentially important for virulence, adherence, DNA uptake, and methanol utilization. GC% values and best BLAST analysis suggested that some of these unique ORFs were recently acquired. Comparison of G. bethesdensis to other known CGD pathogens demonstrated conservation of some putative virulence factors, suggesting possible common mechanisms involved in pathogenesis in CGD. Genotyping of the four patient isolates by use of a custom microarray demonstrated genome-wide variations in regions encoding DNA uptake systems and transcriptional regulators and in hypothetical ORFs. G. bethesdensis is a genetically diverse emerging human pathogen that may have recently acquired virulence factors new to this family of organisms. PMID:17827295

  8. Protein sequence analysis by incorporating modified chaos game and physicochemical properties into Chou's general pseudo amino acid composition.

    Science.gov (United States)

    Xu, Chunrui; Sun, Dandan; Liu, Shenghui; Zhang, Yusen

    2016-10-07

    In this contribution we introduced a novel graphical method to compare protein sequences. By mapping a protein sequence into 3D space based on codons and physicochemical properties of 20 amino acids, we are able to get a unique P-vector from the 3D curve. This approach is consistent with wobble theory of amino acids. We compute the distance between sequences by their P-vectors to measure similarities/dissimilarities among protein sequences. Finally, we use our method to analyze four datasets and get better results compared with previous approaches. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. RNA Sequencing and Coexpression Analysis Reveal Key Genes Involved in α-Linolenic Acid Biosynthesis in Perilla frutescens Seed

    Directory of Open Access Journals (Sweden)

    Tianyuan Zhang

    2017-11-01

    Full Text Available Perilla frutescen is used as traditional food and medicine in East Asia. Its seeds contain high levels of α-linolenic acid (ALA, which is important for health, but is scarce in our daily meals. Previous reports on RNA-seq of perilla seed had identified fatty acid (FA and triacylglycerol (TAG synthesis genes, but the underlying mechanism of ALA biosynthesis and its regulation still need to be further explored. So we conducted Illumina RNA-sequencing in seven temporal developmental stages of perilla seeds. Sequencing generated a total of 127 million clean reads, containing 15.88 Gb of valid data. The de novo assembly of sequence reads yielded 64,156 unigenes with an average length of 777 bp. A total of 39,760 unigenes were annotated and 11,693 unigenes were found to be differentially expressed in all samples. According to Kyoto Encyclopedia of Genes and Genomes (KEGG pathway analysis, 486 unigenes were annotated in the “lipid metabolism” pathway. Of these, 150 unigenes were found to be involved in fatty acid (FA biosynthesis and triacylglycerol (TAG assembly in perilla seeds. A coexpression analysis showed that a total of 104 genes were highly coexpressed (r > 0.95. The coexpression network could be divided into two main subnetworks showing over expression in the medium or earlier and late phases, respectively. In order to identify the putative regulatory genes, a transcription factor (TF analysis was performed. This led to the identification of 45 gene families, mainly including the AP2-EREBP, bHLH, MYB, and NAC families, etc. After coexpression analysis of TFs with highly expression of FAD2 and FAD3 genes, 162 TFs were found to be significantly associated with two FAD genes (r > 0.95. Those TFs were predicted to be the key regulatory factors in ALA biosynthesis in perilla seed. The qRT-PCR analysis also verified the relevance of expression pattern between two FAD genes and partial candidate TFs. Although it has been reported that some TFs

  10. Cloning and sequence analysis of putative type II fatty acid synthase ...

    Indian Academy of Sciences (India)

    Prakash

    pathway for improving oil quality and increasing oil content of peanut through biotechnology-based approaches. Fatty acid biosynthesis is catalysed by two types of fatty acid synthase (FAS). Type I FAS, as found in vertebrates, yeast and some bacteria, contains all the active sites on one or two multidomain polypeptides.

  11. A New Approach to Sequence Analysis Exemplified by Identification of cis-Elements in Abscisic Acid Inducible Promoters

    DEFF Research Database (Denmark)

    Busk, Peter Kamp; Hallin, Peter Fischer; Salomon, Jesper

    the sequences of known cis-elements for promoter prediction4. This is a computational alternative to genome-wide DNA array analysis of gene expression and can be used to confirm the importance of known promoter elements on a genome level. Unfortunately, this approach does not yield information about new cis....... These promoters were compared to 28000 promoters that are not induced by abscisic acid. The analysis identified previously described ABA-inducible promoter elements such as ABRE, CE3 and CRT1 but also new cis-elements were found. Furthermore, the list of DNA elements could be used to predict ABA......-inducible promoter with more than 70 % accuracy for the top 40 predictions, which is better than previously described methods4. In conclusion, identification of short, conserved motifs is a useful computational approach for identification of cis-elements in DNA promoters and for promoter prediction in genome...

  12. The structural analysis of protein sequences based on the quasi-amino acids code

    International Nuclear Information System (INIS)

    Ping, Zhu; Xu-Qing, Tang; Zhen-Yuan, Xu

    2009-01-01

    Proteomics is the study of proteins and their interactions in a cell. With the successful completion of the Human Genome Project, it comes the postgenome era when the proteomics technology is emerging. This paper studies protein molecule from the algebraic point of view. The algebraic system (Σ, +, *) is introduced, where Σ is the set of 64 codons. According to the characteristics of (Σ, +, *), a novel quasi-amino acids code classification method is introduced and the corresponding algebraic operation table over the set ZU of the 16 kinds of quasi-amino acids is established. The internal relation is revealed about quasi-amino acids. The results show that there exist some very close correlations between the properties of the quasi-amino acids and the codon. All these correlation relationships may play an important part in establishing the logic relationship between codons and the quasi-amino acids during the course of life origination. According to Ma F et al (2003 J. Anhui Agricultural University 30 439), the corresponding relation and the excellent properties about amino acids code are very difficult to observe. The present paper shows that (ZU, ⊕, ) is a field. Furthermore, the operational results display that the codon tga has different property from other stop codons. In fact, in the mitochondrion from human and ox genomic codon, tga is just tryptophane, is not the stop codon like in other genetic code, it is the case of the Chen W C et al (2002 Acta Biophysica Sinica 18(1) 87). The present theory avoids some inexplicable events of the 20 kinds of amino acids code, in other words it solves the problem of 'the 64 codon assignments of mRNA to amino acids is probably completely wrong' proposed by Yang (2006 Progress in Modern Biomedicine 6 3). (cross-disciplinary physics and related areas of science and technology)

  13. Sequencing and transcriptional analysis of the biosynthesis gene cluster of abscisic acid-producing Botrytis cinerea.

    Science.gov (United States)

    Gong, Tao; Shu, Dan; Yang, Jie; Ding, Zhong-Tao; Tan, Hong

    2014-09-29

    Botrytis cinerea is a model species with great importance as a pathogen of plants and has become used for biotechnological production of ABA. The ABA cluster of B. cinerea is composed of an open reading frame without significant similarities (bcaba3), followed by the genes (bcaba1 and bcaba2) encoding P450 monooxygenases and a gene probably coding for a short-chain dehydrogenase/reductase (bcaba4). In B. cinerea ATCC58025, targeted inactivation of the genes in the cluster suggested at least three genes responsible for the hydroxylation at carbon atom C-1' and C-4' or oxidation at C-4' of ABA. Our group has identified an ABA-overproducing strain, B. cinerea TB-3-H8. To differentiate TB-3-H8 from other B. cinerea strains with the functional ABA cluster, the DNA sequence of the 12.11-kb region containing the cluster of B. cinerea TB-3-H8 was determined. Full-length cDNAs were also isolated for bcaba1, bcaba2, bcaba3 and bcaba4 from B. cinerea TB-3-H8. Sequence comparison of the four genes and their flanking regions respectively derived from B. cinerea TB-3-H8, B05.10 and T4 revealed that major variations were located in intergenic sequences. In B. cinerea TB-3-H8, the expression profiles of the four function genes under ABA high-yield conditions were also analyzed by real-time PCR.

  14. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene......This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis...

  15. Bioinformatics analysis of the oxidosqualene cyclase gene and the amino acid sequence in mangrove plants

    Science.gov (United States)

    Basyuni, M.; Wati, R.

    2017-01-01

    This study described the bioinformatics methods to analyze seven oxidosqualene cyclase (OSC) genes from mangrove plants on DDBJ/EMBL/GenBank as well as predicted the structure, composition, similarity, subcellular localization and phylogenetic. The physical and chemical properties of seven mangrove OSC showed variation among the genes. The percentage of the secondary structure of seven mangrove OSC genes followed the order of a helix > random coil > extended chain structure. The values of chloroplast or signal peptide were too low, indicated that no chloroplast transit peptide or signal peptide of secretion pathway in mangrove OSC genes. The target peptide value of mitochondria varied from 0.163 to 0.430, indicated it was possible to exist. These results suggested the importance of understanding the diversity and functional of properties of the different amino acids in mangrove OSC genes. To clarify the relationship among the mangrove OSC gene, a phylogenetic tree was constructed. The phylogenetic tree shows that there are three clusters, Kandelia KcMS join with Bruguiera BgLUS, Rhizophora RsM1 was close to Bruguiera BgbAS, and Rhizophora RcCAS join with Kandelia KcCAS. The present study, therefore, supported the previous results that plant OSC genes form distinct clusters in the tree.

  16. Chip-based sequencing nucleic acids

    Science.gov (United States)

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  17. Image analysis for DNA sequencing

    International Nuclear Information System (INIS)

    Palaniappan, K.; Huang, T.S.

    1991-01-01

    This paper reports that there is a great deal of interest in automating the process of DNA (deoxyribonucleic acid) sequencing to support the analysis of genomic DNA such as the Human and Mouse Genome projects. In one class of gel-based sequencing protocols autoradiograph images are generated in the final step and usually require manual interpretation to reconstruct the DNA sequence represented by the image. The need to handle a large volume of sequence information necessitates automation of the manual autoradiograph reading step through image analysis in order to reduce the length of time required to obtain sequence data and reduce transcription errors. Various adaptive image enhancement, segmentation and alignment methods were applied to autoradiograph images. The methods are adaptive to the local characteristics of the image such as noise, background signal, or presence of edges. Once the two-dimensional data is converted to a set of aligned one-dimensional profiles waveform analysis is used to determine the location of each band which represents one nucleotide in the sequence. Different classification strategies including a rule-based approach are investigated to map the profile signals, augmented with the original two-dimensional image data as necessary, to textual DNA sequence information

  18. Human Retroviruses and AIDS. A compilation and analysis of nucleic acid and amino acid sequences: I--II; III--V

    Energy Technology Data Exchange (ETDEWEB)

    Myers, G.; Korber, B. [eds.] [Los Alamos National Lab., NM (United States); Wain-Hobson, S. [ed.] [Laboratory of Molecular Retrovirology, Pasteur Inst.; Smith, R.F. [ed.] [Baylor Coll. of Medicine, Houston, TX (United States). Dept. of Pharmacology; Pavlakis, G.N. [ed.] [National Cancer Inst., Frederick, MD (United States). Cancer Research Facility

    1993-12-31

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (I) HIV and SIV Nucleotide Sequences; (II) Amino Acid Sequences; (III) Analyses; (IV) Related Sequences; and (V) Database Communications. Information within all the parts is updated at least twice in each year, which accounts for the modes of binding and pagination in the compendium.

  19. Barley polyamine oxidase: Characterisation and analysis of the cofactor and the N-terminal amino acid sequence

    DEFF Research Database (Denmark)

    Radova, A.; Sebela, M.; Galuszka, P.

    2001-01-01

    was further purified to a final homogeneity (by the criteria of isoelectric focusing and SDS-PAGE) using techniques of low pressure chromatography followed by two FPLC steps. The purified yellow enzyme showed visible absorption maxima of a flavoprotein at 380 and 450 nm: the presence of FAD as the cofactor...... was further confirmed by measuring the fluorescence spectra, Barley PAO is an acidic protein (pI 5.4) containing 3% of neutral sugars: its molecular mass determined by SDS-PAGE was 56 kDa, whilst gel permeation chromatography revealed the higher value of 76 kDa. The N-terminal amino acid sequence of barley...

  20. Analysis of Draft Genome Sequence of Pseudomonas sp. QTF5 Reveals Its Benzoic Acid Degradation Ability and Heavy Metal Tolerance

    Directory of Open Access Journals (Sweden)

    Yang Li

    2017-01-01

    Full Text Available Pseudomonas sp. QTF5 was isolated from the continuous permafrost near the bitumen layers in the Qiangtang basin of Qinghai-Tibetan Plateau in China (5,111 m above sea level. It is psychrotolerant and highly and widely tolerant to heavy metals and has the ability to metabolize benzoic acid and salicylic acid. To gain insight into the genetic basis for its adaptation, we performed whole genome sequencing and analyzed the resistant genes and metabolic pathways. Based on 120 published and annotated genomes representing 31 species in the genus Pseudomonas, in silico genomic DNA-DNA hybridization (<54% and average nucleotide identity calculation (<94% revealed that QTF5 is closest to Pseudomonas lini and should be classified into a novel species. This study provides the genetic basis to identify the genes linked to its specific mechanisms for adaptation to extreme environment and application of this microorganism in environmental conservation.

  1. Mass Spectrometry Analysis Coupled with de novo Sequencing Reveals Amino Acid Substitutions in Nucleocapsid Protein from Influenza A Virus

    Directory of Open Access Journals (Sweden)

    Zijian Li

    2014-02-01

    Full Text Available Amino acid substitutions in influenza A virus are the main reasons for both antigenic shift and virulence change, which result from non-synonymous mutations in the viral genome. Nucleocapsid protein (NP, one of the major structural proteins of influenza virus, is responsible for regulation of viral RNA synthesis and replication. In this report we used LC-MS/MS to analyze tryptic digestion of nucleocapsid protein of influenza virus (A/Puerto Rico/8/1934 H1N1, which was isolated and purified by SDS poly-acrylamide gel electrophoresis. Thus, LC-MS/MS analyses, coupled with manual de novo sequencing, allowed the determination of three substituted amino acid residues R452K, T423A and N430T in two tryptic peptides. The obtained results provided experimental evidence that amino acid substitutions resulted from non-synonymous gene mutations could be directly characterized by mass spectrometry in proteins of RNA viruses such as influenza A virus.

  2. Image sequence analysis

    CERN Document Server

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  3. The amino acid sequence of hypertensin. II.

    Science.gov (United States)

    SKEGGS, L T; LENTZ, K E; KAHN, J R; SHUMWAY, N P; WOODS, K R

    1956-08-01

    The amino acid sequence of horse hypertensin II has been determined by the use of chymotrypsin, the fluorodinitrobenzene method, and stepwise phenylisothiocyanate degradation. The results indicate that the amino acids of hypertensin II are arranged in the following order: asp-arg-val-tyr-iso-hist-pro-phe.

  4. Sequence analysis on microcomputers.

    Science.gov (United States)

    Cannon, G C

    1987-10-02

    Overall, each of the program packages performed their tasks satisfactorily. For analyses where there was a well-defined answer, such as a search for a restriction site, there were few significant differences between the program sets. However, for tasks in which a degree of flexibility is desirable, such as homology or similarity determinations and database searches, DNASTAR consistently afforded the user more options in conducting the required analysis than did the other two packages. However, for laboratories where sequence analysis is not a major effort and the expense of a full sequence analysis workstation cannot be justified, MicroGenie and IBI-Pustell offer a satisfactory alternative. MicroGenie is a polished program system. Many may find that its user interface is more "user friendly" than the standard menu-driven interfaces. Its system of filing sequences under individual passwords facilitates use by more than one person. MicroGenie uses a hardware device for software protection that occupies a card slot in the computer on which it is used. Although I am sympathetic to the problem of software piracy, I feel that a less drastic solution is in order for a program likely to be sharing limited computer space with other software packages. The IBI-Pustell package performs the required analysis functions as accurately and quickly as MicroGenie but it lacks the clearness and ease of use. The menu system seems disjointed, and new or infrequent users often find themselves at apparent "dead-end menus" where the only clear alternative is to restart the entire program package. It is suggested from published accounts that the user interface is going to be upgraded and perhaps when that version is available, use of the system will be improved. The documentation accompanying each package was relatively clear as to how to run the programs, but all three packages assumed that the user was familiar with the computational techniques employed. MicroGenie and IBI-Pustell further

  5. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences.

    Science.gov (United States)

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D; Adir, Noam

    2016-06-28

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel.

  6. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences

    Science.gov (United States)

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D.; Adir, Noam

    2016-01-01

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel. PMID:27307442

  7. Variation of amino acid sequences of serum amyloid a (SAA) and immunohistochemical analysis of amyloid a (AA) in Japanese domestic cats.

    Science.gov (United States)

    Tei, Meina; Uchida, Kazuyuki; Chambers, James K; Watanabe, Ken-Ichi; Tamamoto, Takashi; Ohno, Koichi; Nakayama, Hiroyuki

    2018-02-02

    Amyloid A (AA) amyloidosis, a fatal systemic amyloid disease, occurs secondary to chronic inflammatory conditions in humans. Although persistently elevated serum amyloid A (SAA) levels are required for its pathogenesis, not all individuals with chronic inflammation necessarily develop AA amyloidosis. Furthermore, many diseases in cats are associated with the elevated production of SAA, whereas only a small number actually develop AA amyloidosis. We hypothesized that a genetic mutation in the SAA gene may strongly contribute to the pathogenesis of feline AA amyloidosis. In the present study, genomic DNA from four Japanese domestic cats (JDCs) with AA amyloidosis and from five without amyloidosis was analyzed using polymerase chain reaction (PCR) amplification and direct sequencing. We identified the novel variation combination of 45R-51A in the deduced amino acid sequences of four JDCs with amyloidosis and five without. However, there was no relationship between amino acid variations and the distribution of AA amyloid deposits, indicating that differences in SAA sequences do not contribute to the pathogenesis of AA amyloidosis. Immunohistochemical analysis using antisera against the three different parts of the feline SAA protein-i.e., the N-terminal, central, and C-terminal regions-revealed that feline AA contained the C-terminus, unlike human AA. These results indicate that the cleavage and degradation of the C-terminus are not essential for amyloid fibril formation in JDCs.

  8. Shared Segment Analysis and Next-Generation Sequencing Implicates the Retinoic Acid Signaling Pathway in Total Anomalous Pulmonary Venous Return (TAPVR.

    Directory of Open Access Journals (Sweden)

    Dustin Nash

    Full Text Available Most isolated congenital heart defects are thought to be sporadic and are often ascribed to multifactorial mechanisms with poorly understood genetics. Total Anomalous Pulmonary Venous Return (TAPVR occurs in 1 in 15,000 live-born infants and occurs either in isolation or as part of a syndrome involving aberrant left-right development. Previously, we reported causative links between TAVPR and the PDGFRA gene. TAPVR has also been linked to the ANKRD1/CARP genes. However, these genes only explain a small fraction of the heritability of the condition. By examination of phased single nucleotide polymorphism genotype data from 5 distantly related TAPVR patients we identified a single 25 cM shared, Identical by Descent genomic segment on the short arm of chromosome 12 shared by 3 of the patients and their obligate-carrier parents. Whole genome sequence (WGS analysis identified a non-synonymous variant within the shared segment in the retinol binding protein 5 (RBP5 gene. The RBP5 variant is predicted to be deleterious and is overrepresented in the TAPVR population. Gene expression and functional analysis of the zebrafish orthologue, rbp7, supports the notion that RBP5 is a TAPVR susceptibility gene. Additional sequence analysis also uncovered deleterious variants in genes associated with retinoic acid signaling, including NODAL and retinol dehydrogenase 10. These data indicate that genetic variation in the retinoic acid signaling pathway confers, in part, susceptibility to TAPVR.

  9. Computer-aided visualization and analysis system for sequence evaluation

    Energy Technology Data Exchange (ETDEWEB)

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  10. Hyaluronan-binding region of aggrecan from pig laryngeal cartilage. Amino acid sequence, analysis of N-linked oligosaccharides and location of the keratan sulphate.

    OpenAIRE

    Barry, F P; Gaw, J U; Young, C N; Neame, P J

    1992-01-01

    The hyaluronan-binding region (HABR) was prepared from pig laryngeal cartilage aggrecan and the amino acid sequence was determined. The HABR had two N-termini: one N-terminal sequence was Val-Glu-Val-Ser-Glu-Pro (367 amino acids in total), and a second N-terminal sequence (Ala-Ile-Ser-Val-Glu-Val; 370 amino acids in total) was found to arise due to alternate cleavage by the signal peptidase. The N-linked oligosaccharides were analysed by examining their reactivity with a series of lectins. It...

  11. HTSSIP: An R package for analysis of high throughput sequencing data from nucleic acid stable isotope probing (SIP experiments.

    Directory of Open Access Journals (Sweden)

    Nicholas D Youngblut

    Full Text Available Combining high throughput sequencing with stable isotope probing (HTS-SIP is a powerful method for mapping in situ metabolic processes to thousands of microbial taxa. However, accurately mapping metabolic processes to taxa is complex and challenging. Multiple HTS-SIP data analysis methods have been developed, including high-resolution stable isotope probing (HR-SIP, multi-window high-resolution stable isotope probing (MW-HR-SIP, quantitative stable isotope probing (qSIP, and ΔBD. Currently, there is no publicly available software designed specifically for analyzing HTS-SIP data. To address this shortfall, we have developed the HTSSIP R package, an open-source, cross-platform toolset for conducting HTS-SIP analyses in a straightforward and easily reproducible manner. The HTSSIP package, along with full documentation and examples, is available from CRAN at https://cran.r-project.org/web/packages/HTSSIP/index.html and Github at https://github.com/buckleylab/HTSSIP.

  12. Optimization of short amino acid sequences classifier

    Science.gov (United States)

    Barcz, Aleksy; Szymański, Zbigniew

    This article describes processing methods used for short amino acid sequences classification. The data processed are 9-symbols string representations of amino acid sequences, divided into 49 data sets - each one containing samples labeled as reacting or not with given enzyme. The goal of the classification is to determine for a single enzyme, whether an amino acid sequence would react with it or not. Each data set is processed separately. Feature selection is performed to reduce the number of dimensions for each data set. The method used for feature selection consists of two phases. During the first phase, significant positions are selected using Classification and Regression Trees. Afterwards, symbols appearing at the selected positions are substituted with numeric values of amino acid properties taken from the AAindex database. In the second phase the new set of features is reduced using a correlation-based ranking formula and Gram-Schmidt orthogonalization. Finally, the preprocessed data is used for training LS-SVM classifiers. SPDE, an evolutionary algorithm, is used to obtain optimal hyperparameters for the LS-SVM classifier, such as error penalty parameter C and kernel-specific hyperparameters. A simple score penalty is used to adapt the SPDE algorithm to the task of selecting classifiers with best performance measures values.

  13. Integrated sequence analysis. Final report

    International Nuclear Information System (INIS)

    Andersson, K.; Pyy, P.

    1998-02-01

    The NKS/RAK subprojet 3 'integrated sequence analysis' (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term 'methodology' denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  14. Amino acid sequence of tyrosinase from Neurospora crassa.

    Science.gov (United States)

    Lerch, K

    1978-01-01

    The amino-acid sequence of tyrosinase from Neurospora crassa (monophenol,dihydroxyphenylalanine:oxygen oxidoreductase, EC 1.14.18.1) is reported. This copper-containing oxidase consists of a single polypeptide chain of 407 amino acids. The primary structure was determined by automated and manual sequence analysis on fragments produced by cleavage with cyanogen bromide and on peptides obtained by digestion with trypsin, pepsin, thermolysin, or chymotrypsin. The amino terminus of the protein is acetylated and the single cysteinyl residue 96 is covalently linked via a thioether bridge to histidyl residue 94. The formation and the possible role of this unusual structure in Neurospora tyrosinase is discussed. Dye-sensitized photooxidation of apotyrosinase and active-site-directed inactivation of the native enzyme indicate the possible involvement of histidyl residues 188, 192, 289, and 305 or 306 as ligands to the active-site copper as well as in the catalytic mechanism of this monooxygenase. PMID:151279

  15. Hyaluronan-binding region of aggrecan from pig laryngeal cartilage. Amino acid sequence, analysis of N-linked oligosaccharides and location of the keratan sulphate.

    Science.gov (United States)

    Barry, F P; Gaw, J U; Young, C N; Neame, P J

    1992-09-15

    The hyaluronan-binding region (HABR) was prepared from pig laryngeal cartilage aggrecan and the amino acid sequence was determined. The HABR had two N-termini: one N-terminal sequence was Val-Glu-Val-Ser-Glu-Pro (367 amino acids in total), and a second N-terminal sequence (Ala-Ile-Ser-Val-Glu-Val; 370 amino acids in total) was found to arise due to alternate cleavage by the signal peptidase. The N-linked oligosaccharides were analysed by examining their reactivity with a series of lectins. It was found that the N-linked oligosaccharide on loop A was of the mannose type, while that on loop B was of the complex type. No reactivity was detected between the N-linked oligosaccharide on loop B' and any of the lectins. The location of keratan sulphate (KS) in the HABR was determined by Edman degradation of the immobilized KS-containing peptide. The released amino acid derivatives were collected and tested for the presence of epitope to antibody 5-D-4. On the basis of 5-D-4 reactivity and sequencing yields, the KS chains are attached to threonine residues 352 and 357. There is no KS at threonine-355. This site is not in fact in G1, but about 16 amino acid residues into the interglobular domain. Comparison of the structure of the KS chain from the HABR and from the KS domain of pig laryngeal cartilage aggrecan was made by separation on polyacrylamide gels of the oligosaccharides arising from digestion with keratanase. Comparison of the oligosaccharide maps suggests that the KS chains from both parts of the aggrecan molecule have the same structure.

  16. Integrated sequence analysis. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Andersson, K.; Pyy, P

    1998-02-01

    The NKS/RAK subprojet 3 `integrated sequence analysis` (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term `methodology` denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  17. Sequencing nucleic acids: from chemistry to medicine.

    Science.gov (United States)

    Balasubramanian, Shankar

    2011-07-14

    Chemistry has played a vital role in making routine, affordable sequencing of human genomes a reality. This article focuses on the genesis and development of Solexa sequencing that originated in Cambridge, UK. This sequencing approach is helping transform science and offers intriguing prospects for the future of medicine.

  18. Amino acid analysis

    Science.gov (United States)

    Winitz, M.; Graff, J. (Inventor)

    1974-01-01

    The process and apparatus for qualitative and quantitative analysis of the amino acid content of a biological sample are presented. The sample is deposited on a cation exchange resin and then is washed with suitable solvents. The amino acids and various cations and organic material with a basic function remain on the resin. The resin is eluted with an acid eluant, and the eluate containing the amino acids is transferred to a reaction vessel where the eluant is removed. Final analysis of the purified acylated amino acid esters is accomplished by gas-liquid chromatographic techniques.

  19. Peptide Nucleic Acids Having Enhanced Binding Affinity and Sequence Specificity

    DEFF Research Database (Denmark)

    1998-01-01

    A novel class of compounds, known as peptide nucleic acids, bind complementary DNA and RNA strands more strongly than a corresponding DNA strand, and exhibit increased sequence specificity and binding affinity. Methods of increasing binding affinity and sequence specificity of peptide nucleic acids...

  20. Methods and compositions for efficient nucleic acid sequencing

    Science.gov (United States)

    Drmanac, Radoje

    2002-01-01

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  1. Sequence Handling by Sequence Analysis Toolbox v1.0

    DEFF Research Database (Denmark)

    Ingrell, Christian Ravnsborg; Matthiesen, Rune; Jensen, Ole Nørregaard

    2006-01-01

    analysis toolbox v1.0 was to have a general purpose sequence analyzing tool that can import sequences obtained by high-throughput sequencing methods. The program includes algorithms for calculation or prediction of isoelectric point, hydropathicity index, transmembrane segments, and glycosylphosphatidyl......The fact that mass spectrometry have become a high-throughput method calls for bioinformatic tools for automated sequence handling and prediction. For efficient use of bioinformatic tools, it is important that these tools are integrated or interfaced with each other. The purpose of sequence...... inositol-anchored proteins....

  2. Novel algorithms for protein sequence analysis

    NARCIS (Netherlands)

    Ye, Kai

    2008-01-01

    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology”s paradigm is that this order of amino acids determines the protein”s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1

  3. Sequence analysis corresponding to the PPE and PE proteins in ...

    Indian Academy of Sciences (India)

    Amino acid sequence analysis corresponding to the PPE proteins in H37Rv and CDC1551 strains of the Mycobacterium tuberculosis genomes resulted in the identification of a previously uncharacterized 225 amino acidresidue common region in 22 proteins. The pairwise sequence identities were as low as 18%.

  4. Molecular cloning, sequence analysis and structure prediction of the ...

    African Journals Online (AJOL)

    Molecular cloning, sequence analysis and structure prediction of the related to b 0,+ amino acid transporter (rBAT) in Cyprinus carpio L. ... The amplified product was 2370 bp, including a 42 bp 5'-untranslated region, a 288 bp 3'-untranslated region, and a 2040 bp open reading frame (ORF), which encoded 679 amino acids ...

  5. Sequence analysis corresponding to the PPE and PE proteins in ...

    Indian Academy of Sciences (India)

    Unknown

    Amino acid sequence analysis corresponding to the PE proteins resulted in the identification of tandem repeats comprising 41–43 amino acid ..... Q9RAP8. Leptospira interrogans serovar ictero- haemorrhagiae-B. ORFC protein. 5. P71003. Bacillus subtilis-B. Hypothetical 49⋅4 kDa protein. 5. ALL2941. Anabaena sp.

  6. Amino acid sequence of Japanese quail (Coturnix japonica) and northern bobwhite (Colinus virginianus) myoglobin.

    Science.gov (United States)

    Goodson, John; Beckstead, Robert B; Payne, Jason; Singh, Rakesh K; Mohan, Anand

    2015-08-15

    Myoglobin has an important physiological role in vertebrates, and as the primary sarcoplasmic pigment in meat, influences quality perception and consumer acceptability. In this study, the amino acid sequences of Japanese quail and northern bobwhite myoglobin were deduced by cDNA cloning of the coding sequence from mRNA. Japanese quail myoglobin was isolated from quail cardiac muscles, purified using ammonium sulphate precipitation and gel-filtration, and subjected to multiple enzymatic digestions. Mass spectrometry corroborated the deduced protein amino acid sequence at the protein level. Sequence analysis revealed both species' myoglobin structures consist of 153 amino acids, differing at only three positions. When compared with chicken myoglobin, Japanese quail showed 98% sequence identity, and northern bobwhite 97% sequence identity. The myoglobin in both quail species contained eight histidine residues instead of the nine present in chicken and turkey. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. Representation of protein-sequence information by amino acid subalphabets

    DEFF Research Database (Denmark)

    Andersen, C.A.F.; Brunak, Søren

    2004-01-01

    -sequence information, using machine learning strategies, where the primary goal is the discovery of novel powerful representations for use in AI techniques. In the case of proteins and the 20 different amino acids they typically contain, it is also a secondary goal to discover how the current selection of amino acids...

  8. MEANS AND METHODS FOR CLONING NUCLEIC ACID SEQUENCES

    NARCIS (Netherlands)

    Geertsma, Eric Robin; Poolman, Berend

    2008-01-01

    The invention provides means and methods for efficiently cloning nucleic acid sequences of interest in micro-organisms that are less amenable to conventional nucleic acid manipulations, as compared to, for instance, E.coli. The present invention enables high-throughput cloning (and, preferably,

  9. Comprehensive global amino acid sequence analysis of PB1F2 protein of influenza A H5N1 viruses and the influenza A virus subtypes responsible for the 20th‐century pandemics

    Science.gov (United States)

    Pasricha, Gunisha; Mishra, Akhilesh C.; Chakrabarti, Alok K.

    2012-01-01

    Please cite this paper as: Pasricha et al. (2012) Comprehensive global amino acid sequence analysis of PB1F2 protein of influenza A H5N1 viruses and the Influenza A virus subtypes responsible for the 20th‐century pandemics. Influenza and Other Respiratory Viruses 7(4), 497–505. Background  PB1F2 is the 11th protein of influenza A virus translated from +1 alternate reading frame of PB1 gene. Since the discovery, varying sizes and functions of the PB1F2 protein of influenza A viruses have been reported. Selection of PB1 gene segment in the pandemics, variable size and pleiotropic effect of PB1F2 intrigued us to analyze amino acid sequences of this protein in various influenza A viruses. Methods  Amino acid sequences for PB1F2 protein of influenza A H5N1, H1N1, H2N2, and H3N2 subtypes were obtained from Influenza Research Database. Multiple sequence alignments of the PB1F2 protein sequences of the aforementioned subtypes were used to determine the size, variable and conserved domains and to perform mutational analysis. Results  Analysis showed that 96·4% of the H5N1 influenza viruses harbored full‐length PB1F2 protein. Except for the 2009 pandemic H1N1 virus, all the subtypes of the 20th‐century pandemic influenza viruses contained full‐length PB1F2 protein. Through the years, PB1F2 protein of the H1N1 and H3N2 viruses has undergone much variation. PB1F2 protein sequences of H5N1 viruses showed both human‐ and avian host‐specific conserved domains. Global database of PB1F2 protein revealed that N66S mutation was present only in 3·8% of the H5N1 strains. We found a novel mutation, N84S in the PB1F2 protein of 9·35% of the highly pathogenic avian influenza H5N1 influenza viruses. Conclusions  Varying sizes and mutations of the PB1F2 protein in different influenza A virus subtypes with pandemic potential were obtained. There was genetic divergence of the protein in various hosts which highlighted the host‐specific evolution of the virus

  10. Nonlinear analysis of biological sequences

    Energy Technology Data Exchange (ETDEWEB)

    Torney, D.C.; Bruno, W.; Detours, V. [and others

    1998-11-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The main objectives of this project involved deriving new capabilities for analyzing biological sequences. The authors focused on tabulating the statistical properties exhibited by Human coding DNA sequences and on techniques of inferring the phylogenetic relationships among protein sequences related by descent.

  11. Purification and amino acid sequence of a fern (Gleichenia japonica) ferredoxin.

    Science.gov (United States)

    Hase, T; Yamanashi, H; Matsubara, H

    1982-01-01

    A chloroplast-type ferredoxin was purified from a fern, Gleichenia japonica, and its amino acid sequence was determined. The conventional method for soft leaves proved to be unsuitable for the extraction of ferredoxin from G. japonica, but a good yield was obtained by blending the leaves in cold acetone. The analysis of 8 tryptic peptides of Cm-ferredoxin gave the complete amino acid sequence. The molecule consisted of a single polypeptide chain of 95 amino acid sequence. The molecule consisted of a single polypeptide chain of 95 amino acid residues and lacked tryptophan. Relatively high contents of phenylalanine and arginine were noted, some of which had unique locations in comparison with other ferredoxins. G. japonica ferredoxin did not show a close sequence homology with the ferredoxins from horsetails, which, like ferns, belong to Pteridophyta, or with those from plants of different taxonomical groups. The fern ferredoxins were suggested to form a unique group in the chloroplast-type ferredoxins.

  12. Nucleotide and Predicted Amino Acid Sequence-Based Analysis of the Avian Metapneumovirus Type C Cell Attachment Glycoprotein Gene: Phylogenetic Analysis and Molecular Epidemiology of U.S. Pneumoviruses

    Science.gov (United States)

    Alvarez, Rene; Lwamba, Humphrey M.; Kapczynski, Darrell R.; Njenga, M. Kariuki; Seal, Bruce S.

    2003-01-01

    A serologically distinct avian metapneumovirus (aMPV) was isolated in the United States after an outbreak of turkey rhinotracheitis (TRT) in February 1997. The newly recognized U.S. virus was subsequently demonstrated to be genetically distinct from European subtypes and was designated aMPV serotype C (aMPV/C). We have determined the nucleotide sequence of the gene encoding the cell attachment glycoprotein (G) of aMPV/C (Colorado strain and three Minnesota isolates) and predicted amino acid sequence by sequencing cloned cDNAs synthesized from intracellular RNA of aMPV/C-infected cells. The nucleotide sequence comprised 1,321 nucleotides with only one predicted open reading frame encoding a protein of 435 amino acids, with a predicted Mr of 48,840. The structural characteristics of the predicted G protein of aMPV/C were similar to those of the human respiratory syncytial virus (hRSV) attachment G protein, including two mucin-like regions (heparin-binding domains) flanking both sides of a CX3C chemokine motif present in a conserved hydrophobic pocket. Comparison of the deduced G-protein amino acid sequence of aMPV/C with those of aMPV serotypes A, B, and D, as well as hRSV revealed overall predicted amino acid sequence identities ranging from 4 to 16.5%, suggesting a distant relationship. However, G-protein sequence identities ranged from 72 to 97% when aMPV/C was compared to other members within the aMPV/C subtype or 21% for the recently identified human MPV (hMPV) G protein. Ratios of nonsynonymous to synonymous nucleotide changes were greater than one in the G gene when comparing the more recent Minnesota isolates to the original Colorado isolate. Epidemiologically, this indicates positive selection among U.S. isolates since the first outbreak of TRT in the United States. PMID:12682171

  13. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  14. Chymotryptic inhibitor I from potatoes. The amino acid sequence of subunit A*

    Science.gov (United States)

    Richardson, M.

    1974-01-01

    The amino acid sequence of subunit A of the potato chymotryptic inhibitor I was determined. The sequence was deduced from analysis of fragments and peptides derived from the protein by cleavage with cyanogen bromide, N-bromosuccinimide and dilute acid, and by digestion with trypsin, thermolysin, pepsin and papain. The molecule consists of a single polypeptide chain of 84 residues, which contains two homologous regions each of 13 amino acids. The protein does not appear to be homologous with any other known proteinase inhibitors. PMID:4595280

  15. Lectin from sainfoin (Onobrychis viciifolia scop.). Complete amino acid sequence.

    Science.gov (United States)

    Kouchalakos, R N; Bates, O J; Bradshaw, R A; Hapner, K D

    1984-04-10

    The complete amino acid sequence of a lectin from sainfoin ( Onobrychis viciifolia Scop . var. Eski ) has been determined by sequential Edman analyses of the intact protein and peptides derived from digests with trypsin and thermolysin. Peptides were purified by pH fractionation, by gel filtration, and by cation-exchange and reverse-phase high-performance liquid chromatography. Seven segments of continuous sequence, accounting for the entire protein, were aligned through sequence comparison with several homologous leguminous lectins to give the final structure. Sainfoin lectin monomer, a glycoprotein which contains a single polypeptide chain of 236 amino acid residues with a molecular weight of 26 509, has amino- and carboxyl-terminal residues of alanine and threonine, respectively. A single residue of cysteine, located at position 33, is the only sulfur-containing amino acid present. Asparagine-118 is the single oligosaccharide attachment site. At least two apparent allelomorphic forms of the protein, having valine or isoleucine at position 49 in equal amounts, were detected. The amino acid sequence of sainfoin lectin exhibits circular permutation relative to that of the homologous protein concanavalin A.

  16. Sequence analysis corresponding to the PPE and PE proteins in ...

    Indian Academy of Sciences (India)

    Unknown

    AB repeats; Mycobacterium tuberculosis genome; PE-PPE domain; PPE, PE proteins; sequence analysis; surface antigens. J. Biosci. | Vol. ... bacterium tuberculosis genomes resulted in the identification of a previously uncharacterized 225 amino acid- ...... Vega Lopez F, Brooks L A, Dockrell H M, De Smet K A,. Thompson ...

  17. Comprehensive global amino acid sequence analysis of PB1F2 protein of influenza A H5N1 viruses and the influenza A virus subtypes responsible for the 20th-century pandemics.

    Science.gov (United States)

    Pasricha, Gunisha; Mishra, Akhilesh C; Chakrabarti, Alok K

    2013-07-01

    PB1F2 is the 11th protein of influenza A virus translated from +1 alternate reading frame of PB1 gene. Since the discovery, varying sizes and functions of the PB1F2 protein of influenza A viruses have been reported. Selection of PB1 gene segment in the pandemics, variable size and pleiotropic effect of PB1F2 intrigued us to analyze amino acid sequences of this protein in various influenza A viruses. Amino acid sequences for PB1F2 protein of influenza A H5N1, H1N1, H2N2, and H3N2 subtypes were obtained from Influenza Research Database. Multiple sequence alignments of the PB1F2 protein sequences of the aforementioned subtypes were used to determine the size, variable and conserved domains and to perform mutational analysis. Analysis showed that 96·4% of the H5N1 influenza viruses harbored full-length PB1F2 protein. Except for the 2009 pandemic H1N1 virus, all the subtypes of the 20th-century pandemic influenza viruses contained full-length PB1F2 protein. Through the years, PB1F2 protein of the H1N1 and H3N2 viruses has undergone much variation. PB1F2 protein sequences of H5N1 viruses showed both human- and avian host-specific conserved domains. Global database of PB1F2 protein revealed that N66S mutation was present only in 3·8% of the H5N1 strains. We found a novel mutation, N84S in the PB1F2 protein of 9·35% of the highly pathogenic avian influenza H5N1 influenza viruses. Varying sizes and mutations of the PB1F2 protein in different influenza A virus subtypes with pandemic potential were obtained. There was genetic divergence of the protein in various hosts which highlighted the host-specific evolution of the virus. However, studies are required to correlate this sequence variability with the virulence and pathogenicity. © 2012 John Wiley & Sons Ltd.

  18. cDNA sequence and tissue expression analysis of glucokinase from ...

    African Journals Online (AJOL)

    Yomi

    2012-01-10

    Jan 10, 2012 ... (Rattus norvegicus) were 98.1, 96.8, 80.3 and 79.8%, respectively. Phylogenetic analysis based on GK amino acid sequences. Phylogenetic analysis among eight fish species, eleven endothermic species and one amphibian species based on glucokinase amino acid sequences is shown in Figure. 3.

  19. The amino acid sequence of the light chain of Acanthamoeba myosin IC.

    Science.gov (United States)

    Wang, Z Y; Sakai, J; Matsudaira, P T; Baines, I C; Sellers, J R; Hammer, J A; Korn, E D

    1997-06-01

    The amino acid sequence of the light chain of Acanthamoeba myosin IC deduced from the cDNA sequence comprises 149 amino acids with a calculated molecular weight of 16,739. All but the 3 N-terminal residues were also determined by amino acid sequencing of the purified protein, which also showed the N-terminus to be blocked. Phylogenetic analysis shows Acanthamoeba myosin IC light chain to be more similar to the calmodulin subfamily of EF-hand calcium-modulated proteins than to the myosin II essential light chain or regulatory light chain subfamilies. In pairwise comparisons, the myosin IC light chain sequence is most similar to sequences of calmodulins (approximately 50% identical) and a squid calcium-binding protein (approximately 43% identical); the sequence is approximately 37% identical to the calcium-binding essential light chain of Physarum myosin II and approximately 30% identical to the essential light chain of Acanthamoeba myosin II, and the essential light chain and regulatory light chain of Dictyostelium myosin II. The sequence predicts four helix-loop-helix domains with possible calcium-binding sites in domains I and III, suggesting that calcium may affect the activity of this unconventional myosin. This is the first report of the sequence of an unconventional myosin light chain other than calmodulin.

  20. Functional analysis of sequences adjacent to dapE of Corynebacterium glutamicum reveals the presence of aroP, which encodes the aromatic amino acid transporter.

    Science.gov (United States)

    Wehrmann, A; Morakkabati, S; Krämer, R; Sahm, H; Eggeling, L

    1995-10-01

    An initially nonclonable DNA locus close to a gene of L-lysine biosynthesis in Corynebacterium glutamicum was analyzed in detail. Its stepwise cloning and its functional identification by monitoring the amino acid uptakes of defined mutants, together with mechanistic studies, identified the corresponding structure as aroP, the general aromatic amino acid uptake system.

  1. Robustness analysis of chiller sequencing control

    International Nuclear Information System (INIS)

    Liao, Yundan; Sun, Yongjun; Huang, Gongsheng

    2015-01-01

    Highlights: • Uncertainties with chiller sequencing control were systematically quantified. • Robustness of chiller sequencing control was systematically analyzed. • Different sequencing control strategies were sensitive to different uncertainties. • A numerical method was developed for easy selection of chiller sequencing control. - Abstract: Multiple-chiller plant is commonly employed in the heating, ventilating and air-conditioning system to increase operational feasibility and energy-efficiency under part load condition. In a multiple-chiller plant, chiller sequencing control plays a key role in achieving overall energy efficiency while not sacrifices the cooling sufficiency for indoor thermal comfort. Various sequencing control strategies have been developed and implemented in practice. Based on the observation that (i) uncertainty, which cannot be avoided in chiller sequencing control, has a significant impact on the control performance and may cause the control fail to achieve the expected control and/or energy performance; and (ii) in current literature few studies have systematically addressed this issue, this paper therefore presents a study on robustness analysis of chiller sequencing control in order to understand the robustness of various chiller sequencing control strategies under different types of uncertainty. Based on the robustness analysis, a simple and applicable method is developed to select the most robust control strategy for a given chiller plant in the presence of uncertainties, which will be verified using case studies

  2. Sequences Of Amino Acids For Human Serum Albumin

    Science.gov (United States)

    Carter, Daniel C.

    1992-01-01

    Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.

  3. Automated genome sequence analysis and annotation.

    Science.gov (United States)

    Andrade, M A; Brown, N P; Leroy, C; Hoersch, S; de Daruvar, A; Reich, C; Franchini, A; Tamames, J; Valencia, A; Ouzounis, C; Sander, C

    1999-05-01

    Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized. Research in bioinformatics contributes to the development of methods for the computational characterization of these sequences. However, the installation and application of these methods require experience and are time consuming. We present here an automatic system for preliminary functional annotation of protein sequences that has been applied to the analysis of sets of sequences from complete genomes, both to refine overall performance and to make new discoveries comparable to those made by human experts. The GeneQuiz system includes a Web-based browser that allows examination of the evidence leading to an automatic annotation and offers additional information, views of the results, and links to biological databases that complement the automatic analysis. System structure and operating principles concerning the use of multiple sequence databases, underlying sequence analysis tools, lexical analyses of database annotations and decision criteria for functional assignments are detailed. The system makes automatic quality assessments of results based on prior experience with the underlying sequence analysis tools; overall error rates in functional assignment are estimated at 2.5-5% for cases annotated with highest reliability ('clear' cases). Sources of over-interpretation of results are discussed with proposals for improvement. A conservative definition for reporting 'new findings' that takes account of database maturity is presented along with examples of possible kinds of discoveries (new function, family and superfamily) made by the system. System performance in relation to sequence database coverage, database dynamics and database search methods is analysed, demonstrating the inherent advantages of an integrated automatic approach using multiple databases and search methods applied in an objective and repeatable manner. The GeneQuiz system

  4. Quantiprot - a Python package for quantitative analysis of protein sequences.

    Science.gov (United States)

    Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

    2017-07-17

    The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.

  5. cDNA, amino acid carbohydrate sequence of barley seed-specific peroxidase BP 1

    DEFF Research Database (Denmark)

    Johansson, A.; Rasmussen, Søren Kjærsgård; Harthill, J.E.

    1992-01-01

    of 69% in the translated region, a 90% G or C preference in the wobble position of the codons and a typical signal peptide sequence. N-terminal amino acid sequencing and sequence analysis of tryptic peptides verified 98% of the sequence of the mature BP 1 which contains 309 amino acid residues. BP 1...... biological role of this enzyme. The barley peroxidase is processed at the C-terminus and might be targeted to the vacuole. The single site of glycosylation is located near the C-terminus in the N-glycosylation sequon -Asn-Cys-Ser- in which Cys forms part of a disulphide bridge. The major glycan is a typical...

  6. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    Science.gov (United States)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  7. Sequence analysis of putative swrW gene required for surfactant ...

    African Journals Online (AJOL)

    Serratia marcescens produces biosurfactant serrawettin, essential for its population migration behavior. Serrawettin W1 was revealed to be an antibiotic serratamolide that makes it significant for deoxyribonucleic acid (DNA) and protein sequence analysis. Four nucleotide and amino-acid sequences from local strains ...

  8. Sequence analysis by iterated maps, a review.

    Science.gov (United States)

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  9. Sequence analysis of diamine oxidase gene from fava bean and its expression related to γ-aminobutyric acid accumulation in seeds germinating under hypoxia-NaCl stress.

    Science.gov (United States)

    Yang, Runqiang; Yin, Yongqi; Guo, Liping; Han, Yongbin; Gu, Zhenxin

    2014-06-01

    γ-Aminobutyric acid (GABA) is synthesized via the polyamine degradation pathway in plants, with diamine oxidase (DAO) being the key enzyme. In this study the cDNA of DAO in fava bean was cloned and its expression in seeds germinating under hypoxia-NaCl stress was investigated. Fava bean DAO cDNA is 2199 bp long and contains 2025 bp of open reading frame that encodes 675 amino acid peptides with a calculated molecular weight of 76.31 kDa and a pI of 5.41. Hypoxia and hypoxia-NaCl stress enhanced DAO activity and resulted in GABA accumulation in germinating fava bean. However, DAO gene expression was down-regulated under hypoxia compared with non-stress condition, while its expression in the cotyledon and shoot was up-regulated under hypoxia-NaCl. In addition, DAO expression could be promoted to enhance GABA accumulation after increasing the stress intensity using NaCl. DAO gene expression was significantly inhibited by aminoguanidine treatment under hypoxia but increased under hypoxia-NaCl. Under hypoxia, GABA accumulation due to NaCl was mainly concentrated in the cotyledon. The GABA content increase under hypoxia did not result from DAO gene expression, but DAO existing in seeds was activated under hypoxia. DAO gene expression was up-regulated to enhance GABA accumulation after increasing the stress intensity. © 2013 Society of Chemical Industry.

  10. Preliminary hazard analysis using sequence tree method

    International Nuclear Information System (INIS)

    Huang Huiwen; Shih Chunkuan; Hung Hungchih; Chen Minghuei; Yih Swu; Lin Jiinming

    2007-01-01

    A system level PHA using sequence tree method was developed to perform Safety Related digital I and C system SSA. The conventional PHA is a brainstorming session among experts on various portions of the system to identify hazards through discussions. However, this conventional PHA is not a systematic technique, the analysis results strongly depend on the experts' subjective opinions. The analysis quality cannot be appropriately controlled. Thereby, this research developed a system level sequence tree based PHA, which can clarify the relationship among the major digital I and C systems. Two major phases are included in this sequence tree based technique. The first phase uses a table to analyze each event in SAR Chapter 15 for a specific safety related I and C system, such as RPS. The second phase uses sequence tree to recognize what I and C systems are involved in the event, how the safety related systems work, and how the backup systems can be activated to mitigate the consequence if the primary safety systems fail. In the sequence tree, the defense-in-depth echelons, including Control echelon, Reactor trip echelon, ESFAS echelon, and Indication and display echelon, are arranged to construct the sequence tree structure. All the related I and C systems, include digital system and the analog back-up systems are allocated in their specific echelon. By this system centric sequence tree based analysis, not only preliminary hazard can be identified systematically, the vulnerability of the nuclear power plant can also be recognized. Therefore, an effective simplified D3 evaluation can be performed as well. (author)

  11. CcMP-II, a new hemorrhagic metalloproteinase from Cerastes cerastes snake venom: purification, biochemical characterization and amino acid sequence analysis.

    Science.gov (United States)

    Boukhalfa-Abib, Hinda; Laraba-Djebari, Fatima

    2015-01-01

    Snake venom metalloproteinases (SVMPs) are the most abundant components in snake venoms. They are important in the induction of systemic alterations and local tissue damage after envenomation. CcMP-II, which is a metalloproteinase purified from Cerastes cerastes snake venom, was obtained by a combination of gel filtration, ion-exchange and affinity chromatographies. It was homogeneous on SDS-PAGE, with a molecular mass estimated to 35kDa and presents a pI of 5.6. CcMP-II has an N-terminal sequence of EDRHINLVSVADHRMXTKY, with high levels of homology with those of the members of class P-II of SVMPs, which comprises metalloproteinase and disintegrin-like domains together. This proteinase displayed a fibrinogenolytic and hemorrhagic activities. The proteolytic and hemorrhagic activities of CcMP-II were inhibited by EDTA and 1,10-phenanthroline. However, these activities were not affected by aprotinine and PMSF, suggesting that CcMP-II is a zinc-dependent hemorrhagic metalloproteinase with an α-fibrinogenase activity. The hemorrhagic metalloproteinase CcMP-II was also able to hydrolyze extracellular matrix components, such as type IV collagen and laminin. These results indicate that CcMP-II is implicated in the local and systemic bleeding, contributing thus in the toxicity of C. cerastes venom. Copyright © 2014 Elsevier Inc. All rights reserved.

  12. Sequence analysis and structure prediction of enoyl-CoA hydratase from Avicennia marina: implication of various amino acid residues on substrate-enzyme interactions.

    Science.gov (United States)

    Jabeen, Uzma; Salim, Asmat

    2013-10-01

    Enoyl-CoA hydratase catalyzes the hydration of 2-trans-enoyl-CoA into 3-hydroxyacyl-CoA. The present study focuses on the correlation between the functional and structural aspects of enoyl-CoA hydratase from Avicennia marina. We have used bioinformatics tools to construct and analyze 3D homology models of A. marina enoyl-CoA hydratase (AMECH) bound to different substrates and inhibitors and studied the residues involved in the ligand-enzyme interaction. Structural information obtained from the models was compared with those of the reported crystal structures. We observed that the overall folds were similar; however, AMECH showed few distinct structural changes which include structural variation in the mobile loop, formation and loss of certain interactions between the active site residues and substrates. Some changes were also observed within specific regions of the enzyme. Glu106 is almost completely conserved in sequences of the isomerases/hydratases including AMECH while Glu86 which is the other catalytic residue in most of the isomerases/hydratases is replaced by Gly and shows no interaction with the substrate. Asp114 is located within 4Å distance of the catalytic water which makes it a probable candidate for the second catalytic residue in AMECH. Another prominent feature of AMECH is the presence of structurally distinct mobile loop having a completely different coordination with the hydrophobic binding pocket of acyl portion of the substrate. Copyright © 2013 Elsevier Ltd. All rights reserved.

  13. [Analysis of the Neuraminidase Amino Acid Sequences of Influenza A/H1N1pdm09, A/H3N2, and B Viruses Isolated from Influenza Patients in the 2013/14 Japanese Influenza Season].

    Science.gov (United States)

    Ikematsu, Hideyuki; Chong, Yong; Shirane, Kenjiro; Toh, Hidehiro; Sasaki, Hiroyuki; Koga, Yui; Matsumoto, Shinya; Hotta, Taeko; Uchiumi, Takeshi; Kang, Donchon

    2015-08-01

    Neuraminidase (NA) is an essential surface protein for influenza virus replication. NA inhibitors are commonly used for the treatment of influenza patients in Japan. Several mutations that reduce the effect of NA inhibitors have been reported. We sequenced the whole NA segment of isolated virus from influenza patients and investigated the relation between the NA amino acid sequence and the 50% inhibitory concentration (IC50) of four NA inhibitors. A total of 20 viruses that showed high or low IC50 of NA inhibitors were selected from A/H1N1pdm09, A/H3N2, and B isolates from the viruses isolated from patients in the 2013-14 influenza season. Viral RNA was extracted and RT-PCR was done. The amplified genome was sequenced using a next generation sequencer", and the deduced amino acid sequences were analyzed. Two A/H1N1pdm09 viruses that showed very high IC50 for oseltamivir (150 nM and 130 nM) contained the H275Y mutation. Otherwise, no significant relation was found between the NA amino acids and the IC50 of the four NA inhibitors. There was no significant relation between the NA amino acids and the IC50 of the four NA inhibitors for A/H3N2 viruses. The B viruses that showed a high IC50 for oseltamivir and laninamivir shared some amino acids. The B viruses that showed a high IC50 of zanamivir and peramivir also shared some amino acids. They were different from the shared amino acids found for oseltamivir and laninamivir. The previously reported H275Y mutation that causes oseltamivir resistance was found in the two A/H1N1pdm09 viruses that showed a very high IC50 for oseltamivir. No additional NA amino acid sequences related to the IC50 of the four NA inhibitors was found. The meaning of the shared amino acids among B viruses that showed a high IC50 would be an interesting target for further investigation.

  14. Complete genome sequence of Lactococcus lactis IO-1, a lactic acid bacterium that utilizes xylose and produces high levels of L-lactic acid.

    Science.gov (United States)

    Kato, Hiroaki; Shiwa, Yuh; Oshima, Kenshiro; Machii, Miki; Araya-Kojima, Tomoko; Zendo, Takeshi; Shimizu-Kadota, Mariko; Hattori, Masahira; Sonomoto, Kenji; Yoshikawa, Hirofumi

    2012-04-01

    We report the complete genome sequence of Lactococcus lactis IO-1 (= JCM7638). It is a nondairy lactic acid bacterium, produces nisin Z, ferments xylose, and produces predominantly L-lactic acid at high xylose concentrations. From ortholog analysis with other five L. lactis strains, IO-1 was identified as L. lactis subsp. lactis.

  15. Information theory applications for biological sequence analysis.

    Science.gov (United States)

    Vinga, Susana

    2014-05-01

    Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.

  16. Digital image sequence processing, compression, and analysis

    CERN Document Server

    Reed, Todd R

    2004-01-01

    IntroductionTodd R. ReedCONTENT-BASED IMAGE SEQUENCE REPRESENTATIONPedro M. Q. Aguiar, Radu S. Jasinschi, José M. F. Moura, andCharnchai PluempitiwiriyawejTHE COMPUTATION OF MOTIONChristoph Stiller, Sören Kammel, Jan Horn, and Thao DangMOTION ANALYSIS AND DISPLACEMENT ESTIMATION IN THE FREQUENCY DOMAINLuca Lucchese and Guido Maria CortelazzoQUALITY OF SERVICE ASSESSMENT IN NEW GENERATION WIRELESS VIDEO COMMUNICATIONSGaetano GiuntaERROR CONCEALMENT IN DIGITAL VIDEOFrancesco G.B. De NataleIMAGE SEQUENCE RESTORATION: A WIDER PERSPECTIVEAnil KokaramVIDEO SUMMARIZATIONCuneyt M. Taskiran and Edward

  17. Amino acid sequences and structures of chicken and turkey beta 2-microglobulin

    DEFF Research Database (Denmark)

    Welinder, K G; Jespersen, H M; Walther-Rasmussen, J

    1991-01-01

    The complete amino acid sequences of chicken and turkey beta 2-microglobulins have been determined by analyses of tryptic, V8-proteolytic and cyanogen bromide fragments, and by N-terminal sequencing. Mass spectrometric analysis of chicken beta 2-microglobulin supports the sequence-derived Mr of 11......,048. The higher apparent Mr obtained for the avian beta 2-microglobulins as compared to human beta 2-microglobulin by SDS-PAGE is not understood. Chicken and turkey beta 2-microglobulin consist of 98 residues and deviate at seven positions: 60, 66, 74-76, 78 and 82. The chicken and turkey sequences are identical...... to human beta 2-microglobulin at 46 and 47 positions, respectively, and to bovine beta 2-microglobulin at 47 positions, i.e. there is about 47% identity between avian and mammalian beta 2-microglobulins. The known X-ray crystallographic structures of bovine beta 2-microglobulin and human HLA-A2 complex...

  18. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  19. CISAPS: Complex Informational Spectrum for the Analysis of Protein Sequences

    Directory of Open Access Journals (Sweden)

    Charalambos Chrysostomou

    2015-01-01

    Full Text Available Complex informational spectrum analysis for protein sequences (CISAPS and its web-based server are developed and presented. As recent studies show, only the use of the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient. Therefore, CISAPS is developed to consider and provide results in three forms including absolute, real, and imaginary spectrum. Biologically related features to the analysis of influenza A subtypes as presented as a case study in this study can also appear individually either in the real or imaginary spectrum. As the results presented, protein classes can present similarities or differences according to the features extracted from CISAPS web server. These associations are probable to be related with the protein feature that the specific amino acid index represents. In addition, various technical issues such as zero-padding and windowing that may affect the analysis are also addressed. CISAPS uses an expanded list of 611 unique amino acid indices where each one represents a different property to perform the analysis. This web-based server enables researchers with little knowledge of signal processing methods to apply and include complex informational spectrum analysis to their work.

  20. Mathematical Characterization of Protein Sequences Using Patterns as Chemical Group Combinations of Amino Acids.

    Directory of Open Access Journals (Sweden)

    Jayanta Kumar Das

    Full Text Available Comparison of amino acid sequence similarity is the fundamental concept behind the protein phylogenetic tree formation. By virtue of this method, we can explain the evolutionary relationships, but further explanations are not possible unless sequences are studied through the chemical nature of individual amino acids. Here we develop a new methodology to characterize the protein sequences on the basis of the chemical nature of the amino acids. We design various algorithms for studying the variation of chemical group transitions and various chemical group combinations as patterns in the protein sequences. The amino acid sequence of conventional myosin II head domain of 14 family members are taken to illustrate this new approach. We find two blocks of maximum length 6 aa as 'FPKATD' and 'Y/FTNEKL' without repeating the same chemical nature and one block of maximum length 20 aa with the repetition of chemical nature which are common among all 14 members. We also check commonality with another motor protein sub-family kinesin, KIF1A. Based on our analysis we find a common block of length 8 aa both in myosin II and KIF1A. This motif is located in the neck linker region which could be responsible for the generation of mechanical force, enabling us to find the unique blocks which remain chemically conserved across the family. We also validate our methodology with different protein families such as MYOI, Myosin light chain kinase (MLCK and Rho-associated protein kinase (ROCK, Na+/K+-ATPase and Ca2+-ATPase. Altogether, our studies provide a new methodology for investigating the conserved amino acids' pattern in different proteins.

  1. Sequence Matching Analysis for Curriculum Development

    Directory of Open Access Journals (Sweden)

    Liem Yenny Bendatu

    2015-06-01

    Full Text Available Many organizations apply information technologies to support their business processes. Using the information technologies, the actual events are recorded and utilized to conform with predefined model. Conformance checking is an approach to measure the fitness and appropriateness between process model and actual events. However, when there are multiple events with the same timestamp, the traditional approach unfit to result such measures. This study attempts to develop a sequence matching analysis. Considering conformance checking as the basis of this approach, this proposed approach utilizes the current control flow technique in process mining domain. A case study in the field of educational process has been conducted. This study also proposes a curriculum analysis framework to test the proposed approach. By considering the learning sequence of students, it results some measurements for curriculum development. Finally, the result of the proposed approach has been verified by relevant instructors for further development.

  2. Planarian homeobox genes: cloning, sequence analysis, and expression.

    Science.gov (United States)

    Garcia-Fernàndez, J; Baguñà, J; Saló, E

    1991-01-01

    Freshwater planarians (Platyhelminthes, Turbellaria, and Tricladida) are acoelomate, triploblastic, unsegmented, and bilaterally symmetrical organisms that are mainly known for their ample power to regenerate a complete organism from a small piece of their body. To identify potential pattern-control genes in planarian regeneration, we have isolated two homeobox-containing genes, Dth-1 and Dth-2 [Dugesia (Girardia) tigrina homeobox], by using degenerate oligonucleotides corresponding to the most conserved amino acid sequence from helix-3 of the homeodomain. Dth-1 and Dth-2 homeodomains are closely related (68% at the nucleotide level and 78% at the protein level) and show the conserved residues characteristic of the homeodomains identified to data. Similarity with most homeobox sequences is low (30-50%), except with Drosophila NK homeodomains (80-82% with NK-2) and the rodent TTF-1 homeodomain (77-87%). Some unusual amino acid residues specific to NK-2, TTF-1, Dth-1, and Dth-2 can be observed in the recognition helix (helix-3) and may define a family of homeodomains. The deduced amino acid sequences from the cDNAs contain, in addition to the homeodomain, other domains also present in various homeobox-containing genes. The expression of both genes, detected by Northern blot analysis, appear slightly higher in cephalic regions than in the rest of the intact organism, while a slight increase is detected in the central period (5 days) or regeneration. Images PMID:1714599

  3. Homology and conservation of amino acids in E-protein sequences of dengue serotypes

    Directory of Open Access Journals (Sweden)

    Ramesh Venkatachalam

    2014-09-01

    Full Text Available Objective: To identify the homology and phylogenetic relationship among the four dengue virus (DENV serotypes, and conservation of amino acid in E-proteins and to find out the phylogenetic relationship among the strains of four DENV serotypes. Methods: Clustal W analysis for homology and phylogram, European molecular biology open software suite for pairwise alignment of amino acid sequences and BLAST-P analysis for various strains of four DENV serotypes were carried out. Results: Homology of E-protein sequences of four DENV serotypes indicated a close relationship of DENV-1 with DENV-3. DENV-2 showed close relationship with DENV-1 and -3 forming a single cluster whereas DENV-4 alone formed group with a single serotype. In the multiple sequence alignment, 19 amino acid conserved groups were observed. BLAST-P analysis showed more number of 100% similarity among DENV-1 and -3 strains whereas only few strains showed 100% similarity in DENV-4. However, 100% similarity was absent among the DENV-3 strains. Conclusions: From the present study, phylogenetically all the four DENV serotypes were related but DENV-1, -2 and -3 were very closely related whereas DENV-4 was somewhat distant from the other three serotypes.

  4. Main: Amino acid Analysis [KOME

    Lifescience Database Archive (English)

    Full Text Available Amino acid Analysis GO classification InterPro Result of GO classification by Inter...Pro motif search result kome_go_classification_interpro.zip kome_go_classification_interpro ...

  5. Main: Amino acid Analysis [KOME

    Lifescience Database Archive (English)

    Full Text Available Amino acid Analysis GO classification GenBank Result of GO classification by GenBan...k homology search result kome_go_classification_genbank.zip kome_go_classification_genbank ...

  6. Isolation and characterization of isopimaric acid-degrading bacteria from a sequencing batch reactor.

    Science.gov (United States)

    Wilson, A E; Moore, E R; Mohn, W W

    1996-09-01

    We isolated two aerobic, gram-negative bacteria which grew on the diterpene resin acid isopimaric acid (IpA) as the sole carbon source and electron donor. The source of the isolates was a sequencing batch reactor treating a high-strength process stream from a paper mill. The isolates, IpA-1 and IpA-2, also grew on pimaric and dehydroabietic acids, and IpA-1 grew on abietic acid. Both strains used fatty acids, but neither strain used camphor, sitosterol, or betulin. Strain IpA-1 grew anaerobically with nitrate as an electron acceptor. Strains IpA-1 and IpA-2 had growth yields of 0.19 and 0.23 g of protein per g of IpA, respectively. During growth, both strains transformed IpA carbon to approximately equal amounts of biomass, carbon dioxide, and dissolved organic carbon. In both strains, growth on IpA induced an enzymatic system which caused cell suspensions to transform all four of the above resin acids. Cell suspensions of IpA-1 and IpA-2 removed IpA at rates of 0.56 and 0.13 mumol mg of protein-1 h-1, respectively. Cultures and cell suspensions of both strains failed to completely consume pimaric acid and yielded small amounts of an apparent metabolite from this acid. Cultures and cell suspensions of both strains yielded large amounts of three apparent metabolites from dehydroabietic acid. Analysis of 16S rDNA sequences indicated that the isolates are distinct members of the genus Pseudomonas sensu stricto.

  7. Protein sequence analysis using Hewlett-Packard biphasic sequencing cartridges in an applied biosystems 473A protein sequencer.

    Science.gov (United States)

    Tang, S; Mozdzanowski, J; Anumula, K R

    1999-01-01

    Protein sequence analysis using an adsorptive biphasic sequencing cartridge, a set of two coupled columns introduced by Hewlett-Packard for protein sequencing by Edman degradation, in an Applied Biosystems 473A protein sequencer has been demonstrated. Samples containing salts, detergents, excipients, etc. (e.g., formulated protein drugs) can be easily analyzed using the ABI sequencer. Simple modifications to the ABI sequencer to accommodate the cartridge extend its utility in the analysis of difficult samples. The ABI sequencer solvents and reagents were compatible with the HP cartridge for sequencing. Sequence information up to ten residues can be easily generated by this nonoptimized procedure, and it is sufficient for identifying proteins by database search and for preparing a DNA probe for cloning novel proteins.

  8. Genome sequence and analysis of Lactobacillus helveticus

    Directory of Open Access Journals (Sweden)

    Paola eCremonesi

    2013-01-01

    Full Text Available The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of L. helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract.As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones.

  9. Metazoan Remaining Genes for Essential Amino Acid Biosynthesis: Sequence Conservation and Evolutionary Analyses

    Directory of Open Access Journals (Sweden)

    Igor R. Costa

    2014-12-01

    Full Text Available Essential amino acids (EAA consist of a group of nine amino acids that animals are unable to synthesize via de novo pathways. Recently, it has been found that most metazoans lack the same set of enzymes responsible for the de novo EAA biosynthesis. Here we investigate the sequence conservation and evolution of all the metazoan remaining genes for EAA pathways. Initially, the set of all 49 enzymes responsible for the EAA de novo biosynthesis in yeast was retrieved. These enzymes were used as BLAST queries to search for similar sequences in a database containing 10 complete metazoan genomes. Eight enzymes typically attributed to EAA pathways were found to be ubiquitous in metazoan genomes, suggesting a conserved functional role. In this study, we address the question of how these genes evolved after losing their pathway partners. To do this, we compared metazoan genes with their fungal and plant orthologs. Using phylogenetic analysis with maximum likelihood, we found that acetolactate synthase (ALS and betaine-homocysteine S-methyltransferase (BHMT diverged from the expected Tree of Life (ToL relationships. High sequence conservation in the paraphyletic group Plant-Fungi was identified for these two genes using a newly developed Python algorithm. Selective pressure analysis of ALS and BHMT protein sequences showed higher non-synonymous mutation ratios in comparisons between metazoans/fungi and metazoans/plants, supporting the hypothesis that these two genes have undergone non-ToL evolution in animals.

  10. Chimera: construction of chimeric sequences for phylogenetic analysis

    NARCIS (Netherlands)

    Leunissen, J.A.M.

    2003-01-01

    Chimera allows the construction of chimeric protein or nucleic acid sequence files by concatenating sequences from two or more sequence files in PHYLIP formats. It allows the user to interactively select genes and species from the input files. The concatenated result is stored to one single output

  11. Strategy for the sequence analysis of heparin.

    Science.gov (United States)

    Liu, J; Desai, U R; Han, X J; Toida, T; Linhardt, R J

    1995-12-01

    The versatile biological activities of proteoglycans are mainly mediated by their glycosaminoglycan (GAG) components. Unlike proteins and nucleic acids, no satisfactory method for sequencing GAGs has been developed. This paper describes a strategy to sequence the GAG chains of heparin. Heparin, prepared from animal tissue, and processed by proteinases and endoglucuronidases, is 90% GAG heparin and 10% peptidoglycan heparin (containing small remnants of core protein). Raw porcine mucosal heparin was labelled on the amino termini of these core protein remnants with a hydrophobic, fluorescent tag [N-4-(6-dimethylamino-2-benzofuranyl) phenyl (NDBP)-isothiocyanate]. Enrichment of the NDBP-heparin using phenyl-Sepharose chromatography, followed by treatment with a mixture of heparin lyase I and III, resulted in a single NDBP-linkage region tetrasaccharide, which was characterized as deltaUAp(1-->3)-beta-D-Galp(1-->3)-beta-D-Galp(1-->4)-beta-Xylp -(1-->O-Ser-NDBP (deltaUAp is 4-deoxy-alpha-L-threo-hex-4-enopyranosyl uronic acid). Several NDBP-octasaccharides were isolated when NDBP-heparin was treated with only heparin lyase I. The structure of one of these NDBP-octasaccharides, deltaUAp2S(1-->4)-alpha-D-GlcNpAc(1-->4)-alpha-L-IdoAp (1-->4)-alpha-D-GlcNpAc6S(1-->4)-beta-D-GlcAp(1-->3)-beta-D- Galp(1-->3)-beta-D-Galp(1-->4)-beta-Xylp-(1-->O-Ser NDBP (S is sulphate, Ac is acetate), was determined by 1H-NMR and enzymatic methods. Enriched NDBP-heparin was treated with lithium hydroxide to release heparin, and the GAG chain was then labelled at xylose with 7-amino-1,3-naphthalene disulphonic acid (AGA). The resulting AGA-Xyl-heparin was sequenced on gradient PAGE using heparin lyase I and heparin lyase III. A predominant sequence in heparin at the protein core attachment site was deduced to be -D-GlcNp2S6S(or 6OH)(1-->4)-alpha-L-IdoAp2S-(1-->4)-alpha-D-GlcNp2S6S (or60H) (1-->4)-alpha-L-IdoAp2S(1-->4)-alpha-D-GlcNp2S6S( or 6OH)(1-->4)-alpha-L-IdoAp2S(1-->4)-alpha-D-GlcNpAc (1

  12. Complete genome sequence analysis of chicken astrovirus isolate from India.

    Science.gov (United States)

    Patel, Amrutlal K; Pandit, Ramesh J; Thakkar, Jalpa R; Hinsu, Ankit T; Pandey, Vinod C; Pal, Joy K; Prajapati, Kantilal S; Jakhesara, Subhash J; Joshi, Chaitanya G

    2017-03-01

    Chicken astroviruses have been known to cause severe disease in chickens leading to increased mortality and "white chicks" condition. Here we aim to characterize the causative agent of visceral gout suspected for astrovirus infection in broiler breeder chickens. Total RNA isolated from allantoic fluid of SPF embryo passaged with infected chicken sample was sequenced by whole genome shotgun sequencing using ion-torrent PGM platform. The sequence was analysed for the presence of coding and non-coding features, its similarity with reported isolates and epitope analysis of capsid structural protein. The consensus length of 7513 bp genome sequence of Indian isolate of chicken astrovirus was obtained after assembly of 14,121 high quality reads. The genome was comprised of 13 bp 5'-UTR, three open reading frames (ORFs) including ORF1a encoding serine protease, ORF1b encoding RNA dependent RNA polymerase (RdRp) and ORF2 encoding capsid protein, and 298 bp of 3'-UTR which harboured two corona virus stem loop II like "s2m" motifs and a poly A stretch of 19 nucleotides. The genetic analysis of CAstV/INDIA/ANAND/2016 suggested highest sequence similarity of 86.94% with the chicken astrovirus isolate CAstV/GA2011 followed by 84.76% with CAstV/4175 and 74.48%% with CAstV/Poland/G059/2014 isolates. The capsid structural protein of CAstV/INDIA/ANAND/2016 showed 84.67% similarity with chicken astrovirus isolate CAstV/GA2011, 81.06% with CAstV/4175 and 41.18% with CAstV/Poland/G059/2014 isolates. However, the capsid protein sequence showed high degree of sequence identity at nucleotide level (98.64-99.32%) and at amino acids level (97.74-98.69%) with reported sequences of Indian isolates suggesting their common origin and limited sequence divergence. The epitope analysis by SVMTriP identified two unique epitopes in our isolate, seven shared epitopes among Indian isolates and two shared epitopes among all isolates except Poland isolate which carried all distinct epitopes.

  13. Comparative analysis of sequences from PT 2013

    DEFF Research Database (Denmark)

    Mikkelsen, Susie Sommer

    . All but one sequence mapped to the MCP gene while the last sequence mapped to the Neurofilament gene. Approx. half of the sequences contained no errors while the rest differed with 88-99 percent similarity with most having 99% similarity. One sequence, when BLASTed, showed most similarity to European...... Sheatfish and not EHNV. Generally, mistakes occurred at the ends of the sequences. This can be due to several factors. One is that the sequence has not been trimmed of the sequence primer sites. Another is the lack of quality control of the chromatogram. Finally, sequencing in just one direction can result...

  14. Bayesian Correlation Analysis for Sequence Count Data.

    Directory of Open Access Journals (Sweden)

    Daniel Sánchez-Taltavull

    Full Text Available Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low-especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.

  15. Nanopore sensors for nucleic acid analysis

    International Nuclear Information System (INIS)

    Nakane, Jonathan J; Akeson, Mark; Marziali, Andre

    2003-01-01

    In the past decade, nanometre-scale pores have been explored as the basis for technologies to analyse and sequence single nucleic acid molecules. Most approaches involve using such a pore to localize single macromolecules and interact with them to garner some information on their composition. Though nanopore sensors cannot yet claim success at deoxyribonucleic acid (DNA) sequencing, nanopore-based technologies offer one of the most promising approaches to single molecule detection and analysis. The majority of experimental work with nanopore detection of nucleic acids has involved the α-haemolysin (alpha-HL) ion channel a heptameric protein with a ∼2 nm diameter inner pore which allows translocation of single-stranded DNA. Analysis of externally induced ion current through the pore during its interaction with DNA can provide information about the DNA molecule, including length and base composition. This review focuses on alpha-HL and its applications to single-molecule detection. Modified alpha-HL and other biological and synthetic pores for macromolecule detection are also discussed, along with a brief summary of relevant theoretical work and numerical modelling of polymer-pore interaction. (topical review)

  16. Complete Genome Sequence of the Probiotic Lactic Acid Bacterium Lactobacillus Rhamnosus

    Directory of Open Access Journals (Sweden)

    Samat Kozhakhmetov

    2014-01-01

    Full Text Available Introduction: Lactobacilli are a bacteria commonly found in the gastrointestinal tract. Some species of this genus have probiotic properties. The most common of these is Lactobacillus rhamnosus, a microoganism, generally regarded as safe (GRAS. It is also a homofermentative L-(+-lactic acid producer. The genus Lactobacillus is characterized by an extraordinary degree of the phenotypic and genotypic diversity. However, the studies of the genus were conducted mostly with the unequally distributed, non-random choice of species for sequencing; thus, there is only one representative genome from the Lactobacillus rhamnosus clade available to date. The aim of this study was to characterize the genome sequencing of selected strains of Lactobacilli. Methods: 109 samples were isolated from national domestic dairy products in the laboratory of Center for life sciences. After screaning isolates for probiotic properties, a highly active Lactobacillus spp strain was chosen. Genomic DNA was extracted according to the manufacturing protocol (Wizard® Genomic DNA Purification Kit. The Lactobacillus rhamnosus strain was identified as the highly active Lactobacillus strain accoridng to its morphological, cultural, physiological, and biochemical properties, and a genotypic analysis. Results: The genome of Lactobacillus rhamnosus was sequenced using the Roche 454 GS FLX (454 GS FLX platforms. The initial draft assembly was prepared from 14 large contigs (20 all contigs by the Newbler gsAssembler 2.3 (454 Life Sciences, Branford, CT. Conclusion: A full genome-sequencing of selected strains of lactic acid bacteria was made during the study.

  17. A basic analysis toolkit for biological sequences

    Directory of Open Access Journals (Sweden)

    Siragusa Enrico

    2007-09-01

    Full Text Available Abstract This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at http://www.math.unipa.it/~raffaele/BATS/ under the GNU GPL.

  18. Whole-Genome Sequence Analysis of Bombella intestini LMG 28161T, a Novel Acetic Acid Bacterium Isolated from the Crop of a Red-Tailed Bumble Bee, Bombus lapidarius.

    Directory of Open Access Journals (Sweden)

    Leilei Li

    Full Text Available The whole-genome sequence of Bombella intestini LMG 28161T, an endosymbiotic acetic acid bacterium (AAB occurring in bumble bees, was determined to investigate the molecular mechanisms underlying its metabolic capabilities. The draft genome sequence of B. intestini LMG 28161T was 2.02 Mb. Metabolic carbohydrate pathways were in agreement with the metabolite analyses of fermentation experiments and revealed its oxidative capacity towards sucrose, D-glucose, D-fructose and D-mannitol, but not ethanol and glycerol. The results of the fermentation experiments also demonstrated that the lack of effective aeration in small-scale carbohydrate consumption experiments may be responsible for the lack of reproducibility of such results in taxonomic studies of AAB. Finally, compared to the genome sequences of its nearest phylogenetic neighbor and of three other insect associated AAB strains, the B. intestini LMG 28161T genome lost 69 orthologs and included 89 unique genes. Although many of the latter were hypothetical they also included several type IV secretion system proteins, amino acid transporter/permeases and membrane proteins which might play a role in the interaction with the bumble bee host.

  19. An analysis of sequence alignment: heuristic algorithms.

    Science.gov (United States)

    Bucak, I Ö; Uslan, V

    2010-01-01

    Sequence alignment becomes challenging with an increase in size and number of sequences. Finding optimal or near optimal solutions for sequence alignment is one of the most important operations in bioinformatics. This study aims to survey heuristics applied for the sequence alignment problem summarized in a time line.

  20. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  1. RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

    Science.gov (United States)

    Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide

    2000-01-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can

  2. RIKEN integrated sequence analysis (RISA) system--384-format sequencing pipeline with 384 multicapillary sequencer.

    Science.gov (United States)

    Shibata, K; Itoh, M; Aizawa, K; Nagaoka, S; Sasaki, N; Carninci, P; Konno, H; Akiyama, J; Nishi, K; Kitsunai, T; Tashiro, H; Itoh, M; Sumi, N; Ishii, Y; Nakamura, S; Hazama, M; Nishine, T; Harada, A; Yamamoto, R; Matsumoto, H; Sakaguchi, S; Ikegami, T; Kashiwagi, K; Fujiwake, S; Inoue, K; Togawa, Y

    2000-11-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3' end and 5' end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can be

  3. Multilocus sequence analysis of the family Halomonadaceae.

    Science.gov (United States)

    de la Haba, Rafael R; Márquez, M Carmen; Papke, R Thane; Ventosa, Antonio

    2012-03-01

    Multilocus sequence analysis (MLSA) protocols have been developed for species circumscription for many taxa. However, at present, no studies based on MLSA have been performed within any moderately halophilic bacterial group. To test the usefulness of MLSA with these kinds of micro-organisms, the family Halomonadaceae, which includes mainly halophilic bacteria, was chosen as a model. This family comprises ten genera with validly published names and 85 species of environmental, biotechnological and clinical interest. In some cases, the phylogenetic relationships between members of this family, based on 16S rRNA gene sequence comparisons, are not clear and a deep phylogenetic analysis using several housekeeping genes seemed appropriate. Here, MLSA was applied using the 16S rRNA, 23S rRNA, atpA, gyrB, rpoD and secA genes for species of the family Halomonadaceae. Phylogenetic trees based on the individual and concatenated gene sequences revealed that the family Halomonadaceae formed a monophyletic group of micro-organisms within the order Oceanospirillales. With the exception of the genera Halomonas and Modicisalibacter, all other genera within this family were phylogenetically coherent. Five of the six studied genes (16S rRNA, 23S rRNA, gyrB, rpoD and secA) showed a consistent evolutionary history. However, the results obtained with the atpA gene were different; thus, this gene may not be considered useful as an individual gene phylogenetic marker within this family. The phylogenetic methods produced variable results, with those generated from the maximum-likelihood and neighbour-joining algorithms being more similar than those obtained by maximum-parsimony methods. Horizontal gene transfer (HGT) plays an important evolutionary role in the family Halomonadaceae; however, the impact of recombination events in the phylogenetic analysis was minimized by concatenating the six loci, which agreed with the current taxonomic scheme for this family. Finally, the findings of

  4. Amino Acid Sequence - KOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ino Acid Sequence Data detail Data name Amino Acid Sequence DOI 10.18908/lsdba.nbdc00... '[ Japanese | English ]'; } else if ( url.search(//(dbmeta|datameta)-(list|searc...switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us KOME Am...120-031 Description of data contents Amino acid sequences of the full-length cDNA clones deduced from the lo...ngest ORFs. Data file File name: CSV: kome_ine_full_sequence_amino_db.zip File URL: ftp://ftp.biosciencedbc.

  5. Time fluctuation analysis of forest fire sequences

    Science.gov (United States)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value

  6. Statistical analysis of next generation sequencing data

    CERN Document Server

    Nettleton, Dan

    2014-01-01

    Next Generation Sequencing (NGS) is the latest high throughput technology to revolutionize genomic research. NGS generates massive genomic datasets that play a key role in the big data phenomenon that surrounds us today. To extract signals from high-dimensional NGS data and make valid statistical inferences and predictions, novel data analytic and statistical techniques are needed. This book contains 20 chapters written by prominent statisticians working with NGS data. The topics range from basic preprocessing and analysis with NGS data to more complex genomic applications such as copy number variation and isoform expression detection. Research statisticians who want to learn about this growing and exciting area will find this book useful. In addition, many chapters from this book could be included in graduate-level classes in statistical bioinformatics for training future biostatisticians who will be expected to deal with genomic data in basic biomedical research, genomic clinical trials and personalized med...

  7. SVAMP: Sequence variation analysis, maps and phylogeny

    KAUST Repository

    Naeem, Raeece

    2014-04-03

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis in real time from variant call format (VCF) and associated metadata files. Allele frequency map, geographical map of isolates, Tajima\\'s D metric, single nucleotide polymorphism density, GC and variation density are also available for visualization in real time. We demonstrate the utility of SVAMP in tracking a methicillin-resistant Staphylococcus aureus outbreak from published next-generation sequencing data across 15 countries. We also demonstrate the scalability and accuracy of our software on 245 Plasmodium falciparum malaria isolates from three continents. Availability and implementation: The Qt/C++ software code, binaries, user manual and example datasets are available at http://cbrc.kaust.edu.sa/svamp. © The Author 2014.

  8. N-terminal N-myristoylation of proteins: prediction of substrate proteins from amino acid sequence.

    Science.gov (United States)

    Maurer-Stroh, Sebastian; Eisenhaber, Birgit; Eisenhaber, Frank

    2002-04-05

    Myristoylation by the myristoyl-CoA:protein N-myristoyltransferase (NMT) is an important lipid anchor modification of eukaryotic and viral proteins. Automated prediction of N-terminal N-myristoylation from the substrate protein sequence alone is necessary for large-scale sequence annotation projects but it requires a low rate of false positive hits in addition to a sufficient sensitivity. Our previous analysis of substrate protein sequence variability, NMT sequences and 3D structures has revealed motif properties in addition to the known PROSITE motif that are utilized in a new predictor described here. The composite prediction function (with separate ad hoc parameterization (a) for queries from non-fungal eukaryotes and their viruses and (b) for sequences from fungal species) consists of terms evaluating amino acid type preferences at sequences positions close to the N terminus as well as terms penalizing deviations from the physical property pattern of amino acid side-chains encoded in multi-residue correlation within the motif sequence. The algorithm has been validated with a self-consistency and two jack-knife tests for the learning set as well as with kinetic data for model substrates. The sensitivity in recognizing documented NMT substrates is above 95 % for both taxon-specific versions. The corresponding rate of false positive prediction (for sequences with an N-terminal glycine residue) is close to 0.5 %; thus, the technique is applicable for large-scale automated sequence database annotation. The predictor is available as public WWW-server with the URL http://mendel.imp.univie.ac.at/myristate/. Additionally, we propose a version of the predictor that identifies a number of proteolytic protein processing sites at internal glycine residues and that evaluates possible N-terminal myristoylation of the protein fragments.A scan of public protein databases revealed new potential NMT targets for which the myristoyl modification may be of critical importance for

  9. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    BACKGROUND: The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. RESULTS: Assemblies...... of the BAC clone derived genome sequence have been annotated using the Pre-Ensembl and Ensembl automated pipelines and made accessible through the Pre-Ensembl/Ensembl browsers. The current annotated genome assembly (Sscrofa9) was released with Ensembl 56 in September 2009. A revised assembly (Sscrofa10......) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30x genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were...

  10. Amino acid substitutions in genetic variants of human serum albumin and in sequences inferred from molecular cloning.

    Science.gov (United States)

    Takahashi, N; Takahashi, Y; Blumberg, B S; Putnam, F W

    1987-01-01

    The structural changes in four genetic variants of human serum albumin were analyzed by tandem high-pressure liquid chromatography (HPLC) of the tryptic peptides, HPLC mapping and isoelectric focusing of the CNBr fragments, and amino acid sequence analysis of the purified peptides. Lysine-372 of normal (common) albumin A was changed to glutamic acid both in albumin Naskapi, a widespread polymorphic variant of North American Indians, and in albumin Mersin found in Eti Turks. The two variants also exhibited anomalous migration in NaDodSO4/PAGE, which is attributed to a conformational change. The identity of albumins Naskapi and Mersin may have originated through descent from a common mid-Asiatic founder of the two migrating ethnic groups, or it may represent identical but independent mutations of the albumin gene. In albumin Adana, from Eti Turks, the substitution site was not identified but was localized to the region from positions 447 through 548. The substitution of aspartic acid-550 by glycine was found in albumin Mexico-2 from four individuals of the Pima tribe. Although only single-point substitutions have been found in these and in certain other genetic variants of human albumin, five differences exist in the amino acid sequences inferred from cDNA sequences by workers in three other laboratories. However, our results on albumin A and on 14 different genetic variants accord with the amino acid sequence of albumin deduced from the genomic sequence. The apparent amino acid substitutions inferred from comparison of individual cDNA sequences probably reflect artifacts in cloning or in cDNA sequence analysis rather than polymorphism of the coding sections of the albumin gene. Images PMID:3474609

  11. Streptococcal phosphoenolpyruvate-sugar phosphotransferase system: amino acid sequence and site of ATP-dependent phosphorylation of HPr

    International Nuclear Information System (INIS)

    Deutscher, J.; Pevec, B.; Beyreuther, K.; Kiltz, H.H.; Hengstenberg, W.

    1986-01-01

    The amino acid sequence of histidine-containing protein (HPr) from Streptococcus faecalis has been determined by direct Edman degradation of intact HPr and by amino acid sequence analysis of tryptic peptides, V8 proteolyptic peptides, thermolytic peptides, and cyanogen bromide cleavage products. HPr from S. faecalis was found to contain 89 amino acid residues, corresponding to a molecular weight of 9438. The amino acid sequence of HPr from S. faecalis shows extended homology to the primary structure of HPr proteins from other bacteria. Besides the phosphoenolpyruvate-dependent phosphorylation of a histidyl residue in HPr, catalyzed by enzyme I of the bacterial phosphotransferase system, HPr was also found to be phosphorylated at a seryl residue in an ATP-dependent protein kinase catalyzed reaction. The site of ATP-dependent phosphorylation in HPr of S faecalis has now been determined. [ 32 P]P-Ser-HPr was digested with three different proteases, and in each case, a single labeled peptide was isolated. Following digestion with subtilisin, they obtained a peptide with the sequence -(P)Ser-Ile-Met-. Using chymotrypsin, they isolated a peptide with the sequence -Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-Gly-Val-Met-. The longest labeled peptide was obtained with V8 staphylococcal protease. According to amino acid analysis, this peptide contained 36 out of the 89 amino acid residues of HPr. The following sequence of 12 amino acid residues of the V8 peptide was determined: -Tyr-Lys-Gly-Lys-Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-. Thus, the site of ATP-dependent phosphorylation was determined to be Ser-46 within the primary structure of HPr

  12. Streptococcal phosphoenolpyruvate-sugar phosphotransferase system: amino acid sequence and site of ATP-dependent phosphorylation of HPr

    Energy Technology Data Exchange (ETDEWEB)

    Deutscher, J.; Pevec, B.; Beyreuther, K.; Kiltz, H.H.; Hengstenberg, W.

    1986-10-21

    The amino acid sequence of histidine-containing protein (HPr) from Streptococcus faecalis has been determined by direct Edman degradation of intact HPr and by amino acid sequence analysis of tryptic peptides, V8 proteolyptic peptides, thermolytic peptides, and cyanogen bromide cleavage products. HPr from S. faecalis was found to contain 89 amino acid residues, corresponding to a molecular weight of 9438. The amino acid sequence of HPr from S. faecalis shows extended homology to the primary structure of HPr proteins from other bacteria. Besides the phosphoenolpyruvate-dependent phosphorylation of a histidyl residue in HPr, catalyzed by enzyme I of the bacterial phosphotransferase system, HPr was also found to be phosphorylated at a seryl residue in an ATP-dependent protein kinase catalyzed reaction. The site of ATP-dependent phosphorylation in HPr of S faecalis has now been determined. (/sup 32/P)P-Ser-HPr was digested with three different proteases, and in each case, a single labeled peptide was isolated. Following digestion with subtilisin, they obtained a peptide with the sequence -(P)Ser-Ile-Met-. Using chymotrypsin, they isolated a peptide with the sequence -Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-Gly-Val-Met-. The longest labeled peptide was obtained with V8 staphylococcal protease. According to amino acid analysis, this peptide contained 36 out of the 89 amino acid residues of HPr. The following sequence of 12 amino acid residues of the V8 peptide was determined: -Tyr-Lys-Gly-Lys-Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-. Thus, the site of ATP-dependent phosphorylation was determined to be Ser-46 within the primary structure of HPr.

  13. Movement Pattern Analysis Based on Sequence Signatures

    Directory of Open Access Journals (Sweden)

    Seyed Hossein Chavoshi

    2015-09-01

    Full Text Available Increased affordability and deployment of advanced tracking technologies have led researchers from various domains to analyze the resulting spatio-temporal movement data sets for the purpose of knowledge discovery. Two different approaches can be considered in the analysis of moving objects: quantitative analysis and qualitative analysis. This research focuses on the latter and uses the qualitative trajectory calculus (QTC, a type of calculus that represents qualitative data on moving point objects (MPOs, and establishes a framework to analyze the relative movement of multiple MPOs. A visualization technique called sequence signature (SESI is used, which enables to map QTC patterns in a 2D indexed rasterized space in order to evaluate the similarity of relative movement patterns of multiple MPOs. The applicability of the proposed methodology is illustrated by means of two practical examples of interacting MPOs: cars on a highway and body parts of a samba dancer. The results show that the proposed method can be effectively used to analyze interactions of multiple MPOs in different domains.

  14. RevTrans: multiple alignment of coding DNA from aligned amino acid sequences

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Pedersen, Anders Gorm

    2003-01-01

    The simple fact that proteins are built from 20 amino acids while DNA only contains four different bases, means that the 'signal-to-noise ratio' in protein sequence alignments is much better than in alignments of DNA. Besides this information-theoretical advantage, protein alignments also benefit...... analysis. RevTrans also accepts user-provided protein alignments for greater control of the alignment process. The RevTrans web server is freely available at http://www.cbs.dtu.dk/services/RevTrans/....

  15. Characterization of the murine fatty acid transport protein gene and its insulin response sequence.

    Science.gov (United States)

    Hui, T Y; Frohnert, B I; Smith, A J; Schaffer, J E; Bernlohr, D A

    1998-10-16

    Fatty acid transport protein (FATP) was identified by expression cloning strategies (Schaffer, J. E., and Lodish, H. F. (1994) Cell 79, 427-436) and shown by transfection analysis to catalyze the transfer of long-chain fatty acids across the plasma membrane of cells. It is expressed highly in tissues exhibiting rapid fatty acid metabolism such as skeletal muscle, heart, and adipose. FATP mRNA levels are down-regulated by insulin in cultured 3T3-L1 adipocytes and up-regulated by nutrient depletion in murine adipose tissue (Man, M. Z., Hui, T. Y., Schaffer, J. E., Lodish, H. F., and Bernlohr, D. A. (1996) Mol. Endocrinol. 10, 1021-1028). To determine the molecular mechanism of insulin regulation of FATP transcription, we have isolated the murine FATP gene and its 5'-flanking sequences. The FATP gene spans approximately 16 kilobases and contains 13 exons, of which exon 2 is alternatively spliced. S1 nuclease and RNase protection assays revealed the presence of multiple transcription start sites; the DNA sequence upstream of the predominant transcription start sites lacks a typical TATA box. By transient transfection assays in 3T3-L1 adipocytes, the inhibitory action of insulin on FATP transcription was localized to a cis-acting element with the sequence 5'-TGTTTTC-3' from -1347 to -1353. This sequence is very similar to the insulin response sequence found in the regulatory region of other genes negatively regulated by insulin such as those encoding phosphoenolpyruvate carboxykinase, tyrosine aminotransferase, and insulin-like growth factor-binding protein 1. Fluorescence in situ hybridization analysis revealed that the murine FATP gene is localized to chromosome 8, band 8B3.3. Interestingly, this region of chromosome 8 contains a cluster of three other genes important for fatty acid homeostasis, lipoprotein lipase, the mitochondrial uncoupling protein 1 (UCP1) and sterol regulatory element-binding protein 1. These results characterize the murine FATP gene and its

  16. Efficient computational methods for sequence analysis of small RNAs

    OpenAIRE

    Cozen, Gozde

    2007-01-01

    With the discovery of small regulatory RNAs, there has been a tremendous increase in the number of RNA sequencing projects. Meanwhile, novel high-throughput sequencing technologies, which can sequence as much as 500000 small RNA sequences in one run, have emerged. The challenge of processing this rapidly growing data can be addressed by optimizing current analysis approaches for small RNA sequences. We present fast register-level methods for small RNA pairwise alignment and small RNA to genom...

  17. Noncoding sequence classification based on wavelet transform analysis: part I

    Science.gov (United States)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  18. Modular probes for enriching and detecting complex nucleic acid sequences

    Science.gov (United States)

    Wang, Juexiao Sherry; Yan, Yan Helen; Zhang, David Yu

    2017-12-01

    Complex DNA sequences are difficult to detect and profile, but are important contributors to human health and disease. Existing hybridization probes lack the capability to selectively bind and enrich hypervariable, long or repetitive sequences. Here, we present a generalized strategy for constructing modular hybridization probes (M-Probes) that overcomes these challenges. We demonstrate that M-Probes can tolerate sequence variations of up to 7 nt at prescribed positions while maintaining single nucleotide sensitivity at other positions. M-Probes are also shown to be capable of sequence-selectively binding a continuous DNA sequence of more than 500 nt. Furthermore, we show that M-Probes can detect genes with triplet repeats exceeding a programmed threshold. As a demonstration of this technology, we have developed a hybrid capture method to determine the exact triplet repeat expansion number in the Huntington's gene of genomic DNA using quantitative PCR.

  19. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Directory of Open Access Journals (Sweden)

    Guy Leonard

    2009-01-01

    Full Text Available The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment fi le, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree fi les (with a user-defined combination of species name and/or database accession number. Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file and generation of species and accession number lists for use in supplementary materials or figure legends.

  20. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Directory of Open Access Journals (Sweden)

    Guy Leonard

    2009-05-01

    Full Text Available The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment fi le, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree fi les (with a user-defined combination of species name and/or database accession number. Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file and generation of species and accession number lists for use in supplementary materials or figure legends.

  1. Automated Sanger Analysis Pipeline (ASAP): A Tool for Rapidly Analyzing Sanger Sequencing Data with Minimum User Interference.

    Science.gov (United States)

    Singh, Aditya; Bhatia, Prateek

    2016-12-01

    Sanger sequencing platforms, such as applied biosystems instruments, generate chromatogram files. Generally, for 1 region of a sequence, we use both forward and reverse primers to sequence that area, in that way, we have 2 sequences that need to be aligned and a consensus generated before mutation detection studies. This work is cumbersome and takes time, especially if the gene is large with many exons. Hence, we devised a rapid automated command system to filter, build, and align consensus sequences and also optionally extract exonic regions, translate them in all frames, and perform an amino acid alignment starting from raw sequence data within a very short time. In full capabilities of Automated Mutation Analysis Pipeline (ASAP), it is able to read "*.ab1" chromatogram files through command line interface, convert it to the FASTQ format, trim the low-quality regions, reverse-complement the reverse sequence, create a consensus sequence, extract the exonic regions using a reference exonic sequence, translate the sequence in all frames, and align the nucleic acid and amino acid sequences to reference nucleic acid and amino acid sequences, respectively. All files are created and can be used for further analysis. ASAP is available as Python 3.x executable at https://github.com/aditya-88/ASAP. The version described in this paper is 0.28.

  2. The Role of HIV-1 gp41 Glycoprotein in Infectious Tropism Inferred from Physico-Chemical Properties of its Amino Acid Sequence

    Science.gov (United States)

    Figueroa, E.; Villarreal, C.; Huerta, L.; Cocho, G.

    2006-09-01

    We performed a statistical analysis of the amino acid sequence of the gp41 ectodomain of the Human Immunodeficiency Virus type 1. We found strong correlations between physicochemical properties of highly variable residues and the viral infectious tropism.

  3. Molecular cloning, expression analysis and sequence prediction of ...

    African Journals Online (AJOL)

    ajl yemi

    2011-11-28

    Nov 28, 2011 ... Besides, one basic leucine zipper domain (bZIP) in amino acid area from 274 to 337 was found, concurring with the main characteristic of C/EBPs. Homologous comparison of the amino acid sequences from C/EBPβ cloned in this study and those from different species indicated C/EBPβ gene of Qinchuan ...

  4. Sequencing small RNA: introduction and data analysis fundamentals.

    Science.gov (United States)

    Mehta, Jai Prakash

    2014-01-01

    Small RNAs are important transcriptional regulators within cells. With the advent of powerful Next Generation Sequencing platforms, sequencing small RNAs seems to be an obvious choice to understand their expression and its downstream effect. Additionally, sequencing provides an opportunity to identify novel and polymorphic miRNA. However, the biggest challenge is the appropriate data analysis pipeline, which is still in phase of active development by various academic groups. This chapter describes basic and advanced steps for small RNA sequencing analysis including quality control, small RNA alignment and quantification, differential expression analysis, novel small RNA identification, target prediction, and downstream analysis. We also provide a list of various resources for small RNA analysis.

  5. Project Report: Automatic Sequence Processor Software Analysis

    Science.gov (United States)

    Benjamin, Brandon

    2011-01-01

    The Mission Planning and Sequencing (MPS) element of Multi-Mission Ground System and Services (MGSS) provides space missions with multi-purpose software to plan spacecraft activities, sequence spacecraft commands, and then integrate these products and execute them on spacecraft. Jet Propulsion Laboratory (JPL) is currently is flying many missions. The processes for building, integrating, and testing the multi-mission uplink software need to be improved to meet the needs of the missions and the operations teams that command the spacecraft. The Multi-Mission Sequencing Team is responsible for collecting and processing the observations, experiments and engineering activities that are to be performed on a selected spacecraft. The collection of these activities is called a sequence and ultimately a sequence becomes a sequence of spacecraft commands. The operations teams check the sequence to make sure that no constraints are violated. The workflow process involves sending a program start command, which activates the Automatic Sequence Processor (ASP). The ASP is currently a file-based system that is comprised of scripts written in perl, c-shell and awk. Once this start process is complete, the system checks for errors and aborts if there are any; otherwise the system converts the commands to binary, and then sends the resultant information to be radiated to the spacecraft.

  6. Cloning and sequence analysis of lily and tobacco guanylate kinases.

    Science.gov (United States)

    Kumar, V

    2000-03-01

    Guanylate kinase is an essential enzyme in the nucleotide biosynthetic pathway, catalyzing the reversible transfer of the terminal phosphoryl group of ATP to GMP or dGMP. This enzyme has been well studied from several organisms and many structural and functional details have been characterized. Animal GMP kinases have also been implicated in signal transduction pathways. However, the corresponding role by plant derived GMP kinases remains to be elucidated. Full-length cDNA clones encoding enzymatically active guanylate kinases were isolated from cDNA libraries of lily and tobacco. Lily cDNA is predicted to encode a 392-amino acid protein with a molecular mass of 43.1 kDa and carries amino- and carboxy- terminal extensions of the guanylate kinase (GK)-like domain. But tobacco cDNA is predicted to encode a smaller protein of 297-amino acids with a molecular mass of 32.7 kDa. The amino acid residues known to participate in the catalytic activity of functionally characterized GMP kinases, are also conserved in GK domains of LGK-1 and NGK-1. The GK domains of NGK-1, LGK-1 and previously characterized AGK-1 from Arabidopsis exhibit 74-84% identity, whereas their N- and C-terminal domains are more divergent with amino acid conservation in the order of 48-55%. Phylogenetic analysis on the deduced amino acid sequences reveals that NGK-1 and LGK-1 form one distinct subgroup along with AGK-1 and AGK-2 homologues from Arabidopsis. Isolation of GMP kinases from diverse plant species like lily and tobacco adds a new dimension in understanding their role in cell signaling pathways that are associated with plant growth and development.

  7. Analysis and prediction of baculovirus promoter sequences.

    Science.gov (United States)

    Xing, Ke; Deng, Riqiang; Wang, Jinwen; Feng, Jinghua; Huang, Mingsong; Wang, Xunzhang

    2005-10-01

    Consensus patterns of baculovirus sequences upstream from the translational initiation sites have been analyzed and a web tool, Local Alignment Promoter Predictor (LAPP), for the prediction of baculovirus promoter sequences has also been developed. Potential consensus sequences, i.e., TCATTGT, TCTTGTA, CTCGTAA, TCCATTT and TCATT plus TCGT in approximately 30 bp spacing context, have been found in baculovirus promoter regions, in addition to well-characterized late and early promoter elements G/T/ATAAG and TATAA, which is accompanied about 30-bp downstream by a transcriptional initiation sequence CAGT or CATT. Promoter prediction is performed by a dynamic programming algorithm based on maximal segment pair measure with scores above some cutoff against each sequence in a refined promoter database. The algorithm was able to discriminate between promoter and non-promoter sequences in a test set of baculovirus sequences with prediction specificity and sensitivity superior to that using five other eukaryotic promoter recognition programs available on the Internet. A web server that implements the LAPP with continually updated promoter database is freely available at http://life.zsu.edu.cn/LAPP/.

  8. Characterization and sequence analysis of cysteine and glycine-rich ...

    African Journals Online (AJOL)

    Primers specific for CSRP3 were designed using known cDNA sequences of Bos taurus published in database with different accession numbers. Polymerase chain reaction (PCR) was performed and products were purified and sequenced. Sequence analysis and alignment were carried out using CLUSTAL W (1.83).

  9. Contig sequences and their annotation (amino acid sequence and results of homology search), and expression profile - Dicty_cDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Dicty_cDB Contig sequences and their annotation (amino acid sequence and results of homology... search), and expression profile Data detail Data name Contig sequences and their annotation (amino acid seq...f data contents Contig sequences of cDNA sequences of Dictyostelium discoideum and the...EST-3'EST-ligated sequence and full-length cDNA sequence by the assembly program ...Phrap ( http://www.phrap.org/index.html ). Link to the list of clones constituting the contig, the information on its mapping to the

  10. Incident sequence analysis; event trees, methods and graphical symbols

    International Nuclear Information System (INIS)

    1980-11-01

    When analyzing incident sequences, unwanted events resulting from a certain cause are looked for. Graphical symbols and explanations of graphical representations are presented. The method applies to the analysis of incident sequences in all types of facilities. By means of the incident sequence diagram, incident sequences, i.e. the logical and chronological course of repercussions initiated by the failure of a component or by an operating error, can be presented and analyzed simply and clearly

  11. BONSAI Garden: Parallel knowledge discovery system for amino acid sequences

    Energy Technology Data Exchange (ETDEWEB)

    Shoudai, T.; Miyano, S.; Shinohara, A.; Okazaki, T.; Arikawa, S. [Kyushu Univ., Fukuoka (Japan)] [and others

    1995-12-31

    We have developed a machine discovery system BON-SAI which receives positive and negative examples as inputs and produces as a hypothesis a pair of a decision tree over regular patterns and an alphabet indexing. This system has succeeded in discovering reasonable knowledge on transmembrane domain sequences and signal peptide sequences by computer experiments. However, when several kinds of sequences axe mixed in the data, it does not seem reasonable for a single BONSAI system to find a hypothesis of a reasonably small size with high accuracy. For this purpose, we have designed a system BONSAI Garden, in which several BONSAI`s and a program called Gardener run over a network in parallel, to partition the data into some number of classes together with hypotheses explaining these classes accurately.

  12. BONSAI Garden: parallel knowledge discovery system for amino acid sequences.

    Science.gov (United States)

    Shoudai, T; Lappe, M; Miyano, S; Shinohara, A; Okazaki, T; Arikawa, S; Uchida, T; Shimozono, S; Shinohara, T; Kuhara, S

    1995-01-01

    We have developed a machine discovery system BONSAI which receives positive and negative examples as inputs and produces as a hypothesis a pair of a decision tree over regular patterns and an alphabet indexing. This system has succeeded in discovering reasonable knowledge on transmembrane domain sequences and signal peptide sequences by computer experiments. However, when several kinds of sequences are mixed in the data, it does not seem reasonable for a single BONSAI system to find a hypothesis of a reasonably small size with high accuracy. For this purpose, we have designed a system BONSAI Garden, in which several BONSAI's and a program called Gardener run over a network in parallel, to partition the data into some number of classes together with hypotheses explaining these classes accurately.

  13. Analysis of Neuronal Sequences Using Pairwise Biases

    Science.gov (United States)

    2015-08-27

    semantic memory (knowledge of facts) and implicit memory (e.g., how to ride a bike ). Evidence for the participation of the hippocampus in the formation of...very different from each other in many ways including duration and number of spikes. Still, these sequences share a similar trend in the general order...1 and 2 precede all other spikes in both s and s�). Many other sequences share this property with s and s�; in fact, we can completely characterize

  14. Layered materials with coexisting acidic and basic sites for catalytic one-pot reaction sequences.

    Science.gov (United States)

    Motokura, Ken; Tada, Mizuki; Iwasawa, Yasuhiro

    2009-06-17

    Acidic montmorillonite-immobilized primary amines (H-mont-NH(2)) were found to be excellent acid-base bifunctional catalysts for one-pot reaction sequences, which are the first materials with coexisting acid and base sites active for acid-base tamdem reactions. For example, tandem deacetalization-Knoevenagel condensation proceeded successfully with the H-mont-NH(2), affording the corresponding condensation product in a quantitative yield. The acidity of the H-mont-NH(2) was strongly influenced by the preparation solvent, and the base-catalyzed reactions were enhanced by interlayer acid sites.

  15. WEB-server for search of a periodicity in amino acid and nucleotide sequences

    Science.gov (United States)

    E Frenkel, F.; Skryabin, K. G.; Korotkov, E. V.

    2017-12-01

    A new web server (http://victoria.biengi.ac.ru/splinter/login.php) was designed and developed to search for periodicity in nucleotide and amino acid sequences. The web server operation is based upon a new mathematical method of searching for multiple alignments, which is founded on the position weight matrices optimization, as well as on implementation of the two-dimensional dynamic programming. This approach allows the construction of multiple alignments of the indistinctly similar amino acid and nucleotide sequences that accumulated more than 1.5 substitutions per a single amino acid or a nucleotide without performing the sequences paired comparisons. The article examines the principles of the web server operation and two examples of studying amino acid and nucleotide sequences, as well as information that could be obtained using the web server.

  16. Computer Simulation of the Determination of Amino Acid Sequences in Polypeptides

    Science.gov (United States)

    Daubert, Stephen D.; Sontum, Stephen F.

    1977-01-01

    Describes a computer program that generates a random string of amino acids and guides the student in determining the correct sequence of a given protein by using experimental analytic data for that protein. (MLH)

  17. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Science.gov (United States)

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Symbols and format to be used... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  18. Peptide Nucleic Acids Having Enhanced Binding Affinity, Sequence Specificity and Solubility

    DEFF Research Database (Denmark)

    1998-01-01

    A novel class of compounds, known as peptide nucleic acids, bind complementary DNA and RNA strands more strongly than a corresponding DNA strand, and exhibit increased sequence specificity and solubility. The peptide nucleic acids comprise ligands selected from a group consisting of naturally......-occurring nucleobases and non-naturally-occurring nucleobases attached to a polyamide backbone, and contain C1-C8 alkylamine side chains. Methods of enhancing the solubility, binding affinity and sequence specificity of PNAs are provided....

  19. Complete amino acid sequence of a Lolium perenne (perennial rye grass) pollen allergen, Lol p II.

    Science.gov (United States)

    Ansari, A A; Shenbagamurthi, P; Marsh, D G

    1989-07-05

    The complete amino acid sequence of a Lolium perenne (rye grass) pollen allergen, Lol p II was determined by automated Edman degradation of the protein and selected fragments. Cleavage of the protein by enzymatic and chemical techniques established an unambiguous sequence for the protein. Lol p II contains 97 amino acid residues, with a calculated molecular weight of 10,882. The protein lacks cysteine and glutamine and shows no evidence of glycosylation. Theoretical predictions by Fraga's (Fraga, S. (1982) Can. J. Chem. 60, 2606-2610) and Hopp and Woods' (Hopp, T. P., and Woods, K. R. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 3824-3828) methods indicate the presence of four hydrophilic regions, which may contribute to sequential or parts of conformational B-cell epitopes. Analysis of amphipathic regions by Berzofsky's method indicates the presence of a highly amphipathic region, which may contain, or contribute to, an Ia/T-cell epitope. This latter segment of Lol p II was found to be highly homologous with an antibody-binding segment of the major rye allergen Lol p I and may explain why immune responsiveness to both the allergens is associated with HLA-DR3.

  20. Probing the outstanding local hydrophobicity increase of peptide sequences induced by trifluoromethylated amino acids incorporation.

    Science.gov (United States)

    Gadais, Charlène; Devillers, Emmanuelle; Gasparik, Vincent; Chelain, Evelyne; Pytkowicz, Julien; Brigaud, Thierry

    2018-03-07

    In order to accurately probe the local hydrophobicity increase of peptide sequences by trifluoromethylated amino acids (TfmAAs) incorporation, the chromatographic hydrophobicity indexes (φ0) of three series of tripeptides containing three unnatural trifluoromethylated amino acids have been measured and compared to the indexes of their non-fluorinated analogs. The hydrophobic contribution of each fluorinated amino acids was quantified by varying the position and the protection of (R) and (S)-α-trifluoromethyl alanine (TfmAla), (S)-trifluoromethyl cysteine (TfmCys) and (L)-trifluoromethionine (TFM) in a short peptide sequence. As a general trend, a strong increase of hydrophobicity was precisely measured even exceeding the high hydrophobic contribution of the natural isoleucine amino acid. This study validates the incorporation of trifluoromethylated amino acids into peptide sequences as a rational strategy for the fine tuning of hydrophobic peptide-protein interactions. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    Directory of Open Access Journals (Sweden)

    Roberts Richard J

    2008-05-01

    Full Text Available Abstract Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360, cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  2. Comparison of the Amino Acid Sequences of lectins from the seeds of

    African Journals Online (AJOL)

    Dr GATSING

    The amino acid sequence of a glucose/mannose specific lectin from the seeds of Dioclea reflexa. (Dioclea reflexa agglutinin II, DRA-II) was determined by sequential Edman analyses of the intact subunit, and of the peptides derived from the protein by enzymatic digestion with trypsin. This sequence was found to be.

  3. Comparison of the amino acid sequences of lectins from the seeds ...

    African Journals Online (AJOL)

    The amino acid sequence of a glucose/mannose specific lectin from the seeds of Dioclea reflexa (Dioclea reflexa agglutinin II, DRA-II) was determined by sequential Edman analyses of the intact subunit, and of the peptides derived from the protein by enzymatic digestion with trypsin. This sequence was found to be very ...

  4. Human heart cytochrome c oxidase subunit VIII. Purification and determination of the complete amino acid sequence

    NARCIS (Netherlands)

    van Kuilenburg, A. B.; Muijsers, A. O.; Demol, H.; Dekker, H. L.; van Beeumen, J. J.

    1988-01-01

    Subunit VIII was purified from a preparation of the human heart cytochrome c oxidase and its complete amino acid sequence was determined. The sequence proved to be much more related to that of the bovine liver oxidase subunit VIII than to that found in bovine heart. Our finding of a 'liver-type'

  5. Amino acid sequence requirements for the association of apocytochrome c with mitochondria

    NARCIS (Netherlands)

    Sprinkle, J. R.; Hakvoort, T. B.; Koshy, T. I.; Miller, D. D.; Margoliash, E.

    1990-01-01

    To examine the amino acid sequence requirements for the biphasic association of Drosophila melanogaster apocytochrome c with mouse liver mitochondria in vitro, recombinant constructs of the protein were prepared. Removal of the C-terminal sequence to residue 58 had little influence, but truncation

  6. Amino-Acid Sequences of Light Chains of a Rabbit Anti-p-Azobenzoate Antibody

    Science.gov (United States)

    Appella, E.; Chersi, A.; Roholt, O. A.; Pressman, D.

    1971-01-01

    Half-cystine peptic peptides representing the intrachain disulfide bonds of the light chain from an apparently homogeneous rabbit anti-p-azobenzoate antibody were isolated in good yields after reduction and alkylation with [14C]iodoacetate, and their sequences were determined. The sequence of the first 21 amino-terminal residues was also determined. The good yields of these peptides, the fact that no variants of any of them were found, and the cleanness of the amino-terminal sequence determination confirm the high degree of homogeneity of this light-chain preparation. Previous evidence for the homogeneity of the light chain includes the appearance of only a single band upon analysis by disc electrophoresis, a relatively unique amino-acid composition, and a simple tryptic peptide map. The whole antibody shows a homogeneity of the hapten-binding constant. The antigen used in the present work is complex, since the attached hapten groups are in a large variety of environments, particularly since the carrier is a heterogeneous mixture of globulins. The very limited heterogeneity of the antibodies found in this case would appear to depend on the stimulation of only a few of the cells capable of producing antibody against a given hapten, rather than on a structural identity of the environment around each individual hapten group that is located on the antigen molecule. Images PMID:5289890

  7. Establishing a framework for comparative analysis of genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  8. Bioinformatic analysis of whole genome sequencing data

    OpenAIRE

    Maqbool, Khurram

    2014-01-01

    Evolution has shaped the life forms for billion of years. Domestication is an accelerated process that can be used as a model for evolutionary changes. The aim of this thesis project has been to carry out extensive bioinformatic analyses of whole genome sequencing data to reveal SNPs, InDels and selective sweeps in the chicken, pig and dog genome. Pig genome sequencing revealed loci under selection for elongation of back and increased number of vertebrae, associated with the NR6A1, PLAG1,...

  9. [Tabular excel editor for analysis of aligned nucleotide sequences].

    Science.gov (United States)

    Demkin, V V

    2010-01-01

    Excel platform was used for transition of results of multiple aligned nucleotide sequences obtained using the BLAST network service to the form appropriate for visual analysis and editing. Two macros operators for MS Excel 2007 were constructed. The array of aligned sequences transformed into Excel table and processed using macros operators is more appropriate for analysis than initial html data.

  10. Prediction of the Occurrence of the ADP-binding βαβ-fold in Proteins, Using an Amino Acid Sequence Fingerprint

    NARCIS (Netherlands)

    Wierenga, Rik K.; Terpstra, Peter; Hol, Wim G.J.

    1986-01-01

    An amino acid sequence "fingerprint” has been derived that can be used to test if a particular sequence will fold into a βαβ-unit with ADP-binding properties. It was deduced from a careful analysis of the known three-dimensional structures of ADP-binding βαβ-folds. This fingerprint is in fact a set

  11. SEQUENCE ANALYSIS OF MATURASE K (MATK): A ...

    African Journals Online (AJOL)

    Global Journal

    The application and utilization of sequence data has been found very informative in the characterization and phylogenetic relationship of different crops species. This study aimed to use bioinformatics tools to characterize the. matK gene in some selected legumes with special reference to pigeon pea [cajanus cajan ...

  12. Complete amino acid sequence of human intestinal aminopeptidase N as deduced from cloned cDNA

    DEFF Research Database (Denmark)

    Cowell, G M; Kønigshøfer, E; Danielsen, E M

    1988-01-01

    The complete primary structure (967 amino acids) of an intestinal human aminopeptidase N (EC 3.4.11.2) was deduced from the sequence of a cDNA clone. Aminopeptidase N is anchored to the microvillar membrane via an uncleaved signal for membrane insertion. A domain constituting amino acid 250...

  13. Draft Genome Sequences of Two Novel Acidimicrobiaceae Members from an Acid Mine Drainage Biofilm Metagenome

    OpenAIRE

    Pinto, Ameet J.; Sharp, Jonathan O.; Yoder, Michael J.; Almstrand, Robert

    2016-01-01

    Bacteria belonging to the family Acidimicrobiaceae are frequently encountered in heavy metal-contaminated acidic environments. However, their phylogenetic and metabolic diversity is poorly resolved. We present draft genome sequences of two novel and phylogenetically distinct Acidimicrobiaceae members assembled from an acid mine drainage biofilm metagenome.

  14. Isolation and characterization of isopimaric acid-degrading bacteria from a sequencing batch reactor.

    OpenAIRE

    Wilson, A E; Moore, E R; Mohn, W W

    1996-01-01

    We isolated two aerobic, gram-negative bacteria which grew on the diterpene resin acid isopimaric acid (IpA) as the sole carbon source and electron donor. The source of the isolates was a sequencing batch reactor treating a high-strength process stream from a paper mill. The isolates, IpA-1 and IpA-2, also grew on pimaric and dehydroabietic acids, and IpA-1 grew on abietic acid. Both strains used fatty acids, but neither strain used camphor, sitosterol, or betulin. Strain IpA-1 grew anaerobic...

  15. Sequence and phylogenetic analysis of chicken anaemia virus obtained from backyard and commercial chickens in Nigeria.

    Science.gov (United States)

    Oluwayelu, D O; Todd, D; Olaleye, O D

    2008-12-01

    This work reports the first molecular analysis study of chicken anaemia virus (CAV) in backyard chickens in Africa using molecular cloning and sequence analysis to characterize CAV strains obtained from commercial chickens and Nigerian backyard chickens. Partial VP1 gene sequences were determined for three CAVs from commercial chickens and for six CAV variants present in samples from a backyard chicken. Multiple alignment analysis revealed that the 6% and 4% nucleotide diversity obtained respectively for the commercial and backyard chicken strains translated to only 2% amino acid diversity for each breed. Overall, the amino acid composition of Nigerian CAVs was found to be highly conserved. Since the partial VP1 gene sequence of two backyard chicken cloned CAV strains (NGR/CI-8 and NGR/CI-9) were almost identical and evolutionarily closely related to the commercial chicken strains NGR-1, and NGR-4 and NGR-5, respectively, we concluded that CAV infections had crossed the farm boundary.

  16. Clostridium sticklandii, a specialist in amino acid degradation:revisiting its metabolism through its genome sequence

    Directory of Open Access Journals (Sweden)

    Pelletier Eric

    2010-10-01

    Full Text Available Abstract Background Clostridium sticklandii belongs to a cluster of non-pathogenic proteolytic clostridia which utilize amino acids as carbon and energy sources. Isolated by T.C. Stadtman in 1954, it has been generally regarded as a "gold mine" for novel biochemical reactions and is used as a model organism for studying metabolic aspects such as the Stickland reaction, coenzyme-B12- and selenium-dependent reactions of amino acids. With the goal of revisiting its carbon, nitrogen, and energy metabolism, and comparing studies with other clostridia, its genome has been sequenced and analyzed. Results C. sticklandii is one of the best biochemically studied proteolytic clostridial species. Useful additional information has been obtained from the sequencing and annotation of its genome, which is presented in this paper. Besides, experimental procedures reveal that C. sticklandii degrades amino acids in a preferential and sequential way. The organism prefers threonine, arginine, serine, cysteine, proline, and glycine, whereas glutamate, aspartate and alanine are excreted. Energy conservation is primarily obtained by substrate-level phosphorylation in fermentative pathways. The reactions catalyzed by different ferredoxin oxidoreductases and the exergonic NADH-dependent reduction of crotonyl-CoA point to a possible chemiosmotic energy conservation via the Rnf complex. C. sticklandii possesses both the F-type and V-type ATPases. The discovery of an as yet unrecognized selenoprotein in the D-proline reductase operon suggests a more detailed mechanism for NADH-dependent D-proline reduction. A rather unusual metabolic feature is the presence of genes for all the enzymes involved in two different CO2-fixation pathways: C. sticklandii harbours both the glycine synthase/glycine reductase and the Wood-Ljungdahl pathways. This unusual pathway combination has retrospectively been observed in only four other sequenced microorganisms. Conclusions Analysis of the C

  17. Clostridium sticklandii, a specialist in amino acid degradation:revisiting its metabolism through its genome sequence.

    Science.gov (United States)

    Fonknechten, Nuria; Chaussonnerie, Sébastien; Tricot, Sabine; Lajus, Aurélie; Andreesen, Jan R; Perchat, Nadia; Pelletier, Eric; Gouyvenoux, Michel; Barbe, Valérie; Salanoubat, Marcel; Le Paslier, Denis; Weissenbach, Jean; Cohen, Georges N; Kreimeyer, Annett

    2010-10-11

    Clostridium sticklandii belongs to a cluster of non-pathogenic proteolytic clostridia which utilize amino acids as carbon and energy sources. Isolated by T.C. Stadtman in 1954, it has been generally regarded as a "gold mine" for novel biochemical reactions and is used as a model organism for studying metabolic aspects such as the Stickland reaction, coenzyme-B12- and selenium-dependent reactions of amino acids. With the goal of revisiting its carbon, nitrogen, and energy metabolism, and comparing studies with other clostridia, its genome has been sequenced and analyzed. C. sticklandii is one of the best biochemically studied proteolytic clostridial species. Useful additional information has been obtained from the sequencing and annotation of its genome, which is presented in this paper. Besides, experimental procedures reveal that C. sticklandii degrades amino acids in a preferential and sequential way. The organism prefers threonine, arginine, serine, cysteine, proline, and glycine, whereas glutamate, aspartate and alanine are excreted. Energy conservation is primarily obtained by substrate-level phosphorylation in fermentative pathways. The reactions catalyzed by different ferredoxin oxidoreductases and the exergonic NADH-dependent reduction of crotonyl-CoA point to a possible chemiosmotic energy conservation via the Rnf complex. C. sticklandii possesses both the F-type and V-type ATPases. The discovery of an as yet unrecognized selenoprotein in the D-proline reductase operon suggests a more detailed mechanism for NADH-dependent D-proline reduction. A rather unusual metabolic feature is the presence of genes for all the enzymes involved in two different CO2-fixation pathways: C. sticklandii harbours both the glycine synthase/glycine reductase and the Wood-Ljungdahl pathways. This unusual pathway combination has retrospectively been observed in only four other sequenced microorganisms. Analysis of the C. sticklandii genome and additional experimental procedures

  18. 5S ribosomal ribonucleic acid sequences in Bacteroides and Fusobacterium: evolutionary relationships within these genera and among eubacteria in general

    Science.gov (United States)

    Van den Eynde, H.; De Baere, R.; Shah, H. N.; Gharbia, S. E.; Fox, G. E.; Michalik, J.; Van de Peer, Y.; De Wachter, R.

    1989-01-01

    The 5S ribosomal ribonucleic acid (rRNA) sequences were determined for Bacteroides fragilis, Bacteroides thetaiotaomicron, Bacteroides capillosus, Bacteroides veroralis, Porphyromonas gingivalis, Anaerorhabdus furcosus, Fusobacterium nucleatum, Fusobacterium mortiferum, and Fusobacterium varium. A dendrogram constructed by a clustering algorithm from these sequences, which were aligned with all other hitherto known eubacterial 5S rRNA sequences, showed differences as well as similarities with respect to results derived from 16S rRNA analyses. In the 5S rRNA dendrogram, Bacteroides clustered together with Cytophaga and Fusobacterium, as in 16S rRNA analyses. Intraphylum relationships deduced from 5S rRNAs suggested that Bacteroides is specifically related to Cytophaga rather than to Fusobacterium, as was suggested by 16S rRNA analyses. Previous taxonomic considerations concerning the genus Bacteroides, based on biochemical and physiological data, were confirmed by the 5S rRNA sequence analysis.

  19. Protein chemotaxonomy. XIII. Amino acid sequence of ferredoxin from Panax ginseng.

    Science.gov (United States)

    Mino, Yoshiki

    2006-08-01

    The complete amino acid sequence of [2Fe-2S] ferredoxin from Panax ginseng (Araliaceae) has been determined by automated Edman degradation of the entire S-carboxymethylcysteinyl protein and of the peptides obtained by enzymatic digestion. This ferredoxin has a unique amino acid sequence, which includes an insertion of Tyr at the 3rd position from the amino-terminus and a deletion of two amino acid residues at the carboxyl terminus. This ferredoxin had 18 differences in its amino acid sequence compared to that of Petroselinum sativum (Umbelliferae). In contrast, 23-33 differences were observed compared to other dicotyledonous plants. This suggests that Panax ginseng is related taxonomically to umbelliferous plants.

  20. Identification and sequence analysis of grain softness protein in selected wheat, rye and triticale.

    Science.gov (United States)

    Kharrazi, M A S; Bobojonov, V

    2012-08-16

    Grain softness protein (GSP) is an important protein for overcoming milling and grain defenses in the innate immunity systems of cereals. The objective of this study was to evaluate and understand GSP sequences in selected wheat, rye and triticale. Using sequences for this gene from a sequence database, we performed clustering analysis to compare the sequences obtained from 3 germplasms with other studied sequences for GSP. The maximum difference between the Hirmand GSP genotype in wheat and the database sequences was 23% in EF109396 and EF109399. Most amino acid variation between the GSP sequences involved the same amino acids. The Nikita rye GSP gene showed 64% identity with DQ269918 and AY667063. The isoelectric point in the GSP of wheat and Lasko triticale was significantly higher than that of rye GSP. In addition, parameters such as optical density, grand average of hydrophobicity, percentage of hydrophobicity and hydrophilic amino acids, and number of alpha helices and beta sheets in GSP were similar in wheat and triticale but not in wheat and rye.

  1. Google matrix analysis of DNA sequences.

    Science.gov (United States)

    Kandiah, Vivek; Shepelyansky, Dima L

    2013-01-01

    For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  2. Google matrix analysis of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Vivek Kandiah

    Full Text Available For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW. At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  3. Sequence Design for a Test Tube of Interacting Nucleic Acid Strands.

    Science.gov (United States)

    Wolfe, Brian R; Pierce, Niles A

    2015-10-16

    We describe an algorithm for designing the equilibrium base-pairing properties of a test tube of interacting nucleic acid strands. A target test tube is specified as a set of desired "on-target" complexes, each with a target secondary structure and target concentration, and a set of undesired "off-target" complexes, each with vanishing target concentration. Sequence design is performed by optimizing the test tube ensemble defect, corresponding to the concentration of incorrectly paired nucleotides at equilibrium evaluated over the ensemble of the test tube. To reduce the computational cost of accepting or rejecting mutations to a random initial sequence, the structural ensemble of each on-target complex is hierarchically decomposed into a tree of conditional subensembles, yielding a forest of decomposition trees. Candidate sequences are evaluated efficiently at the leaf level of the decomposition forest by estimating the test tube ensemble defect from conditional physical properties calculated over the leaf subensembles. As optimized subsequences are merged toward the root level of the forest, any emergent defects are eliminated via ensemble redecomposition and sequence reoptimization. After successfully merging subsequences to the root level, the exact test tube ensemble defect is calculated for the first time, explicitly checking for the effect of the previously neglected off-target complexes. Any off-target complexes that form at appreciable concentration are hierarchically decomposed, added to the decomposition forest, and actively destabilized during subsequent forest reoptimization. For target test tubes representative of design challenges in the molecular programming and synthetic biology communities, our test tube design algorithm typically succeeds in achieving a normalized test tube ensemble defect ≤1% at a design cost within an order of magnitude of the cost of test tube analysis.

  4. Monomer sequence in PLGA microparticles: Effects on acidic microclimates and in vivo inflammatory response.

    Science.gov (United States)

    Washington, Michael A; Balmert, Stephen C; Fedorchak, Morgan V; Little, Steven R; Watkins, Simon C; Meyer, Tara Y

    2018-01-01

    Controlling the backbone architecture of poly(lactic-co-glycolic acid)s (PLGAs) is demonstrated to have a strong influence on the production and release of acidic degradation by-products in microparticle matrices. Previous efforts for controlling the internal and external accumulation of acidity for PLGA microparticles have focused on the addition of excipients including neutralization and anti-inflammatory agents. In this report, we utilize a sequence-control strategy to tailor the microstructure of PLGA. The internal acidic microclimate distributions within sequence-defined and random PLGA microparticles were monitored in vitro using a non-invasive ratiometric two-photon microscopy (TPM) methodology. Sequence-defined PLGAs were found to have minimal changes in pH distribution and lower amounts of percolating acidic by-products. A parallel scanning electron microscopy study further linked external morphological events to internal degradation-induced structural changes. The properties of the sequenced and random copolymers characterized in vitro translated to differences in in vivo behavior. The sequence alternating copolymer, poly LG, had lower granulomatous foreign-body reactions compared to random racemic PLGA with a 50:50 ratio of lactic to glycolic acid. This paper demonstrates that changing the monomer sequence in poly(lactic-co-glycolic acid)s (PLGAs) leads to dramatic differences in the rate of degradation and the internal acidic microclimate of microparticles degrading in vitro. We note that the acidic microclimates within these particles were imaged for the first time with two-photon microscopy, which gives an extremely clear and detailed picture of the degradation process. Importantly, we also document that the observed sequence-controlled in vitro processes translate into differences in the in vivo behavior of polymers which have the same L to G composition but differing microstructures. These data, placed in the context of our prior studies on swelling

  5. Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm

    NARCIS (Netherlands)

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to

  6. RNAblueprint: flexible multiple target nucleic acid sequence design.

    Science.gov (United States)

    Hammer, Stefan; Tschiatschek, Birgit; Flamm, Christoph; Hofacker, Ivo L; Findeiß, Sven

    2017-09-15

    Realizing the value of synthetic biology in biotechnology and medicine requires the design of molecules with specialized functions. Due to its close structure to function relationship, and the availability of good structure prediction methods and energy models, RNA is perfectly suited to be synthetically engineered with predefined properties. However, currently available RNA design tools cannot be easily adapted to accommodate new design specifications. Furthermore, complicated sampling and optimization methods are often developed to suit a specific RNA design goal, adding to their inflexibility. We developed a C ++  library implementing a graph coloring approach to stochastically sample sequences compatible with structural and sequence constraints from the typically very large solution space. The approach allows to specify and explore the solution space in a well defined way. Our library also guarantees uniform sampling, which makes optimization runs performant by not only avoiding re-evaluation of already found solutions, but also by raising the probability of finding better solutions for long optimization runs. We show that our software can be combined with any other software package to allow diverse RNA design applications. Scripting interfaces allow the easy adaption of existing code to accommodate new scenarios, making the whole design process very flexible. We implemented example design approaches written in Python to demonstrate these advantages. RNAblueprint , Python implementations and benchmark datasets are available at github: https://github.com/ViennaRNA . s.hammer@univie.ac.at, ivo@tbi.univie.ac.at or sven@tbi.univie.ac.at. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  7. Amino acids analysis during lactic acid fermentation by single strain ...

    African Journals Online (AJOL)

    SAM

    2014-07-09

    Jul 9, 2014 ... Amino acids analysis during lactic acid fermentation by single strain cultures of lactobacilli and mixed culture starter made from them. KiBeom Lee1*, Ho-Jin Kim1 and Sang-Kyu Park2. 1Bio Center Technopark, 7-50 Songdo, Yeonsu-Gu, Incheon 406-840, Republic of Korea. 2Nambu University, Chumdan ...

  8. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    Directory of Open Access Journals (Sweden)

    Wadim L. Matochko

    2013-01-01

    Full Text Available Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa. The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq. Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN, which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  9. EST sequences and their annotation (amino acid sequence and results of homology search) - Dicty_cDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available esource Project ( http://www.nbrp.jp/ ). The link to the National BioResource Pro...List Contact us Dicty_cDB EST sequences and their annotation (amino acid sequence and results of homology se...arch) Data detail Data name EST sequences and their annotation (amino acid sequence and results of homology ...search) DOI 10.18908/lsdba.nbdc00419-001 Description of data contents Sequences of cDNA clones of Dictyostelium discoideum and the...s (with target DBs: dicty EST-DB, DNA-DB and protein-DB)). Links to the Atlas database ( http://dictycdb.bio

  10. Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

    Science.gov (United States)

    Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

    2003-01-01

    To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979

  11. Analysis and functional annotation of an expressed sequence tag collection for tropical crop sugarcane.

    Science.gov (United States)

    Vettore, André L; da Silva, Felipe R; Kemper, Edson L; Souza, Glaucia M; da Silva, Aline M; Ferro, Maria Inês T; Henrique-Silva, Flavio; Giglioti, Eder A; Lemos, Manoel V F; Coutinho, Luiz L; Nobrega, Marina P; Carrer, Helaine; França, Suzelei C; Bacci Júnior, Mauricio; Goldman, Maria Helena S; Gomes, Suely L; Nunes, Luiz R; Camargo, Luis E A; Siqueira, Walter J; Van Sluys, Marie-Anne; Thiemann, Otavio H; Kuramae, Eiko E; Santelli, Roberto V; Marino, Celso L; Targon, Maria L P N; Ferro, Jesus A; Silveira, Henrique C S; Marini, Danyelle C; Lemos, Eliana G M; Monteiro-Vitorello, Claudia B; Tambor, José H M; Carraro, Dirce M; Roberto, Patrícia G; Martins, Vanderlei G; Goldman, Gustavo H; de Oliveira, Regina C; Truffi, Daniela; Colombo, Carlos A; Rossi, Magdalena; de Araujo, Paula G; Sculaccio, Susana A; Angella, Aline; Lima, Marleide M A; de Rosa Júnior, Vicente E; Siviero, Fábio; Coscrato, Virginia E; Machado, Marcos A; Grivet, Laurent; Di Mauro, Sonia M Z; Nobrega, Francisco G; Menck, Carlos F M; Braga, Marilia D V; Telles, Guilherme P; Cara, Frank A A; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

    2003-12-01

    To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged.

  12. Phylogenetic analysis of the genus Hordeum using repetitive DNA sequences

    DEFF Research Database (Denmark)

    Svitashev, S.; Bryngelsson, T.; Vershinin, A.

    1994-01-01

    A set of six cloned barley (Hordeum vulgare) repetitive DNA sequences was used for the analysis of phylogenetic relationships among 31 species (46 taxa) of the genus Hordeum, using molecular hybridization techniques. In situ hybridization experiments showed dispersed organization of the sequences...

  13. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    In Pakistan, more than 10 million people are living with hepatitis C virus (HCV) with high morbidity and mortality. The aims of the present study are to report HCV core gene sequences from Pakistani population and perform their sequence comparison/phylogenetic analysis. The core gene of HCV has been cloned from six ...

  14. Cloning and sequence analysis of the Antheraea pernyi ...

    Indian Academy of Sciences (India)

    A genomic library was generated using HindIII and the positive clones were sequenced and analysed. The gp64 gene, encoding the baculovirus envelope protein GP64, was found in an insert. The nucleotide sequence analysis indicated that the AnpeNPV gp64 gene consists of a 1530 nucleotide open reading frame ...

  15. RNA Sequencing Analysis of Salivary Extracellular RNA.

    Science.gov (United States)

    Majem, Blanca; Li, Feng; Sun, Jie; Wong, David T W

    2017-01-01

    Salivary biomarkers for disease detection, diagnostic and prognostic assessments have become increasingly well established in recent years. In this chapter we explain the current leading technology that has been used to characterize salivary non-coding RNAs (ncRNAs) from the extracellular RNA (exRNA) fraction: HiSeq from Illumina® platform for RNA sequencing. Therefore, the chapter is divided into two main sections regarding the type of the library constructed (small and long ncRNA libraries), from saliva collection, RNA extraction and quantification to cDNA library generation and corresponding QCs. Using these invaluable technical tools, one can identify thousands of ncRNA species in saliva. These methods indicate that salivary exRNA provides an efficient medium for biomarker discovery of oral and systemic diseases.

  16. Sequence analysis of Schmallenberg virus genomes detected in Hungary.

    Science.gov (United States)

    Fehér, Enikő; Marton, Szilvia; Tóth, Ádám György; Ursu, Krisztina; Wernike, Kerstin; Beer, Martin; Dán, Ádám; Bányai, Krisztián

    2017-12-01

    Since its emergence near the German-Dutch border in 2011, Schmallenberg virus (SBV) has been identified in many European countries. In this study, we determined the complete coding sequence of seven Hungarian SBV genomes to expand our knowledge about the genetic diversity of circulating field strains. The samples originated from the first case, an aborted cattle fetus without malformation collected in 2012, and from the blood samples of six adult cattle in 2014. The Hungarian SBV sequences shared ≥99.3% nucleotide (nt) and ≥97.8% amino acid (aa) identity with each other, and ≥98.9 nt and ≥96.7% aa identity with reference strains. Although phylogenetic analyses showed low resolution in general, the M sequences of cattle and sheep origin SBV strains seemed to cluster on different branches. Both common and unique mutation sites were observed in different groups of sequences that might help understanding the evolution of emerging SBV strains.

  17. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    Science.gov (United States)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  18. Editorial: Special Issue on Algorithms for Sequence Analysis and Storage

    Directory of Open Access Journals (Sweden)

    Veli Mäkinen

    2014-03-01

    Full Text Available This special issue of Algorithms is dedicated to approaches to biological sequence analysis that have algorithmic novelty and potential for fundamental impact in methods used for genome research.

  19. Initial sequencing and comparative analysis of the mouse genome.

    Science.gov (United States)

    Waterston, Robert H; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R; Brown, Daniel G; Brown, Stephen D; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T; Church, Deanna M; Clamp, Michele; Clee, Christopher; Collins, Francis S; Cook, Lisa L; Copley, Richard R; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D; Deri, Justin; Dermitzakis, Emmanouil T; Dewey, Colin; Dickens, Nicholas J; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M; Eddy, Sean R; Elnitski, Laura; Emes, Richard D; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A; Flicek, Paul; Foley, Karen; Frankel, Wayne N; Fulton, Lucinda A; Fulton, Robert S; Furey, Terrence S; Gage, Diane; Gibbs, Richard A; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A; Green, Eric D; Gregory, Simon; Guigó, Roderic; Guyer, Mark; Hardison, Ross C; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B; Johnson, L Steven; Jones, Matthew; Jones, Thomas A; Joy, Ann; Kamal, Michael; Karlsson, Elinor K; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W James; Kirby, Andrew; Kolbe, Diana L; Korf, Ian; Kucherlapati, Raju S; Kulbokas, Edward J; Kulp, David; Landers, Tom; Leger, J P; Leonard, Steven; Letunic, Ivica; Levine, Rosie; Li, Jia; Li, Ming; Lloyd, Christine; Lucas, Susan; Ma, Bin; Maglott, Donna R; Mardis, Elaine R; Matthews, Lucy; Mauceli, Evan; Mayer, John H; McCarthy, Megan; McCombie, W Richard; McLaren, Stuart; McLay, Kirsten; McPherson, John D; Meldrim, Jim; Meredith, Beverley; Mesirov, Jill P; Miller, Webb; Miner, Tracie L; Mongin, Emmanuel; Montgomery, Kate T; Morgan, Michael; Mott, Richard; Mullikin, James C; Muzny, Donna M; Nash, William E; Nelson, Joanne O; Nhan, Michael N; Nicol, Robert; Ning, Zemin; Nusbaum, Chad; O'Connor, Michael J; Okazaki, Yasushi; Oliver, Karen; Overton-Larty, Emma; Pachter, Lior; Parra, Genís; Pepin, Kymberlie H; Peterson, Jane; Pevzner, Pavel; Plumb, Robert; Pohl, Craig S; Poliakov, Alex; Ponce, Tracy C; Ponting, Chris P; Potter, Simon; Quail, Michael; Reymond, Alexandre; Roe, Bruce A; Roskin, Krishna M; Rubin, Edward M; Rust, Alistair G; Santos, Ralph; Sapojnikov, Victor; Schultz, Brian; Schultz, Jörg; Schwartz, Matthias S; Schwartz, Scott; Scott, Carol; Seaman, Steven; Searle, Steve; Sharpe, Ted; Sheridan, Andrew; Shownkeen, Ratna; Sims, Sarah; Singer, Jonathan B; Slater, Guy; Smit, Arian; Smith, Douglas R; Spencer, Brian; Stabenau, Arne; Stange-Thomann, Nicole; Sugnet, Charles; Suyama, Mikita; Tesler, Glenn; Thompson, Johanna; Torrents, David; Trevaskis, Evanne; Tromp, John; Ucla, Catherine; Ureta-Vidal, Abel; Vinson, Jade P; Von Niederhausern, Andrew C; Wade, Claire M; Wall, Melanie; Weber, Ryan J; Weiss, Robert B; Wendl, Michael C; West, Anthony P; Wetterstrand, Kris; Wheeler, Raymond; Whelan, Simon; Wierzbowski, Jamey; Willey, David; Williams, Sophie; Wilson, Richard K; Winter, Eitan; Worley, Kim C; Wyman, Dudley; Yang, Shan; Yang, Shiaw-Pyng; Zdobnov, Evgeny M; Zody, Michael C; Lander, Eric S

    2002-12-05

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  20. Laser Desorption Mass Spectrometry for DNA Sequencing and Analysis

    Science.gov (United States)

    Chen, C. H. Winston; Taranenko, N. I.; Golovlev, V. V.; Isola, N. R.; Allman, S. L.

    1998-03-01

    Rapid DNA sequencing and/or analysis is critically important for biomedical research. In the past, gel electrophoresis has been the primary tool to achieve DNA analysis and sequencing. However, gel electrophoresis is a time-consuming and labor-extensive process. Recently, we have developed and used laser desorption mass spectrometry (LDMS) to achieve sequencing of ss-DNA longer than 100 nucleotides. With LDMS, we succeeded in sequencing DNA in seconds instead of hours or days required by gel electrophoresis. In addition to sequencing, we also applied LDMS for the detection of DNA probes for hybridization LDMS was also used to detect short tandem repeats for forensic applications. Clinical applications for disease diagnosis such as cystic fibrosis caused by base deletion and point mutation have also been demonstrated. Experimental details will be presented in the meeting. abstract.

  1. Irritable bowel syndrome-diarrhea: characterization of genotype by exome sequencing, and phenotypes of bile acid synthesis and colonic transit

    Science.gov (United States)

    Klee, Eric W.; Shin, Andrea; Carlson, Paula; Li, Ying; Grover, Madhusudan; Zinsmeister, Alan R.

    2013-01-01

    The study objectives were: to mine the complete exome to identify putative rare single nucleotide variants (SNVs) associated with irritable bowel syndrome (IBS)-diarrhea (IBS-D) phenotype, to assess genes that regulate bile acids in IBS-D, and to explore univariate associations of SNVs with symptom phenotype and quantitative traits in an independent IBS cohort. Using principal components analysis, we identified two groups of IBS-D (n = 16) with increased fecal bile acids: rapid colonic transit or high bile acids synthesis. DNA was sequenced in depth, analyzing SNVs in bile acid genes (ASBT, FXR, OSTα/β, FGF19, FGFR4, KLB, SHP, CYP7A1, LRH-1, and FABP6). Exome findings were compared with those of 50 similar ethnicity controls. We assessed univariate associations of each SNV with quantitative traits and a principal components analysis and associations between SNVs in KLB and FGFR4 and symptom phenotype in 405 IBS, 228 controls and colonic transit in 70 IBS-D, 71 IBS-constipation. Mining the complete exome did not reveal significant associations with IBS-D over controls. There were 54 SNVs in 10 of 11 bile acid-regulating genes, with no SNVs in FGF19; 15 nonsynonymous SNVs were identified in similar proportions of IBS-D and controls. Variations in KLB (rs1015450, downstream) and FGFR4 [rs434434 (intronic), rs1966265, and rs351855 (nonsynonymous)] were associated with colonic transit (rs1966265; P = 0.043), fecal bile acids (rs1015450; P = 0.064), and principal components analysis groups (all 3 FGFR4 SNVs; P transit (P = 0.066). Thus exome sequencing identified additional variants in KLB and FGFR4 associated with bile acids or colonic transit in IBS-D. PMID:24200957

  2. Detection of COL III in Parchment by Amino Acid Analysis

    DEFF Research Database (Denmark)

    Vestergaard Poulsen Sommer, Dorte; Larsen, René

    2016-01-01

    Cultural heritage parchments made from the reticular dermis of animals have been subject to studies of deterioration and conservation by amino acid analysis. The reticular dermis contains a varying mixture of collagen I and III (COL I and III). When dealing with the results of the amino acid...... analyses, till now the COL III content has not been taken into account. Based on the available amino acid sequences we present a method for determining the amount of COL III in the reticular dermis of new and historical parchments calculated from the ratio of Ile/Val. We find COL III contents between 7...... and 32 % in new parchments and between 0.2 and 40 % in the historical parchments. This is consistent with results in the literature. The varying content of COL III has a significant influence on the uncertainty of the amino acid analysis. Although we have not found a simple correlation between the COL...

  3. Tools for integrated sequence-structure analysis with UCSF Chimera

    Directory of Open Access Journals (Sweden)

    Huang Conrad C

    2006-07-01

    Full Text Available Abstract Background Comparing related structures and viewing the structures in the context of sequence alignments are important tasks in protein structure-function research. While many programs exist for individual aspects of such work, there is a need for interactive visualization tools that: (a provide a deep integration of sequence and structure, far beyond mapping where a sequence region falls in the structure and vice versa; (b facilitate changing data of one type based on the other (for example, using only sequence-conserved residues to match structures, or adjusting a sequence alignment based on spatial fit; (c can be used with a researcher's own data, including arbitrary sequence alignments and annotations, closely or distantly related sets of proteins, etc.; and (d interoperate with each other and with a full complement of molecular graphics features. We describe enhancements to UCSF Chimera to achieve these goals. Results The molecular graphics program UCSF Chimera includes a suite of tools for interactive analyses of sequences and structures. Structures automatically associate with sequences in imported alignments, allowing many kinds of crosstalk. A novel method is provided to superimpose structures in the absence of a pre-existing sequence alignment. The method uses both sequence and secondary structure, and can match even structures with very low sequence identity. Another tool constructs structure-based sequence alignments from superpositions of two or more proteins. Chimera is designed to be extensible, and mechanisms for incorporating user-specific data without Chimera code development are also provided. Conclusion The tools described here apply to many problems involving comparison and analysis of protein structures and their sequences. Chimera includes complete documentation and is intended for use by a wide range of scientists, not just those in the computational disciplines. UCSF Chimera is free for non-commercial use and is

  4. Categorizing accident sequences in the external radiotherapy for risk analysis

    OpenAIRE

    Kim, Jonghyun

    2013-01-01

    Purpose This study identifies accident sequences from the past accidents in order to help the risk analysis application to the external radiotherapy. Materials and Methods This study reviews 59 accidental cases in two retrospective safety analyses that have collected the incidents in the external radiotherapy extensively. Two accident analysis reports that accumulated past incidents are investigated to identify accident sequences including initiating events, failure of safety measures, and co...

  5. Sequencing and Analysis of Neanderthal Genomic DNA

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  6. DNA Sequence Analysis in Clinical Medicine, Proceeding Cautiously

    Directory of Open Access Journals (Sweden)

    Moyra Smith

    2017-05-01

    Full Text Available Delineation of underlying genomic and genetic factors in a specific disease may be valuable in establishing a definitive diagnosis and may guide patient management and counseling. In addition, genetic information may be useful in identification of at risk family members. Gene mapping and initial genome sequencing data enabled the development of microarrays to analyze genomic variants. The goal of this review is to consider different generations of sequencing techniques and their application to exome sequencing and whole genome sequencing and their clinical applications. In recent decades, exome sequencing has primarily been used in patient studies. Discussed in some detail, are important measures that have been developed to standardize variant calling and to assess pathogenicity of variants. Examples of cases where exome sequencing has facilitated diagnosis and led to improved medical management are presented. Whole genome sequencing and its clinical relevance are presented particularly in the context of analysis of nucleotide and structural genomic variants in large population studies and in certain patient cohorts. Applications involving analysis of cell free DNA in maternal blood for prenatal diagnosis of specific autosomal trisomies are reviewed. Applications of DNA sequencing to diagnosis and therapeutics of cancer are presented. Also discussed are important recent diagnostic applications of DNA sequencing in cancer, including analysis of tumor derived cell free DNA and exosomes that are present in body fluids. Insights gained into underlying pathogenetic mechanisms of certain complex common diseases, including schizophrenia, macular degeneration, neurodegenerative disease are presented. The relevance of different types of variants, rare, uncommon, and common to disease pathogenesis, and the continuum of causality, are addressed. Pharmogenetic variants detected by DNA sequence analysis are gaining in importance and are particularly relevant

  7. Protein chemotaxonomy of the solanaceae. VI. Amino acid sequence of ferredoxin from Nicotian tabacum.

    Science.gov (United States)

    Mino, Y; Iwao, M

    1999-01-01

    The complete amino acid sequence of [2Fe-2S] ferredoxin from Nicotiana tabacum has been determined by automated Edman degradation of the entire Cm-protein and of the peptides obtained by trypsin and Asp-N endoproteinase digestions. This ferredoxin exhibited 9, 10, 8, and 10 differences respectively in its amino acid sequence, when compared with the ferredoxins of Datura stramonium, D. metel, D. arborea, and Physalis alkekengi var.francheti but 17-28 differences for other angiosperms, and 34-37 differences for fern and horsetails. These results are in agreement with the taxonomic position for these plants.

  8. Expressed sequence tags as a tool for phylogenetic analysis of placental mammal evolution.

    Directory of Open Access Journals (Sweden)

    Morgan Kullberg

    Full Text Available BACKGROUND: We investigate the usefulness of expressed sequence tags, ESTs, for establishing divergences within the tree of placental mammals. This is done on the example of the established relationships among primates (human, lagomorphs (rabbit, rodents (rat and mouse, artiodactyls (cow, carnivorans (dog and proboscideans (elephant. METHODOLOGY/PRINCIPAL FINDINGS: We have produced 2000 ESTs (1.2 mega bases from a marsupial mouse and characterized the data for their use in phylogenetic analysis. The sequences were used to identify putative orthologous sequences from whole genome projects. Although most ESTs stem from single sequence reads, the frequency of potential sequencing errors was found to be lower than allelic variation. Most of the sequences represented slowly evolving housekeeping-type genes, with an average amino acid distance of 6.6% between human and mouse. Positive Darwinian selection was identified at only a few single sites. Phylogenetic analyses of the EST data yielded trees that were consistent with those established from whole genome projects. CONCLUSIONS: The general quality of EST sequences and the general absence of positive selection in these sequences make ESTs an attractive tool for phylogenetic analysis. The EST approach allows, at reasonable costs, a fast extension of data sampling from species outside the genome projects.

  9. Automated sequence analysis and editing software for HIV drug resistance testing.

    Science.gov (United States)

    Struck, Daniel; Wallis, Carole L; Denisov, Gennady; Lambert, Christine; Servais, Jean-Yves; Viana, Raquel V; Letsoalo, Esrom; Bronze, Michelle; Aitken, Sue C; Schuurman, Rob; Stevens, Wendy; Schmit, Jean Claude; Rinke de Wit, Tobias; Perez Bercoff, Danielle

    2012-05-01

    Access to antiretroviral treatment in resource-limited-settings is inevitably paralleled by the emergence of HIV drug resistance. Monitoring treatment efficacy and HIV drugs resistance testing are therefore of increasing importance in resource-limited settings. Yet low-cost technologies and procedures suited to the particular context and constraints of such settings are still lacking. The ART-A (Affordable Resistance Testing for Africa) consortium brought together public and private partners to address this issue. To develop an automated sequence analysis and editing software to support high throughput automated sequencing. The ART-A Software was designed to automatically process and edit ABI chromatograms or FASTA files from HIV-1 isolates. The ART-A Software performs the basecalling, assigns quality values, aligns query sequences against a set reference, infers a consensus sequence, identifies the HIV type and subtype, translates the nucleotide sequence to amino acids and reports insertions/deletions, premature stop codons, ambiguities and mixed calls. The results can be automatically exported to Excel to identify mutations. Automated analysis was compared to manual analysis using a panel of 1624 PR-RT sequences generated in 3 different laboratories. Discrepancies between manual and automated sequence analysis were 0.69% at the nucleotide level and 0.57% at the amino acid level (668,047 AA analyzed), and discordances at major resistance mutations were recorded in 62 cases (4.83% of differences, 0.04% of all AA) for PR and 171 (6.18% of differences, 0.03% of all AA) cases for RT. The ART-A Software is a time-sparing tool for pre-analyzing HIV and viral quasispecies sequences in high throughput laboratories and highlighting positions requiring attention. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. Sequence-targeted chemical modifications of nucleic acids by complementary oligonucleotides covalently linked to porphyrins.

    OpenAIRE

    Trung Le Doan; Perrouault, L; Chassignol, M; Nguyen, T T; Hélène, C

    1987-01-01

    Oligo-heptathymidylates covalently linked to porphyrins bind to complementary sequences and can induce local damages on the target molecule. In dark reactions, iron porphyrin derivatives exhibited various chemical reactivities resulting in base oxidation, crosslinking and chain scission reactions. Reactions induced by reductants, such as ascorbic acid, dithiothreitol or mercapto-propionic acid, led to very localised reactions. A single base was the target for more than 50% of the damages. Oxi...

  11. Genome sequencing and analysis conference grant

    Energy Technology Data Exchange (ETDEWEB)

    Venter, J.C. [ed.

    1995-10-01

    The 14 plenary session presentations focused on nematode; yeast; fruit fly; plants; mycobacteria; and man. In addition there were presentations on a variety of technical innovations including database developments and refinements, bioelectronic genesensors, computer-assisted multiplex techniques, and hybridization analysis with DNA chip technology. This document includes a list of exhibitors and abstracts of sessions.

  12. Detection of multiple, novel reverse transcriptase coding sequences in human nucleic acids: relation to primate retroviruses

    International Nuclear Information System (INIS)

    Shih, A.; Misra, R.; Rush, M.G.

    1989-01-01

    A variety of chemically synthesized oligonucleotides designed on the basis of amino acid and/or nucleotide sequence data were used to detect a large number of novel reverse transcriptase coding sequences in human and mouse DNAs. Procedures involving Southern blotting, library screening, and the polymerase chain reaction were all used to detect such sequences; the polymerase chain reaction was the most rapid and productive approach. In the polymerase chain reaction, oligonucleotide mixtures based on consensus sequence homologies to reverse transcriptase coding sequences and unique oligonucleotides containing perfect homology to the coding sequences of human T-cell leukemia virus types I and II were both effective in amplifying reverse transcriptase-related DNA. It is shown that human DNA contains a wide spectrum of retrovirus-related reverse transcriptase coding sequences, including some that are clearly related to human T-cell leukemia virus types I and II, some that are related to the L-1 family of long interspersed nucleotide sequences, and others that are related to previously described human endogenous proviral DNAs. In addition, human T-cell leukemia virus type I-related sequences appear to be transcribed in both normal human T cells and in a cell line derived from a human teratocarcinoma

  13. Categorizing accident sequences in the external radiotherapy for risk analysis

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Jong Hyun [KEPCO International Nuclear Graduate School (KINGS), Ulsan (Korea, Republic of)

    2013-06-15

    This study identifies accident sequences from the past accidents in order to help the risk analysis application to the external radiotherapy. This study reviews 59 accidental cases in two retrospective safety analyses that have collected the incidents in the external radiotherapy extensively. Two accident analysis reports that accumulated past incidents are investigated to identify accident sequences including initiating events, failure of safety measures, and consequences. This study classifies the accidents by the treatments stages and sources of errors for initiating events, types of failures in the safety measures, and types of undesirable consequences and the number of affected patients. Then, the accident sequences are grouped into several categories on the basis of similarity of progression. As a result, these cases can be categorized into 14 groups of accident sequence. The result indicates that risk analysis needs to pay attention to not only the planning stage, but also the calibration stage that is committed prior to the main treatment process. It also shows that human error is the largest contributor to initiating events as well as to the failure of safety measures. This study also illustrates an event tree analysis for an accident sequence initiated in the calibration. This study is expected to provide sights into the accident sequences for the prospective risk analysis through the review of experiences.

  14. Categorizing accident sequences in the external radiotherapy for risk analysis.

    Science.gov (United States)

    Kim, Jonghyun

    2013-06-01

    This study identifies accident sequences from the past accidents in order to help the risk analysis application to the external radiotherapy. This study reviews 59 accidental cases in two retrospective safety analyses that have collected the incidents in the external radiotherapy extensively. Two accident analysis reports that accumulated past incidents are investigated to identify accident sequences including initiating events, failure of safety measures, and consequences. This study classifies the accidents by the treatments stages and sources of errors for initiating events, types of failures in the safety measures, and types of undesirable consequences and the number of affected patients. Then, the accident sequences are grouped into several categories on the basis of similarity of progression. As a result, these cases can be categorized into 14 groups of accident sequence. The result indicates that risk analysis needs to pay attention to not only the planning stage, but also the calibration stage that is committed prior to the main treatment process. It also shows that human error is the largest contributor to initiating events as well as to the failure of safety measures. This study also illustrates an event tree analysis for an accident sequence initiated in the calibration. This study is expected to provide sights into the accident sequences for the prospective risk analysis through the review of experiences.

  15. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    Directory of Open Access Journals (Sweden)

    Xiaoxia Yang

    Full Text Available Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  16. Isolation and sequence analysis of the gene encoding triose phosphate isomerase from Zygosaccharomyces bailii.

    Science.gov (United States)

    Merico, A; Rodrigues, F; Côrte-Real, M; Porro, D; Ranzi, B M; Compagno, C

    2001-06-30

    The ZbTPI1 gene encoding triose phosphate isomerase (TIM) was cloned from a Zygosaccharomyces bailii genomic library by complementation of the Saccharomyces cerevisiae tpi1 mutant strain. The nucleotide sequence of a 1.5 kb fragment showed an open reading frame (ORF) of 746 bp, encoding a protein of 248 amino acid residues. The deduced amino acid sequence shares a high degree of homology with TIMs from other yeast species, including some highly conserved regions. The analysis of the promoter sequence of the ZbTPI1 revealed the presence of putative motifs known to have regulatory functions in S. cerevisiae. The GenBank Accession No. of ZbTPI1 is AF325852. Copyright 2001 John Wiley & Sons, Ltd.

  17. Nucleic acid sequence-based amplification with oligochromatography for detection of Trypanosoma brucei in clinical samples

    NARCIS (Netherlands)

    Mugasa, Claire M.; Laurent, Thierry; Schoone, Gerard J.; Kager, Piet A.; Lubega, George W.; Schallig, Henk D. F. H.

    2009-01-01

    Molecular tools, such as real-time nucleic acid sequence-based amplification (NASBA) and PCR, have been developed to detect Trypanosoma brucei parasites in blood for the diagnosis of human African trypanosomiasis (HAT). Despite good sensitivity, these techniques are not implemented in HAT control

  18. Draft Genome Sequence of Sorghum Grain Mold Fungus Epicoccum sorghinum, a Producer of Tenuazonic Acid.

    Science.gov (United States)

    Oliveira, Rodrigo C; Davenport, Karen W; Hovde, Blake; Silva, Danielle; Chain, Patrick S G; Correa, Benedito; Rodrigues, Debora F

    2017-01-26

    The facultative plant pathogen Epicoccum sorghinum is associated with grain mold of sorghum and produces the mycotoxin tenuazonic acid. This fungus can have serious economic impact on sorghum production. Here, we report the draft genome sequence of E. sorghinum (USPMTOX48). Copyright © 2017 Oliveira et al.

  19. Draft Genome Sequence of Sorghum Grain Mold Fungus Epicoccum sorghinum, a Producer of Tenuazonic Acid

    OpenAIRE

    Oliveira, Rodrigo C.; Davenport, Karen W.; Hovde, Blake; Silva, Danielle; Chain, Patrick S. G.; Correa, Benedito; Rodrigues, Debora F.

    2017-01-01

    ABSTRACT The facultative plant pathogen Epicoccum sorghinum is associated with grain mold of sorghum and produces the mycotoxin tenuazonic acid. This fungus can have serious economic impact on sorghum production. Here, we report the draft genome sequence of E.?sorghinum (USPMTOX48).

  20. Nonlinear analysis of river flow time sequences

    Science.gov (United States)

    Porporato, Amilcare; Ridolfi, Luca

    1997-06-01

    Within the field of chaos theory several methods for the analysis of complex dynamical systems have recently been proposed. In light of these ideas we study the dynamics which control the behavior over time of river flow, investigating the existence of a low-dimension deterministic component. The present article follows the research undertaken in the work of Porporato and Ridolfi [1996a] in which some clues as to the existence of chaos were collected. Particular emphasis is given here to the problem of noise and to nonlinear prediction. With regard to the latter, the benefits obtainable by means of the interpolation of the available time series are reported and the remarkable predictive results attained with this nonlinear method are shown.

  1. Deep Sequencing Analysis of Nucleolar Small RNAs: Bioinformatics.

    Science.gov (United States)

    Bai, Baoyan; Laiho, Marikki

    2016-01-01

    Small RNAs (size 20-30 nt) of various types have been actively investigated in recent years, and their subcellular compartmentalization and relative concentrations are likely to be of importance to their cellular and physiological functions. Comprehensive data on this subset of the transcriptome can only be obtained by application of high-throughput sequencing, which yields data that are inherently complex and multidimensional, as sequence composition, length, and abundance will all inform to the small RNA function. Subsequent data analysis, hypothesis testing, and presentation/visualization of the results are correspondingly challenging. We have constructed small RNA libraries derived from different cellular compartments, including the nucleolus, and asked whether small RNAs exist in the nucleolus and whether they are distinct from cytoplasmic and nuclear small RNAs, the miRNAs. Here, we present a workflow for analysis of small RNA sequencing data generated by the Ion Torrent PGM sequencer from samples derived from different cellular compartments.

  2. Amino-acid sequence of two trypsin isoinhibitors, ITD I and ITD III from squash seeds (Cucurbita maxima).

    Science.gov (United States)

    Wilusz, T; Wieczorek, M; Polanowski, A; Denton, A; Cook, J; Laskowski, M

    1983-01-01

    The amino-acid sequences of two trypsin isoinhibitors, ITD I and ITD III, from squash seeds (Cucurbita maxima) were determined. Both isoinhibitors contain 29 amino-acid residues, including 6 half cystine residues. They differ only by one amino acid. Lysine in position 9 of ITD III is substituted by glutamic acid in ITD I. Arginine in position 5 is present at the reactive site of both isoinhibitors. The previously published sequence of ITD III has been shown to be incorrect.

  3. Molecular cloning and sequence analysis of the cat myostatin gene ...

    African Journals Online (AJOL)

    ... MEF3, MTBF, PAX3, SMAD, HBOX, HOMF and TEAF motifs. Comparative analysis for some motifs showed both conservations and differences among cat, horse, porcine and human. Key words: Cat, myostatin 5'-regulatory region, molecular cloning, sequence analysis and comparison, transcription factor binding sites.

  4. Food Fish Identification from DNA Extraction through Sequence Analysis

    Science.gov (United States)

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  5. An optimum analysis sequence for environmental gamma-ray spectrometry

    International Nuclear Information System (INIS)

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L.

    2010-10-01

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced χ 2 criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  6. An optimum analysis sequence for environmental gamma-ray spectrometry

    Energy Technology Data Exchange (ETDEWEB)

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L., E-mail: fta777@hotmail.co [Universidad Autonoma de Zacatecas, Centro Regional de Estudis Nucleares, Calle Cipres No. 10, Fracc. La Penuela, 98068 Zacatecas (Mexico)

    2010-10-15

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced {chi}{sup 2} criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  7. Sequence analysis and genetic diversity of five new Indian isolates of cucumber mosaic virus.

    Science.gov (United States)

    Kumar, S; Gautam, K K; Raj, S K

    2015-12-01

    Cucumber mosaic virus (CMV) is an important virus since it causes severe losses to many economically important crops worldwide. Five new isolates of CMV were isolated from naturally infected Hippeastrum hybridum, Dahlia pinnata, Hemerocallis fulva, Acorus calamus and Typhonium trilobatum plants, all exhibiting severe leaf mosaic symptoms. For molecular identification and sequence analyses, the complete coat protein (CP) gene of these isolates was amplified by RT-PCR. The resulting amplicons were cloned and sequenced and isolates were designated as HH (KP698590), DP (JF682239), HF (KP698589), AC (KP698588) and TT (JX570732). For study of genetic diversity among these isolates, the sequence data were analysed by BLASTn, multiple alignment and generating phylogenetic trees along with the respective sequences of other CMV isolates available in GenBank Database were done. The isolates under study showed 82-99% sequence diversity among them at nucleotide and amino acid levels; however they showed close relationships with CMV isolates of subgroup IB. In alignment analysis of amino acid sequences of HH and AC isolates, we have found fifteen and twelve unique substitutions, compared to HF, DP and TT isolates, suggesting the cause of high genetic diversity.

  8. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers.

    Directory of Open Access Journals (Sweden)

    Stephan Pabinger

    Full Text Available Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM. Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage

  9. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    Energy Technology Data Exchange (ETDEWEB)

    Athavale, Ajay

    2012-06-01

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  10. Bioinformatics analysis of circulating cell-free DNA sequencing data.

    Science.gov (United States)

    Chan, Landon L; Jiang, Peiyong

    2015-10-01

    The discovery of cell-free DNA molecules in plasma has opened up numerous opportunities in noninvasive diagnosis. Cell-free DNA molecules have become increasingly recognized as promising biomarkers for detection and management of many diseases. The advent of next generation sequencing has provided unprecedented opportunities to scrutinize the characteristics of cell-free DNA molecules in plasma in a genome-wide fashion and at single-base resolution. Consequently, clinical applications of circulating cell-free DNA analysis have not only revolutionized noninvasive prenatal diagnosis but also facilitated cancer detection and monitoring toward an era of blood-based personalized medicine. With the remarkably increasing throughput and lowering cost of next generation sequencing, bioinformatics analysis becomes increasingly demanding to understand the large amount of data generated by these sequencing platforms. In this Review, we highlight the major bioinformatics algorithms involved in the analysis of cell-free DNA sequencing data. Firstly, we briefly describe the biological properties of these molecules and provide an overview of the general bioinformatics approach for the analysis of cell-free DNA. Then, we discuss the specific upstream bioinformatics considerations concerning the analysis of sequencing data of circulating cell-free DNA, followed by further detailed elaboration on each key clinical situation in noninvasive prenatal diagnosis and cancer management where downstream bioinformatics analysis is heavily involved. We also discuss bioinformatics analysis as well as clinical applications of the newly developed massively parallel bisulfite sequencing of cell-free DNA. Finally, we offer our perspectives on the future development of bioinformatics in noninvasive diagnosis. Copyright © 2015 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.

  11. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.

    Science.gov (United States)

    Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H

    2017-04-15

    Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly

  12. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    Science.gov (United States)

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  13. Widespread occurrence of the tfd-II genes in soil bacteria revealed by nucleotide sequence analysis of 2,4-dichlorophenoxyacetic acid degradative plasmids pDB1 and p712.

    Science.gov (United States)

    Kim, Dong-Uk; Kim, Min-Sun; Lim, Jong-Sung; Ka, Jong-Ok

    2013-05-01

    Variovorax sp. strain DB1 and Pseudomonas pickettii strain 712 are 2,4-dicholorophenoxy-acetic acid (2,4-D)-degrading bacteria, which were isolated from agricultural soils in Republic of Korea and USA, respectively. Each strain harbors a 2,4-D degradative plasmid and is able to utilize 2,4-D as the sole source of carbon for its growth. The 2,4-D degradative plasmid pDB1 of strain DB1 consisted of a 65,269-bp circular molecule with a G+C content of 66.23% and had 68 ORFs. The 2,4-D degradative plasmid p712 of strain 712 was composed of a 62,798-bp circular molecule with a 62.11% G+C content and had 62 ORFs. The plasmids pDB1 and p712 share significantly homologous 2,4-D degradative genes with high similarity to the tfdR, tfdB-II, tfdC-II, tfdD-II, tfdE-II, tfdF-II, tfdK and tfdA genes of plasmid pJP4 of Alcaligenes eutrophus isolated from Australia. In a phylogenetic analysis with trfA, traL, and trbA genes, pDB1 belonged to IncP-1β with pJP4, while p712 belonged to IncP-1ε with pKJK5 and pEMT3. The results indicated that, in spite of the differences in their backbone regions, the 2,4-D catabolic genes of the two plasmids were closely related and also related to the well-known 2,4-D degradative plasmid pJP4 even though all were isolated from different geographic regions. Other similarities in the genetic organization and the presence of IS1071 suggested that these catabolic genes may be on a transposable element, leading to widespread occurrence in soil bacteria. Copyright © 2013 Elsevier Inc. All rights reserved.

  14. Genome sequence of the highly weak-acid-tolerant Zygosaccharomyces bailii IST302, amenable to genetic manipulations and physiological studies.

    Science.gov (United States)

    Palma, Margarida; Münsterkötter, Martin; Peça, João; Güldener, Ulrich; Sá-Correia, Isabel

    2017-06-01

    Zygosaccharomyces bailii is one of the most problematic spoilage yeast species found in the food and beverage industry particularly in acidic products, due to its exceptional resistance to weak acid stress. This article describes the annotation of the genome sequence of Z. bailii IST302, a strain recently proven to be amenable to genetic manipulations and physiological studies. The work was based on the annotated genomes of strain ISA1307, an interspecies hybrid between Z. bailii and a closely related species, and the Z. bailii reference strain CLIB 213T. The resulting genome sequence of Z. bailii IST302 is distributed through 105 scaffolds, comprising a total of 5142 genes and a size of 10.8 Mb. Contrasting with CLIB 213T, strain IST302 does not form cell aggregates, allowing its manipulation in the laboratory for genetic and physiological studies. Comparative cell cycle analysis with the haploid and diploid Saccharomyces cerevisiae strains BY4741 and BY4743, respectively, suggests that Z. bailii IST302 is haploid. This is an additional trait that makes this strain attractive for the functional analysis of non-essential genes envisaging the elucidation of mechanisms underlying its high tolerance to weak acid food preservatives, or the investigation and exploitation of the potential of this resilient yeast species as cell factory. © FEMS 2017.

  15. Identification of metal ion binding sites based on amino acid sequences

    Science.gov (United States)

    Cao, Xiaoyong; Zhang, Xiaojin; Gao, Sujuan; Ding, Changjiang; Feng, Yonge; Bao, Weihua

    2017-01-01

    The identification of metal ion binding sites is important for protein function annotation and the design of new drug molecules. This study presents an effective method of analyzing and identifying the binding residues of metal ions based solely on sequence information. Ten metal ions were extracted from the BioLip database: Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+ and Co2+. The analysis showed that Zn2+, Cu2+, Fe2+, Fe3+, and Co2+ were sensitive to the conservation of amino acids at binding sites, and promising results can be achieved using the Position Weight Scoring Matrix algorithm, with an accuracy of over 79.9% and a Matthews correlation coefficient of over 0.6. The binding sites of other metals can also be accurately identified using the Support Vector Machine algorithm with multifeature parameters as input. In addition, we found that Ca2+ was insensitive to hydrophobicity and hydrophilicity information and Mn2+ was insensitive to polarization charge information. An online server was constructed based on the framework of the proposed method and is freely available at http://60.31.198.140:8081/metal/HomePage/HomePage.html. PMID:28854211

  16. Carnivora: the amino-acid sequence of the adult Sumatran tiger (Panthera tigris sumatrae) hemoglobins.

    Science.gov (United States)

    Jahan, M; Ahmed, A; Braunitzer, G; Göltenboth, R

    1989-01-01

    The complete amino-acid sequences of the hemoglobins from the adult Sumatran tiger (Panthera tigris sumatrae) have been determined on automatic liquid- and gas-phase sequenators. The globin chains were isolated by reverse phase HPLC on a column of Nucleosil-C4. N-Acetylserine was detected by FAB-mass spectroscopy as N-terminal aminoacid residue of the beta I chain. Comparing the sequences of the globin chains of the tiger with that of human Hb-A, 23 substitutions were recognized in the alpha, 29 in beta I and 28 in the beta II chain.

  17. The amino acid sequence of cytochrome c from Cucurbita maxima L. (pumpkin)

    Science.gov (United States)

    Thompson, E. W.; Richardson, M.; Boulter, D.

    1971-01-01

    The amino acid sequence of pumpkin cytochrome c was determined on 2μmol of protein. Some evidence was found for the occurrence of two forms of cytochrome c, whose sequences differed in three positions. Pumpkin cytochrome c consists of 111 residues and is homologous with mitochondrial cytochromes c from other plants. Experimental details are given in a supplementary paper that has been deposited as Supplementary Publication SUP 50005 at the National Lending Library for Science and Technology, Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1971), 121, 7. PMID:5131733

  18. Application of Ammonium Persulfate for Selective Oxidation of Guanines for Nucleic Acid Sequencing

    Directory of Open Access Journals (Sweden)

    Yafen Wang

    2017-07-01

    Full Text Available Nucleic acids can be sequenced by a chemical procedure that partially damages the nucleotide positions at their base repetition. Many methods have been reported for the selective recognition of guanine. The accurate identification of guanine in both single and double regions of DNA and RNA remains a challenging task. Herein, we present a new, non-toxic and simple method for the selective recognition of guanine in both DNA and RNA sequences via ammonium persulfate modification. This strategy can be further successfully applied to the detection of 5-methylcytosine by using PCR.

  19. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion.

    Science.gov (United States)

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-07-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed).

  20. Molecular characterization, sequence analysis and tissue expression of a porcine gene – MOSPD2

    Directory of Open Access Journals (Sweden)

    Yang Jie

    2017-01-01

    Full Text Available The full-length cDNA sequence of a porcine gene, MOSPD2, was amplified using the rapid amplification of cDNA ends method based on a pig expressed sequence tag sequence which was highly homologous to the coding sequence of the human MOSPD2 gene. Sequence prediction analysis revealed that the open reading frame of this gene encodes a protein of 491 amino acids that has high homology with the motile sperm domain-containing protein 2 (MOSPD2 of five species: horse (89%, human (90%, chimpanzee (89%, rhesus monkey (89% and mouse (85%; thus, it could be defined as a porcine MOSPD2 gene. This novel porcine gene was assigned GeneID: 100153601. This gene is structured in 15 exons and 14 introns as revealed by computer-assisted analysis. The phylogenetic analysis revealed that the porcine MOSPD2 gene has a closer genetic relationship with the MOSPD2 gene of horse. Tissue expression analysis indicated that the porcine MOSPD2 gene is generally and differentially expressed in the spleen, muscle, skin, kidney, lung, liver, fat and heart. Our experiment is the first to establish the primary foundation for further research on the porcine MOSPD2 gene.

  1. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    Science.gov (United States)

    Studier, F. William

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.

  2. Sequence-specific thermodynamic properties of nucleic acids influence both transcriptional pausing and backtracking in yeast

    Science.gov (United States)

    2017-01-01

    RNA Polymerase II pauses and backtracks during transcription, with many consequences for gene expression and cellular physiology. Here, we show that the energy required to melt double-stranded nucleic acids in the transcription bubble predicts pausing in Saccharomyces cerevisiae far more accurately than nucleosome roadblocks do. In addition, the same energy difference also determines when the RNA polymerase backtracks instead of continuing to move forward. This data-driven model corroborates—in a genome wide and quantitative manner—previous evidence that sequence-dependent thermodynamic features of nucleic acids influence both transcriptional pausing and backtracking. PMID:28301878

  3. The human receptor for urokinase plasminogen activator. NH2-terminal amino acid sequence and glycosylation variants

    DEFF Research Database (Denmark)

    Behrendt, N; Rønne, E; Ploug, M

    1990-01-01

    , but no N-acetyl-D-galactosamine. Glycosylation is responsible for substantial heterogeneity in the receptor on phorbol ester-stimulated U937 cells, and also for molecular weight variations among various cell lines. The amino acid composition and the NH2-terminal amino acid sequence are reported...... amino-terminal fragment; 2) the identical electrophoretic mobilities observed for cross-linked conjugates, formed between either the purified protein or the u-PA receptor on intact U937 cells and the above ligands; 3) the identity of the apparent molecular weight of the purified protein...

  4. Complementary DNA and derived amino acid sequence of the α subunit of human complement protein C8: evidence for the existence of a separate α subunit messenger RNA

    International Nuclear Information System (INIS)

    Rao, A.G.; Howard, O.M.Z.; Ng, S.C.; Whitehead, A.S.; Colten, H.R.; Sodetz, J.M.

    1987-01-01

    The entire amino acid sequence of the α subunit (M/sub r/ 64,000) of the eight component of complement (C8) was determined by characterizing cDNA clones isolated from a human liver cDNA library. Two clones with overlapping inserts of net length 2.44 kilobases (kb) were isolated and found to contain the entire α coding region [1659 base pairs (bp)]. The 5' end consists of an untranslated region and a leader sequence of 30 amino acids. This sequence contains an apparent initiation Met, signal peptide, and propeptide which ends with an arginine-rich sequence that is characteristic of proteolytic processing sites found in the pro form of protein precursors. The 3' untranslated region contains two polyadenylation signals and a poly(A)sequence. RNA blot analysis of total cellular RNA from the human hepatoma cell line HepG2 revealed a message size of ∼2.5 kb. Features of the 5' and 3' sequences and the message size suggest that a separate mRNA codes for α and argues against the occurrence of a single-chain precursor form of the disulfide-linked α-λ subunit found in mature C8. Analysis of the derived amino acid sequence revealed several membrane surface seeking domains and a possible transmembrane domain. Analysis of the carbohydrate composition indicates 1 or 2 asparagine-linked but no O-linked oligosaccharide chains, a result consistent with predictions from the amino acid sequence. Most significantly, it exhibits a striking overall homology to human C9, with values of 24% on the basis of identity and 46% when conserved substitutions are allowed. As described in an accompanying report this homology also extends to the β subunit of C8

  5. De novo transcriptome sequencing and sequence analysis of the malaria vector Anopheles sinensis (Diptera: Culicidae)

    Science.gov (United States)

    2014-01-01

    Background Anopheles sinensis is the major malaria vector in China and Southeast Asia. Vector control is one of the most effective measures to prevent malaria transmission. However, there is little transcriptome information available for the malaria vector. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to build a transcriptome dataset for functional genomics analysis by large-scale RNA sequencing (RNA-seq). Methods To provide a more comprehensive and complete transcriptome of An. sinensis, eggs, larvae, pupae, male adults and female adults RNA were pooled together for cDNA preparation, sequenced using the Illumina paired-end sequencing technology and assembled into unigenes. These unigenes were then analyzed in their genome mapping, functional annotation, homology, codon usage bias and simple sequence repeats (SSRs). Results Approximately 51.6 million clean reads were obtained, trimmed, and assembled into 38,504 unigenes with an average length of 571 bp, an N50 of 711 bp, and an average GC content 51.26%. Among them, 98.4% of unigenes could be mapped onto the reference genome, and 69% of unigenes could be annotated with known biological functions. Homology analysis identified certain numbers of An. sinensis unigenes that showed homology or being putative 1:1 orthologues with genomes of other Dipteran species. Codon usage bias was analyzed and 1,904 SSRs were detected, which will provide effective molecular markers for the population genetics of this species. Conclusions Our data and analysis provide the most comprehensive transcriptomic resource and characteristics currently available for An. sinensis, and will facilitate genetic, genomic studies, and further vector control of An. sinensis. PMID:25000941

  6. Sequence symmetry analysis in pharmacovigilance and pharmacoepidemiologic studies

    DEFF Research Database (Denmark)

    Lai, Edward Chia Cheng; Pratt, Nicole; Hsieh, Cheng Yang

    2017-01-01

    Sequence symmetry analysis (SSA) is a method for detecting adverse drug events by utilizing computerized claims data. The method has been increasingly used to investigate safety concerns of medications and as a pharmacovigilance tool to identify unsuspected side effects. Validation studies have i...

  7. Inter simple sequence repeat analysis of genetic diversity of five ...

    African Journals Online (AJOL)

    This paper studied the genetic diversity of five cultivated pepper species using inter simple sequence repeat (ISSR) analysis. The amplicons of 13 out of 15 designed primers were stable polymorphic and therefore were used as genetic biomarkers. 135 total clear bands were obtained, of which 102 were polymorphic bands ...

  8. Molecular cloning, expression analysis and sequence prediction of ...

    African Journals Online (AJOL)

    CCAAT/enhancer-binding protein beta as an essential transcriptional factor, regulates the differentiation of adipocytes and the deposition of fat. Herein, we cloned the whole open reading frame (ORF) of bovine C/EBPβ gene and analyzed its putative protein structures via DNA cloning and sequence analysis. Then, the ...

  9. Sequence analysis of mitochondrial 16S ribosomal RNA gene ...

    Indian Academy of Sciences (India)

    Unknown

    Sequence analysis of mitochondrial 16S ribosomal RNA gene fragment from seven mosquito species. YOGESH S SHOUCHE* and MILIND S PATOLE. National Center for Cell Science, Pune University Campus, Pune 411 007, India. *Corresponding author (Fax, 91-20-5672259; Email, yogesh@nccs.res.in). Mosquitoes are ...

  10. Phylogenetic relationships of Malassezia species based on multilocus sequence analysis.

    Science.gov (United States)

    Castellá, Gemma; Coutinho, Selene Dall' Acqua; Cabañes, F Javier

    2014-01-01

    Members of the genus Malassezia are lipophilic basidiomycetous yeasts, which are part of the normal cutaneous microbiota of humans and other warm-blooded animals. Currently, this genus consists of 14 species that have been characterized by phenetic and molecular methods. Although several molecular methods have been used to identify and/or differentiate Malassezia species, the sequencing of the rRNA genes and the chitin synthase-2 gene (CHS2) are the most widely employed. There is little information about the β-tubulin gene in the genus Malassezia, a gene has been used for the analysis of complex species groups. The aim of the present study was to sequence a fragment of the β-tubulin gene of Malassezia species and analyze their phylogenetic relationship using a multilocus sequence approach based on two rRNA genes (ITS including 5.8S rRNA and D1/D2 region of 26S rRNA) together with two protein encoding genes (CHS2 and β-tubulin). The phylogenetic study of the partial β-tubulin gene sequences indicated that this molecular marker can be used to assess diversity and identify new species. The multilocus sequence analysis of the four loci provides robust support to delineate species at the terminal nodes and could help to estimate divergence times for the origin and diversification of Malassezia species.

  11. Sequence analysis of the aminoacylase-1 family. A new proposed signature for metalloexopeptidases.

    Science.gov (United States)

    Biagini, A; Puigserver, A

    2001-03-01

    The amino acid sequence analysis of the human and porcine aminoacylases-1, the carboxypeptidase S precursor from Saccharomyces cerevisiae, the succinyl-diaminopimelate desuccinylase from Escherichia coli, Haemophilus influenzae and Corynebacterium glutamicum, the acetylornithine deacetylase from Escherichia coli and Dictyostelium discoideum and the carboxypeptidase G(2) precursor from Pseudomonas strain, using the Basic Local Alignment Search Tool (BLAST) and the Position-Specific Iterated BLAST (PSI-BLAST), allowed us to suggest that all these enzymes, which share common functional and biochemical features, belong to the same structural family. The three amino acid blocks which were found to be highly conserved, using the CLUSTAL W program, could be assigned to the catalytic active site, based on the general three-dimensional structure of the carboxypeptidase G(2) from the Pseudomonas strain precursor. Six additional proteins with the same signature have been retrieved after performing two successive PSI-BLAST iterations using the sequence of the conserved motif, namely Lactobacillus delbrueckii aminoacyl-histidine dipeptidase, Streptomyces griseus aminopeptidase, Saccharomyces cerevisiae aminopeptidase Y precursor, two Bacillus stearothermophilus N-carbamyl-L-amino acid amidohydrolases and Pseudomonas sp. hydantoin utilization protein C. The three conserved amino acid motifs corresponded to the following blocks: (i) [S, G, A]-H-x-D-x-V; (ii) G-x-x-D; and (iii) x-E-E. This new sequence signature is clearly different from that commonly reported in the literature for proteins belonging to the ArgE/DapE/CPG2/YscS family.

  12. Analysis and comparison of fragrant gene sequence in some rice cultivars

    Directory of Open Access Journals (Sweden)

    Karami Noushafarin

    2016-01-01

    Full Text Available It is known that the fragrant trait in rice (Oryza sativa L. is largely controlled by fgr gene on chromosome 8 and it has been specified that the existence of an 8 bp deletion and three single nucleotide polymorphism (SNP in exon 7 is effective on this trait. In this study, sequence alignment analysis of fgr exon7 on chromosome 8 for 11 different fragrant and non-fragrant cultivars revealed that 5 aromatic rice cultivars carried 3 SNPs and 8 bp deletion in exon7 which terminates prematurely at a TAA stop codon. However, 5 of the non-aromatics showed a sequence identical to the published Nipponbare, being non-fragrant Japonica variety sequence. An exception among them was Bejar, which had 8 bp deletion and 3SNPs but it was non-aromatic. Sequencing can determine nucleotide alignment of a gene and give beneficial information about gene function. In silico prediction showed proteins sequences alignment of fgr gene for Khazar and Domsiah genotypes were different. Betaine aldehyde dehydrogenase complete enzyme belongs to Khazar non-fragrant genotype that has complete length and 503 amino acids while non-functional BADH2 enzyme for Domsiah fragrant genotype has 251 amino acids that result in accumulate 2-acetyl-1-pyrroline (2AP and produces aroma in fragrant genotypes.

  13. Amino acid sequences mediating vascular cell adhesion molecule 1 binding to integrin alpha 4: homologous DSP sequence found for JC polyoma VP1 coat protein

    Directory of Open Access Journals (Sweden)

    Michael Andrew Meyer

    2013-07-01

    Full Text Available The JC polyoma viral coat protein VP1 was analyzed for amino acid sequences homologies to the IDSP sequence which mediates binding of VLA-4 (integrin alpha 4 to vascular cell adhesion molecule 1. Although the full sequence was not found, a DSP sequence was located near the critical arginine residue linked to infectivity of the virus and binding to sialic acid containing molecules such as integrins (3. For the JC polyoma virus, a DSP sequence was found at residues 70, 71 and 72 with homology also noted for the mouse polyoma virus and SV40 virus. Three dimensional modeling of the VP1 molecule suggests that the DSP loop has an accessible site for interaction from the external side of the assembled viral capsid pentamer.

  14. The SCALE criticality safety analysis sequences: Status and future directions

    International Nuclear Information System (INIS)

    Parks, C.V.

    1993-01-01

    The Standardized Computer Analyses for Licensing Evaluation (SCALE) code system. Was originally conceived and developed in the late 1970s for the US Nuclear Regulatory Commission. The goal was to provide easy-to-use, yet accurate, analysis capabilities for use in evaluating the criticality safety, shielding, and heat transfer aspects of transportation packages for radioactive material. The Criticality Safety Analysis Sequences (CSAS) for SCALE were developed to ''automate'' problem-dependent cross-section and material processing prior to execution of the wellestablished XSDRNPM or KENO codes for calculation of k eff . The criticality analysis sequences provided in SCALE-4 are summarized. The SCALE system continues to be maintained and enhanced by staff of the Computing Applications Division at Oak Ridge National Laboratory (ORNL). The purpose of this paper is to discuss recent work to improve system portability and user interfaces and to provide information on ongoing work to enhance the analysis capabilities

  15. Nucleic and amino acid sequences support structure-based viral classification

    OpenAIRE

    Robert M., Sinclair; Janne J., Ravantti; Dennis H., Bamford

    2017-01-01

    Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome, capsid and between individual capsid proteins (i.e. “capsid architecture”) are intimate and expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made...

  16. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

    OpenAIRE

    Sinclair, Robert M.; Ravantti, Janne J.; Bamford, Dennis H.

    2017-01-01

    ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classifi...

  17. Genome sequence of the acid-tolerant Desulfovibrio sp. DV isolated from the sediments of a Pb-Zn mine tailings dam in the Chita region, Russia

    Directory of Open Access Journals (Sweden)

    Anastasiia Kovaliova

    2017-03-01

    Full Text Available Here we report the draft genome sequence of the acid-tolerant Desulfovibrio sp. DV isolated from the sediments of a Pb-Zn mine tailings dam in the Chita region, Russia. The draft genome has a size of 4.9 Mb and encodes multiple K+-transporters and proton-consuming decarboxylases. The phylogenetic analysis based on concatenated ribosomal proteins revealed that strain DV clusters together with the acid-tolerant Desulfovibrio sp. TomC and Desulfovibrio magneticus. The draft genome sequence and annotation have been deposited at GenBank under the accession number MLBG00000000.

  18. Molecular cloning and sequence analysis of growth hormone cDNA of Neotropical freshwater fish Pacu (Piaractus mesopotamicus

    Directory of Open Access Journals (Sweden)

    Janeth Silva Pinheiro

    2008-01-01

    Full Text Available RT-PCR was used for amplifying Piaractus mesopotamicus growth hormone (GH cDNA obtained from mRNA extracted from pituitary cells. The amplified fragment was cloned and the complete cDNA sequence was determined. The cloned cDNA encompassed a sequence of 543 nucleotides that encoded a polypeptide of 178 amino acids corresponding to mature P. mesopotamicus GH. Comparison with other GH sequences showed a gap of 10 amino acids localized in the N terminus of the putative polypeptide of P. mesopotamicus. This same gap was also observed in other members of the family. Neighbor-joining tree analysis with GH sequences from fishes belonging to different taxonomic groups placed the P. mesopotamicus GH within the Otophysi group. To our knowledge, this is the first GH sequence of a Neotropical characiform fish deposited in GenBank.

  19. Phylogeny and classification of Dickeya based on multilocus sequence analysis.

    Science.gov (United States)

    Marrero, Glorimar; Schneider, Kevin L; Jenkins, Daniel M; Alvarez, Anne M

    2013-09-01

    Bacterial heart rot of pineapple reported in Hawaii in 2003 and reoccurring in 2006 was caused by an undetermined species of Dickeya. Classification of the bacterial strains isolated from infected pineapple to one of the recognized Dickeya species and their phylogenetic relationships with Dickeya were determined by a multilocus sequence analysis (MLSA), based on the partial gene sequences of dnaA, dnaJ, dnaX, gyrB and recN. Individual and concatenated gene phylogenies revealed that the strains form a clade with reference Dickeya sp. isolated from pineapple in Malaysia and are closely related to D. zeae; however, previous DNA-DNA reassociation values suggest that these strains do not meet the genomic threshold for consideration in D. zeae, and require further taxonomic analysis. An analysis of the markers used in this MLSA determined that recN was the best overall marker for resolution of species within Dickeya. Differential intraspecies resolution was observed with the other markers, suggesting that marker selection is important for defining relationships within a clade. Phylogenies produced with gene sequences from the sequenced genomes of strains D. dadantii Ech586, D. dadantii Ech703 and D. zeae Ech1591 did not place the sequenced strains with members of other well-characterized members of their respective species. The average nucleotide identity (ANI) and tetranucleotide frequencies determined for the sequenced strains corroborated the results of the MLSA that D. dadantii Ech586 and D. dadantii Ech703 should be reclassified as Dickeya zeae Ech586 and Dickeya paradisiaca Ech703, respectively, whereas D. zeae Ech1591 should be reclassified as Dickeya chrysanthemi Ech1591.

  20. Fatty acid profile and unigene-derived simple sequence repeat markers in tung tree (Vernicia fordii.

    Directory of Open Access Journals (Sweden)

    Lin Zhang

    Full Text Available Tung tree (Vernicia fordii provides the sole source of tung oil widely used in industry. Lack of fatty acid composition and molecular markers hinders biochemical, genetic and breeding research. The objectives of this study were to determine fatty acid profiles and develop unigene-derived simple sequence repeat (SSR markers in tung tree. Fatty acid profiles of 41 accessions showed that the ratio of α-eleostearic acid was increasing continuously with a parallel trend to the amount of tung oil accumulation while the ratios of other fatty acids were decreasing in different stages of the seeds and that α-eleostearic acid (18∶3 consisted of 77% of the total fatty acids in tung oil. Transcriptome sequencing identified 81,805 unigenes from tung cDNA library constructed using seed mRNA and discovered 6,366 SSRs in 5,404 unigenes. The di- and tri-nucleotide microsatellites accounted for 92% of the SSRs with AG/CT and AAG/CTT being the most abundant SSR motifs. Fifteen polymorphic genic-SSR markers were developed from 98 unigene loci tested in 41 cultivated tung accessions by agarose gel and capillary electrophoresis. Genbank database search identified 10 of them putatively coding for functional proteins. Quantitative PCR demonstrated that all 15 polymorphic SSR-associated unigenes were expressed in tung seeds and some of them were highly correlated with oil composition in the seeds. Dendrogram revealed that most of the 41 accessions were clustered according to the geographic region. These new polymorphic genic-SSR markers will facilitate future studies on genetic diversity, molecular fingerprinting, comparative genomics and genetic mapping in tung tree. The lipid profiles in the seeds of 41 tung accessions will be valuable for biochemical and breeding studies.

  1. Complete motif analysis of sequence requirements for translation initiation at non-AUG start codons.

    Science.gov (United States)

    Diaz de Arce, Alexander J; Noderer, William L; Wang, Clifford L

    2018-01-25

    The initiation of mRNA translation from start codons other than AUG was previously believed to be rare and of relatively low impact. More recently, evidence has suggested that as much as half of all translation initiation utilizes non-AUG start codons, codons that deviate from AUG by a single base. Furthermore, non-AUG start codons have been shown to be involved in regulation of expression and disease etiology. Yet the ability to gauge expression based on the sequence of a translation initiation site (start codon and its flanking bases) has been limited. Here we have performed a comprehensive analysis of translation initiation sites that utilize non-AUG start codons. By combining genetic-reporter, cell-sorting, and high-throughput sequencing technologies, we have analyzed the expression associated with all possible variants of the -4 to +4 positions of non-AUG translation initiation site motifs. This complete motif analysis revealed that 1) with the right sequence context, certain non-AUG start codons can generate expression comparable to that of AUG start codons, 2) sequence context affects each non-AUG start codon differently, and 3) initiation at non-AUG start codons is highly sensitive to changes in the flanking sequences. Complete motif analysis has the potential to be a key tool for experimental and diagnostic genomics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  3. Draft Genome Sequence of Acid-Tolerant Clostridium drakei SL1T, a Potential Chemical Producer through Syngas Fermentation

    Science.gov (United States)

    Jeong, Yujin; Song, Yoseb; Shin, Hyeon Seok

    2014-01-01

    Clostridium drakei SL1T is a strictly anaerobic, H2-utilizing, and acid-tolerant acetogen isolated from an acidic sediment that is a potential platform for commodity chemical production from syngas fermentation. The draft genome sequence of this strain will enable determination of the acid resistance and autotrophic pathway of the acetogen. PMID:24831144

  4. Analysis of Sequence Diagram Layout in Advanced UML Modelling Tools

    Directory of Open Access Journals (Sweden)

    Ņikiforova Oksana

    2016-05-01

    Full Text Available System modelling using Unified Modelling Language (UML is the task that should be solved for software development. The more complex software becomes the higher requirements are stated to demonstrate the system to be developed, especially in its dynamic aspect, which in UML is offered by a sequence diagram. To solve this task, the main attention is devoted to the graphical presentation of the system, where diagram layout plays the central role in information perception. The UML sequence diagram due to its specific structure is selected for a deeper analysis on the elements’ layout. The authors research represents the abilities of modern UML modelling tools to offer automatic layout of the UML sequence diagram and analyse them according to criteria required for the diagram perception.

  5. Evolutionary analysis of hepatitis C virus gene sequences from 1953

    Science.gov (United States)

    Gray, Rebecca R.; Tanaka, Yasuhito; Takebe, Yutaka; Magiorkinis, Gkikas; Buskell, Zelma; Seeff, Leonard; Alter, Harvey J.; Pybus, Oliver G.

    2013-01-01

    Reconstructing the transmission history of infectious diseases in the absence of medical or epidemiological records often relies on the evolutionary analysis of pathogen genetic sequences. The precision of evolutionary estimates of epidemic history can be increased by the inclusion of sequences derived from ‘archived’ samples that are genetically distinct from contemporary strains. Historical sequences are especially valuable for viral pathogens that circulated for many years before being formally identified, including HIV and the hepatitis C virus (HCV). However, surprisingly few HCV isolates sampled before discovery of the virus in 1989 are currently available. Here, we report and analyse two HCV subgenomic sequences obtained from infected individuals in 1953, which represent the oldest genetic evidence of HCV infection. The pairwise genetic diversity between the two sequences indicates a substantial period of HCV transmission prior to the 1950s, and their inclusion in evolutionary analyses provides new estimates of the common ancestor of HCV in the USA. To explore and validate the evolutionary information provided by these sequences, we used a new phylogenetic molecular clock method to estimate the date of sampling of the archived strains, plus the dates of four more contemporary reference genomes. Despite the short fragments available, we conclude that the archived sequences are consistent with a proposed sampling date of 1953, although statistical uncertainty is large. Our cross-validation analyses suggest that the bias and low statistical power observed here likely arise from a combination of high evolutionary rate heterogeneity and an unstructured, star-like phylogeny. We expect that attempts to date other historical viruses under similar circumstances will meet similar problems. PMID:23938759

  6. IS1222: analysis and distribution of a new insertion sequence in Enterobacter agglomerans 339.

    Science.gov (United States)

    Steibl, H D; Lewecke, F M

    1995-04-14

    With a length of 1221 bp and 44-bp inverted repeats with ten mismatches, IS1222 was identified as an endogenous insertion sequence in Enterobacter agglomerans 339. In this host strain, four copies were located, three on the nif plasmid pEA9 and one at the chromosome. Sequence analysis showed two consecutive open reading frames, orfA and orfB, encoding putative polypeptides of 87 and 276 amino acids. In-between both reading frames, a potential frameshift window of the homonucleotide type was postulated, followed by a pseudoknot structure and a ribosome-binding site. Based on significant homology at the sequence level and similarity of the features discussed, IS1222 was placed among the group of IS3 elements with IS407, IS476 and ISR1 being the most closely related IS. Hybridization experiments suggest that the distribution of IS1222 is limited to a group of related bacterial strains among Enterobacteriaceae.

  7. A general sequence processing and analysis program for protein engineering.

    Science.gov (United States)

    Stafford, Ryan L; Zimmerman, Erik S; Hallam, Trevor J; Sato, Aaron K

    2014-10-27

    Protein engineering projects often amass numerous raw DNA sequences, but no readily available software combines sequence processing and activity correlation required for efficient lead identification. XLibraryDisplay is an open source program integrated into Microsoft Excel for Windows that automates batch sequence processing via a simple step-by-step, menu-driven graphical user interface. XLibraryDisplay accepts any DNA template which is used as a basis for trimming, filtering, translating, and aligning hundreds to thousands of sequences (raw, FASTA, or Phred PHD file formats). Key steps for library characterization through lead discovery are available including library composition analysis, filtering by experimental data, graphing and correlating to experimental data, alignment to structural data extracted from PDB files, and generation of PyMOL visualization scripts. Though larger data sets can be handled, the program is best suited for analyzing approximately 10 000 or fewer leads or naïve clones which have been characterized using Sanger sequencing and other experimental approaches. XLibraryDisplay can be downloaded for free from sourceforge.net/projects/xlibrarydisplay/ .

  8. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1

    Science.gov (United States)

    Rhee, Mun Su; Moritz, Brélan E.; Xie, Gary; Glavina del Rio, T.; Dalin, E.; Tice, H.; Bruce, D.; Goodwin, L.; Chertkov, O.; Brettin, T.; Han, C.; Detter, C.; Pitluck, S.; Land, Miriam L.; Patel, Milind; Ou, Mark; Harbrucker, Roberta; Ingram, Lonnie O.; Shanmugam, K. T.

    2011-01-01

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 °C and pH 5.0 and ferments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 °C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemicellulose. This bacterium is also considered as a potential probiotic. Complete genome sequence of a representative strain, B. coagulans strain 36D1, is presented and discussed. PMID:22675583

  9. Isolation and complete amino acid sequence of human thymopoietin and splenin

    International Nuclear Information System (INIS)

    Audhya, T.; Schlesinger, D.H.; Goldstein, G.

    1987-01-01

    Human thymopoietin and splenin were isolated from human thymus and spleen, respectively, by monitoring tissue fractionation with a bovine thymopoietin RIA cross-reactive with human thymopoietin and splenin. Bovine thymopoietin and splenin are 49-amino acid polypeptides that differ by only 2 amino acids at positions 34 and 43; the change at position 34 in the active-site region changes the receptor specificities and biological activities. The complete amino acid sequences of purified human thymopoietin and splenin were determined and shown to be 48-amino acid polypeptides differing at four positions. Ten amino acids, constant within each species for thymopoietin and splenin, differ between the human and bovine polypeptides. The pentapeptide active side of thymopoietin (residues 32-36) is constant between the human and bovine thymopoietins, but position 34 in the active site of splenin has changed from glutamic acid in bovine splenin to alanine in human splenin, accounting for the biological activity of the human but not the bovine splenin on the human T-cell line MOLT-4

  10. Characterization of squid enolase mRNA: sequence analysis, tissue distribution, and axonal localization.

    Science.gov (United States)

    Chun, J T; Gioio, A E; Crispino, M; Giuditta, A; Kaplan, B B

    1995-08-01

    Enolase is a glycolytic enzyme whose amino acid sequence is highly conserved across a wide range of animal species. In mammals, enolase is known to be a dimeric protein composed of distinct but closely related subunits: alpha (non-neuronal), beta (muscle-specific), and gamma (neuron-specific). However, little information is available on the primary sequence of enolase in invertebrates. Here we report the isolation of two overlapping cDNA clones and the putative primary structure of the enzyme from the squid (Loligo pealii) nervous system. The composite sequence of those cDNA clones is 1575 bp and contains the entire coding region (1302 bp), as well as 66 and 207 bp of 5' and 3' untranslated sequence, respectively. Cross-species comparison of enolase primary structure reveals that squid enolase shares over 70% sequence identity to vertebrate forms of the enzyme. The greatest degree of sequence similarity was manifest to the alpha isoform of the human homologue. Results of Northern analysis revealed a single 1.6 kb mRNA species, the relative abundance of which differs approximately 10-fold between various tissues. Interestingly, evidence derived from in situ hybridization and polymerase chain reaction experiments indicate that the mRNA encoding enolase is present in the squid giant axon.

  11. Now and next-generation sequencing techniques: future of sequence analysis using cloud computing.

    Science.gov (United States)

    Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav

    2012-01-01

    Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed "cloud computing") has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows.

  12. Now And Next Generation Sequencing Techniques: Future of Sequence Analysis using Cloud Computing

    Directory of Open Access Journals (Sweden)

    Radhe Shyam Thakur

    2012-12-01

    Full Text Available Advancements in the field of sequencing techniques resulted in the huge sequenced data to be produced at a very faster rate. It is going cumbersome for the datacenter to maintain the databases. Data mining and sequence analysis approaches needs to analyze the databases several times to reach any efficient conclusion. To cope with such overburden on computer resources and to reach efficient and effective conclusions quickly, the virtualization of the resources and computation on pay as you go concept was introduced and termed as cloud computing. The datacenter’s hardware and software is collectively known as cloud which when available publicly is termed as public cloud. The datacenter’s resources are provided in a virtual mode to the clients via a service provider like Amazon, Google and Joyent which charges on pay as you go manner. The workload is shifted to the provider which is maintained by the required hardware and software upgradation. The service provider manages it by upgrading the requirements in the virtual mode. Basically a virtual environment is created according to the need of the user by taking permission from datacenter via internet, the task is performed and the environment is deleted after the task is over. In this discussion, we are focusing on the basics of cloud computing, the prerequisites and overall working of clouds. Furthermore, briefly the applications of cloud computing in biological systems, especially in comparative genomics, genome informatics and SNP detection with reference to traditional workflow are discussed.

  13. SEQUENCING AND SEQUENCE ANALYSIS OF MYOSTATIN GENE IN THE EXON 1 OF THE CAMEL (CAMELUS DROMEDARIUS

    Directory of Open Access Journals (Sweden)

    M. G. SHAH, A. S. QURESHI1, M. REISSMANN2 AND H. J. SCHWARTZ3

    2006-10-01

    Full Text Available Myostatin, also called growth differentiation factor-8 (GDF-8, is a member of the mammalian growth transforming family (TGF-beta superfamily, which is expressed specifically in developing an adult skeletal muscle. Muscular hypertrophy allele (mh allele in the double muscle breeds involved mutation within the myostatin gene. Genomic DNA was isolated from the camel hair using NucleoSpin Tissue kit. Two animals of each of the six breeds namely, Marecha, Dhatti, Larri, Kohi, Sakrai and Cambelpuri were used for sequencing. For PCR amplification of the gene, a primer pair was designed from homolog regions of already published sequences of farm animals from GenBank. Results showed that camel myostatin possessed more than 90% homology with that of cattle, sheep and pig. Camel formed separate cluster from the pig in spite of having high homology (98% and showed 94% homology with cattle and sheep as reported in literature. Sequence analysis of the PCR amplified part of exon 1 (256 bp of the camel myostatin was identical among six camel breeds.

  14. Infrared thermal facial image sequence registration analysis and verification

    Science.gov (United States)

    Chen, Chieh-Li; Jian, Bo-Lin

    2015-03-01

    To study the emotional responses of subjects to the International Affective Picture System (IAPS), infrared thermal facial image sequence is preprocessed for registration before further analysis such that the variance caused by minor and irregular subject movements is reduced. Without affecting the comfort level and inducing minimal harm, this study proposes an infrared thermal facial image sequence registration process that will reduce the deviations caused by the unconscious head shaking of the subjects. A fixed image for registration is produced through the localization of the centroid of the eye region as well as image translation and rotation processes. Thermal image sequencing will then be automatically registered using the two-stage genetic algorithm proposed. The deviation before and after image registration will be demonstrated by image quality indices. The results show that the infrared thermal image sequence registration process proposed in this study is effective in localizing facial images accurately, which will be beneficial to the correlation analysis of psychological information related to the facial area.

  15. Multilocus sequence analysis of Treponema denticola strains of diverse origin

    Science.gov (United States)

    2013-01-01

    Background The oral spirochete bacterium Treponema denticola is associated with both the incidence and severity of periodontal disease. Although the biological or phenotypic properties of a significant number of T. denticola isolates have been reported in the literature, their genetic diversity or phylogeny has never been systematically investigated. Here, we describe a multilocus sequence analysis (MLSA) of 20 of the most highly studied reference strains and clinical isolates of T. denticola; which were originally isolated from subgingival plaque samples taken from subjects from China, Japan, the Netherlands, Canada and the USA. Results The sequences of the 16S ribosomal RNA gene, and 7 conserved protein-encoding genes (flaA, recA, pyrH, ppnK, dnaN, era and radC) were successfully determined for each strain. Sequence data was analyzed using a variety of bioinformatic and phylogenetic software tools. We found no evidence of positive selection or DNA recombination within the protein-encoding genes, where levels of intraspecific sequence polymorphism varied from 18.8% (flaA) to 8.9% (dnaN). Phylogenetic analysis of the concatenated protein-encoding gene sequence data (ca. 6,513 nucleotides for each strain) using Bayesian and maximum likelihood approaches indicated that the T. denticola strains were monophyletic, and formed 6 well-defined clades. All analyzed T. denticola strains appeared to have a genetic origin distinct from that of ‘Treponema vincentii’ or Treponema pallidum. No specific geographical relationships could be established; but several strains isolated from different continents appear to be closely related at the genetic level. Conclusions Our analyses indicate that previous biological and biophysical investigations have predominantly focused on a subset of T. denticola strains with a relatively narrow range of genetic diversity. Our methodology and results establish a genetic framework for the discrimination and phylogenetic analysis of T

  16. Multilocus sequence analysis of Treponema denticola strains of diverse origin

    Directory of Open Access Journals (Sweden)

    Mo Sisu

    2013-02-01

    Full Text Available Abstract Background The oral spirochete bacterium Treponema denticola is associated with both the incidence and severity of periodontal disease. Although the biological or phenotypic properties of a significant number of T. denticola isolates have been reported in the literature, their genetic diversity or phylogeny has never been systematically investigated. Here, we describe a multilocus sequence analysis (MLSA of 20 of the most highly studied reference strains and clinical isolates of T. denticola; which were originally isolated from subgingival plaque samples taken from subjects from China, Japan, the Netherlands, Canada and the USA. Results The sequences of the 16S ribosomal RNA gene, and 7 conserved protein-encoding genes (flaA, recA, pyrH, ppnK, dnaN, era and radC were successfully determined for each strain. Sequence data was analyzed using a variety of bioinformatic and phylogenetic software tools. We found no evidence of positive selection or DNA recombination within the protein-encoding genes, where levels of intraspecific sequence polymorphism varied from 18.8% (flaA to 8.9% (dnaN. Phylogenetic analysis of the concatenated protein-encoding gene sequence data (ca. 6,513 nucleotides for each strain using Bayesian and maximum likelihood approaches indicated that the T. denticola strains were monophyletic, and formed 6 well-defined clades. All analyzed T. denticola strains appeared to have a genetic origin distinct from that of ‘Treponema vincentii’ or Treponema pallidum. No specific geographical relationships could be established; but several strains isolated from different continents appear to be closely related at the genetic level. Conclusions Our analyses indicate that previous biological and biophysical investigations have predominantly focused on a subset of T. denticola strains with a relatively narrow range of genetic diversity. Our methodology and results establish a genetic framework for the discrimination and phylogenetic

  17. Automated sequence analysis of atmospheric oxidation pathways: SEQUENCE version 1.0

    Directory of Open Access Journals (Sweden)

    T. M. Butler

    2009-10-01

    Full Text Available An algorithm for the sequential analysis of the atmospheric oxidation of chemical species using output from a photochemical model is presented. Starting at a "root species", the algorithm traverses all possible reaction sequences which consume this species, and lead, via intermediate products, to final products. The algorithm keeps track of the effects of all of these reactions on their respective reactants and products. Upon completion, the algorithm has built a detailed picture of the effects of the oxidation of the root species on its chemical surroundings. The output of the algorithm can be used to determine product yields, radical recycling fractions, and ozone production potentials of arbitrary chemical species.

  18. A stochastic model for EEG microstate sequence analysis.

    Science.gov (United States)

    Gärtner, Matthias; Brodbeck, Verena; Laufs, Helmut; Schneider, Gaby

    2015-01-01

    The analysis of spontaneous resting state neuronal activity is assumed to give insight into the brain function. One noninvasive technique to study resting state activity is electroencephalography (EEG) with a subsequent microstate analysis. This technique reduces the recorded EEG signal to a sequence of prototypical topographical maps, which is hypothesized to capture important spatio-temporal properties of the signal. In a statistical EEG microstate analysis of healthy subjects in wakefulness and three stages of sleep, we observed a simple structure in the microstate transition matrix. It can be described with a first order Markov chain in which the transition probability from the current state (i.e., map) to a different map does not depend on the current map. The resulting transition matrix shows a high agreement with the observed transition matrix, requiring only about 2% of mass transport (1/2 L1-distance). In the second part, we introduce an extended framework in which the simple Markov chain is used to make inferences on a potential underlying time continuous process. This process cannot be directly observed and is therefore usually estimated from discrete sampling points of the EEG signal given by the local maxima of the global field power. Therefore, we propose a simple stochastic model called sampled marked intervals (SMI) model that relates the observed sequence of microstates to an assumed underlying process of background intervals and thus, complements approaches that focus on the analysis of observable microstate sequences. Copyright © 2014 Elsevier Inc. All rights reserved.

  19. Congruence analysis of point clouds from unstable stereo image sequences

    Directory of Open Access Journals (Sweden)

    C. Jepping

    2014-06-01

    Full Text Available This paper deals with the correction of exterior orientation parameters of stereo image sequences over deformed free-form surfaces without control points. Such imaging situation can occur, for example, during photogrammetric car crash test recordings where onboard high-speed stereo cameras are used to measure 3D surfaces. As a result of such measurements 3D point clouds of deformed surfaces are generated for a complete stereo sequence. The first objective of this research focusses on the development and investigation of methods for the detection of corresponding spatial and temporal tie points within the stereo image sequences (by stereo image matching and 3D point tracking that are robust enough for a reliable handling of occlusions and other disturbances that may occur. The second objective of this research is the analysis of object deformations in order to detect stable areas (congruence analysis. For this purpose a RANSAC-based method for congruence analysis has been developed. This process is based on the sequential transformation of randomly selected point groups from one epoch to another by using a 3D similarity transformation. The paper gives a detailed description of the congruence analysis. The approach has been tested successfully on synthetic and real image data.

  20. The amino acid sequence of pike-whale (lesser-rorqual) pancreatic ribonuclease.

    Science.gov (United States)

    Emmens, M; Welling, G W; Beintema, J J

    1976-08-01

    Pancreatic RNAase (ribonuclease) from the pike whale (lesser rorqual, Balaenoptera acutorostrata) was isolated by affinity chromatography. The protein was digested with different proteolytic enzymes. Peptides were isolated by gel filtration, preparative high-voltage paper electrophoresis and paper chromatography. The amino acid sequence of peptides was determined by the dansyl-Edman method. Although we do not have an amino acid composition for the whole protein, all peptide bonds were overlapped by one or more peptides. Residues 85-96 are bridged by a peptide of unstaisfactory composition and the sequence here depends, at least in part, on homology for its confirmation. Another region in which a similar situation obtains is residues 39-40. This pancreatic RNAase differs at 24-33% of the positions from all other mammalian pancreatic RNAases sequenced to date, except for pig RNAase, from which it differs by 19%. This indicates that whale RNAase has evolved independently during the larger part of the evolution of the mammals. Lesser-rorqual pancreatic RNAase is partially glycosidated (30%) at asparagine-76 in an Asn-Ser-Thr sequence (residues 76-78). Pig RNAase also has carbohydrate attached to asparagine-76 and is identical with lesser-rorqual RNAase in residues 76-98. Detailed evidence for the sequence has been deposited as Supplementary Publication SUP 50066 (11 pages) at the British Library Lending Division, Boston Spa, Wetherby, W. Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms ginen in Biochem. J. (1976) 135, 5.

  1. sRNAnalyzer-a flexible and customizable small RNA sequencing data analysis pipeline.

    Science.gov (United States)

    Wu, Xiaogang; Kim, Taek-Kyun; Baxter, David; Scherler, Kelsey; Gordon, Aaron; Fong, Olivia; Etheridge, Alton; Galas, David J; Wang, Kai

    2017-12-01

    Although many tools have been developed to analyze small RNA sequencing (sRNA-Seq) data, it remains challenging to accurately analyze the small RNA population, mainly due to multiple sequence ID assignment caused by short read length. Additional issues in small RNA analysis include low consistency of microRNA (miRNA) measurement results across different platforms, miRNA mapping associated with miRNA sequence variation (isomiR) and RNA editing, and the origin of those unmapped reads after screening against all endogenous reference sequence databases. To address these issues, we built a comprehensive and customizable sRNA-Seq data analysis pipeline-sRNAnalyzer, which enables: (i) comprehensive miRNA profiling strategies to better handle isomiRs and summarization based on each nucleotide position to detect potential SNPs in miRNAs, (ii) different sequence mapping result assignment approaches to simulate results from microarray/qRT-PCR platforms and a local probabilistic model to assign mapping results to the most-likely IDs, (iii) comprehensive ribosomal RNA filtering for accurate mapping of exogenous RNAs and summarization based on taxonomy annotation. We evaluated our pipeline on both artificial samples (including synthetic miRNA and Escherichia coli cultures) and biological samples (human tissue and plasma). sRNAnalyzer is implemented in Perl and available at: http://srnanalyzer.systemsbiology.net/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Frame to Frame Diffeomorphic Motion Analysis from Echocardiographic Sequences

    OpenAIRE

    Zhang, Zhijun; Sahn, David; Song, Xubo

    2011-01-01

    International audience; Quantitative motion analysis from echocardiography is an important yet challenging problem. We develop a motion estimation algorithm for echocardiographic image sequences based on diffeomorphic image registration in which the velocity field is spatiotemporally smooth. The novelty of this work is that instead of optimizing a functional of velocity field which consists of similarity metrics between a reference image to each of the following images (\\textitfirst-to-follow...

  3. The sequence and analysis of Trypanosoma brucei chromosome II

    OpenAIRE

    El-Sayed, Najib M. A.; Ghedin, Elodie; Song, Jinming; MacLeod, Annette; Bringaud, Frederic; Larkin, Christopher; Wanless, David; Peterson, Jeremy; Hou, Lihua; Taylor, Sonya; Tweedie, Alison; Biteau, Nicolas; Khalak, Hanif G.; Lin, Xiaoying; Mason, Tanya

    2003-01-01

    We report here the sequence of chromosome II from Trypanosoma brucei, the causative agent of African sleeping sickness. The 1.2-Mb pairs encode about 470 predicted genes organised in 17 directional clusters on either strand, the largest cluster of which has 92 genes lined up over a 284-kb region. An analysis of the GC skew reveals strand compositional asymmetries that coincide with the distribution of protein-coding genes, suggesting these asymmetries may be the result of transcription-couple...

  4. Antibody V and C domain sequence, structure, and interaction analysis with special reference to IMGT®.

    Science.gov (United States)

    Alamyar, Eltaf; Giudicelli, Véronique; Duroux, Patrice; Lefranc, Marie-Paule

    2014-01-01

    IMGT(®), the international ImMunoGeneTics information system(®) (http://www.imgt.org), created in 1989 (Centre National de la Recherche Scientifique, Montpellier University), is acknowledged as the global reference in immunogenetics and immunoinformatics. The accuracy and the consistency of the IMGT(®) data are based on IMGT-ONTOLOGY which bridges the gap between genes, sequences, and three-dimensional (3D) structures. Thus, receptors, chains, and domains are characterized with the same IMGT(®) rules and standards (IMGT standardized labels, IMGT gene and allele nomenclature, IMGT unique numbering, IMGT Collier de Perles), independently from the molecule type (genomic DNA, complementary DNA, transcript, or protein) or from the species. More particularly, IMGT(®) tools and databases provide a highly standardized analysis of the immunoglobulin (IG) or antibody and T cell receptor (TR) V and C domains. IMGT/V-QUEST analyzes the V domains of IG or TR rearranged nucleotide sequences, integrates the IMGT/JunctionAnalysis and IMGT/Automat tools, and provides IMGT Collier de Perles. IMGT/HighV-QUEST analyzes sequences from high-throughput sequencing (HTS) (up to 150,000 sequences per batch) and performs statistical analysis on up to 450,000 results, with the same resolution and high quality as IMGT/V-QUEST online. IMGT/DomainGapAlign analyzes amino acid sequences of V and C domains and IMGT/3Dstructure-DB and associated tools provide information on 3D structures, contact analysis, and paratope/epitope interactions. These IMGT(®) tools and databases, and the IMGT/mAb-DB interface with access to therapeutical antibody data, provide an invaluable help for antibody engineering and antibody humanization.

  5. Method for predicting homology modeling accuracy from amino acid sequence alignment: the power function.

    Science.gov (United States)

    Iwadate, Mitsuo; Kanou, Kazuhiko; Terashi, Genki; Umeyama, Hideaki; Takeda-Shitaka, Mayuko

    2010-01-01

    We have devised a power function (PF) that can predict the accuracy of a three-dimensional (3D) structure model of a protein using only amino acid sequence alignments. This Power Function (PF) consists of three parts; (1) the length of a model, (2) a homology identity percent value and (3) the agreement rate between PSI-PRED secondary structure prediction and the secondary structure judgment of a reference protein. The PF value is mathematically computed from the execution process of homology search tools, such as FASTA or various BLAST programs, to obtain the amino acid sequence alignments. There is a high correlation between the global distance test-total score (GDT_TS) value of the protein model based upon the PF score and the GDT_TS(MAX) value used as an index of protein modeling accuracy in the international contest Critical Assessment of Techniques for Protein Structure Prediction (CASP). Accordingly, the PF method is valuable for constructing a highly accurate model without wasteful calculations of homology modeling that is normally performed by an iterative method to move the main chain and side chains in the modeling process. Moreover, a model with higher accuracy can be obtained by combining the models ordered by the PF score with models sorted by the size of the CIRCLE score. The CIRCLE software is a 3D-1D program, in which energetic stabilization is estimated based upon the experimental environment of each amino acid residue in the protein solution or protein crystals.

  6. Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs

    Directory of Open Access Journals (Sweden)

    Ruan Jishou

    2007-04-01

    Full Text Available Abstract Background Traditionally, it is believed that the native structure of a protein corresponds to a global minimum of its free energy. However, with the growing number of known tertiary (3D protein structures, researchers have discovered that some proteins can alter their structures in response to a change in their surroundings or with the help of other proteins or ligands. Such structural shifts play a crucial role with respect to the protein function. To this end, we propose a machine learning method for the prediction of the flexible/rigid regions of proteins (referred to as FlexRP; the method is based on a novel sequence representation and feature selection. Knowledge of the flexible/rigid regions may provide insights into the protein folding process and the 3D structure prediction. Results The flexible/rigid regions were defined based on a dataset, which includes protein sequences that have multiple experimental structures, and which was previously used to study the structural conservation of proteins. Sequences drawn from this dataset were represented based on feature sets that were proposed in prior research, such as PSI-BLAST profiles, composition vector and binary sequence encoding, and a newly proposed representation based on frequencies of k-spaced amino acid pairs. These representations were processed by feature selection to reduce the dimensionality. Several machine learning methods for the prediction of flexible/rigid regions and two recently proposed methods for the prediction of conformational changes and unstructured regions were compared with the proposed method. The FlexRP method, which applies Logistic Regression and collocation-based representation with 95 features, obtained 79.5% accuracy. The two runner-up methods, which apply the same sequence representation and Support Vector Machines (SVM and Naïve Bayes classifiers, obtained 79.2% and 78.4% accuracy, respectively. The remaining considered methods are

  7. Nucleic and amino acid sequences relating to a novel transketolase, and methods for the expression thereof

    Energy Technology Data Exchange (ETDEWEB)

    Croteau, Rodney Bruce (Pullman, WA); Wildung, Mark Raymond (Colfax, WA); Lange, Bernd Markus (Pullman, WA); McCaskill, David G. (Pullman, WA)

    2001-01-01

    cDNAs encoding 1-deoxyxylulose-5-phosphate synthase from peppermint (Mentha piperita) have been isolated and sequenced, and the corresponding amino acid sequences have been determined. Accordingly, isolated DNA sequences (SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7) are provided which code for the expression of 1-deoxyxylulose-5-phosphate synthase from plants. In another aspect the present invention provides for isolated, recombinant DXPS proteins, such as the proteins having the sequences set forth in SEQ ID NO:4, SEQ ID NO:6 and SEQ ID NO:8. In other aspects, replicable recombinant cloning vehicles are provided which code for plant 1-deoxyxylulose-5-phosphate synthases, or for a base sequence sufficiently complementary to at least a portion of 1-deoxyxylulose-5-phosphate synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding a plant 1-deoxyxylulose-5-phosphate synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant 1-deoxyxylulose-5-phosphate synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant 1-deoxyxylulose-5-phosphate synthase may be used to obtain expression or enhanced expression of 1-deoxyxylulose-5-phosphate synthase in plants in order to enhance the production of 1-deoxyxylulose-5-phosphate, or its derivatives such as isopentenyl diphosphate (BP), or may be otherwise employed for the regulation or expression of 1-deoxyxylulose-5-phosphate synthase, or the production of its products.

  8. CPSS: a computational platform for the analysis of small RNA deep sequencing data.

    Science.gov (United States)

    Zhang, Yuanwei; Xu, Bo; Yang, Yifan; Ban, Rongjun; Zhang, Huan; Jiang, Xiaohua; Cooke, Howard J; Xue, Yu; Shi, Qinghua

    2012-07-15

    Next generation sequencing (NGS) techniques have been widely used to document the small ribonucleic acids (RNAs) implicated in a variety of biological, physiological and pathological processes. An integrated computational tool is needed for handling and analysing the enormous datasets from small RNA deep sequencing approach. Herein, we present a novel web server, CPSS (a computational platform for the analysis of small RNA deep sequencing data), designed to completely annotate and functionally analyse microRNAs (miRNAs) from NGS data on one platform with a single data submission. Small RNA NGS data can be submitted to this server with analysis results being returned in two parts: (i) annotation analysis, which provides the most comprehensive analysis for small RNA transcriptome, including length distribution and genome mapping of sequencing reads, small RNA quantification, prediction of novel miRNAs, identification of differentially expressed miRNAs, piwi-interacting RNAs and other non-coding small RNAs between paired samples and detection of miRNA editing and modifications and (ii) functional analysis, including prediction of miRNA targeted genes by multiple tools, enrichment of gene ontology terms, signalling pathway involvement and protein-protein interaction analysis for the predicted genes. CPSS, a ready-to-use web server that integrates most functions of currently available bioinformatics tools, provides all the information wanted by the majority of users from small RNA deep sequencing datasets. CPSS is implemented in PHP/PERL+MySQL+R and can be freely accessed at http://mcg.ustc.edu.cn/db/cpss/index.html or http://mcg.ustc.edu.cn/sdap1/cpss/index.html.

  9. Analysis on Response of Dynamic Systems to Pulse Sequences Excitation

    Directory of Open Access Journals (Sweden)

    Xie Lili

    2009-07-01

    Full Text Available Near-fault ground motions with long-period pulses can place severe demands on structures near an active fault. These pulse-type ground motions can be represented by pulse sequences with simple shapes. Half-sinusoidal pulse sequences are used to approximate recorded ground motions and dynamic responses of SDOF system under the excitation of these pulse sequences are studied. Four cases are considered: (1 variation in duration of successor sub-pulse; (2 variation in duration of predecessor sub-pulse; (3 variation in amplitude of successor sub-pulse; and (4 variation in amplitude of predecessor sub-pulse. The corresponding acceleration, velocity and displacement response spectra of these pulse sequences are studied. The analysis on SDOF system shows that in some cases the responses are strongly affected by the changes of duration and/or amplitude of the sub-pulse. The study can be useful to understand the influences of sub-pulse in the near-fault pulse-type ground motions.

  10. Sequence and phylogenetic analysis of chicken anaemia virus obtained from backyard and commercial chickens in Nigeria : research communication

    Directory of Open Access Journals (Sweden)

    D.O. Oluwayelu

    2008-09-01

    Full Text Available This work reports the first molecular analysis study of chicken anaemia virus (CAV in backyard chickens in Africa using molecular cloning and sequence analysis to characterize CAV strains obtained from commercial chickens and Nigerian backyard chickens. Partial VP1 gene sequences were determined for three CAVs from commercial chickens and for six CAV variants present in samples from a backyard chicken. Multiple alignment analysis revealed that the 6 % and 4 % nucleotide diversity obtained respectively for the commercial and backyard chicken strains translated to only 2 % amino acid diversity for each breed. Overall, the amino acid composition of Nigerian CAVs was found to be highly conserved. Since the partial VP1 gene sequence of two backyard chicken cloned CAV strains (NGR/Cl-8 and NGR/Cl-9 were almost identical and evolutionarily closely related to the commercial chicken strains NGR-1, and NGR-4 and NGR-5, respectively, we concluded that CAV infections had crossed the farm boundary.

  11. Identities among actin-encoding cDNAs of the Nile tilapia (Oreochromis niloticus and other eukaryote species revealed by nucleotide and amino acid sequence analyses

    Directory of Open Access Journals (Sweden)

    Andréia B. Poletto

    2008-01-01

    Full Text Available Actin-encoding cDNAs of Nile tilapia (Oreochromis niloticus were isolated by RT-PCR using total RNA samples of different tissues and further characterized by nucleotide sequencing and in silico amino acid (aa sequence analysis. Comparisons among the actin gene sequences of O. niloticus and those of other species evidenced that the isolated genes present a high similarity to other fish and other vertebrate actin genes. The highest nucleotide resemblance was observed between O. niloticus and O. mossambicus a-actin and b-actin genes. Analysis of the predicted aa sequences revealed two distinct types of cytoplasmic actins, one cardiac muscle actin type and one skeletal muscle actin type that were expressed in different tissues of Nile tilapia. The evolutionary relationships between the Nile tilapia actin genes and diverse other organisms is discussed.

  12. Unconventional amino acid sequence of the sun anemone (Stoichactis helianthus) polypeptide neurotoxin

    International Nuclear Information System (INIS)

    Kem, W.; Dunn, B.; Parten, B.; Pennington, M.; Price, D.

    1986-01-01

    A 5000 dalton polypeptide neurotoxin (Sh-NI) purified by G50 Sephadex, P-cellulose, and SP-Sephadex chromatography was homogeneous by isoelectric focusing. Sh-NI was highly toxic to crayfish (LD 50 0.6 μg/kg) but without effect upon mice at 15,000 μg/kg (i.p. injection). The reduced, 3 H-carboxymethylated toxin and its fragments were subjected to automatic Edman degradation and the resulting PTH-amino acids were identified by HPLC, back hydrolysis, and scintillation counting. Peptides resulting from proteolytic (clostripain, staphylococcal protease) and chemical (tryptophan) cleavage were sequenced. The sequence is: AACKCDDEGPDIRTAPLTGTVDLGSCNAGWEKCASYYTIIADCCRKKK. This sequence differs considerably from the homologous Anemonia and Anthopleura toxins; many of the identical residues (6 half-cystines, G9, P10, R13, G19, G29, W30) are probably critical for folding rather than receptor recognition. However, the Sh-NI sequence closely resembles Radioanthus macrodactylus neurotoxin III and r. paumotensis II. The authors propose that Sh-NI and related Radioanthus toxins act upon a different site on the sodium channel

  13. Environmental impact analysis for the main accidental sequences of ignitor

    International Nuclear Information System (INIS)

    Carpignano, A.; Francabandiera, S.; Vella, R.; Zucchetti, M.

    1996-01-01

    A safety analysis study has been applied to the Ignitor machine using Probabilistic Safety Assessment. The main initiating events have been identified, and accident sequences have been studied by means of traditional methods such as Failure Mode and Effect Analysis (FMEA), Fault Trees (FT) and Event Trees (ET). The consequences of the radioactive environmental releases have been assessed in terms of Effective Dose Equivalent (EDEs) to the Most Exposed Individuals (MEI) of the chosen site, by means of a population dose code. Results point out the low enviromental impact of the machine. 13 refs., 1 fig., 3 tabs

  14. Sequencing and phylogenetic analysis of neurotoxin gene from an environmental isolate of Clostridium sp.: comparison with other clostridial neurotoxins.

    Science.gov (United States)

    Dixit, Aparna; Alam, Syed Imteyaz; Singh, Lokendra

    2006-07-01

    A Clostridium sp. isolated from intestine of decaying fish exhibited 99% sequence identity with C. tetani at 16S rRNA level. It produced a neurotoxin that was neutralized by botulinum antitoxin (A+B+E) as well as tetanus antitoxin. The gene fragments for light chain, C-terminal and N-terminal regions of the heavy chain of the toxin were amplified using three reported primer sets for tetanus neurotoxin (TeNT). The neurotoxin gene fragments were cloned in Escherichia coli and sequenced. The sequences obtained exhibited approximately 98, 99 and 98% sequence identity with reported gene sequences of TeNT/LC, TeNT/HC and TeNT/HN, respectively. The phylogenetic interrelationship between the neurotoxin gene of Clostridium sp. with previously reported gene sequences of Clostridium botulinum A to G and C. tetani was examined by analysis of differences in the nucleotide sequences. Six amino acids were substituted at four different positions in the light chain of neurotoxin from the isolate when compared with the reported closest sequence of TeNT. Of these, four were located in the beta15 motif at a solvent inaccessible, buried region of the protein molecule. One of these substitutions were on the solvent accessible surface residue of alpha1 motif, previously shown to have strong sequence conservation. A substitution of two amino acids observed in N-terminal region of heavy chain were buried residues, located in the beta21 and beta37 motifs showing variability in other related sequences. The C-terminal region responsible for binding to receptor was conserved, showing no changes in the amino acid sequence.

  15. Comparison of nucleic acid sequence-based amplification and loop-mediated isothermal amplification for diagnosis of human African trypanosomiasis

    NARCIS (Netherlands)

    Mugasa, Claire M.; Katiti, Diana; Boobo, Alex; Lubega, George W.; Schallig, Henk D. F. H.; Matovu, Enock

    2014-01-01

    Diagnosis of human African trypanosomiasis (HAT) using molecular tests should ideally achieve high sensitivity without compromising specificity. This study compared 2 simplified tests, nucleic acid sequence-based amplification (NASBA) combined with oligochromatography (OC) and loop-mediated

  16. JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms.

    Science.gov (United States)

    Petrillo, Mauro; Angers-Loustau, Alexandre; Henriksson, Peter; Bonfini, Laura; Patak, Alex; Kreysa, Joachim

    2015-01-01

    The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico determination of PCR amplification with reference methods for GMO analysis. The European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) provides these methods in the GMOMETHODS database to support enforcement of EU legislation and GM food/feed control. The JRC GMO-Amplicons database is composed of more than 240 000 amplicons, which can be easily accessed and screened through a web interface. To our knowledge, this is the first attempt at pooling and collecting publicly available sequences related to GMOs in food and feed. The JRC GMO-Amplicons supports control laboratories in the design and assessment of GMO methods, providing inter-alia in silico prediction of primers specificity and GM targets coverage. The new tool can assist the laboratories in the analysis of complex issues, such as the detection and identification of unauthorized GMOs. Notably, the JRC GMO-Amplicons database allows the retrieval and characterization of GMO-related sequences included in patents documentation. Finally, it can help annotating poorly described GM sequences and identifying new relevant GMO-related sequences in public databases. The JRC GMO-Amplicons is freely accessible through a web-based portal that is hosted on the EU-RL GMFF website. Database URL: http://gmo-crl.jrc.ec.europa.eu/jrcgmoamplicons/. © The Author(s) 2015. Published by Oxford University Press.

  17. Novel Biochip Platform for Nucleic Acid Analysis

    Directory of Open Access Journals (Sweden)

    Juan J. Diaz-Mochon

    2012-06-01

    Full Text Available This manuscript describes the use of a novel biochip platform for the rapid analysis/identification of nucleic acids, including DNA and microRNAs, with very high specificity. This approach combines a unique dynamic chemistry approach for nucleic acid testing and analysis developed by DestiNA Genomics with the STMicroelectronics In-Check platform, which comprises two microfluidic optimized and independent PCR reaction chambers, and a sequential microarray area for nucleic acid capture and identification by fluorescence. With its compact bench-top “footprint” requiring only a single technician to operate, the biochip system promises to transform and expand routine clinical diagnostic testing and screening for genetic diseases, cancers, drug toxicology and heart disease, as well as employment in the emerging companion diagnostics market.

  18. Heterodimeric l-amino acid oxidase enzymes from Egyptian Cerastes cerastes venom: Purification, biochemical characterization and partial amino acid sequencing

    Directory of Open Access Journals (Sweden)

    A.E. El Hakim

    2015-12-01

    Full Text Available Two l-amino acid oxidase enzyme isoforms, Cc-LAAOI and Cc-LAAOII were purified to apparent homogeneity from Cerastes cerastes venom in a sequential two-step chromatographic protocol including; gel filtration and anion exchange chromatography. The native molecular weights of the isoforms were 115 kDa as determined by gel filtration on calibrated Sephacryl S-200 column, while the monomeric molecular weights of the enzymes were, 60, 56 kDa and 60, 53 kDa for LAAOI and LAAOII, respectively. The tryptic peptides of the two isoforms share high sequence homology with other snake venom l-amino acid oxidases. The optimal pH and temperature values of Cc-LAAOI and Cc-LAAOII were 7.8, 50 °C and 7, 60 °C, respectively. The two isoenzymes were thermally stable up to 70 °C. The Km and Vmax values were 0.67 mM, 0.135 μmol/min for LAAOI and 0.82 mM, 0.087 μmol/min for LAAOII. Both isoenzymes displayed high catalytic preference to long-chain, hydrophobic and aromatic amino acids. The Mn2+ ion markedly increased the LAAO activity for both purified isoforms, while Na+, K+, Ca2+, Mg2+ and Ba2+ ions showed a non-significant increase in the enzymatic activity of both isoforms. Furthermore, Zn2+, Ni2+, Co2+, Cu2+ and AL3+ ions markedly inhibited the LAAOI and LAAOII activities. l-Cysteine and reduced glutathione completely inhibited the LAAO activity of both isoenzymes, whereas, β-mercaptoethanol, O-phenanthroline and PMSF completely inhibited the enzymatic activity of LAAOII. Furthermore, iodoacitic acid inhibited the enzymatic activity of LAAOII by 46% and had no effect on the LAAOI activity.

  19. Sequence and transcription analysis of the human cytomegalovirus DNA polymerase gene

    International Nuclear Information System (INIS)

    Kouzarides, T.; Bankier, A.T.; Satchwell, S.C.; Weston, K.; Tomlinson, P.; Barrell, B.G.

    1987-01-01

    DNA sequence analysis has revealed that the gene coding for the human cytomegalovirus (HCMV) DNA polymerase is present within the long unique region of the virus genome. Identification is based on extensive amino acid homology between the predicted HCMV open reading frame HFLF2 and the DNA polymerase of herpes simplex virus type 1. The authors present here a 5280 base-pair DNA sequence containing the HCMV pol gene, along with the analysis of transcripts encoded within this region. Since HCMV pol also shows homology to the predicted Epstein-Barr virus pol, they were able to analyze the extent of homology between the DNA polymerases of three distantly related herpes viruses, HCMV, Epstein-Barr virus, and herpes simplex virus. The comparison shows that these DNA polymerases exhibit considerable amino acid homology and highlights a number of highly conserved regions; two such regions show homology to sequences within the adenovirus type 2 DNA polymerase. The HCMV pol gene is flanked by open reading frames with homology to those of other herpes viruses; upstream, there is a reading frame homologous to the glycoprotein B gene of herpes simplex virus type I and Epstein-Barr virus, and downstream there is a reading frame homologous to BFLF2 of Epstein-Barr virus

  20. The amino acid sequence of cytochrome c from Spinacea oleracea L. (spinach).

    Science.gov (United States)

    Brown, R H; Richardson, M; Scogin, R; Boulter, D

    1973-02-01

    The amino acid sequence of spinach (Spinacea oleracea L., var. Monster Viroflay) cytochrome c was determined on 1mumol of protein. The molecule consists of 111 residues and is homologous with other mitochondrial cytochromes c. Experimental details are given in a supplementary paper that has been deposited as Supplementary Publication SUP 50013, at the National Lending Library for Science and Technology, Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1973), 131, 5.

  1. Fast computational methods for predicting protein structure from primary amino acid sequence

    Science.gov (United States)

    Agarwal, Pratul Kumar [Knoxville, TN

    2011-07-19

    The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.

  2. Evolution of phosphagen kinase. VI. Isolation, characterization and cDNA-derived amino acid sequence of lombricine kinase from the earthworm Eisenia foetida, and identification of a possible candidate for the guanidine substrate recognition site.

    Science.gov (United States)

    Suzuki, T; Kawasaki, Y; Furukohri, T; Ellington, W R

    1997-12-05

    Lombricine kinase (LK) from the body wall muscle of the earthworm Eisenia foetida was purified to homogeneity. The enzyme was shown to be a dimer consisting of 40 kDa subunits. The cDNA-derived amino acid sequence of 370 residues of Eisenia LK was determined. The validity of the sequence was supported by chemical sequencing of internal tryptic peptides. This is the first reported lombricine kinase amino acid sequence. Alignment of Eisenia LK with those of creatine kinases (CKs), arginine kinases (AKs) and glycocyamine kinase (GK) suggested a region displaying remarkable amino acid deletions (referred to GS region), as a possible candidate for guanidine substrate recognition site. A phylogenetic analysis using amino acid sequences of all four phosphagen kinases indicates that CK, GK and LK probably evolved from a common immediate ancestor protein.

  3. Targeted DNA methylation analysis by next-generation sequencing.

    Science.gov (United States)

    Masser, Dustin R; Stanford, David R; Freeman, Willard M

    2015-02-24

    The role of epigenetic processes in the control of gene expression has been known for a number of years. DNA methylation at cytosine residues is of particular interest for epigenetic studies as it has been demonstrated to be both a long lasting and a dynamic regulator of gene expression. Efforts to examine epigenetic changes in health and disease have been hindered by the lack of high-throughput, quantitatively accurate methods. With the advent and popularization of next-generation sequencing (NGS) technologies, these tools are now being applied to epigenomics in addition to existing genomic and transcriptomic methodologies. For epigenetic investigations of cytosine methylation where regions of interest, such as specific gene promoters or CpG islands, have been identified and there is a need to examine significant numbers of samples with high quantitative accuracy, we have developed a method called Bisulfite Amplicon Sequencing (BSAS). This method combines bisulfite conversion with targeted amplification of regions of interest, transposome-mediated library construction and benchtop NGS. BSAS offers a rapid and efficient method for analysis of up to 10 kb of targeted regions in up to 96 samples at a time that can be performed by most research groups with basic molecular biology skills. The results provide absolute quantitation of cytosine methylation with base specificity. BSAS can be applied to any genomic region from any DNA source. This method is useful for hypothesis testing studies of target regions of interest as well as confirmation of regions identified in genome-wide methylation analyses such as whole genome bisulfite sequencing, reduced representation bisulfite sequencing, and methylated DNA immunoprecipitation sequencing.

  4. Identification of linear HLA class II amino acid sequences recognized by rabbit antisera against native molecules.

    Science.gov (United States)

    Chersi, A; Morganti, M C; Houghten, R; Chillemi, F; Muratti, E

    1987-12-01

    A rabbit was immunized with B lymphoblastoid cells, and subsets of the antibodies produced were isolated on affinity columns made from synthetic peptides corresponding to known amino acid sequences from the human class II antigens DQ and DP. Those peptides for which specific antibodies were isolated could be assumed to contribute to the antigenic properties of the intact antigen. The antibody subsets were tested for binding to synthetic peptides, to glycoprotein fractions isolated from cells with different DR and DQ specificities, and to the cells used for immunization of the rabbit. The isolation of those antibodies directed against well-defined amino acid stretches of the histocompatibility antigens is proof of the role of those regions in determining the antigenic properties of these molecules.

  5. Comparative Analysis of Microbial Diversity in Termite Gut and Termite Nest Using Ion Sequencing.

    Science.gov (United States)

    Manjula, Arumugam; Pushpanathan, Muthuirulan; Sathyavathi, Sundararaju; Gunasekaran, Paramasamy; Rajendhran, Jeyaprakash

    2016-03-01

    Termite gut and termite nest possess complex microbial communities. However, only limited information is available on the comparative investigation of termite gut- and nest-associated microbial communities. In the present study, we examined and compared the bacterial diversity of termite gut and their respective nest by high-throughput sequencing of V3 hypervariable region of 16S rDNA. A total of 14 barcoded libraries were generated from seven termite gut samples and their respective nest samples, and sequenced using Ion Torrent platform. The sequences of each group were pooled, which yielded 170,644 and 132,000 reads from termite gut and termite nest samples, respectively. Phylogenetic analysis revealed significant differences in the bacterial diversity and community structure between termite gut and termite nest samples. Phyla Verrucomicrobia and Acidobacteria were observed only in termite gut, whereas Synergistetes and Chlorobi were observed only in termite nest samples. These variations in microbial structure and composition could be attributed with the differences in physiological conditions prevailing in the termite gut (anoxic and alkaline) and termite nest (oxic, slightly acidic and rich in organic matter) environment. Overall, this study unmasked the complexity of bacterial population in the respective niche. Interestingly, majority of the sequence reads could be classified only up to the domain level indicating the presence of a huge number of uncultivable or unidentified novel bacterial species in both termite gut and nest samples. Whole metagenome sequencing and assessing the metabolic potential of these samples will be useful for biotechnological applications.

  6. Analysis of integrated human papillomavirus type 16 DNA in cervical cancers: amplification of viral sequences together with cellular flanking sequences.

    Science.gov (United States)

    Wagatsuma, M; Hashimoto, K; Matsukura, T

    1990-01-01

    We have isolated four clones of integrated human papillomavirus type 16 (HPV-16) DNA from four different primary cervical cancer specimens. All clones were found to be monomeric or dimeric forms of HPV-16 DNA with cellular flanking sequences at both ends. Analysis of the viral sequences in these clones showed that E6/E7 open reading frames and the long control region were conserved and that no region specific for the integration was detected. Analysis of the cellular flanking sequences revealed no significant homology with any known human DNA sequences, except Alu sequences, and no homology among the clones, indicating no cellular sequence specific for the integration. By probing with single-copy cellular flanking sequences from the clones, it was demonstrated that the integrated HPV-16 DNAs, with different sizes in the same specimens, shared the same cellular flanking sequences at the ends. Furthermore, it was shown that the viral sequences together with cellular flanking sequences were amplified. The possible process of viral integration into cell chromosomes in cervical cancer is discussed. Images PMID:2153245

  7. Detection and quantification of Plasmodium falciparum in blood samples using quantitative nucleic acid sequence-based amplification

    NARCIS (Netherlands)

    Schoone, G. J.; Oskam, L.; Kroon, N. C.; Schallig, H. D.; Omar, S. A.

    2000-01-01

    A quantitative nucleic acid sequence-based amplification (QT-NASBA) assay for the detection of Plasmodium parasites has been developed. Primers and probes were selected on the basis of the sequence of the small-subunit rRNA gene. Quantification was achieved by coamplification of the RNA in the

  8. Sequence-based analysis of the microbial composition of water kefir from multiple sources.

    Science.gov (United States)

    Marsh, Alan J; O'Sullivan, Orla; Hill, Colin; Ross, R Paul; Cotter, Paul D

    2013-11-01

    Water kefir is a water-sucrose-based beverage, fermented by a symbiosis of bacteria and yeast to produce a final product that is lightly carbonated, acidic and that has a low alcohol percentage. The microorganisms present in water kefir are introduced via water kefir grains, which consist of a polysaccharide matrix in which the microorganisms are embedded. We aimed to provide a comprehensive sequencing-based analysis of the bacterial population of water kefir beverages and grains, while providing an initial insight into the corresponding fungal population. To facilitate this objective, four water kefirs were sourced from the UK, Canada and the United States. Culture-independent, high-throughput, sequencing-based analyses revealed that the bacterial fraction of each water kefir and grain was dominated by Zymomonas, an ethanol-producing bacterium, which has not previously been detected at such a scale. The other genera detected were representatives of the lactic acid bacteria and acetic acid bacteria. Our analysis of the fungal component established that it was comprised of the genera Dekkera, Hanseniaspora, Saccharomyces, Zygosaccharomyces, Torulaspora and Lachancea. This information will assist in the ultimate identification of the microorganisms responsible for the potentially health-promoting attributes of these beverages. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  9. Multiple amino acid sequence alignment nitrogenase component 1: insights into phylogenetics and structure-function relationships.

    Directory of Open Access Journals (Sweden)

    James B Howard

    Full Text Available Amino acid residues critical for a protein's structure-function are retained by natural selection and these residues are identified by the level of variance in co-aligned homologous protein sequences. The relevant residues in the nitrogen fixation Component 1 α- and β-subunits were identified by the alignment of 95 protein sequences. Proteins were included from species encompassing multiple microbial phyla and diverse ecological niches as well as the nitrogen fixation genotypes, anf, nif, and vnf, which encode proteins associated with cofactors differing at one metal site. After adjusting for differences in sequence length, insertions, and deletions, the remaining >85% of the sequence co-aligned the subunits from the three genotypes. Six Groups, designated Anf, Vnf , and Nif I-IV, were assigned based upon genetic origin, sequence adjustments, and conserved residues. Both subunits subdivided into the same groups. Invariant and single variant residues were identified and were defined as "core" for nitrogenase function. Three species in Group Nif-III, Candidatus Desulforudis audaxviator, Desulfotomaculum kuznetsovii, and Thermodesulfatator indicus, were found to have a seleno-cysteine that replaces one cysteinyl ligand of the 8Fe:7S, P-cluster. Subsets of invariant residues, limited to individual groups, were identified; these unique residues help identify the gene of origin (anf, nif, or vnf yet should not be considered diagnostic of the metal content of associated cofactors. Fourteen of the 19 residues that compose the cofactor pocket are invariant or single variant; the other five residues are highly variable but do not correlate with the putative metal content of the cofactor. The variable residues are clustered on one side of the cofactor, away from other functional centers in the three dimensional structure. Many of the invariant and single variant residues were not previously recognized as potentially critical and their identification

  10. Citrate synthase gene sequence: a new tool for phylogenetic analysis and identification of Ehrlichia.

    Science.gov (United States)

    Inokuma, H; Brouqui, P; Drancourt, M; Raoult, D

    2001-09-01

    The sequence of the citrate synthase gene (gltA) of 13 ehrlichial species (Ehrlichia chaffeensis, Ehrlichia canis, Ehrlichia muris, an Ehrlichia species recently detected from Ixodes ovatus, Cowdria ruminantium, Ehrlichia phagocytophila, Ehrlichia equi, the human granulocytic ehrlichiosis [HGE] agent, Anaplasma marginale, Anaplasma centrale, Ehrlichia sennetsu, Ehrlichia risticii, and Neorickettsia helminthoeca) have been determined by degenerate PCR and the Genome Walker method. The ehrlichial gltA genes are 1,197 bp (E. sennetsu and E. risticii) to 1,254 bp (A. marginale and A. centrale) long, and GC contents of the gene vary from 30.5% (Ehrlichia sp. detected from I. ovatus) to 51.0% (A. centrale). The percent identities of the gltA nucleotide sequences among ehrlichial species were 49.7% (E. risticii versus A. centrale) to 99.8% (HGE agent versus E. equi). The percent identities of deduced amino acid sequences were 44.4% (E. sennetsu versus E. muris) to 99.5% (HGE agent versus E. equi), whereas the homology range of 16S rRNA genes was 83.5% (E. risticii versus the Ehrlichia sp. detected from I. ovatus) to 99.9% (HGE agent, E. equi, and E. phagocytophila). The architecture of the phylogenetic trees constructed by gltA nucleotide sequences or amino acid sequences was similar to that derived from the 16S rRNA gene sequences but showed more-significant bootstrap values. Based upon the alignment analysis of the ehrlichial gltA sequences, two sets of primers were designed to amplify tick-borne Ehrlichia and Neorickettsia genogroup Ehrlichia (N. helminthoeca, E. sennetsu, and E. risticii), respectively. Tick-borne Ehrlichia species were specifically identified by restriction fragment length polymorphism (RFLP) patterns of AcsI and XhoI with the exception of E. muris and the very closely related ehrlichia derived from I. ovatus for which sequence analysis of the PCR product is needed. Similarly, Neorickettsia genogroup Ehrlichia species were specifically identified by

  11. Swab-to-Sequence: Real-time Data Analysis Platform for the Biomolecule Sequencer

    Data.gov (United States)

    National Aeronautics and Space Administration — DNA was successfully sequenced on the ISS in 2016, but the DNA sequenced was prepared on the ground. With FY’16 IRAD funds, the same team developed a...

  12. Differentiation of sheep pox and goat poxviruses by sequence analysis and PCR-RFLP of P32 gene.

    Science.gov (United States)

    Hosamani, Madhusudan; Mondal, Bimalendu; Tembhurne, Prabhakar A; Bandyopadhyay, Santanu Kumar; Singh, Raj Kumar; Rasool, Thaha Jamal

    2004-08-01

    Sheep pox and Goat pox are highly contagious viral diseases of small ruminants. These diseases were earlier thought to be caused by a single species of virus, as they are serologically indistinguishable. P32, one of the major immunogenic genes of Capripoxvirus, was isolated and Sequenced from two Indian isolates of goat poxvirus (GPV) and a vaccine strain of sheep poxvirus (SPV). The sequences were compared with other P32 sequences of capripoxviruses available in the database. Sequence analysis revealed that sheep pox and goat poxviruses share 97.5 and 94.7% homology at nucleotide and amino acid level, respectively. A major difference between them is the presence of an additional aspartic acid at 55th position of P32 of sheep poxvirus that is absent in both goat poxvirus and lumpy skin disease virus. Further, six unique neutral nucleotide substitutions were observed at positions 77, 275, 403, 552, 867 and 964 in the sequence of goat poxvirus, which can be taken as GPV signature residues. Similar unique nucleotide signatures could be identified in SPV and LSDV sequences also. Phylogenetic analysis showed that members of the Capripoxvirus could be delineated into three distinct clusters of GPV, SPV and LSDV based on the P32 genomic sequence. Using this information, a PCR-RFLP method has been developed for unequivocal genomic differentiation of SPV and GPV.

  13. Sequence analysis of the N-acetyltransferase 2 gene (NAT2) among ...

    African Journals Online (AJOL)

    Yazun Bashir Jarrar

    2017-11-26

    Nov 26, 2017 ... bp open reading DNA sequence encoding 290 amino acids [8-10]. ... Two DNA frag- ments of NAT2 gene that represent the complete sequence of exon 2 of NAT2 gene were amplified using specific DNA primers (Table 1). The PCR condi ..... sequence from different species: human, chimpanzee,. Sumatran ...

  14. Cloning, nucleotide sequence and transcriptional analysis of the uvrA gene from Neisseria gonorrhoeae

    International Nuclear Information System (INIS)

    Black, C.G.; Fyfe, J.A.M.; Davies, J.K.

    1997-01-01

    A recombinant plasmid capable of restoring UV resistance to an Escherichia coli uvrA mutant was isolated from a genomic library of Neisseria gonorrhoeae. Sequence analysis revealed an open reading frame whose deduced amino acid sequence displayed significant similarity to those of the UvrA proteins of other bacterial species. A second open reading frame (ORF259) was identified upstream from, and in the opposite orientation to the gonococcal uvrA gene. Transcriptional fusions between portions of the gonococcal uvrA upstream region and a reporter gene were used to localise promoter activity in both E. coli and N. gonorrhoeae. The transcriptional starting points of uvrA and ORF259 were mapped in E. coli by primer extension analysis, and corresponding σ 70 promoters were identified. The arrangement of the uvrA-ORF259 intergenic region is similar to that of the gonococcal recA-aroD intergenic region. Both contain inverted copies of the 10 bp neisserial DNA uptake sequence situated between divergently transcribed genes. However, there is no evidence that either the uptake sequence or the proximity of the promoters influences expression of these genes. (author)

  15. Molecular cloning, sequence characteristics, and tissue expression analysis of ECE1 gene in Tibetan pig.

    Science.gov (United States)

    Wang, Yan-Dong; Zhang, Jian; Li, Chuan-Hao; Xu, Hai-Peng; Chen, Wei; Zeng, Yong-Qing; Wang, Hui

    2015-10-25

    Low air pressure and low oxygen partial pressure at high altitude seriously affect the survival and development of human beings and animals. ECE1 is a recently discovered gene that is involved in anti-hypoxia, but the full-length cDNA sequence has not been obtained. For a better understanding of the structure and function of the ECE1 gene and to study its effect in Tibetan pig, the cDNA of the ECE1 gene from the muscle of Tibetan pig was cloned, sequenced and characterized. The ECE1 full-length cDNA sequence consists of 2262 bp coding sequence (CDS) that encodes 753 amino acids with a molecular mass of 85,449 kD, 2 bp 5'UTR and 1507 bp 3'UTR. In addition, the phylogenetic tree analysis revealed that the Tibetan pig ECE1 has a closer genetic relationship and evolution distance with the land mammals ECE1. Furthermore, analysis by qPCR showed that the ECE1 transcript is constitutively expressed in the 10 tissues tested: the liver, subcutaneous fat, kidney, muscle, stomach, heart, brain, spleen, pancreas, and lung. These results serve as a foundation for further insight into the Tibetan pig ECE1 gene. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Parameters of proteome evolution from histograms of amino-acid sequence identities of paralogous proteins

    Directory of Open Access Journals (Sweden)

    Yan Koon-Kiu

    2007-11-01

    Full Text Available Abstract Background The evolution of the full repertoire of proteins encoded in a given genome is mostly driven by gene duplications, deletions, and sequence modifications of existing proteins. Indirect information about relative rates and other intrinsic parameters of these three basic processes is contained in the proteome-wide distribution of sequence identities of pairs of paralogous proteins. Results We introduce a simple mathematical framework based on a stochastic birth-and-death model that allows one to extract some of this information and apply it to the set of all pairs of paralogous proteins in H. pylori, E. coli, S. cerevisiae, C. elegans, D. melanogaster, and H. sapiens. It was found that the histogram of sequence identities p generated by an all-to-all alignment of all protein sequences encoded in a genome is well fitted with a power-law form ~ p-γ with the value of the exponent γ around 4 for the majority of organisms used in this study. This implies that the intra-protein variability of substitution rates is best described by the Gamma-distribution with the exponent α ≈ 0.33. Different features of the shape of such histograms allow us to quantify the ratio between the genome-wide average deletion/duplication rates and the amino-acid substitution rate. Conclusion We separately measure the short-term ("raw" duplication and deletion rates rdup∗ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemOCai3aa0baaSqaaiabbsgaKjabbwha1jabbchaWbqaaiabgEHiQaaaaaa@3283@, rdel∗ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemOCai3aa0baaSqaaiabbsga

  17. Pan-Cancer Analysis of Genomic Sequencing Among the Elderly.

    Science.gov (United States)

    Wahl, Daniel R; Nguyen, Paul L; Santiago, Maria; Yousefi, Kasra; Davicioni, Elai; Shumway, Dean A; Speers, Corey; Mehra, Rohit; Feng, Felix Y; Osborne, Joseph R; Spratt, Daniel E

    2017-07-15

    We hypothesized that elderly patients might have age-specific genetic abnormalities yet be underrepresented in currently available sequencing repositories, which could limit the effect of sequencing efforts for this population. Leveraging The Cancer Genome Atlas (TCGA) data portal, 9 tumor types were analyzed. The frequency distribution of cancer by age was determined and compared with Surveillance, Epidemiology, and End Results data. Using the estimated median somatic mutational frequency of each tumor type, the samples needed beyond TCGA to detect a 10% mutational frequency were calculated. Microarray data from a separate prospective cohort were obtained from primary prostatectomy samples to determine whether elderly-specific transcriptomic alterations could be identified. Of the 5236 TCGA samples, 73% were from patients aged elderly patients with cancer were likely to harbor age-specific molecular abnormalities, we accessed transcriptomic data from a separate, larger database of >2000 prostate cancer samples. That analysis revealed significant differences in the expression of 10 genes in patients aged ≥70 years compared with those Elderly patients have been underrepresented in genomic sequencing studies. Our data suggest the presence of elderly-specific molecular alterations. Further dedicated efforts to understand the biology of cancer among the elderly will be important moving forward. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. Linear discriminant analysis of character sequences using occurrences of words

    KAUST Repository

    Dutta, Subhajit

    2014-02-01

    Classification of character sequences, where the characters come from a finite set, arises in disciplines such as molecular biology and computer science. For discriminant analysis of such character sequences, the Bayes classifier based on Markov models turns out to have class boundaries defined by linear functions of occurrences of words in the sequences. It is shown that for such classifiers based on Markov models with unknown orders, if the orders are estimated from the data using cross-validation, the resulting classifier has Bayes risk consistency under suitable conditions. Even when Markov models are not valid for the data, we develop methods for constructing classifiers based on linear functions of occurrences of words, where the word length is chosen by cross-validation. Such linear classifiers are constructed using ideas of support vector machines, regression depth, and distance weighted discrimination. We show that classifiers with linear class boundaries have certain optimal properties in terms of their asymptotic misclassification probabilities. The performance of these classifiers is demonstrated in various simulated and benchmark data sets.

  19. Integrated visual analysis of protein structures, sequences, and feature data.

    Science.gov (United States)

    Stolte, Christian; Sabir, Kenneth S; Heinrich, Julian; Hammang, Christopher J; Schafferhans, Andrea; O'Donoghue, Seán I

    2015-01-01

    To understand the molecular mechanisms that give rise to a protein's function, biologists often need to (i) find and access all related atomic-resolution 3D structures, and (ii) map sequence-based features (e.g., domains, single-nucleotide polymorphisms, post-translational modifications) onto these structures. To streamline these processes we recently developed Aquaria, a resource offering unprecedented access to protein structure information based on an all-against-all comparison of SwissProt and PDB sequences. In this work, we provide a requirements analysis for several frequently occuring tasks in molecular biology and describe how design choices in Aquaria meet these requirements. Finally, we show how the interface can be used to explore features of a protein and gain biologically meaningful insights in two case studies conducted by domain experts. The user interface design of Aquaria enables biologists to gain unprecedented access to molecular structures and simplifies the generation of insight. The tasks involved in mapping sequence features onto structures can be conducted easier and faster using Aquaria.

  20. Isolation and amino acid sequences of squirrel monkey (Saimiri sciurea) insulin and glucagon

    Energy Technology Data Exchange (ETDEWEB)

    Yu, Jinghua (Veterans Administration Medical Center, Bronx, NY (United States)); Eng, J.; Yalow, R.S. (Veterans Administration Medical Center, Bronx, NY (United States) City Univ. of New York, NY (United States))

    1990-12-01

    It was reported two decades ago that insulin was not detectable in the glucose-stimulated state in Saimiri sciurea, the New World squirrel monkey, by a radioimmunoassay system developed with guinea pig anti-pork insulin antibody and labeled park insulin. With the same system, reasonable levels were observed in rhesus monkeys and chimpanzees. This suggested that New World monkeys, like the New World hystricomorph rodents such as the guinea pig and the coypu, might have insulins whose sequences differ markedly from those of Old World mammals. In this report the authors describe the purification and amino acid sequences of squirrel monkey insulin and glucagon. They demonstrate that the substitutions at B29, B27, A2, A4, and A17 of squirrel monkey insulin are identical with those previously found in another New World primate, the owl monkey (Aotus trivirgatus). The immunologic cross-reactivity of this insulin in their immunoassay system is only a few percent of that of human insulin. It appears that the peptides of the New World monkeys have diverged less from those of the Old World mammals than have those of the New World hystricomorph rodents. The striking improvements in peptide purification and sequencing have the potential for adding new information concerning the evolutionary divergence of species.

  1. Determining physical constraints in transcriptional initiationcomplexes using DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Shultzaberger, Ryan K.; Chiang, Derek Y.; Moses, Alan M.; Eisen,Michael B.

    2007-07-01

    Eukaryotic gene expression is often under the control ofcooperatively acting transcription factors whose binding is limited bystructural constraints. By determining these structural constraints, wecan understand the "rules" that define functional cooperativity.Conversely, by understanding the rules of binding, we can inferstructural characteristics. We have developed an information theory basedmethod for approximating the physical limitations of cooperativeinteractions by comparing sequence analysis to microarray expressiondata. When applied to the coordinated binding of the sulfur amino acidregulatory protein Met4 by Cbf1 and Met31, we were able to create acombinatorial model that can correctly identify Met4 regulatedgenes.

  2. Evolution of amino acid metabolism inferred through cladistic analysis.

    Science.gov (United States)

    Cunchillos, Chomin; Lecointre, Guillaume

    2003-11-28

    Because free amino acids were most probably available in primitive abiotic environments, their metabolism is likely to have provided some of the very first metabolic pathways of life. What were the first enzymatic reactions to emerge? A cladistic analysis of metabolic pathways of the 16 aliphatic amino acids and 2 portions of the Krebs cycle was performed using four criteria of homology. The analysis is not based on sequence comparisons but, rather, on coding similarities in enzyme properties. The properties used are shared specific enzymatic activity, shared enzymatic function without substrate specificity, shared coenzymes, and shared functional family. The tree shows that the earliest pathways to emerge are not portions of the Krebs cycle but metabolisms of aspartate, asparagine, glutamate, and glutamine. The views of Horowitz (Horowitz, N. H. (1945) Proc. Natl. Acad. Sci. U. S. A. 31, 153-157) and Cordón (Cordón, F. (1990) Tratado Evolucionista de Biologia, Aguilar, Madrid, Spain), according to which the upstream reactions in the catabolic pathways and the downstream reactions in the anabolic pathways are the earliest in evolution, are globally corroborated; however, with some exceptions. These are due to later opportunistic connections of pathways (actually already suggested by these authors). Earliest enzymatic functions are mostly catabolic; they were deaminations, transaminations, and decarboxylations. From the consensus tree we extracted four time spans for amino acid metabolism development. For some amino acids catabolism and biosynthesis occurred at the same time (Asp, Glu, Lys, Leu, Ala, Val, Ile, Pro, Arg). For others ultimate reactions that use amino acids as a substrate or as a product are distinct in time, with catabolism preceding anabolism for Asn, Gln, and Cys and anabolism preceding catabolism for Ser, Met, and Thr. Cladistic analysis of the structure of biochemical pathways makes hypotheses in biochemical evolution explicit and parsimonious.

  3. Influence of the Amino Acid Sequence on Protein-Mineral Interactions in Soil

    Science.gov (United States)

    Chacon, S. S.; Reardon, P. N.; Purvine, S.; Lipton, M. S.; Washton, N.; Kleber, M.

    2017-12-01

    The intimate associations between protein and mineral surfaces have profound impacts on nutrient cycling in soil. Proteins are an important source of organic C and N, and a subset of proteins, extracellular enzymes (EE), can catalyze the depolymerization of soil organic matter (SOM). Our goal was to determine how variation in the amino acid sequence could influence a protein's susceptibility to become chemically altered by mineral surfaces to infer the fate of adsorbed EE function in soil. We hypothesized that (1) addition of charged amino acids would enhance the adsorption onto oppositely charged mineral surfaces (2) addition of aromatic amino acids would increase adsorption onto zero charged surfaces (3) Increase adsorption of modified proteins would enhance their susceptibility to alterations by redox active minerals. To test these hypotheses, we generated three engineered proxies of a model protein Gb1 (IEP 4.0, 6.2 kDA) by inserting either negatively charged, positively charged or aromatic amino acids in the second loop. These modified proteins were allowed to interact with functionally different mineral surfaces (goethite, montmorillonite, kaolinite and birnessite) at pH 5 and 7. We used LC-MS/MS and solution-state Heteronuclear Single Quantum Coherence Spectroscopy NMR to observe modifications on engineered proteins as a consequence to mineral interactions. Preliminary results indicate that addition of any amino acids to a protein increase its susceptibility to fragmentation and oxidation by redox active mineral surfaces, and alter adsorption to the other mineral surfaces. This suggest that not all mineral surfaces in soil may act as sorbents for EEs and chemical modification of their structure should also be considered as an explanation for decrease in EE activity. Fragmentation of proteins by minerals can bypass the need to produce proteases, but microbial acquisition of other nutrients that require enzymes such as cellulases, ligninases or phosphatases

  4. [Cloning and sequence analysis of thioredoxin peroxidase gene from Taenia multiceps].

    Science.gov (United States)

    Li, Yong-guang; Li, Wen-hui; Gai, Wen-yan; Yao, Ju-xia; Qu, Zi-gang; Jia, Wan-zhong; Radu, Blaga; Fu, Bao-quan

    2011-02-28

    Protoscoleces of Taenia multiceps were collected from the naturally infected sheep and total RNA was extracted. Specific primers were designed according to TaHe2-D11 mRNA sequence and T. multiceps thioredoxin peroxidase gene (TmTPx) was amplified by RT-PCR. PCR products were ligated into pMD18-T vector and transformed to E. coli DH5alpha. The recombinant plasmids were identified by restriction digestion and sequencing. A 614 bp cDNA was amplified. The TmTPx open reading frame (591 bp) encoded a 196-amino acid protein with Mr 21,690, pI 7.61. Bioinformatics analysis indicated that TmTPx had a typical 2-Cys Prx conserved domain. Phylogenetic tree revealed that T. multiceps had the closest relationship to T. asiatica, followed by T. solium and T. crassiceps, E. granulosus and E. multilocularis.

  5. De novo transcriptome assembly of Zanthoxylum bungeanum using Illumina sequencing for evolutionary analysis and simple sequence repeat marker development

    OpenAIRE

    Feng, Shijing; Zhao, Lili; Liu, Zhenshan; Liu, Yulin; Yang, Tuxi; Wei, Anzhi

    2017-01-01

    Zanthoxylum, an ancient economic crop in Asia, has a satisfying aromatic taste and immense medicinal values. A lack of genomic information and genetic markers has limited the evolutionary analysis and genetic improvement of Zanthoxylum species and their close relatives. To better understand the evolution, domestication, and divergence of Zanthoxylum, we present a de novo transcriptome analysis of an elite cultivar of Z. bungeanum using Illumina sequencing; we then developed simple sequence re...

  6. Complete Genome Sequence of Pseudomonas Parafulva PRS09-11288, a Biocontrol Strain Produces the Antibiotic Phenazine-1-carboxylic Acid.

    Science.gov (United States)

    Zhang, Yu; Chen, Ping; Ye, Guoyou; Lin, Haiyan; Ren, Deyong; Guo, Longbiao; Zhu, Bo; Wang, Zhongwei

    2018-01-22

    Rhizoctonia solani is a plant pathogenic fungus, which can infect a wide range of economic crops including rice. In this case, biological control of this pathogen is one of the fundmental way to effectively control this pathogen. The Pseudomonas parafulva strain PRS09-11288 was isolated from rice rhizosphere and shows biocontrol ability against R. solani. Here, we analyzed the P. parafulva genome, which is ~ 4.7 Mb, with 4310 coding sequences, 76 tRNAs, and 7 rRNAs. Genome analysis identified a phenazine biosynthetic pathway, which can produce antibiotic phenazine-1-carboxylic acid (PCA). This compound is responsible for biocontrol ability against R. solani Kühn, which is one of the most serious fungus disease on rice. Analysis of the phenazine biosynthesis gene mutant, ΔphzF, which is very important in this pathway, confirmed the relationship between the pathway and PCA production using LC-MS profiles. The annotated full genome sequence of this strain sheds light on the role of P. parafulva PRS09-11288 as a biocontrol bacterium.

  7. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Science.gov (United States)

    Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461

  8. Next-Generation Sequence Analysis of Cancer Xenograft Models

    Science.gov (United States)

    Rossello, Fernando J.; Tothill, Richard W.; Britt, Kara; Marini, Kieren D.; Falzon, Jeanette; Thomas, David M.; Peacock, Craig D.; Marchionni, Luigi; Li, Jason; Bennett, Samara; Tantoso, Erwin; Brown, Tracey; Chan, Philip; Martelotto, Luciano G.; Watkins, D. Neil

    2013-01-01

    Next-generation sequencing (NGS) studies in cancer are limited by the amount, quality and purity of tissue samples. In this situation, primary xenografts have proven useful preclinical models. However, the presence of mouse-derived stromal cells represents a technical challenge to their use in NGS studies. We examined this problem in an established primary xenograft model of small cell lung cancer (SCLC), a malignancy often diagnosed from small biopsy or needle aspirate samples. Using an in silico strategy that assign reads according to species-of-origin, we prospectively compared NGS data from primary xenograft models with matched cell lines and with published datasets. We show here that low-coverage whole-genome analysis demonstrated remarkable concordance between published genome data and internal controls, despite the presence of mouse genomic DNA. Exome capture sequencing revealed that this enrichment procedure was highly species-specific, with less than 4% of reads aligning to the mouse genome. Human-specific expression profiling with RNA-Seq replicated array-based gene expression experiments, whereas mouse-specific transcript profiles correlated with published datasets from human cancer stroma. We conclude that primary xenografts represent a useful platform for complex NGS analysis in cancer research for tumours with limited sample resources, or those with prominent stromal cell populations. PMID:24086345

  9. Next-generation sequence analysis of cancer xenograft models.

    Directory of Open Access Journals (Sweden)

    Fernando J Rossello

    Full Text Available Next-generation sequencing (NGS studies in cancer are limited by the amount, quality and purity of tissue samples. In this situation, primary xenografts have proven useful preclinical models. However, the presence of mouse-derived stromal cells represents a technical challenge to their use in NGS studies. We examined this problem in an established primary xenograft model of small cell lung cancer (SCLC, a malignancy often diagnosed from small biopsy or needle aspirate samples. Using an in silico strategy that assign reads according to species-of-origin, we prospectively compared NGS data from primary xenograft models with matched cell lines and with published datasets. We show here that low-coverage whole-genome analysis demonstrated remarkable concordance between published genome data and internal controls, despite the presence of mouse genomic DNA. Exome capture sequencing revealed that this enrichment procedure was highly species-specific, with less than 4% of reads aligning to the mouse genome. Human-specific expression profiling with RNA-Seq replicated array-based gene expression experiments, whereas mouse-specific transcript profiles correlated with published datasets from human cancer stroma. We conclude that primary xenografts represent a useful platform for complex NGS analysis in cancer research for tumours with limited sample resources, or those with prominent stromal cell populations.

  10. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    STORAGESEVER

    2010-07-19

    Jul 19, 2010 ... sequences from Japan are grouped into same cluster in the phylogenetic tree. Sequence comparison and phylogenetic ..... Tree was generated by Neighbor joining algorithm. Boot strap values are shown ... Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence ...

  11. Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics

    Directory of Open Access Journals (Sweden)

    Galtier Nicolas

    2006-04-01

    Full Text Available Abstract Background A large number of bioinformatics applications in the fields of bio-sequence analysis, molecular evolution and population genetics typically share input/ouput methods, data storage requirements and data analysis algorithms. Such common features may be conveniently bundled into re-usable libraries, which enable the rapid development of new methods and robust applications. Results We present Bio++, a set of Object Oriented libraries written in C++. Available components include classes for data storage and handling (nucleotide/amino-acid/codon sequences, trees, distance matrices, population genetics datasets, various input/output formats, basic sequence manipulation (concatenation, transcription, translation, etc., phylogenetic analysis (maximum parsimony, markov models, distance methods, likelihood computation and maximization, population genetics/genomics (diversity statistics, neutrality tests, various multi-locus analyses and various algorithms for numerical calculus. Conclusion Implementation of methods aims at being both efficient and user-friendly. A special concern was given to the library design to enable easy extension and new methods development. We defined a general hierarchy of classes that allow the developer to implement its own algorithms while remaining compatible with the rest of the libraries. Bio++ source code is distributed free of charge under the CeCILL general public licence from its website http://kimura.univ-montp2.fr/BioPP.

  12. Massively parallel sequencing and analysis of the Necator americanus transcriptome.

    Directory of Open Access Journals (Sweden)

    Cinzia Cantacessi

    2010-05-01

    Full Text Available The blood-feeding hookworm Necator americanus infects hundreds of millions of people worldwide. In order to elucidate fundamental molecular biological aspects of this hookworm, the transcriptome of the adult stage of Necator americanus was explored using next-generation sequencing and bioinformatic analyses.A total of 19,997 contigs were assembled from the sequence data; 6,771 of these contigs had known orthologues in the free-living nematode Caenorhabditis elegans, and most of them encoded proteins with WD40 repeats (10.6%, proteinase inhibitors (7.8% or calcium-binding EF-hand proteins (6.7%. Bioinformatic analyses inferred that the C. elegans homologues are involved mainly in biological pathways linked to ribosome biogenesis (70%, oxidative phosphorylation (63% and/or proteases (60%; most of these molecules were predicted to be involved in more than one biological pathway. Comparative analyses of the transcriptomes of N. americanus and the canine hookworm, Ancylostoma caninum, revealed qualitative and quantitative differences. For instance, proteinase inhibitors were inferred to be highly represented in the former species, whereas SCP/Tpx-1/Ag5/PR-1/Sc7 proteins ( = SCP/TAPS or Ancylostoma-secreted proteins were predominant in the latter. In N. americanus, essential molecules were predicted using a combination of orthology mapping and functional data available for C. elegans. Further analyses allowed the prioritization of 18 predicted drug targets which did not have homologues in the human host. These candidate targets were inferred to be linked to mitochondrial (e.g., processing proteins or amino acid metabolism (e.g., asparagine t-RNA synthetase.This study has provided detailed insights into the transcriptome of the adult stage of N. americanus and examines similarities and differences between this species and A. caninum. Future efforts should focus on comparative transcriptomic and proteomic investigations of the other predominant human

  13. A novel method for comparative analysis of DNA sequences by Ramanujan-Fourier transform.

    Science.gov (United States)

    Yin, Changchuan; Yin, Xuemeng E; Wang, Jiasong

    2014-12-01

    Alignment-free sequence analysis approaches provide important alternatives over multiple sequence alignment (MSA) in biological sequence analysis because alignment-free approaches have low computation complexity and are not dependent on high level of sequence identity. However, most of the existing alignment-free methods do not employ true full information content of sequences and thus can not accurately reveal similarities and differences among DNA sequences. We present a novel alignment-free computational method for sequence analysis based on Ramanujan-Fourier transform (RFT), in which complete information of DNA sequences is retained. We represent DNA sequences as four binary indicator sequences and apply RFT on the indicator sequences to convert them into frequency domain. The Euclidean distance of the complete RFT coefficients of DNA sequences are used as similarity measures. To address the different lengths of RFT coefficients in Euclidean space, we pad zeros to short DNA binary sequences so that the binary sequences equal the longest length in the comparison sequence data. Thus, the DNA sequences are compared in the same dimensional frequency space without information loss. We demonstrate the usefulness of the proposed method by presenting experimental results on hierarchical clustering of genes and genomes. The proposed method opens a new channel to biological sequence analysis, classification, and structural module identification.

  14. Sequence analysis of dolphin ferritin H and L subunits and possible iron-dependent translational control of dolphin ferritin gene

    Directory of Open Access Journals (Sweden)

    Sasaki Yukako

    2008-10-01

    Full Text Available Abstract Background Iron-storage protein, ferritin plays a central role in iron metabolism. Ferritin has dual function to store iron and segregate iron for protection of iron-catalyzed reactive oxygen species. Tissue ferritin is composed of two kinds of subunits (H: heavy chain or heart-type subunit; L: light chain or liver-type subunit. Ferritin gene expression is controlled at translational level in iron-dependent manner or at transcriptional level in iron-independent manner. However, sequencing analysis of marine mammalian ferritin subunits has not yet been performed fully. The purpose of this study is to reveal cDNA-derived amino acid sequences of cetacean ferritin H and L subunits, and demonstrate the possibility of expression of these subunits, especially H subunit, by iron. Methods Sequence analyses of cetacean ferritin H and L subunits were performed by direct sequencing of polymerase chain reaction (PCR fragments from cDNAs generated via reverse transcription-PCR of leukocyte total RNA prepared from blood samples of six different dolphin species (Pseudorca crassidens, Lagenorhynchus obliquidens, Grampus griseus, Globicephala macrorhynchus, Tursiops truncatus, and Delphinapterus leucas. The putative iron-responsive element sequence in the 5'-untranslated region of the six different dolphin species was revealed by direct sequencing of PCR fragments obtained using leukocyte genomic DNA. Results Dolphin H and L subunits consist of 182 and 174 amino acids, respectively, and amino acid sequence identities of ferritin subunits among these dolphins are highly conserved (H: 99–100%, (99→98 ; L: 98–100%. The conserved 28 bp IRE sequence was located -144 bp upstream from the initiation codon in the six different dolphin species. Conclusion These results indicate that six different dolphin species have conserved ferritin sequences, and suggest that these genes are iron-dependently expressed.

  15. Analysis of perfluorocarboxylic acids in air.

    Science.gov (United States)

    Miller, James; Flaherty, John; Wille, Roice; Buck, Warren; Morandi, Francesco; Isemura, Tsuguhide

    2007-03-01

    Perfluorocarboxylic acids have been used in limited but important applications, such as in the production of fluoroelastomers and as components of fire-fighting foams. A method for the measurement of perfluorocarboxylic acids of different chain lengths in air is presented. Chain lengths tested included those having 8, 9, 10, 11, and 13 carbon atoms. Occupational Safety and Health Administration (OSHA) Versatile Sampler airtubes with glass fiber filter, XAD-4 resin, and polyurethane foam were fortified with analyte at levels corresponding to 0.1, 0.5, 1.0, and 2.0 times the level representing a 0.01 mg/m3 analyte air concentration sampled for 8-hour period with an airflow rate of 1 L/min. Airtube sections were extracted with acetone. Analysis of analyte in airtube fraction extracts was by HPLC/MS using ammonium acetate in water and methanol mobile phase and single ion monitoring for quantification. The lower limit of quantification for the HPLC/MS was set at 0.0001 mg/m3 for each analyte. Total recoveries of analytes within National Institute for Occupational Safety and Health (NIOSH) guidelines of +/- 25% of the true value were obtained at three different atmospheric conditions: ambient, cool/dry, and warm/humid conditions. The maximum loading with acceptable recovery of each analyte was 9600 ng (48000 ng total) for the ambient and warm/humid conditions. The maximum loading with acceptable recovery was 4800 ng (24000 ng total) for the cool/dry condition. The ammonium salt of the eight-carbon perfluorocarboxylic acid, ammonium perfluorooctanoate, was similar to the parent acid in behavior at the ambient condition. Hydrohexadecafluorononanoic acid was tested as a possible surrogate standard for the method, and it was found unsuitable for use because of variable recoveries. Stability studies were completed, and the perfluorocarboxylic acid analytes were stable in airtube samplers for up to 14 days at either ambient or refrigerated condition.

  16. Statistically significant dependence of the Xaa-Pro peptide bond conformation on secondary structure and amino acid sequence

    Directory of Open Access Journals (Sweden)

    Leitner Dietmar

    2005-04-01

    Full Text Available Abstract Background A reliable prediction of the Xaa-Pro peptide bond conformation would be a useful tool for many protein structure calculation methods. We have analyzed the Protein Data Bank and show that the combined use of sequential and structural information has a predictive value for the assessment of the cis versus trans peptide bond conformation of Xaa-Pro within proteins. For the analysis of the data sets different statistical methods such as the calculation of the Chou-Fasman parameters and occurrence matrices were used. Furthermore we analyzed the relationship between the relative solvent accessibility and the relative occurrence of prolines in the cis and in the trans conformation. Results One of the main results of the statistical investigations is the ranking of the secondary structure and sequence information with respect to the prediction of the Xaa-Pro peptide bond conformation. We observed a significant impact of secondary structure information on the occurrence of the Xaa-Pro peptide bond conformation, while the sequence information of amino acids neighboring proline is of little predictive value for the conformation of this bond. Conclusion In this work, we present an extensive analysis of the occurrence of the cis and trans proline conformation in proteins. Based on the data set, we derived patterns and rules for a possible prediction of the proline conformation. Upon adoption of the Chou-Fasman parameters, we are able to derive statistically relevant correlations between the secondary structure of amino acid fragments and the Xaa-Pro peptide bond conformation.

  17. Sequence analysis of the ORF 7 region of transmissible gastroenteritis viruses isolated in Korea.

    Science.gov (United States)

    Park, Jeong Ho; Han, Jeong Hee; Kwon, Hyuk Moo

    2008-02-01

    Three (KT2, 133, and DAE) transmissible gastroenteritis viruses (TGEVs) were isolated from pigs suspected of having TGE in Korea. One, KT2 (KT2-L), was passaged 128 times (KT2-H) in swine testicular cells. The open reading frame 7 (ORF 7) gene from each of the four TGEVs (KT2-L, KT2-H, 133, and DAE), which is located at the 3' end of the TGEV genome, was amplified by reverse transcriptase-polymerase chain reaction (RT-PCR). Amplified PCR products were cloned, sequenced, and compared with published sequences of non-Korean TGEV strains. Differences in replication and cytopathic effect (CPE) between the KT2-L and KT2-H strains in swine testicular cells were investigated. Korean TGEV field strains had 94.8-99.6% nucleotide and 92.1-98.7% amino acid sequence similarity with each other, and 87.8-100.0% nucleotide and 84.2-100.0% amino acid sequence similarity with non-Korean TGEV strains. Compared to the original KT2-L strain, the KT2-H strain differed by 2.2 and 3.9% in nucleotide and amino acid sequences, respectively. Specifically, the KT2-H had six nucleotide and two amino acid deletions compared to the original KT2-L strain. In phylogenetic analysis of the ORF 7 gene, Korean TGEV strains were clustered into two groups. One group (KT2-L, KT2-H, 133) was related to TGEV strains isolated in Japan. Another Korean TGEV isolate (DAE) was related to a strain from China and one from the USA. The Korean TGEV isolates appear to have evolved from a separate lineage of TGEV strain. Differences in growth rate and CPE between the KT2-L and KT2-H strains were discovered in swine testicular cells (STCs). The KT2-H strain exhibited a higher replication rate than KT2-L and produced a CPE distinctly different from that of the KT2-L strain.

  18. Immunoreactivity of polyclonal antibodies generated against the carboxy terminus of the predicted amino acid sequence of the Huntington disease gene

    Energy Technology Data Exchange (ETDEWEB)

    Alkatib, G.; Graham, R.; Pelmear-Telenius, A. [Univ. of Britisch Columbia, Vancouver (Canada)] [and others

    1994-09-01

    A cDNA fragment spanning the 3{prime}-end of the Huntington disease gene (from 8052 to 9252) was cloned into a prokaryotic expression vector containing the E. Coli lac promoter and a portion of the coding sequence for {beta}-galactosidase. The truncated {beta}-galactosidase gene was cleaved with BamHl and fused in frame to the BamHl fragment of the Huntington disease gene 3{prime}-end. Expression analysis of proteins made in E. Coli revealed that 20-30% of the total cellular proteins was represented by the {beta}-galactosidase-huntingtin fusion protein. The identity of the Huntington disease protein amino acid sequences was confirmed by protein sequence analysis. Affinity chromatography was used to purify large quantities of the fusion protein from bacterial cell lysates. Affinity-purified proteins were used to immunize New Zealand white rabbits for antibody production. The generated polyclonal antibodies were used to immunoprecipitate the Huntington disease gene product expressed in a neuroblastoma cell line. In this cell line the antibodies precipitated two protein bands of apparent gel migrations of 200 and 150 kd which together, correspond to the calculated molecular weight of the Huntington disease gene product (350 kd). Immunoblotting experiments revealed the presence of a large precursor protein in the range of 350-750 kd which is in agreement with the predicted molecular weight of the protein without post-translational modifications. These results indicate that the huntingtin protein is cleaved into two subunits in this neuroblastoma cell line and implicate that cleavage of a large precursor protein may contribute to its biological activity. Experiments are ongoing to determine the precursor-product relationship and to examine the synthesis of the huntingtin protein in freshly isolated rat brains, and to determine cellular and subcellular distribution of the gene product.

  19. Sequence analysis of L RNA of Lassa virus

    International Nuclear Information System (INIS)

    Vieth, Simon; Torda, Andrew E.; Asper, Marcel; Schmitz, Herbert; Guenther, Stephan

    2004-01-01

    The L RNA of three Lassa virus strains originating from Nigeria, Ghana/Ivory Coast, and Sierra Leone was sequenced and the data subjected to structure predictions and phylogenetic analyses. The L gene products had 2218-2221 residues, diverged by 18% at the amino acid level, and contained several conserved regions. Only one region of 504 residues (positions 1043-1546) could be assigned a function, namely that of an RNA polymerase. Secondary structure predictions suggest that this domain is very similar to RNA-dependent RNA polymerases of known structure encoded by plus-strand RNA viruses, permitting a model to be built. Outside the polymerase region, there is little structural data, except for regions of strong alpha-helical content and probably a coiled-coil domain at the N terminus. No evidence for reassortment or recombination during Lassa virus evolution was found. The secondary structure-assisted alignment of the RNA polymerase region permitted a reliable reconstruction of the phylogeny of all negative-strand RNA viruses, indicating that Arenaviridae are most closely related to Nairoviruses. In conclusion, the data provide a basis for structural and functional characterization of the Lassa virus L protein and reveal new insights into the phylogeny of negative-strand RNA viruses

  20. An Analysis of Loss of Offsite Power Sequence for the Severe Accident Analysis Database (II)

    Energy Technology Data Exchange (ETDEWEB)

    Park, Soo Yong; Kim, Dong Ha

    2006-12-15

    This report contains analysis methodologies and calculation results of loss of offsite sequences for the severe accident analysis database system. The Korean standard nuclear power plant has been selected as a reference plant. Based on the probabilistic safety analysis of the corresponding plant, Twelve accident scenarios, which was predicted to have more than 10-10 /ry occurrence frequency, have been analyzed as base cases for the loss of offsite sequence database. The functions of the severe accident analysis database system will be to make a diagnosis of the accident by some input information from the plant symptoms, to search a corresponding scenario, and finally to provide the user phenomenological information based on the pre-analyzed results. The MAAP 4.06 calculation results of loss of offsite sequence in this report will be utilized as input data of the severe accident analysis database system. This report updates and complements a previously published Technical Report.

  1. Succession sequence of lactic acid bacteria driven by environmental factors and substrates throughout the brewing process of Shanxi aged vinegar.

    Science.gov (United States)

    Zheng, Yu; Mou, Jun; Niu, Jiwei; Yang, Shuai; Chen, Lin; Xia, Menglei; Wang, Min

    2018-03-01

    Lactic acid bacteria (LAB) are essential microbiota for the fermentation and flavor formation of Shanxi aged vinegar, a famous Chinese traditional cereal vinegar that is manufactured using open solid-state fermentation (SSF) technology. However, the dynamics of LAB in this SSF process and the underlying mechanism remain poorly understood. Here, the diversity of LAB and the potential driving factors of the entire process were analyzed by combining culture-independent and culture-dependent methods. Canonical correlation analysis indicated that ethanol, acetic acid, and temperature that result from the metabolism of microorganisms serve as potential driving factors for LAB succession. LAB strains were periodically isolated, and the characteristics of 57 isolates on environmental factor tolerance and substrate utilization were analyzed to understand the succession sequence. The environmental tolerance of LAB from different stages was in accordance with their fermentation conditions. Remarkable correlations were identified between LAB growth and environmental factors with 0.866 of ethanol (70 g/L), 0.756 of acetic acid (10 g/L), and 0.803 of temperature (47 °C). More gentle or harsh environments (less or more than 60 or 80 g/L of ethanol, 5 or 20 g/L of acetic acid, and 30 or 55 °C temperature) did not affect the LAB succession. The utilization capability evaluation of the 57 isolates for 95 compounds proved that strains from different fermentation stages exhibited different predilections on substrates to contribute to the fermentation at different stages. Results demonstrated that LAB succession in the SSF process was driven by the capabilities of environmental tolerance and substrate utilization.

  2. Multilocus sequence analysis for Leishmania braziliensis outbreak investigation.

    Directory of Open Access Journals (Sweden)

    Mariel A Marlow

    2014-02-01

    Full Text Available With the emergence of leishmaniasis in new regions around the world, molecular epidemiological methods with adequate discriminatory power, reproducibility, high throughput and inter-laboratory comparability are needed for outbreak investigation of this complex parasitic disease. As multilocus sequence analysis (MLSA has been projected as the future gold standard technique for Leishmania species characterization, we propose a MLSA panel of six housekeeping gene loci (6pgd, mpi, icd, hsp70, mdhmt, mdhnc for investigating intraspecific genetic variation of L. (Viannia braziliensis strains and compare the resulting genetic clusters with several epidemiological factors relevant to outbreak investigation. The recent outbreak of cutaneous leishmaniasis caused by L. (V. braziliensis in the southern Brazilian state of Santa Catarina is used to demonstrate the applicability of this technique. Sequenced fragments from six genetic markers from 86 L. (V. braziliensis strains from twelve Brazilian states, including 33 strains from Santa Catarina, were used to determine clonal complexes, genetic structure, and phylogenic networks. Associations between genetic clusters and networks with epidemiological characteristics of patients were investigated. MLSA revealed epidemiological patterns among L. (V. braziliensis strains, even identifying strains from imported cases among the Santa Catarina strains that presented extensive homogeneity. Evidence presented here has demonstrated MLSA possesses adequate discriminatory power for outbreak investigation, as well as other potential uses in the molecular epidemiology of leishmaniasis.

  3. Sequencing Infrastructure Investments under Deep Uncertainty Using Real Options Analysis

    Directory of Open Access Journals (Sweden)

    Nishtha Manocha

    2018-02-01

    Full Text Available The adaptation tipping point and adaptation pathway approach developed to make decisions under deep uncertainty do not shed light on which among the multiple available pathways should be chosen as the preferred pathway. This creates the need to extend these approaches by means of suitable tools that can help sequence actions and subsequently enable the outlining of relevant policies. This paper presents two sequencing approaches, namely, the “Build to Target” and “Build Up” approach, to aid in sub-selecting a set of preferred pathways. Both approaches differ in the levels of flexibility they offer. They are exemplified by means of two case studies wherein the Net Present Valuation and the Real Options Analysis are employed as selection criterions. The results demonstrate the benefit of these two approaches when used in conjunction with the adaptation pathways and show how the pathways selected by means of a Build to Target approach generally have a value greater than, or at least the same as, the pathways selected by the Build Up approach. Further, this paper also demonstrates the capacity of Real Options to quantify and capture the economic value of flexibility, which cannot be done by traditional valuation approaches such as Net Present Valuation.

  4. Nucleic Acid Amplification Testing and Sequencing Combined with Acid-Fast Staining in Needle Biopsy Lung Tissues for the Diagnosis of Smear-Negative Pulmonary Tuberculosis.

    Directory of Open Access Journals (Sweden)

    Faming Jiang

    Full Text Available Smear-negative pulmonary tuberculosis (PTB is common and difficult to diagnose. In this study, we investigated the diagnostic value of nucleic acid amplification testing and sequencing combined with acid-fast bacteria (AFB staining of needle biopsy lung tissues for patients with suspected smear-negative PTB.Patients with suspected smear-negative PTB who underwent percutaneous transthoracic needle biopsy between May 1, 2012, and June 30, 2015, were enrolled in this retrospective study. Patients with AFB in sputum smears were excluded. All lung biopsy specimens were fixed in formalin, embedded in paraffin, and subjected to acid-fast staining and tuberculous polymerase chain reaction (TB-PCR. For patients with positive AFB and negative TB-PCR results in lung tissues, probe assays and 16S rRNA sequencing were used for identification of nontuberculous mycobacteria (NTM. The sensitivity, specificity, positive predictive value (PPV, negative predictive value (NPV, and diagnostic accuracy of PCR and AFB staining were calculated separately and in combination.Among the 220 eligible patients, 133 were diagnosed with TB (men/women: 76/57; age range: 17-80 years, confirmed TB: 9, probable TB: 124. Forty-eight patients who were diagnosed with other specific diseases were assigned as negative controls, and 39 patients with indeterminate final diagnosis were excluded from statistical analysis. The sensitivity, specificity, PPV, NPV, and accuracy of histological AFB (HAFB for the diagnosis of smear-negative were 61.7% (82/133, 100% (48/48, 100% (82/82, 48.5% (48/181, and 71.8% (130/181, respectively. The sensitivity, specificity, PPV, and NPV of histological PCR were 89.5% (119/133, 95.8% (46/48, 98.3% (119/121, and 76.7% (46/60, respectively, demonstrating that histological PCR had significantly higher accuracy (91.2% [165/181] than histological acid-fast staining (71.8% [130/181], P < 0.001. Parallel testing of histological AFB staining and PCR showed the

  5. The amino acid alphabet and the architecture of the protein sequence-structure map. I. Binary alphabets.

    Science.gov (United States)

    Ferrada, Evandro

    2014-12-01

    The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet.

  6. In silico Analysis of osr40c1 Promoter Sequence Isolated from Indica Variety Pokkali

    Directory of Open Access Journals (Sweden)

    W.S.I. de Silva

    2017-07-01

    Full Text Available The promoter region of a drought and abscisic acid (ABA inducible gene, osr40c1, was isolated from a salt-tolerant indica rice variety Pokkali, which is 670 bp upstream of the putative translation start codon. In silico promoter analysis of resulted sequence showed that at least 15 types of putative motifs were distributed within the sequence, including two types of common promoter elements, TATA and CAAT boxes. Additionally, several putative cis-acing regulatory elements which may be involved in regulation of osr40c1 expression under different conditions were found in the 5′-upstream region of osr40c1. These are ABA-responsive element, light-responsive elements (ATCT-motif, Box I, G-box, GT1-motif, Gap-box and Sp1, myeloblastosis oncogene response element (CCAAT-box, auxin responsive element (TGA-element, gibberellin-responsive element (GARE-motif and fungal-elicitor responsive elements (Box E and Box-W1. A putative regulatory element, required for endosperm-specific pattern of gene expression designated as Skn-1 motif, was also detected in the Pokkali osr40c1 promoter region. In conclusion, the bioinformatic analysis of osr40c1 promoter region isolated from indica rice variety Pokkali led to the identification of several important stress-responsive cis-acting regulatory elements, and therefore, the isolated promoter sequence could be employed in rice genetic transformation to mediate expression of abiotic stress induced genes.

  7. Next generation sequencing improves the accuracy of KRAS mutation analysis in endoscopic ultrasound fine needle aspiration pancreatic lesions.

    Directory of Open Access Journals (Sweden)

    Dario de Biase

    Full Text Available The use of endoscopic ultrasonography has allowed for improved detection and pathologic analysis of fine needle aspirate material for pancreatic lesion diagnosis. The molecular analysis of KRAS has further improved the clinical sensitivity of preoperative analysis. For this reason, the use of highly analytical sensitive and specific molecular tests in the analysis of material from fine needle aspirate specimens has become of great importance. In the present study, 60 specimens from endoscopic ultrasonography fine needle aspirate were analyzed for KRAS exon 2 and exon 3 mutations, using three different techniques: Sanger sequencing, allele specific locked nucleic acid PCR and Next Generation sequencing (454 GS-Junior, Roche. Moreover, KRAS was also tested in wild-type samples, starting from DNA obtained from cytological smears after pathological evaluation. Sanger sequencing showed a clinical sensitivity for the detection of the KRAS mutation of 42.1%, allele specific locked nucleic acid of 52.8% and Next Generation of 73.7%. In two wild-type cases the re-sequencing starting from selected material allowed to detect a KRAS mutation, increasing the clinical sensitivity of next generation sequencing to 78.95%. The present study demonstrated that the performance of molecular analysis could be improved by using highly analytical sensitive techniques. The Next Generation Sequencing allowed to increase the clinical sensitivity of the test without decreasing the specificity of the analysis. Moreover we observed that it could be useful to repeat the analysis starting from selectable material, such as cytological smears to avoid false negative results.

  8. Complete genome sequence of probiotic Bacillus coagulans HM-08: A potential lactic acid producer.

    Science.gov (United States)

    Yao, Guoqiang; Gao, Pengfei; Zhang, Wenyi

    2016-06-20

    Bacillus coagulans HM-08 is a commercialized probiotic strain in China. Its genome contains a 3.62Mb circular chromosome with an average GC content of 46.3%. In silico analysis revealed the presence of one xyl operon as well as several other genes that are correlated to xylose utilization. The genetic information provided here may help to expand its future biotechnology potential in lactic acid production. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Scanning mutagenesis of the amino acid sequences flanking phosphorylation site 1 of the mitochondrial pyruvate dehydrogenase complex

    Directory of Open Access Journals (Sweden)

    Nagib eAhsan

    2012-07-01

    Full Text Available The mitochondrial pyruvate dehydrogenase complex is regulated by reversible seryl-phosphorylation of the E1α subunit by a dedicated, intrinsic kinase. The phospho-complex is reactivated when dephosphorylated by an intrinsic PP2C-type protein phosphatase. Both the position of the phosphorylated Ser-residue and the sequences of the flanking amino acids are highly conserved. We have used the synthetic peptide-based kinase client assay plus recombinant pyruvate dehydrogenase E1α and E1α-kinase to perform scanning mutagenesis of the residues flanking the site of phosphorylation. Consistent with the results from phylogenetic analysis of the flanking sequences, the direct peptide-based kinase assays tolerated very few changes. Even conservative changes such as Leu, Ile, or Val for Met, or Glu for Asp, gave very marked reductions in phosphorylation. Overall the results indicate that regulation of the mitochondrial pyruvate dehydrogenase complex by reversible phosphorylation is an extreme example of multiple, interdependent instances of co-evolution.

  10. Secondary structure-based analysis of mouse brain small RNA sequences obtained by using next-generation sequencing.

    Science.gov (United States)

    Kiyosawa, Hidenori; Okumura, Akio; Okui, Saya; Ushida, Chisato; Kawai, Gota

    2015-08-01

    In order to find novel structured small RNAs, next-generation sequencing was applied to small RNA fractions with lengths ranging from 40 to 140 nt and secondary structure-based clustering was performed. Sequences of structured RNAs were effectively clustered and analyzed by secondary structure. Although more than 99% of the obtained sequences were known RNAs, 16 candidate mouse structured small non-coding RNAs (MsncRs) were isolated. Based on these results, the merits of secondary structure-based analysis are discussed. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Human factors review for Severe Accident Sequence Analysis (SASA)

    International Nuclear Information System (INIS)

    Krois, P.A.; Haas, P.M.; Manning, J.J.; Bovell, C.R.

    1984-01-01

    The paper will discuss work being conducted during this human factors review including: (1) support of the Severe Accident Sequence Analysis (SASA) Program based on an assessment of operator actions, and (2) development of a descriptive model of operator severe accident management. Research by SASA analysts on the Browns Ferry Unit One (BF1) anticipated transient without scram (ATWS) was supported through a concurrent assessment of operator performance to demonstrate contributions to SASA analyses from human factors data and methods. A descriptive model was developed called the Function Oriented Accident Management (FOAM) model, which serves as a structure for bridging human factors, operations, and engineering expertise and which is useful for identifying needs/deficiencies in the area of accident management. The assessment of human factors issues related to ATWS required extensive coordination with SASA analysts. The analysis was consolidated primarily to six operator actions identified in the Emergency Procedure Guidelines (EPGs) as being the most critical to the accident sequence. These actions were assessed through simulator exercises, qualitative reviews, and quantitative human reliability analyses. The FOAM descriptive model assumes as a starting point that multiple operator/system failures exceed the scope of procedures and necessitates a knowledge-based emergency response by the operators. The FOAM model provides a functionally-oriented structure for assembling human factors, operations, and engineering data and expertise into operator guidance for unconventional emergency responses to mitigate severe accident progression and avoid/minimize core degradation. Operators must also respond to potential radiological release beyond plant protective barriers. Research needs in accident management and potential uses of the FOAM model are described. 11 references, 1 figure

  12. Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences

    KAUST Repository

    Chen, Peng

    2013-07-23

    Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. © 2013 Wiley Periodicals, Inc.

  13. IntFOLD: an integrated server for modelling protein structures and functions from amino acid sequences.

    Science.gov (United States)

    McGuffin, Liam J; Atkins, Jennifer D; Salehe, Bajuna R; Shuid, Ahmad N; Roche, Daniel B

    2015-07-01

    IntFOLD is an independent web server that integrates our leading methods for structure and function prediction. The server provides a simple unified interface that aims to make complex protein modelling data more accessible to life scientists. The server web interface is designed to be intuitive and integrates a complex set of quantitative data, so that 3D modelling results can be viewed on a single page and interpreted by non-expert modellers at a glance. The only required input to the server is an amino acid sequence for the target protein. Here we describe major performance and user interface updates to the server, which comprises an integrated pipeline of methods for: tertiary structure prediction, global and local 3D model quality assessment, disorder prediction, structural domain prediction, function prediction and modelling of protein-ligand interactions. The server has been independently validated during numerous CASP (Critical Assessment of Techniques for Protein Structure Prediction) experiments, as well as being continuously evaluated by the CAMEO (Continuous Automated Model Evaluation) project. The IntFOLD server is available at: http://www.reading.ac.uk/bioinf/IntFOLD/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Nucleic Acid Aptamers for Living Cell Analysis

    Science.gov (United States)

    Xiong, Xiangling; Lv, Yifan; Chen, Tao; Zhang, Xiaobing; Wang, Kemin; Tan, Weihong

    2014-06-01

    Cells as the building blocks of life determine the basic functions and properties of a living organism. Understanding the structure and components of a cell aids in the elucidation of its biological functions. Moreover, knowledge of the similarities and differences between diseased and healthy cells is essential to understanding pathological mechanisms, identifying diagnostic markers, and designing therapeutic molecules. However, monitoring the structures and activities of a living cell remains a challenging task in bioanalytical and life science research. To meet the requirements of this task, aptamers, as “chemical antibodies,” have become increasingly powerful tools for cellular analysis. This article reviews recent advances in the development of nucleic acid aptamers in the areas of cell membrane analysis, cell detection and isolation, real-time monitoring of cell secretion, and intracellular delivery and analysis with living cell models. Limitations of aptamers and possible solutions are also discussed.

  15. Arguments for a Cluster Analysis of Nasal Consonant Sequences of ...

    African Journals Online (AJOL)

    Bantu language scholars, have among other things, debated over the issue of whether nasal and consonant sequences (NC sequences) in various Bantu languages should be considered as clusters or single segments (prenasalised stops). This paper examines these sequences as they occur in Sukwa nouns. Sukwa is a ...

  16. Design and Analysis of Single-Cell Sequencing Experiments

    NARCIS (Netherlands)

    Grün, Dominic; van Oudenaarden, Alexander

    2015-01-01

    Recent advances in single-cell sequencing hold great potential for exploring biological systems with unprecedented resolution. Sequencing the genome of individual cells can reveal somatic mutations and allows the investigation of clonal dynamics. Single-cell transcriptome sequencing can elucidate

  17. Sequence analysis of cereal sucrose synthase genes and isolation ...

    African Journals Online (AJOL)

    SERVER

    2007-10-18

    Oct 18, 2007 ... sequencing of sucrose synthase gene fragment from sor- ghum using primers designed at their conserved exons. MATERIALS AND METHODS. Multiple sequence alignment. Sucrose synthase gene sequences of various cereals like rice, maize, and barley were accessed from NCBI Genbank database.

  18. Analysis of expressed sequence tags derived from inflorescence ...

    African Journals Online (AJOL)

    PRECIOUS

    2009-11-02

    Nov 2, 2009 ... genetically distant organisms. Clones could be isolated and partially sequenced by the thousands with the improvements in DNA sequencing technology. High- throughput DNA sequencing has greatly reduced both the cost and time involved in obtaining large ESTs data sets. There are several applications ...

  19. Accident Sequence Evaluation Program: Human reliability analysis procedure

    International Nuclear Information System (INIS)

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs

  20. Accident Sequence Evaluation Program: Human reliability analysis procedure

    Energy Technology Data Exchange (ETDEWEB)

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs.

  1. Purification and sequence analysis of the mRNA coding for an immunoglobulin heavy chain

    International Nuclear Information System (INIS)

    Cowan, N.J.; Secher, D.S.; Milstein, C.

    1976-01-01

    A mutant cell line (IF2) derived from the mouse myeloma MOPC 21 has been used for the isolation and sequence analysis of H-chain mRNA. The IF2 cells synthesise an H-chain of reduced size in which the Csub(H)1 homology region is missing. Sizing of the IF2 H-chain mRNA and wild-type H-chain mRNA revealed that the deletion is expressed at the mRNA level. The mutant H-chain mRNA sedimented at 16-S, enabling effective resolution from 18-S ribosomal RNA. In experiments using IF2 cells labelled with [ 32 P]phosphate, the 16-S mRNA was purified by oligo(T)-cellulose chromatography. Polyacrylamide gel analysis of the poly(A)-containing fraction showed the presence of a single radioactive band. Comparison of the mobility of this band relative to markers of known molecular weight revealed that the molecule contained about 1,600 nucleotides. Digestion of the 32 P-labelled mRNA with T 1 ribonuclease and two-dimensional fractionation of the resulting oligonucleotides yielded a fingerprint' suitable for a preliminary sequence analysis. By using the established amino acid sequence of the IF2 H-chain and a knowledge of the genetic code, 14 oligonucleotides were assigned within the constant region and four within the variable region of the IF2 H-chain. This sequence data accounts for 19.5% of the coding region. Several other oligonucleotides, which could not be assigned within the coding region but which occurred in approximately molar yield, have also been partially characterised. These oligonucleotides are presumably derived from the untranslated regions of the mRNA. (orig.) [de

  2. Complete amino acid sequence of the human alpha 5 (IV) collagen chain and identification of a single-base mutation in exon 23 converting glycine 521 in the collagenous domain to cysteine in an Alport syndrome patient

    DEFF Research Database (Denmark)

    Zhou, J; Hertz, Jens Michael; Leinonen, A

    1992-01-01

    We have generated and characterized cDNA clones providing the complete amino acid sequence of the human type IV collagen chain whose gene has been shown to be mutated in X chromosome-linked Alport syndrome. The entire translation product has 1,685 amino acid residues. There is a 26-residue signal...... peptide, a 1,430-residue collagenous domain starting with a 14-residue noncollagenous sequence, and a Gly-Xaa-Yaa-repeat sequence interrupted at 22 locations, and a 229-residue carboxyl-terminal noncollagenous domain. The calculated molecular weight of the mature alpha 5(IV) chain is 158,303. Analysis...

  3. Combining protein sequence, structure, and dynamics: A novel approach for functional evolution analysis of PAS domain superfamily.

    Science.gov (United States)

    Dong, Zheng; Zhou, Hongyu; Tao, Peng

    2018-02-01

    PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.

  4. Hybridization properties of long nucleic acid probes for detection of variable target sequences, and development of a hybridization prediction algorithm

    Science.gov (United States)

    Öhrmalm, Christina; Jobs, Magnus; Eriksson, Ronnie; Golbob, Sultan; Elfaitouri, Amal; Benachenhou, Farid; Strømme, Maria; Blomberg, Jonas

    2010-01-01

    One of the main problems in nucleic acid-based techniques for detection of infectious agents, such as influenza viruses, is that of nucleic acid sequence variation. DNA probes, 70-nt long, some including the nucleotide analog deoxyribose-Inosine (dInosine), were analyzed for hybridization tolerance to different amounts and distributions of mismatching bases, e.g. synonymous mutations, in target DNA. Microsphere-linked 70-mer probes were hybridized in 3M TMAC buffer to biotinylated single-stranded (ss) DNA for subsequent analysis in a Luminex® system. When mismatches interrupted contiguous matching stretches of 6 nt or longer, it had a strong impact on hybridization. Contiguous matching stretches are more important than the same number of matching nucleotides separated by mismatches into several regions. dInosine, but not 5-nitroindole, substitutions at mismatching positions stabilized hybridization remarkably well, comparable to N (4-fold) wobbles in the same positions. In contrast to shorter probes, 70-nt probes with judiciously placed dInosine substitutions and/or wobble positions were remarkably mismatch tolerant, with preserved specificity. An algorithm, NucZip, was constructed to model the nucleation and zipping phases of hybridization, integrating both local and distant binding contributions. It predicted hybridization more exactly than previous algorithms, and has the potential to guide the design of variation-tolerant yet specific probes. PMID:20864443

  5. Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides.

    Science.gov (United States)

    McMillen, Chelsea L; Wright, Patience M; Cassady, Carolyn J

    2016-05-01

    Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.

  6. Meta-analysis of small RNA-sequencing errors reveals ubiquitous post-transcriptional RNA modifications

    OpenAIRE

    Ebhardt, H. Alexander; Tsang, Herbert H.; Dai, Denny C.; Liu, Yifeng; Bostan, Babak; Fahlman, Richard P.

    2009-01-01

    Recent advances in DNA-sequencing technology have made it possible to obtain large datasets of small RNA sequences. Here we demonstrate that not all non-perfectly matched small RNA sequences are simple technological sequencing errors, but many hold valuable biological information. Analysis of three small RNA datasets originating from Oryza sativa and Arabidopsis thaliana small RNA-sequencing projects demonstrates that many single nucleotide substitution errors overlap when aligning homologous...

  7. Homology analyses of the protein sequences of fatty acid synthases from chicken liver, rat mammary gland, and yeast

    International Nuclear Information System (INIS)

    Chang, Soo-Ik; Hammes, G.G.

    1989-01-01

    Homology analyses of the protein sequences of chicken liver and rat mammary gland fatty acid synthases were carried out. The amino acid sequences of the chicken and rat enzymes are 67% identical. If conservative substitutions are allowed, 78% of the amino acids are matched. A region of low homologies exists between the functional domains, in particular around amino acid residues 1059-1264 of the chicken enzyme. Homologies between the active sites of chicken and rat and of chicken and yeast enzymes have been analyzed by an alignment method. A high degree of homology exists between the active sites of the chicken and rat enzymes. However, the chicken and yeast enzymes show a lower degree of homology. The DADPH-binding dinucleotide folds of the β-ketoacyl reductase and the enoyl reductase sites were identified by comparison with a known consensus sequence for the DADP- and FAD-binding dinucleotide folds. The active sites of all of the enzymes are primarily in hydrophobic regions of the protein. This study suggests that the genes for the functional domains of fatty acid synthase were originally separated, and these genes were connected to each other by using different connecting nucleotide sequences in different species. An alternative explanation for the differences in rat and chicken is a common ancestry and mutations in the joining regions during evolution

  8. Nucleotide sequence of a cDNA for branched chain acyltransferase with analysis of the deduced protein structure

    International Nuclear Information System (INIS)

    Hummel, K.B.; Litwer, S.; Bradford, A.P.; Aitken, A.; Danner, D.J.; Yeaman, S.J.

    1988-01-01

    Nucleotide sequence was determined for a 1.6-kilobase human cDNA putative for the branched chain acyltransferase protein of the branched chain α-ketoacid dehydrogenase complex. Translation of the sequence reveals an open reading frame encoding a 315-amino acid protein of molecular weight 35,759 followed by 560 bases of 3'-untranslated sequence. Three repeats of the polyadenylation signal hexamer ATTAAA are present prior to the polyadenylate tail. Within the open reading frame is a 10-amino acid fragment which matches exactly the amino acid sequence around the lipoate-lysine residue in bovine kidney branched chain acyltransferase, thus confirming the identity of the cDNA. Analysis of the deduced protein structure for the human branched chain acyltransferase revealed an organization into domains similar to that reported for the acyltransferase proteins of the pyruvate and α-ketoglutarate dehydrogenase complexes. This similarity in organization suggests that a more detailed analysis of the proteins will be required to explain the individual substrate and multienzyme complex specificity shown by these acyltransferases

  9. Full-length genome sequencing analysis of avian infectious bronchitis virus isolate associated with nephropathogenic infection.

    Science.gov (United States)

    Leghari, R A; Fan, B; Wang, H; Bai, J; Zhang, L; Abro, S H; Jiang, P

    2016-12-01

    Infectious bronchitis virus (IBV) produces infectious bronchitis (IB) disease in poultry worldwide. In spite of proper vaccinations against the IBV, new IBV strains are continually emerging worldwide. In this study, a new highly virulent nephropathogenic IBV strain named CK/CH/XDC-2/2013 was identified from a vaccinated flock with clinical signs of IB in the Jiangsu province of China. The full-length genome sequence of the isolate was 27,714 nucleotides long, and the genome was organized similarly to classical IBV strains. Minimum divergence, phylogenetic analysis, and distance matrix of the genome showed that the CK/CH/XDC-2/2013 isolate had the highest similarity to the IBV BJ strain. The spike glycoprotein (S) gene had the greatest similarity to the nephropathogenic BJ strain and showed an 8 amino acid insertion (YSNGNSDV) at 73 to 80 sites and 3 amino acid deletion at sites 126 to 128 compared to the IBV vaccine strains. A recombination analysis of the S gene showed that the new isolate evolved from the IBV BJ strain and the KM91 vaccine strain. An animal challenge experiment showed a mortality of 60 to 80% in early-age chickens by different inoculation routes. Pathological examinations of the kidneys revealed inflammation, distention with uric acid deposits, and tubular degeneration. It indicated that the CK/CH/XDC-2/2013 isolate has robust kidney tissue tropism, and new nephropathogenic IBV strains are continuously evolving in China. © 2016 Poultry Science Association Inc.

  10. Isolation and sequence analysis of a chalcone synthase cDNA of Matthiola incana R. Br. (Brassicaceae).

    Science.gov (United States)

    Epping, B; Kittel, M; Ruhnau, B; Hemleben, V

    1990-06-01

    A cDNA clone (pcM12) of the chalcone synthase (CHS) of Matthiola incana R. Br. (Brassicaceae) was isolated from a cDNA library, sequenced and analysed. It comprises the complete coding sequence for the CHS and 5' and 3' untranslated regions. The deduced amino acid sequence shows that the Matthiola incana CHS consists of 394 amino acid residues. Comparison with CHS amino acid sequences of other plants indicates more than 82% homology.

  11. Defining Sequence Space and Reaction Products within the Cyanuric Acid Hydrolase (AtzD)/Barbiturase Protein Family

    OpenAIRE

    Seffernick, Jennifer L.; Erickson, Jasmine S.; Cameron, Stephan M.; Cho, Seunghee; Dodge, Anthony G.; Richman, Jack E.; Sadowsky, Michael J.; Wackett, Lawrence P.

    2012-01-01

    Cyanuric acid hydrolases (AtzD) and barbiturases are homologous, found almost exclusively in bacteria, and comprise a rare protein family with no discernible linkage to other protein families or an X-ray structural class. There has been confusion in the literature and in genome projects regarding the reaction products, the assignment of individual sequences as either cyanuric acid hydrolases or barbiturases, and spurious connection of this family to another protein family. The present study h...

  12. De novo transcriptome assembly of Zanthoxylum bungeanum using Illumina sequencing for evolutionary analysis and simple sequence repeat marker development.

    Science.gov (United States)

    Feng, Shijing; Zhao, Lili; Liu, Zhenshan; Liu, Yulin; Yang, Tuxi; Wei, Anzhi

    2017-12-01

    Zanthoxylum, an ancient economic crop in Asia, has a satisfying aromatic taste and immense medicinal values. A lack of genomic information and genetic markers has limited the evolutionary analysis and genetic improvement of Zanthoxylum species and their close relatives. To better understand the evolution, domestication, and divergence of Zanthoxylum, we present a de novo transcriptome analysis of an elite cultivar of Z. bungeanum using Illumina sequencing; we then developed simple sequence repeat markers for identification of Zanthoxylum. In total, we predicted 45,057 unigenes and 22,212 protein coding sequences, approximately 90% of which showed significant similarities to known proteins in databases. Phylogenetic analysis indicated that Zanthoxylum is relatively recent and estimated to have diverged from Citrus ca. 36.5-37.7 million years ago. We also detected a whole-genome duplication event in Zanthoxylum that occurred 14 million years ago. We found no protein coding sequences that were significantly under positive selection by Ka/Ks. Simple sequence repeat analysis divided 31 Zanthoxylum cultivars and landraces into three major groups. This Zanthoxylum reference transcriptome provides crucial information for the evolutionary study of the Zanthoxylum genus and the Rutaceae family, and facilitates the establishment of more effective Zanthoxylum breeding programs.

  13. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    Science.gov (United States)

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  14. Transcriptome sequencing and positive selected genes analysis of Bombyx mandarina.

    Directory of Open Access Journals (Sweden)

    Tingcai Cheng

    Full Text Available The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG and posterior silk gland (PSG. Three sericin genes (sericin 1, sericin 2, and sericin 3 were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25 were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs and 361 insertion-deletions (INDELs were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research.

  15. Diverse Bacterial PKS Sequences Derived From Okadaic Acid-Producing Dinoflagellates

    Directory of Open Access Journals (Sweden)

    Kathleen S. Rein

    2008-05-01

    Full Text Available Okadaic acid (OA and the related dinophysistoxins are isolated from dinoflagellates of the genus Prorocentrum and Dinophysis. Bacteria of the Roseobacter group have been associated with okadaic acid producing dinoflagellates and have been previously implicated in OA production. Analysis of 16S rRNA libraries reveals that Roseobacter are the most abundant bacteria associated with OA producing dinoflagellates of the genus Prorocentrum and are not found in association with non-toxic dinoflagellates. While some polyketide synthase (PKS genes form a highly supported Prorocentrum clade, most appear to be bacterial, but unrelated to Roseobacter or Alpha-Proteobacterial PKSs or those derived from other Alveolates Karenia brevis or Crytosporidium parvum.

  16. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects...

  17. Data for amino acid alignment of Japanese stingray melanocortin receptors with other gnathostome melanocortin receptor sequences, and the ligand selectivity of Japanese stingray melanocortin receptors

    Directory of Open Access Journals (Sweden)

    Akiyoshi Takahashi

    2016-06-01

    Full Text Available This article contains structure and pharmacological characteristics of melanocortin receptors (MCRs related to research published in “Characterization of melanocortin receptors from stingray Dasyatis akajei, a cartilaginous fish” (Takahashi et al., 2016 [1]. The amino acid sequences of the stingray, D. akajei, MC1R, MC2R, MC3R, MC4R, and MC5R were aligned with the corresponding melanocortin receptor sequences from the elephant shark, Callorhinchus milii, the dogfish, Squalus acanthias, the goldfish, Carassius auratus, and the mouse, Mus musculus. These alignments provide the basis for phylogenetic analysis of these gnathostome melanocortin receptor sequences. In addition, the Japanese stingray melanocortin receptors were separately expressed in Chinese Hamster Ovary cells, and stimulated with stingray ACTH, α-MSH, β-MSH, γ-MSH, δ-MSH, and β-endorphin. The dose response curves reveal the order of ligand selectivity for each stingray MCR.

  18. iTriplet, a rule-based nucleic acid sequence motif finder

    Directory of Open Access Journals (Sweden)

    Gunderson Samuel I

    2009-10-01

    Full Text Available Abstract Background With the advent of high throughput sequencing techniques, large amounts of sequencing data are readily available for analysis. Natural biological signals are intrinsically highly variable making their complete identification a computationally challenging problem. Many attempts in using statistical or combinatorial approaches have been made with great success in the past. However, identifying highly degenerate and long (>20 nucleotides motifs still remains an unmet challenge as high degeneracy will diminish statistical significance of biological signals and increasing motif size will cause combinatorial explosion. In this report, we present a novel rule-based method that is focused on finding degenerate and long motifs. Our proposed method, named iTriplet, avoids costly enumeration present in existing combinatorial methods and is amenable to parallel processing. Results We have conducted a comprehensive assessment on the performance and sensitivity-specificity of iTriplet in analyzing artificial and real biological sequences in various genomic regions. The results show that iTriplet is able to solve challenging cases. Furthermore we have confirmed the utility of iTriplet by showing it accurately predicts polyA-site-related motifs using a dual Luciferase reporter assay. Conclusion iTriplet is a novel rule-based combinatorial or enumerative motif finding method that is able to process highly degenerate and long motifs that have resisted analysis by other methods. In addition, iTriplet is distinguished from other methods of the same family by its parallelizability, which allows it to leverage the power of today's readily available high-performance computing systems.

  19. Partial amino acid sequence of the branched chain amino acid aminotransferase (TmB) of E. coli JA199 pDU11

    International Nuclear Information System (INIS)

    Feild, M.J.; Armstrong, F.B.

    1987-01-01

    E. coli JA199 pDU11 harbors a multicopy plasmid containing the ilv GEDAY gene cluster of S. typhimurium. TmB, gene product of ilv E, was purified, crystallized, and subjected to Edman degradation using a gas phase sequencer. The intact protein yielded an amino terminal 31 residue sequence. Both carboxymethylated apoenzyme and [ 3 H]-NaBH-reduced holoenzyme were then subjected to digestion by trypsin. The digests were fractionated using reversed phase HPLC, and the peptides isolated were sequenced. The borohydride-treated holoenzyme was used to isolate the cofactor-binding peptide. The peptide is 27 residues long and a comparison with known sequences of other aminotransferases revealed limited homology. Peptides accounting for 211 of 288 predicted residues have been sequenced, including 9 residues of the carboxyl terminus. Comparison of peptides with the inferred amino acid sequence of the E. coli K-12 enzyme has helped determine the sequence of the amino terminal 59 residues; only two differences between the sequences are noted in this region

  20. Database-driven primary analysis of raw sequencing data

    DEFF Research Database (Denmark)

    2014-01-01

    The present invention relates to methods for identifying the source of a biological sequence containing sample from raw sequencing reads. The method may be used to identify the source of unknown DNA and can be used for diagnostic, biodefense, food safety and quality, and hygiene applications....... In another aspect the invention relates to a database of reference sequences which can be used in the method of the invention....

  1. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    Science.gov (United States)

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. An analysis of expressed sequence tags of developing castor endosperm using a full-length cDNA library

    Directory of Open Access Journals (Sweden)

    Wallis James G

    2007-07-01

    Full Text Available Abstract Background Castor seeds are a major source for ricinoleate, an important industrial raw material. Genomics studies of castor plant will provide critical information for understanding seed metabolism, for effectively engineering ricinoleate production in transgenic oilseeds, or for genetically improving castor plants by eliminating toxic and allergic proteins in seeds. Results Full-length cDNAs are useful resources in annotating genes and in providing functional analysis of genes and their products. We constructed a full-length cDNA library from developing castor endosperm, and obtained 4,720 ESTs from 5'-ends of the cDNA clones representing 1,908 unique sequences. The most abundant transcripts are genes encoding storage proteins, ricin, agglutinin and oleosins. Several other sequences are also very numerous, including two acidic triacylglycerol lipases, and the oleate hydroxylase (FAH12 gene that is responsible for ricinoleate biosynthesis. The role(s of the lipases in developing castor seeds are not clear, and co-expressing of a lipase and the FAH12 did not result in significant changes in hydroxy fatty acid accumulation in transgenic Arabidopsis seeds. Only one oleate desaturase (FAD2 gene was identified in our cDNA sequences. Sequence and functional analyses of the castor FAD2 were carried out since it had not been characterized previously. Overexpression of castor FAD2 in a FAH12-expressing Arabidopsis line resulted in decreased accumulation of hydroxy fatty acids in transgenic seeds. Conclusion Our results suggest that transcriptional regulation of FAD2 and FAH12 genes maybe one of the mechanisms that contribute to a high level of ricinoleate accumulation in castor endosperm. The full-length cDNA library will be used to search for additional genes that affect ricinoleate accumulation in seed oils. Our EST sequences will also be useful to annotate the castor genome, which whole sequence is being generated by shotgun sequencing at

  3. Feature Extraction From DNA Sequences by Multifractal Analysis

    National Research Council Canada - National Science Library

    Zhang, H

    2001-01-01

    This paper presents feature extraction and estimation of multifractal measures of DNA sequences using a multifractal methodology and demonstrates a new scheme for identifying biological functionality...

  4. Sequence and expression analysis of gaps in human chromosome 20

    DEFF Research Database (Denmark)

    Minocherhomji, Sheroy; Seemann, Stefan; Mang, Yuan

    2012-01-01

    /or overlap disease-associated loci, including the DLGAP4 locus. In this study, we sequenced ~99% of all three unfinished gaps on human chr 20, determined their complete genomic sizes and assessed epigenetic profiles using a combination of Sanger sequencing, mate pair paired-end high-throughput sequencing......The finished human genome-assemblies comprise several hundred un-sequenced euchromatic gaps, which may be rich in long polypurine/polypyrimidine stretches. Human chromosome 20 (chr 20) currently has three unfinished gaps remaining on its q-arm. All three gaps are within gene-dense regions and...

  5. Purification and partial amino-acid sequence of gibberellin 20-oxidase from Cucurbita maxima L. endosperm.

    Science.gov (United States)

    Lange, T

    1994-01-01

    Gibberellin (GA) 20-oxidase was purified to apparent homogeneity from Cucurbita maxima endosperm by fractionated ammonium-sulphate precipitation, gel-filtration chromatography and anion-exchange and hydrophobic-interaction high-performance liquid chromatography (HPLC). Average purification after the last step was 55-fold with 3.9% of the activity recovered. The purest single fraction was enriched 101-fold with 0.2% overall recovery. Apparent relative molecular mass of the enzyme was 45 kDa, as determined by gel-filtration HPLC and sodium dodecyl sulphate-polyacrylamide gel electrophoresis, indicating that GA 20-oxidase is probably a monomeric enzyme. The purified enzyme degraded on two-dimensional gel electrophoresis, giving two protein spots: a major one corresponding to a molecular mass of 30 kDa and a minor one at 45 kDa. The isoelectric point for both was 5.4. The amino-acid sequences of the amino-terminus of the purified enzyme and of two peptides from a tryptic digest were determined. The purified enzyme catalysed the sequential conversion of [14C]GA12 to [14C]GA15, [14C]GA24 and [14C]GA25, showing that carbon atom 20 was oxidised to the corresponding alcohol, aldehyde and carboxylic acid in three consecutive reactions. [14C]Gibberellin A53 was similarly converted to [14C]GA44, [14C]GA19, [14C]GA17 and small amounts of a fourth product, which was preliminarily identified as [14C]GA20, a C19-gibberellin. All GAs except [14C]GA20 were identified by combined gas chromatography-mass spectrometry. The cofactor requirements in the absence of dithiothreitol were essentially as in its presence (Lange et al., Planta 195, 98-107, 1994), except that ascorbate was essential for enzyme activity and the optimal concentration of catalase was lower.

  6. Using Willie's Acid-Base Box for Blood Gas Analysis

    Science.gov (United States)

    Dietz, John R.

    2011-01-01

    In this article, the author describes a method developed by Dr. William T. Lipscomb for teaching blood gas analysis of acid-base status and provides three examples using Willie's acid-base box. Willie's acid-base box is constructed using three of the parameters of standard arterial blood gas analysis: (1) pH; (2) bicarbonate; and (3) CO[subscript…

  7. Isolation of three allergenic fractions of the major allergen from Olea europea pollen and N-terminal amino acid sequence.

    Science.gov (United States)

    Villalba, M; López-Otín, C; Martín-Orozco, E; Monsalve, R I; Palomino, P; Lahoz, C; Rodríguez, R

    1990-10-30

    A method to isolate the major allergen from olive pollen (Ole e I) in high yield is described. The allergenic fraction has been separated into 3 subfractions by reverse-phase HPLC. All these fractions were reactive to allergic sera from olive-sensitized patients, giving similar responses. No significant differences were observed between the amino acid compositions of these three proteins. The amino acid sequence of the first 27 amino acid residues from the N-terminal end is given. No homologies have been detected between Ole e I and other known allergens obtained from pollen.

  8. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  9. Structural analysis of DNA sequence: evidence for lateral gene transfer in Thermotoga maritima

    DEFF Research Database (Denmark)

    Worning, Peder; Jensen, Lars Juhl; Nelson, K. E.

    2000-01-01

    The recently published complete DNA sequence of the bacterium Thermotoga maritima provides evidence, based on protein sequence conservation, for lateral gene transfer between Archaea and Bacteria. We introduce a new method of periodicity analysis of DNA sequences, based on structural parameters......, which brings independent evidence for the lateral gene transfer in the genome of T.maritima, The structural analysis relates the Archaea-like DNA sequences to the genome of Pyrococcus horikoshii. Analysis of 24 complete genomic DNA sequences shows different periodicity patterns for organisms...

  10. An Analysis of Medium Loss of Coolant Sequence for the Severe Accident Analysis DB

    Energy Technology Data Exchange (ETDEWEB)

    Park, Soo Yong; Song, Yong Mann

    2007-12-15

    This report contains analysis methodologies and calculation results of medium loss of Coolant sequences for the severe accident analysis database system. The Korean standard nuclear power plant has been selected as a reference plant. Based on the probabilistic safety analysis of the corresponding plant, 10 accident scenarios, which was predicted to have more than 10{sup -10} /ry occurrence frequency, have been analyzed as base cases for the medium loss of Coolant sequence database. The functions of the severe accident analysis database system will be to make a diagnosis of the accident by some input information from the plant symptoms, to search a corresponding scenario, and finally to provide the user phenomenological information based on the pre-analyzed results. The MAAP 4.06 calculation results in this report will be utilized as input data of the severe accident analysis database system.

  11. An Analysis of Large Loss of Coolant Sequence for the Severe Accident Analysis DB

    Energy Technology Data Exchange (ETDEWEB)

    Park, Soo Yong; Song, Yong Mann

    2007-12-15

    This report contains analysis methodologies and calculation results of Large loss of Coolant sequences for the severe accident analysis database system. The Korean standard nuclear power plant has been selected as a reference plant. Based on the probabilistic safety analysis of the corresponding plant, 14 accident scenarios, which was predicted to have more than 10{sup -10} /ry occurrence frequency, have been analyzed as base cases for the Large loss of Coolant sequence database. The functions of the severe accident analysis database system will be to make a diagnosis of the accident by some input information from the plant symptoms, to search a corresponding scenario, and finally to provide the user phenomenological information based on the pre-analyzed results. The MAAP 4.06 calculation results in this report will be utilized as input data of the severe accident analysis database system.

  12. An Analysis of Small Loss of Coolant Sequence for the Severe Accident Analysis Database

    Energy Technology Data Exchange (ETDEWEB)

    Park, Soo Yong; Song, Yong Mann

    2007-12-15

    This report contains analysis methodologies and calculation results of small loss of Coolant sequences for the severe accident analysis database system. The Korean standard nuclear power plant has been selected as a reference plant. Based on the probabilistic safety analysis of the corresponding plant, 10 accident scenarios, which was predicted to have more than 10{sup -9} /ry occurrence frequency, have been analyzed as base cases for the small loss of Coolant sequence database. The functions of the severe accident analysis database system will be to make a diagnosis of the accident by some input information from the plant symptoms, to search a corresponding scenario, and finally to provide the user phenomenological information based on the pre-analyzed results. The MAAP 4.06 calculation results in this report will be utilized as input data of the severe accident analysis database system.

  13. Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88

    DEFF Research Database (Denmark)

    Pel, Herman J.; de Winde, Johannes H.; Archer, David B.

    2007-01-01

    The filamentous fungus Aspergillus niger is widely exploited by the fermentation industry for the production of enzymes and organic acids, particularly citric acid. We sequenced the 33.9-megabase genome of A. niger CBS 513.88, the ancestor of currently used enzyme production strains. A high level...

  14. Draft Genome Sequence of Lactobacillus delbrueckii subsp. bulgaricus CFL1, a Lactic Acid Bacterium Isolated from French Handcrafted Fermented Milk

    OpenAIRE

    Meneghel, Julie; Dugat-Bony, Eric; Irlinger, Fran?oise; Loux, Valentin; Vidal, Marie; Passot, St?phanie; B?al, Catherine; Layec, S?verine; Fonseca, Fernanda

    2016-01-01

    Lactobacillus delbrueckii subsp. bulgaricus (L. bulgaricus) is a lactic acid bacterium widely used for the production of yogurt and cheeses. Here, we report the genome sequence of L. bulgaricus CFL1 to improve our knowledge on its stress-induced damages following production and end-use processes.

  15. Acid mine drainage neutralization in a pilot sequencing batch reactor using limestone from a paper and pulp industry

    CSIR Research Space (South Africa)

    Vadapalli, VRK

    2015-10-01

    Full Text Available Technology Acid mine drainage neutralization in a pilot sequencing batch reactor using limestone from a paper and pulp industry V.R.K. Vadapallia∗, J.N. Zvimbab, M. Mathyeb, H. Fischerc and L. Bologoc aSustainable Resources and Environment...

  16. N-terminal amino acid sequence of Bacillus licheniformis alpha-amylase: comparison with Bacillus amyloliquefaciens and Bacillus subtilis Enzymes.

    OpenAIRE

    Kuhn, H; Fietzek, P P; Lampen, J O

    1982-01-01

    The thermostable, liquefying alpha-amylase from Bacillus licheniformis was immunologically cross-reactive with the thermolabile, liquefying alpha-amylase from Bacillus amyloliquefaciens. Their N-terminal amino acid sequences showed extensive homology with each other, but not with the saccharifying alpha-amylases of Bacillus subtilis.

  17. Synthesis of peptide-immunogens corresponding to amino acid sequences from human histocompatibility class II membrane glycoproteins.

    Science.gov (United States)

    Chillemi, F; Cappelletti, S; Francescato, P; Chersi, A

    1990-03-01

    Six peptides with amino acid sequences of human histocompatibility Class II membrane glycoproteins were synthesized by conventional solution methods. Five peptides were prepared by stepwise procedures from the carboxyterminus. The sixth was synthesized by fragment condensation (5 + 10 coupling). Antibodies to synthetic peptides were then used to locate exposed and buried regions in the membrane glycoproteins.

  18. Draft Genome Sequence of Burkholderia stabilis LA20W, a Trehalose Producer That Uses Levulinic Acid as a Substrate

    Science.gov (United States)

    Sato, Yuya; Koike, Hideaki; Kondo, Susumu; Hori, Tomoyuki; Kanno, Manabu; Kimura, Nobutada; Morita, Tomotake; Kirimura, Kohtaro

    2016-01-01

    Burkholderia stabilis LA20W produces trehalose using levulinic acid (LA) as a substrate. Here, we report the 7.97-Mb draft genome sequence of B. stabilis LA20W, which will be useful in investigations of the enzymes involved in LA metabolism and the mechanism of LA-induced trehalose production. PMID:27491978

  19. N-terminal amino acid sequence of Bacillus licheniformis alpha-amylase: comparison with Bacillus amyloliquefaciens and Bacillus subtilis Enzymes.

    Science.gov (United States)

    Kuhn, H; Fietzek, P P; Lampen, J O

    1982-01-01

    The thermostable, liquefying alpha-amylase from Bacillus licheniformis was immunologically cross-reactive with the thermolabile, liquefying alpha-amylase from Bacillus amyloliquefaciens. Their N-terminal amino acid sequences showed extensive homology with each other, but not with the saccharifying alpha-amylases of Bacillus subtilis. PMID:6172418

  20. Acid rain compliance planning using decision analysis

    International Nuclear Information System (INIS)

    Norris, C.; Sweet, T.; Borison, A.

    1991-01-01

    Illinois Power Company (IP) is an investor-owned electric and natural gas utility serving portions of downstate Illinois. In addition to one nuclear unit and several small gas and/or oil-fired units, IP has ten coal-fired units. It is easy to understand the impact the Clean Air Act Amendments of 1990 (CAAA) could have on IP. Prior to passage of the CAAA, IP formed several teams to evaluate the specific compliance options at each of the high sulfur coal units. Following that effort, numerous economic analyses of compliance strategies were conducted. The CAAA have introduced a new dimension to planning under uncertainty. Not only are many of the familiar variables uncertain, but the specific form of regulation, and indeed, the compliance goal itself is hard to define. For IP, this led them to use techniques not widely used within their corporation. This paper summarizes the analytical methods used in these analyses and the preliminary results as of July, 1991. The analysis used three approaches to examine the acid rain compliance decision. These approaches were: (1) the 'most-likely,' or single-path scenario approach; (2) a multi-path strategy analysis using the strategies defined in the single-scenario analysis; and (3) a less constrained multi-path option analysis which selects the least cost compliance option for each unit

  1. Analysis of simple sequence repeats in rice bean (Vigna umbellata using an SSR-enriched library

    Directory of Open Access Journals (Sweden)

    Lixia Wang

    2016-02-01

    Full Text Available Rice bean (Vigna umbellata Thunb., a warm-season annual legume, is grown in Asia mainly for dried grain or fodder and plays an important role in human and animal nutrition because the grains are rich in protein and some essential fatty acids and minerals. With the aim of expediting the genetic improvement of rice bean, we initiated a project to develop genomic resources and tools for molecular breeding in this little-known but important crop. Here we report the construction of an SSR-enriched genomic library from DNA extracted from pooled young leaf tissues of 22 rice bean genotypes and developing SSR markers. In 433,562 reads generated by a Roche 454 GS-FLX sequencer, we identified 261,458 SSRs, of which 48.8% were of compound form. Dinucleotide repeats were predominant with an absolute proportion of 81.6%, followed by trinucleotides (17.8%. Other types together accounted for 0.6%. The motif AC/GT accounted for 77.7% of the total, followed by AAG/CTT (14.3%, and all others accounted for 12.0%. Among the flanking sequences, 2928 matched putative genes or gene models in the protein database of Arabidopsis thaliana, corresponding with 608 non-redundant Gene Ontology terms. Of these sequences, 11.2% were involved in cellular components, 24.2% were involved molecular functions, and 64.6% were associated with biological processes. Based on homolog analysis, 1595 flanking sequences were similar to mung bean and 500 to common bean genomic sequences. Comparative mapping was conducted using 350 sequences homologous to both mung bean and common bean sequences. Finally, a set of primer pairs were designed, and a validation test showed that 58 of 220 new primers can be used in rice bean and 53 can be transferred to mung bean. However, only 11 were polymorphic when tested on 32 rice bean varieties. We propose that this study lays the groundwork for developing novel SSR markers and will enhance the mapping of qualitative and quantitative traits and marker

  2. Amino acid substitution: its use in detection and analysis of genetic variants

    International Nuclear Information System (INIS)

    Popp, R.A.; Hirsch, G.P.; Bradshaw, B.S.

    1979-01-01

    Techniques of chemical analysis, amino acid sequencing and autoradiography are being used to study the frequency of incorporation of normally noncoded amino acids into hemoglobins and seminal fluid proteins. We are studying, by the sequencing of radiolabeled proteins followed by the recovery of [ 3 H] isoleucine phenylthiohydantoin by high-performance liquid chromatography, the frequency at which normally noncoded isoleucine is incorporated into hemoglobin because of base-substitution mutations versus translational errors. Irradiation increases the isoleucine content of human hemoglobin and the frequency of substitution of isoleucine for specific amino acids in rabbit hemoglobin. Studies to date indicate that these techniques have been developed sufficiently for initial analysis of the potential of drugs and environmental pollutants to induce base-substitution mutations in mammalian somatic cells

  3. Sequence and comparative analysis of Leuconostoc dairy bacteriophages.

    Science.gov (United States)

    Kot, Witold; Hansen, Lars H; Neve, Horst; Hammer, Karin; Jacobsen, Susanne; Pedersen, Per D; Sørensen, Søren J; Heller, Knut J; Vogensen, Finn K

    2014-04-17

    Bacteriophages attacking Leuconostoc species may significantly influence the quality of the final product. There is however limited knowledge of this group of phages in the literature. We have determined the complete genome sequences of nine Leuconostoc bacteriophages virulent to either Leuconostoc mesenteroides or Leuconostoc pseudomesenteroides strains. The phages have dsDNA genomes with sizes ranging from 25.7 to 28.4 kb. Comparative genomics analysis helped classify the 9 phages into two classes, which correlates with the host species. High percentage of similarity within the classes on both nucleotide and protein levels was observed. Genome comparison also revealed very high conservation of the overall genomic organization between the classes. The genes were organized in functional modules responsible for replication, packaging, head and tail morphogenesis, cell lysis and regulation and modification, respectively. No lysogeny modules were detected. To our knowledge this report provides the first comparative genomic work done on Leuconostoc dairy phages. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Analysis of expressed sequence tags from the Ulva prolifera (Chlorophyta)

    Science.gov (United States)

    Niu, Jianfeng; Hu, Haiyan; Hu, Songnian; Wang, Guangce; Peng, Guang; Sun, Song

    2010-01-01

    In 2008, a green tide broke out before the sailing competition of the 29th Olympic Games in Qingdao. The causative species was determined to be Enteromorpha prolifera ( Ulva prolifera O. F. Müller), a familiar green macroalga along the coastline of China. Rapid accumulation of a large biomass of floating U. prolifera prompted research on different aspects of this species. In this study, we constructed a nonnormalized cDNA library from the thalli of U. prolifera and acquired 10 072 high-quality expressed sequence tags (ESTs). These ESTs were assembled into 3 519 nonredundant gene groups, including 1 446 clusters and 2 073 singletons. After annotation with the nr database, a large number of genes were found to be related with chloroplast and ribosomal protein, GO functional classification showed 1 418 ESTs participated in photosynthesis and 1 359 ESTs were responsible for the generation of precursor metabolites and energy. In addition, rather comprehensive carbon fixation pathways were found in U. prolifera using KEGG. Some stress-related and signal transduction-related genes were also found in this study. All the evidences displayed that U. prolifera had substance and energy foundation for the intense photosynthesis and the rapid proliferation. Phylogenetic analysis of cytochrome c oxidase subunit I revealed that this green-tide causative species is most closely affiliated to Pseudendoclonium akinetum (Ulvophyceae).

  5. The sequence and analysis of a Chinese pig genome

    Directory of Open Access Journals (Sweden)

    Fang Xiaodong

    2012-11-01

    Full Text Available Abstract Background The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP, as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome. Results Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes. Conclusion Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models.

  6. The sequence and analysis of Trypanosoma brucei chromosome II

    Science.gov (United States)

    El-Sayed, Najib M. A.; Ghedin, Elodie; Song, Jinming; MacLeod, Annette; Bringaud, Frederic; Larkin, Christopher; Wanless, David; Peterson, Jeremy; Hou, Lihua; Taylor, Sonya; Tweedie, Alison; Biteau, Nicolas; Khalak, Hanif G.; Lin, Xiaoying; Mason, Tanya; Hannick, Linda; Caler, Elisabet; Blandin, Gaëlle; Bartholomeu, Daniella; Simpson, Anjana J.; Kaul, Samir; Zhao, Hong; Pai, Grace; Aken, Susan Van; Utterback, Teresa; Haas, Brian; Koo, Hean L.; Umayam, Lowell; Suh, Bernard; Gerrard, Caroline; Leech, Vanessa; Qi, Rong; Zhou, Shiguo; Schwartz, David; Feldblyum, Tamara; Salzberg, Steven; Tait, Andrew; Turner, C. Michael R.; Ullu, Elisabetta; White, Owen; Melville, Sara; Adams, Mark D.; Fraser, Claire M.; Donelson, John E.

    2003-01-01

    We report here the sequence of chromosome II from Trypanosoma brucei, the causative agent of African sleeping sickness. The 1.2-Mb pairs encode about 470 predicted genes organised in 17 directional clusters on either strand, the largest cluster of which has 92 genes lined up over a 284-kb region. An analysis of the GC skew reveals strand compositional asymmetries that coincide with the distribution of protein-coding genes, suggesting these asymmetries may be the result of transcription-coupled repair on coding versus non-coding strand. A 5-cM genetic map of the chromosome reveals recombinational ‘hot’ and ‘cold’ regions, the latter of which is predicted to include the putative centromere. One end of the chromosome consists of a 250-kb region almost exclusively composed of RHS (pseudo)genes that belong to a newly characterised multigene family containing a hot spot of insertion for retroelements. Interspersed with the RHS genes are a few copies of truncated RNA polymerase pseudogenes as well as expression site associated (pseudo)genes (ESAGs) 3 and 4, and 76 bp repeats. These features are reminiscent of a vestigial variant surface glycoprotein (VSG) gene expression site. The other end of the chromosome contains a 30-kb array of VSG genes, the majority of which are pseudogenes, suggesting that this region may be a site for modular de novo construction of VSG gene diversity during transposition/gene conversion events. PMID:12907728

  7. Characterization and sequence analysis of cysteine and glycine-rich ...

    African Journals Online (AJOL)

    Tarek

    2011-04-18

    Apr 18, 2011 ... 3056 Afr. J. Biotechnol. Table 1. DNA sequence of the primers tested. Name. Sequence. Accession no. ... The lack of contaminating genomic DNA was checked out by monitoring negative polymerase chain ..... variation between human and chimpanzee. Genome Res. 15: 1344-. 1356. Pomies P, Louis HA, ...

  8. Sequencing and analysis of an Irish human genome.

    LENUS (Irish Health Repository)

    Tong, Pin

    2010-01-01

    Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.

  9. Illumina-based de novotranscriptome sequencing and analysis of ...

    Indian Academy of Sciences (India)

    ZHONGXIAN XU

    2017-12-18

    Dec 18, 2017 ... Next-generation sequencing technique is an efficient method for generating an enormous amount of sequence data that can represent a large number of genes and their expression levels. In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland.

  10. Sequence stratigraphy and structural analysis of the Emi field ...

    African Journals Online (AJOL)

    Highstand system tract and transgressive system tract were identified within the depositional sequences. Marker shales, characterized by index fossils Haplophramoides-24 and Bolivina-46, were used to date the key bounding surfaces with the aid of the Niger Delta chronostratigraphic chart. Ages assigned to the sequence ...

  11. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    Science.gov (United States)

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  12. A genome-wide analysis of lentivector integration sites using targeted sequence capture and next generation sequencing technology.

    Science.gov (United States)

    Ustek, Duran; Sirma, Sema; Gumus, Ergun; Arikan, Muzaffer; Cakiris, Aris; Abaci, Neslihan; Mathew, Jaicy; Emrence, Zeliha; Azakli, Hulya; Cosan, Fulya; Cakar, Atilla; Parlak, Mahmut; Kursun, Olcay

    2012-10-01

    One application of next-generation sequencing (NGS) is the targeted resequencing of interested genes which has not been used in viral integration site analysis of gene therapy applications. Here, we combined targeted sequence capture array and next generation sequencing to address the whole genome profiling of viral integration sites. Human 293T and K562 cells were transduced with a HIV-1 derived vector. A custom made DNA probe sets targeted pLVTHM vector used to capture lentiviral vector/human genome junctions. The captured DNA was sequenced using GS FLX platform. Seven thousand four hundred and eighty four human genome sequences flanking the long terminal repeats (LTR) of pLVTHM fragment sequences matched with an identity of at least 98% and minimum 50 bp criteria in both cells. In total, 203 unique integration sites were identified. The integrations in both cell lines were totally distant from the CpG islands and from the transcription start sites and preferentially located in introns. A comparison between the two cell lines showed that the lentiviral-transduced DNA does not have the same preferred regions in the two different cell lines. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. Recursive organizer (ROR): an analytic framework for sequence-based association analysis.

    Science.gov (United States)

    Zhao, Lue Ping; Huang, Xin

    2013-07-01

    The advent of next-generation sequencing technologies affords the ability to sequence thousands of subjects cost-effectively, and is revolutionizing the landscape of genetic research. With the evolving genotyping/sequencing technologies, it is not unrealistic to expect that we will soon obtain a pair of diploidic fully phased genome sequences from each subject in the near future. Here, in light of this potential, we propose an analytic framework called, recursive organizer (ROR), which recursively groups sequence variants based upon sequence similarities and their empirical disease associations, into fewer and potentially more interpretable super sequence variants (SSV). As an illustration, we applied ROR to assess an association between HLA-DRB1 and type 1 diabetes (T1D), discovering SSVs of HLA-DRB1 with sequence data from the Wellcome Trust Case Control Consortium. Specifically, ROR reduces 36 observed unique HLA-DRB1 sequences into 8 SSVs that empirically associate with T1D, a fourfold reduction of sequence complexity. Using HLA-DRB1 data from Type 1 Diabetes Genetics Consortium as cases and data from Fred Hutchinson Cancer Research Center as controls, we are able to validate associations of these SSVs with T1D. Further, SSVs consist of nine nucleotides, and each associates with its corresponding amino acids. Detailed examination of these selected amino acids reveals their potential functional roles in protein structures and possible implication to the mechanism of T1D.

  14. Analysis of proteins responsive to acetic acid in Acetobacter: molecular mechanisms conferring acetic acid resistance in acetic acid bacteria.

    Science.gov (United States)

    Nakano, Shigeru; Fukaya, Masahiro

    2008-06-30

    Acetic acid bacteria are used for industrial vinegar production because of their remarkable ability to oxidize ethanol and high resistance to acetic acid. Although several molecular machineries responsible for acetic acid resistance in acetic acid bacteria have been reported, the entire mechanism that confers acetic acid resistance has not been completely understood. One of the promising methods to elucidate the entire mechanism is global analysis of proteins responsive to acetic acid by two-dimensional gel electrophoresis. Recently, two proteins whose production was greatly enhanced by acetic acid in Acetobacter aceti were identified to be aconitase and a putative ABC-transporter, respectively; furthermore, overexpression or disruption of the genes encoding these proteins affected acetic acid resistance in A. aceti, indicating that these proteins are involved in acetic acid resistance. Overexpression of each gene increased acetic acid resistance in Acetobacter, which resulted in an improvement in the productivity of acetic acid fermentation. Taken together, the results of the proteomic analysis and those of previous studies indicate that acetic acid resistance in acetic acid bacteria is conferred by several mechanisms. These findings also provide a clue to breed a strain having high resistance to acetic acid for vinegar fermentation.

  15. An auditory display tool for DNA sequence analysis.

    Science.gov (United States)

    Temple, Mark D

    2017-04-24

    DNA Sonification refers to the use of an auditory display to convey the information content of DNA sequence data. Six sonification algorithms are presented that each produce an auditory display. These algorithms are logically designed from the simple through to the more complex. Three of these parse individual nucleotides, nucleotide pairs or codons into musical notes to give rise to 4, 16 or 64 notes, respectively. Codons may also be parsed degenerately into 20 notes with respect to the genetic code. Lastly nucleotide pairs can be parsed as two separate frames or codons can be parsed as three reading frames giving rise to multiple streams of audio. The most informative sonification algorithm reads the DNA sequence as codons in three reading frames to produce three concurrent streams of audio in an auditory display. This approach is advantageous since start and stop codons in either frame have a direct affect to start or stop the audio in that frame, leaving the other frames unaffected. Using these methods, DNA sequences such as open reading frames or repetitive DNA sequences can be distinguished from one another. These sonification tools are available through a webpage interface in which an input DNA sequence can be processed in real time to produce an auditory display playable directly within the browser. The potential of this approach as an analytical tool is discussed with reference to auditory displays derived from test sequences including simple nucleotide sequences, repetitive DNA sequences and coding or non-coding genes. This study presents a proof-of-concept that some properties of a DNA sequence can be identified through sonification alone and argues for their inclusion within the toolkit of DNA sequence browsers as an adjunct to existing visual and analytical tools.

  16. Sequence analysis and overexpression of a pectin lyase gene (pel1) from Aspergillus oryzae KBN616.

    Science.gov (United States)

    Kitamoto, N; Yoshino-Yasuda, S; Ohmiya, K; Tsukagoshi, N

    2001-01-01

    A gene (pel1) encoding pectin lyase (Pel1) was isolated from a shoyu koji mold, Aspergillus oryzae KBN616, and characterized. The structural gene comprised 1,196 bp with a single intron. The ORF encoded 381 amino acids with a signal peptide of 20 amino acids. The deduced amino acid sequence showed high similarity to those of Aspergillus niger pectin lyases and Glomerella cingulata PnlA. The pel1 gene was successfully overexpressed under the promoter of the A. oryzae TEF1 gene. The molecular mass of the recombinant pectin lyase substantially coincided with that calculated based on nucleotide sequence.

  17. A Δ-9 Fatty Acid Desaturase Gene in the Microalga Myrmecia incisa Reisigl: Cloning and Functional Analysis

    Directory of Open Access Journals (Sweden)

    Wen-Bin Xue

    2016-07-01

    Full Text Available The green alga Myrmecia incisa is one of the richest natural sources of arachidonic acid (ArA. To better understand the regulation of ArA biosynthesis in M. incisa, a novel gene putatively encoding the Δ9 fatty acid desaturase (FAD was cloned and characterized for the first time. Rapid-amplification of cDNA ends (RACE was employed to yield a full length cDNA designated as MiΔ9FAD, which is 2442 bp long in sequence. Comparing cDNA open reading frame (ORF sequence to genomic sequence indicated that there are 8 introns interrupting the coding region. The deduced MiΔ9FAD protein is composed of 432 amino acids. It is soluble and localized in the chloroplast, as evidenced by the absence of transmembrane domains as well as the presence of a 61-amino acid chloroplast transit peptide. Multiple sequence alignment of amino acids revealed two conserved histidine-rich motifs, typical for Δ9 acyl-acyl carrier protein (ACP desaturases. To determine the function of MiΔ9FAD, the gene was heterologously expressed in a Saccharomyces cerevisiae mutant strain with impaired desaturase activity. Results of GC-MS analysis indicated that MiΔ9FAD was able to restore the synthesis of monounsaturated fatty acids, generating palmitoleic acid and oleic acid through the addition of a double bond in the Δ9 position of palmitic acid and stearic acid, respectively.

  18. Amino acids analysis during lactic acid fermentation by single strain ...

    African Journals Online (AJOL)

    L. salivarius alone showed relatively good assimilation of various amino acids that existed at only a little amounts in MRS media (Asn, Asp, Cit, Cys, Glu, His, Lys, Orn, Phe, Pro, Tyr, Arg, Ile, Leu, Met, Ser, Thr, Trp and Val), whereas Ala and Gly accumulated in L. salivarius cultures. P. acidilactici, in contrast, hydrolyzed the ...

  19. Identification and sequence analysis of Tapasin gene in guinea fowl

    Directory of Open Access Journals (Sweden)

    Varuna P. Panicker

    2014-12-01

    Full Text Available Aim: An attempt has been made to identify and study the nucleotide sequence variability in exon 5 - exon 6 regions of guinea fowl Tapasin gene. Materials and Methods: Blood samples were collected from randomly selected birds (12 guinea fowl birds and Tapasin gene amplified using chicken specific primers designed from GenBank submitted sequences. Polymerase chain reaction conditions were standardized so as get only single amplicons. Obtained products were then cloned and sequenced; sequences were then analyzed using suitable software. Results: Amplicon size of the Tapasin gene in guinea fowl was same as reported in chicken with areas of transitions and transversions. The sequence variations reported in these coding sequences might have influence in the protein structure, which may be correlated with the increased immune status of the bird when compared with chicken breeds. Conclusion: Since Tapasin gene is an immunologically important gene, which plays an important role in the immune status of the bird. Sequence variations in the gene can be correlated with the altered immune status of the bird.

  20. Cloning, sequencing, and sequence analysis of two novel plasmids from the thermophilic anaerobic bacterium Anaerocellum thermophilum

    DEFF Research Database (Denmark)

    Clausen, Anders; Mikkelsen, Marie Just; Schrøder, I.

    2004-01-01

    was found, but no single stranded intermediates, characteristic of rolling circle replication, were found on Southern blots. The larger plasmid, pBAL, was found to be a 8294 bp plasmid with a GC content of 39%. It revealed 17 ORFs, of which three showed similarity at the amino acid (aa) level to known...

  1. Event Sequence Analysis of the Air Intelligence Agency Information Operations Center Flight Operations

    National Research Council Canada - National Science Library

    Larsen, Glen

    1998-01-01

    This report applies Event Sequence Analysis, methodology adapted from aircraft mishap investigation, to an investigation of the performance of the Air Intelligence Agency's Information Operations Center (IOC...

  2. simple sequence repeat (SSR) markers in genetic analysis of

    African Journals Online (AJOL)

    Yomi

    2012-08-28

    1998). Cross- species amplification of soybean (Glycine max) simple sequence repeats (SSRs) within the genus and other legume genera: implications for the transferability of SSRs in plants. Mol. Biol. Evol. 15:1275-1287.

  3. USE OF NEXT-GENERATION SEQUENCING FOR GENOMIC ANALYSIS IN COMPLEX DISEASES

    OpenAIRE

    Sana, Maria Elena

    2013-01-01

    Since the early 1990s, Sanger method has been the gold standard methodology for sequencing analysis of DNA. Next-generation sequencing (NGS) approaches revolutionized the field of genomics over the last 5 years. These new sequencing technologies make feasible the direct and cost-effective sequencing of genomes at unprecedented scale and speed. Furthermore, the applications of these technologies are wide-spread and have been developed to explore the complex biological systems, among which RNA ...

  4. Alternative splicing of human elastin mRNA indicated by sequence analysis of cloned genomic and complementary DNA

    International Nuclear Information System (INIS)

    Indik, Z.; Yeh, H.; Ornstein-goldstein, N.; Sheppard, P.; Anderson, N.; Rosenbloom, J.C.; Peltonen, L.; Rosenbloom, J.

    1987-01-01

    Poly(A) + RNA, isolated from a single 7-mo fetal human aorta, was used to synthesize cDNA by the RNase H method, and the cDNA was inserted into λgt10. Recombinant phage containing elastin sequences were identified by hybridization with cloned, exon-containing fragments of the human elastin gene. Three clones containing inserts of 3.3, 2.7, and 2.3 kilobases were selected for further analysis. Three overlapping clones containing 17.8 kilobases of the human elastin gene were also isolated from genomic libraries. Complete sequence analysis of the six clones demonstrated that: (i) the cDNA encompassed the entire translated portion of the mRNA encoding 786 amino acids, including several unusual hydrophilic amino acid sequences not previously identified in porcine tropoelastin, (ii) exons encoding either hydrophobic or crosslinking domains in the protein alternated in the gene, and (iii) a great abundance of Alu repetitive sequences occurred throughout the introns. The data also indicated substantial alternative splicing of the mRNA. These results suggest the potential for significant variation in the precise molecular structure of the elastic fiber in the human population

  5. Genome sequence and analysis of the tuber crop potato

    DEFF Research Database (Denmark)

    Xu, X.; Pan, S.; Cheng, S.

    2011-01-01

    and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade...... contributed to the evolution of tuber development. The potato genome sequence provides a platform for genetic improvement of this vital crop....

  6. Comparison of complete genome sequences of dog rabies viruses isolated from China and Mexico reveals key amino acid changes that may be associated with virus replication and virulence.

    Science.gov (United States)

    Yu, Fulai; Zhang, Guoqing; Zhong, Xiangfu; Han, Na; Song, Yunfeng; Zhao, Ling; Cui, Min; Rayner, Simon; Fu, Zhen F

    2014-07-01

    Rabies is a global problem, but its impact and prevalence vary across different regions. In some areas, such as parts of Africa and Asia, the virus is prevalent in the domestic dog population, leading to epidemic waves and large numbers of human fatalities. In other regions, such as the Americas, the virus predominates in wildlife and bat populations, with sporadic spillover into domestic animals. In this work, we attempted to investigate whether these distinct environments led to selective pressures that result in measurable changes within the genome at the amino acid level. To this end, we collected and sequenced the full genome of two isolates from divergent environments. The first isolate (DRV-AH08) was from China, where the virus is present in the dog population and the country is experiencing a serious epidemic. The second isolate (DRV-Mexico) was taken from Mexico, where the virus is present in both wildlife and domestic dog populations, but at low levels as a consequence of an effective vaccination program. We then combined and compared these with other full genome sequences to identify distinct amino acid changes that might be associated with environment. Phylogenetic analysis identified strain DRV-AH08 as belonging to the China-I lineage, which has emerged to become the dominant lineage in the current epidemic. The Mexico strain was placed in the D11 Mexico lineage, associated with the West USA-Mexico border clade. Amino acid sequence analysis identified only 17 amino acid differences in the N, G and L proteins. These differences may be associated with virus replication and virulence-for example, the short incubation period observed in the current epidemic in China.

  7. Analysis of 16S rRNA amplicon sequencing options on the Roche/454 next-generation titanium sequencing platform.

    Directory of Open Access Journals (Sweden)

    Hideyuki Tamaki

    Full Text Available BACKGROUND: 16S rRNA gene pyrosequencing approach has revolutionized studies in microbial ecology. While primer selection and short read length can affect the resulting microbial community profile, little is known about the influence of pyrosequencing methods on the sequencing throughput and the outcome of microbial community analyses. The aim of this study is to compare differences in output, ease, and cost among three different amplicon pyrosequencing methods for the Roche/454 Titanium platform METHODOLOGY/PRINCIPAL FINDINGS: The following three pyrosequencing methods for 16S rRNA genes were selected in this study: Method-1 (standard method is the recommended method for bi-directional sequencing using the LIB-A kit; Method-2 is a new option designed in this study for unidirectional sequencing with the LIB-A kit; and Method-3 uses the LIB-L kit for unidirectional sequencing. In our comparison among these three methods using 10 different environmental samples, Method-2 and Method-3 produced 1.5-1.6 times more useable reads than the standard method (Method-1, after quality-based trimming, and did not compromise the outcome of microbial community analyses. Specifically, Method-3 is the most cost-effective unidirectional amplicon sequencing method as it provided the most reads and required the least effort in consumables management. CONCLUSIONS: Our findings clearly demonstrated that alternative pyrosequencing methods for 16S rRNA genes could drastically affect sequencing output (e.g. number of reads before and after trimming but have little effect on the outcomes of microbial community analysis. This finding is important for both researchers and sequencing facilities utilizing 16S rRNA gene pyrosequencing for microbial ecological studies.

  8. Detection and sequence analysis of accessory gene regulator genes of Staphylococcus pseudintermedius isolates

    Directory of Open Access Journals (Sweden)

    M. Ananda Chitra

    2015-07-01

    Full Text Available Background: Staphylococcus pseudintermedius (SP is the major pathogenic species of dogs involved in a wide variety of skin and soft tissue infections. The accessory gene regulator (agr locus of Staphylococcus aureus has been extensively studied, and it influences the expression of many virulence genes. It encodes a two-component signal transduction system that leads to down-regulation of surface proteins and up-regulation of secreted proteins during in vitro growth of S. aureus. The objective of this study was to detect and sequence analyzing the AgrA, B, and D of SP isolated from canine skin infections. Materials and Methods: In this study, we have isolated and identified SP from canine pyoderma and otitis cases by polymerase chain reaction (PCR and confirmed by PCR-restriction fragment length polymorphism. Primers for SP agrA and agrBD genes were designed using online primer designing software and BLAST searched for its specificity. Amplification of the agr genes was carried out for 53 isolates of SP by PCR and sequencing of agrA, B, and D were carried out for five isolates and analyzed using DNAstar and Mega5.2 software. Results: A total of 53 (59% SP isolates were obtained from 90 samples. 15 isolates (28% were confirmed to be methicillinresistant SP (MRSP with the detection of the mecA gene. Accessory gene regulator A, B, and D genes were detected in all the SP isolates. Complete nucleotide sequences of the above three genes for five isolates were submitted to GenBank, and their accession numbers are from KJ133557 to KJ133571. AgrA amino acid sequence analysis showed that it is mainly made of alpha-helices and is hydrophilic in nature. AgrB is a transmembrane protein, and AgrD encodes the precursor of the autoinducing peptide (AIP. Sequencing of the agrD gene revealed that the 5 canine SP strains tested could be divided into three Agr specificity groups (RIPTSTGFF, KIPTSTGFF, and RIPISTGFF based on the putative AIP produced by each strain

  9. GENETIC ANALYSIS OF ABSCISIC ACID BIOSYNTHESIS

    Energy Technology Data Exchange (ETDEWEB)

    MCCARTY D R

    2012-01-10

    The carotenoid cleavage dioxygenases (CCD) catalyze synthesis of a variety of apo-carotenoid secondary metabolites in plants, animals and bacteria. In plants, the reaction catalyzed by the 11, 12, 9-cis-epoxy carotenoid dioxygenase (NCED) is the first committed and key regulated step in synthesis of the plant hormone, abscisic acid (ABA). ABA is a key regulator of plant stress responses and has critical functions in normal root and seed development. The molecular mechanisms responsible for developmental control of ABA synthesis in plant tissues are poorly understood. Five of the nine CCD genes present in the Arabidopsis genome encode NCED's involved in control of ABA synthesis in the plant. This project is focused on functional analysis of these five AtNCED genes as a key to understanding developmental regulation of ABA synthesis and dissecting the role of ABA in plant development. For this purpose, the project developed a comprehensive set of gene knockouts in the AtNCED genes that facilitate genetic dissection of ABA synthesis. These mutants were used in combination with key molecular tools to address the following specific objectives: (1) the role of ABA synthesis in root development; (2) developmental control of ABA synthesis in seeds; (3) analysis of ATNCED over-expressers; (4) preliminary crystallography of the maize VP14 protein.

  10. First fungal genome sequence from Africa: A preliminary analysis

    Directory of Open Access Journals (Sweden)

    Rene Sutherland

    2012-01-01

    Full Text Available Some of the most significant breakthroughs in the biological sciences this century will emerge from the development of next generation sequencing technologies. The ease of availability of DNA sequence made possible through these new technologies has given researchers opportunities to study organisms in a manner that was not possible with Sanger sequencing. Scientists will, therefore, need to embrace genomics, as well as develop and nurture the human capacity to sequence genomes and utilise the ’tsunami‘ of data that emerge from genome sequencing. In response to these challenges, we sequenced the genome of Fusarium circinatum, a fungal pathogen of pine that causes pitch canker, a disease of great concern to the South African forestry industry. The sequencing work was conducted in South Africa, making F. circinatum the first eukaryotic organism for which the complete genome has been sequenced locally. Here we report on the process that was followed to sequence, assemble and perform a preliminary characterisation of the genome. Furthermore, details of the computer annotation and manual curation of this genome are presented. The F. circinatum genome was found to be nearly 44 million bases in size, which is similar to that of four other Fusarium genomes that have been sequenced elsewhere. The genome contains just over 15 000 open reading frames, which is less than that of the related species, Fusarium oxysporum, but more than that for Fusarium verticillioides. Amongst the various putative gene clusters identified in F. circinatum, those encoding the secondary metabolites fumosin and fusarin appeared to harbour evidence of gene translocation. It is anticipated that similar comparisons of other loci will provide insights into the genetic basis for pathogenicity of the pitch canker pathogen. Perhaps more importantly, this project has engaged a relatively large group of scientists

  11. Complete Genome Sequence of Enterococcus mundtii QU 25, an Efficient l-(+)-Lactic Acid-Producing Bacterium

    Science.gov (United States)

    Shiwa, Yuh; Yanase, Hiroaki; Hirose, Yuu; Satomi, Shohei; Araya-Kojima, Tomoko; Watanabe, Satoru; Zendo, Takeshi; Chibazakura, Taku; Shimizu-Kadota, Mariko; Yoshikawa, Hirofumi; Sonomoto, Kenji

    2014-01-01

    Enterococcus mundtii QU 25, a non-dairy bacterial strain of ovine faecal origin, can ferment both cellobiose and xylose to produce l-lactic acid. The use of this strain is highly desirable for economical l-lactate production from renewable biomass substrates. Genome sequence determination is necessary for the genetic improvement of this strain. We report the complete genome sequence of strain QU 25, primarily determined using Pacific Biosciences sequencing technology. The E. mundtii QU 25 genome comprises a 3 022 186-bp single circular chromosome (GC content, 38.6%) and five circular plasmids: pQY182, pQY082, pQY039, pQY024, and pQY003. In all, 2900 protein-coding sequences, 63 tRNA genes, and 6 rRNA operons were predicted in the QU 25 chromosome. Plasmid pQY024 harbours genes for mundticin production. We found that strain QU 25 produces a bacteriocin, suggesting that mundticin-encoded genes on plasmid pQY024 were functional. For lactic acid fermentation, two gene clusters were identified—one involved in the initial metabolism of xylose and uptake of pentose and the second containing genes for the pentose phosphate pathway and uptake of related sugars. This is the first complete genome sequence of an E. mundtii strain. The data provide insights into lactate production in this bacterium and its evolution among enterococci. PMID:24568933

  12. Prevalence of Plasmodium spp. in malaria asymptomatic African migrants assessed by nucleic acid sequence based amplification

    Directory of Open Access Journals (Sweden)

    Schallig Henk DFH

    2009-01-01

    Full Text Available Abstract Background Malaria is one of the most important infectious diseases in the world. Although most cases are found distributed in the tropical regions of Africa, Asia, Central and South Americas, there is in Europe a significant increase in the number of imported cases in non-endemic countries, in particular due to the higher mobility in today's society. Methods The prevalence of a possible asymptomatic infection with Plasmodium species was assessed using Nucleic Acid Sequence Based Amplification (NASBA assays on clinical samples collected from 195 study cases with no clinical signs related to malaria and coming from sub-Saharan African regions to Southern Italy. In addition, base-line demographic, clinical and socio-economic information was collected from study participants who also underwent a full clinical examination. Results Sixty-two study subjects (31.8% were found positive for Plasmodium using a pan Plasmodium specific NASBA which can detect all four Plasmodium species causing human disease, based on the small subunit 18S rRNA gene (18S NASBA. Twenty-four samples (38% of the 62 18S NASBA positive study cases were found positive with a Pfs25 mRNA NASBA, which is specific for the detection of gametocytes of Plasmodium falciparum. A statistically significant association was observed between 18S NASBA positivity and splenomegaly, hepatomegaly and leukopaenia and country of origin. Conclusion This study showed that a substantial proportion of people originating from malaria endemic countries harbor malaria parasites in their blood. If transmission conditions are available, they could potentially be a reservoir. Thefore, health authorities should pay special attention to the health of this potential risk group and aim to improve their health conditions.

  13. Genomic sequencing of uric acid metabolizing and clearing genes in relationship to xanthine oxidase inhibitor dose.

    Science.gov (United States)

    Carroll, Matthew B; Smith, Derek M; Shaak, Thomas L

    2017-03-01

    It remains unclear why the dose of xanthine oxidase inhibitors (XOI) allopurinol or febuxostat varies among patients though they reach similar serum uric acid (SUA) goal. We pursued genomic sequencing of XOI metabolism and clearance genes to identify single-nucleotide polymorphisms (SNPs) relate to differences in XOI dose. Subjects with a diagnosis of Gout based on the 1977 American College of Rheumatology Classification Criteria for the disorder, who were on stable doses of a XOI, and who were at their goal SUA level, were enrolled. The primary outcome was relationship between SNPs in any of these genes to XOI dose. The secondary outcome was relationship between SNPs and change in pre- and post-treatment SUA. We enrolled 100 subjects. The average patient age was 68.6 ± 10.6 years old. Over 80% were men and 77% were Caucasian. One SNP was associated with a higher XOI dose: rs75995567 (p = 0.031). Two SNPs were associated with 300 mg daily of allopurinol: rs11678615 (p = 0.022) and rs3731722 on Aldehyde Oxidase (AO) (His1297Arg) (p = 0.001). Two SNPs were associated with a lower dose of allopurinol: rs1884725 (p = 0.033) and rs34650714 (p = 0.006). For the secondary outcome, rs13415401 was the only SNP related to a smaller mean SUA change. Ten SNPs were identified with a larger change in SUA. Though multiple SNPs were identified in the primary and secondary outcomes of this study, rs3731722 is known to alter catalytic function for some aldehyde oxidase substrates.

  14. Sequencing and analysis of the Mediterranean amphioxus (Branchiostoma lanceolatum transcriptome.

    Directory of Open Access Journals (Sweden)

    Silvan Oulion

    Full Text Available BACKGROUND: The basally divergent phylogenetic position of amphioxus (Cephalochordata, as well as its conserved morphology, development and genetics, make it the best proxy for the chordate ancestor. Particularly, studies using the amphioxus model help our understanding of vertebrate evolution and development. Thus, interest for the amphioxus model led to the characterization of both the transcriptome and complete genome sequence of the American species, Branchiostoma floridae. However, recent technical improvements allowing induction of spawning in the laboratory during the breeding season on a daily basis with the Mediterranean species Branchiostoma lanceolatum have encouraged European Evo-Devo researchers to adopt this species as a model even though no genomic or transcriptomic data have been available. To fill this need we used the pyrosequencing method to characterize the B. lanceolatum transcriptome and then compared our results with the published transcriptome of B. floridae. RESULTS: Starting with total RNA from nine different developmental stages of B. lanceolatum, a normalized cDNA library was constructed and sequenced on Roche GS FLX (Titanium mode. Around 1.4 million of reads were produced and assembled into 70,530 contigs (average length of 490 bp. Overall 37% of the assembled sequences were annotated by BlastX and their Gene Ontology terms were determined. These results were then compared to genomic and transcriptomic data of B. floridae to assess similarities and specificities of each species. CONCLUSION: We obtained a high-quality amphioxus (B. lanceolatum reference transcriptome using a high throughput sequencing approach. We found that 83% of the predicted genes in the B. floridae complete genome sequence are also found in the B. lanceolatum transcriptome, while only 41% were found in the B. floridae transcriptome obtained with traditional Sanger based sequencing. Therefore, given the high degree of sequence conservation

  15. Amino-acid sequences of trypsin inhibitors from watermelon (Citrullus vulgaris) and red bryony (Bryonia dioica) seeds.

    Science.gov (United States)

    Otlewski, J; Whatley, H; Polanowski, A; Wilusz, T

    1987-11-01

    The amino-acid sequences of two trypsin inhibitors isolated from red bryony (Bryonia dioica) and watermelon (Citrullus vulgaris) seeds are reported. Both species represent different genera of the Cucurbitaceae family, which have not been previously investigated as a source of proteinase inhibitors. The sequences are unique but are very similar to those of other proteinase inhibitors which have been isolated from squash seeds. Based on structural homology we assume that the Arg5-Ile6 peptide bond represents the reactive site bond of both inhibitors.

  16. ROC analysis of acid demineralized artificial caries

    Energy Technology Data Exchange (ETDEWEB)

    Kang, Byung Cheol [Dept. of Oral and Maxillofacial Radiology, College of Dentistry, Chonnam National University, Kwangju (Korea, Republic of)

    1997-08-15

    This study is designed to determine the artificial incipient proximal caries lesion detectability by dentists on Ektaspeed Plus film using ROC analysis. Sixteen premolars and 30 molars, which have 52 proximal caries-like demineralized lesions using acid-gel technique were added to 20 sound premolars and 30 sound molars to make 24 plaster blocks. Each block with 4 teeth and 6 contacting proximal surfaces was placed in an optical bench to take 12 bitewing radiographs with Ektaspeed Plus film. Thirty-six dentists acted as observers to evaluated the proximal lesions using five rating scales for ROC analysis. They were also asked to determine the presence or absence of the proximal caries. The true status of the proximal caries was established by the consensus of three oral and maxillofacila radiologists. For evaluation of intra-observer agreement, 9 dentist reread the radiographs at an interval of 1 month. The Pearson correlation coefficient for the intra-observer agreement was 0.746 (good agreement). Ten observer's data set were degenerated. The mean area under ROC curve from 26 observers was 0.806 and standard deviation was 0.061. The sensitivity and the specificity of the binary response were 0.17 (SD=0.11) and 0.78 (SD=0.17) respectively. The binary response only reveal a single values of sensitivity and the specificity. The ROC analysis to assess the diagnostic accuracy in caries detection, which producing estimates of sensitivities for all specifities, yield more comprehensive measures of diagnostic performance than single values for sensitivity and specificity.

  17. [Cloning and sequence analysis of Eg95 cDNA from different stages of Echinococcus granulosus in Xinjiang].

    Science.gov (United States)

    Lin, Ren-yong; Ding, Jian-bing; Wen, Hao; Zhang, Wen-bao; Li, Jun; Lu, Xiao-mei

    2003-01-01

    To study expression and sequence differences of Echinococcus granulosus 95(Eg95) antigen cDNA from different stages of protoscolex, oncosphere and adult worm of E. granulosus from Xinjiang Uighur Aut. Reg. In accordance with the sequence of Eg95 antigen cDNA, the primers of Eg95 were designed. Eg95 antigen cDNAs were amplified by PCR from protoscolex, oncosphere and adult worm cDNA libraries of E. granulosus, respectively and were cloned into pUCm-T plasmid, and sequenced. The sequences were analyzed by DNAman and GenBank/BLAST biosoftware. PCR results showed that Eg95 antigen cDNA was amplified from three stages of E. granulosus cDNA libraries. Sequencing analysis indicated that the Eg95 cDNA length was 402 bp, same as the reported data in GenBank. The Eg95 antigen cDNA was expressed in the different life-cycle stages of E. granulosus in Xinjiang and there was no nucleic acid sequence difference of Eg95 antigen among the protoscolex, oncosphere and adult worm of E. granulosus.

  18. Exome Sequence Analysis of 14 Families With High Myopia

    DEFF Research Database (Denmark)

    Kloss, Bethany A.; Tompson, Stuart W.; Whisenhunt, Kristina N.

    2017-01-01

    Purpose: To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Methods: Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sang...... implicated in the pathogenesis of AD high myopia. This study provides new genes for consideration in the pathogenesis of high myopia, and may aid in the development of genetic profiling of those at greatest risk for attendant ocular morbidities of this disorder.......Purpose: To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Methods: Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sanger...... sequencing was used to confirm variants in original DNA, and to test for disease cosegregation in additional family members. Candidate genes and chromosomal loci previously associated with myopic refractive error and its endophenotypes were comprehensively screened. Results: In 14 high myopia families, we...

  19. Comparative analysis of complete mitochondrial genome sequences confirms independent origins of plant-parasitic nematodes

    Directory of Open Access Journals (Sweden)

    Sultana Tahera

    2013-01-01

    Full Text Available Abstract Background The nematode infraorder Tylenchomorpha (Class Chromadorea includes plant parasites that are of agricultural and economic importance, as well as insect-associates and fungal feeding species. Among tylenchomorph plant parasites, members of the superfamily Tylenchoidea, such as root-knot nematodes, have great impact on agriculture. Of the five superfamilies within Tylenchomorpha, one (Aphelenchoidea includes mainly fungal-feeding species, but also some damaging plant pathogens, including certain Bursaphelenchus spp. The evolutionary relationships of tylenchoid and aphelenchoid nematodes have been disputed based on classical morphological features and molecular data. For example, similarities in the structure of the stomatostylet suggested a common evolutionary origin. In contrast, phylogenetic hypotheses based on nuclear SSU ribosomal DNA sequences have revealed paraphyly of Aphelenchoidea, with, for example, fungal-feeding Aphelenchus spp. within Tylenchomorpha, but Bursaphelenchus and Aphelenchoides spp. more closely related to infraorder Panagrolaimomorpha. We investigated phylogenetic relationships of plant-parasitic tylenchoid and aphelenchoid species in the context of other chromadorean nematodes based on comparative analysis of complete mitochondrial genome data, including two newly sequenced genomes from Bursaphelenchus xylophilus (Aphelenchoidea and Pratylenchus vulnus (Tylenchoidea. Results The complete mitochondrial genomes of B. xylophilus and P. vulnus are 14,778 bp and 21,656 bp, respectively, and identical to all other chromadorean nematode mtDNAs in that they contain 36 genes (lacking atp8 encoded in the same direction. Their mitochondrial protein-coding genes are biased toward use of amino acids encoded by T-rich codons, resulting in high A+T richness. Phylogenetic analyses of both nucleotide and amino acid sequence datasets using maximum likelihood and Bayesian methods did not support B. xylophilus as most

  20. Combining text mining and sequence analysis to discover protein functional regions.

    Science.gov (United States)

    Eskin, E; Agichtein, E

    2004-01-01

    Recently presented protein sequence classification models can identify relevant regions of the sequence. This observation has many potential applications to detecting functional regions of proteins. However, identifying such sequence regions automatically is difficult in practice, as relatively few types of information have enough annotated sequences to perform this analysis. Our approach addresses this data scarcity problem by combining text and sequence analysis. First, we train a text classifier over the explicit textual annotations available for some of the sequences in the dataset, and use the trained classifier to predict the class for the rest of the unlabeled sequences. We then train a joint sequence text classifier over the text contained in the functional annotations of the sequences, and the actual sequences in this larger, automatically extended dataset. Finally, we project the classifier onto the original sequences to determine the relevant regions of the sequences. We demonstrate the effectiveness of our approach by predicting protein sub-cellular localization and determining localization specific functional regions of these proteins.

  1. Laser desorption mass spectrometry for DNA analysis and sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Chen, C.H.; Taranenko, N.I.; Tang, K.; Allman, S.L.

    1995-03-01

    Laser desorption mass spectrometry has been considered as a potential new method for fast DNA sequencing. Our approach is to use matrix-assisted laser desorption to produce parent ions of DNA segments and a time-of-flight mass spectrometer to identify the sizes of DNA segments. Thus, the approach is similar to gel electrophoresis sequencing using Sanger`s enzymatic method. However, gel, radioactive tagging, and dye labeling are not required. In addition, the sequencing process can possibly be finished within a few hundred microseconds instead of hours and days. In order to use mass spectrometry for fast DNA sequencing, the following three criteria need to be satisfied. They are (1) detection of large DNA segments, (2) sensitivity reaching the femtomole region, and (3) mass resolution good enough to separate DNA segments of a single nucleotide difference. It has been very difficult to detect large DNA segments by mass spectrometry before due to the fragile chemical properties of DNA and low detection sensitivity of DNA ions. We discovered several new matrices to increase the production of DNA ions. By innovative design of a mass spectrometer, we can increase the ion energy up to 45 KeV to enhance the detection sensitivity. Recently, we succeeded in detecting a DNA segment with 500 nucleotides. The sensitivity was 100 femtomole. Thus, we have fulfilled two key criteria for using mass spectrometry for fast DNA sequencing. The major effort in the near future is to improve the resolution. Different approaches are being pursued. When high resolution of mass spectrometry can be achieved and automation of sample preparation is developed, the sequencing speed to reach 500 megabases per year can be feasible.

  2. The DNA sequence, annotation and analysis of human chromosome 3

    DEFF Research Database (Denmark)

    Muzny, D.M.; Bolund, Lars; As part of the Chinese Human Genome Sequencing Consortium, E.T.A.L.

    2006-01-01

    chromosomes. Chromosome 3 comprises just four contigs, one of which currently represents the longest unbroken stretch of finished DNA sequence known so far. The chromosome is remarkable in having the lowest rate of segmental duplication in the genome. It also includes a chemokine receptor gene cluster as well...... as numerous loci involved in multiple human cancers such as the gene encoding FHIT, which contains the most common constitutive fragile site in the genome, FRA3B. Using genomic sequence from chimpanzee and rhesus macaque, we were able to characterize the breakpoints defining a large pericentric inversion...

  3. Whole-Genome Sequencing and Comparative Genome Analysis of Bacillus subtilis Strains Isolated from Non-Salted Fermented Soybean Foods.

    Directory of Open Access Journals (Sweden)

    Mayumi Kamada

    Full Text Available Bacillus subtilis is the main component in the fermentation of soybeans. To investigate the genetics of the soybean-fermenting B. subtilis strains and its relationship with the productivity of extracellular poly-γ-glutamic acid (γPGA, we sequenced the whole genome of eight B. subtilis stains isolated from non-salted fermented soybean foods in Southeast Asia. Assembled nucleotide sequences were compared with those of a natto (fermented soybean food starter strain B. subtilis BEST195 and the laboratory standard strain B. subtilis 168 that is incapable of γPGA production. Detected variants were investigated in terms of insertion sequences, biotin synthesis, production of subtilisin NAT, and regulatory genes for γPGA synthesis, which were related to fermentation process. Comparing genome sequences, we found that the strains that produce γPGA have a deletion in a protein that constitutes the flagellar basal body, and this deletion was not found in the non-producing strains. We further identified diversity in variants of the bio operon, which is responsible for the biotin auxotrophism of the natto starter strains. Phylogenetic analysis using multilocus sequencing typing revealed that the B. subtilis strains isolated from the non-salted fermented soybeans were not clustered together, while the natto-fermenting strains were tightly clustered; this analysis also suggested that the strain isolated from "Tua Nao" of Thailand traces a different evolutionary process from other strains.

  4. Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information.

    Science.gov (United States)

    Upadhyay, Atul Kumar; Sowdhamini, Ramanathan

    2016-01-01

    3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.

  5. Confirmation and Sequence analysis of N gene of PPRV in South Xinjiang, China

    Directory of Open Access Journals (Sweden)

    YongHong Liu

    Full Text Available ABSTRACT In China, Peste des petits ruminants (PPR was officially first reported in 2007. From 2010 until the outbreak of 2013, PPRV infection was not reported. In November 2013, PPRV re-emerged in Xinjiang and rapidly spread to 22 P/A/M (provinces, autonomous regions and municipalities of China. In the study, suspected PPRV-infected sheep in a breeding farm of South Xinjiang in 2014 were diagnosed and the characteristics of complete sequence of N protein gene of PPRV was analyzed. The sheep showed PPRV-infected signs, such as fever, orinasal secretions increase, dyspnea and diarrhea, with 60% of morbidity and 21.1% of fatality rate. The macroscopic lesions after autopsy and histopathological changes were observed under light microscope including stomatitis, broncho-interstitial pneumonia, catarrhal hemorrhagic enteritis and intracytoplasmic eosinophilic inclusions in multinucleated giantcell in lung. The formalin-fixed mixed tissues samples were positive by nucleic acid extraction and RT-PCR detection. The nucleotide of N protein gene of China/XJNJ/2014 strain was extremely high homology with the China/XJYL/2013 strain, and the highest with PRADESH_95 strain from India in exotic strains. Phylogenetic analysis based on complete sequence of N protein gene of PPRV showed that the China/XJNJ/2014 strain, other strain of 2013-2014 in this study and Tibetan strains all belonged to lineage Ⅳ, but the PPRV strains of 2013-2014 in this study and Tibetan strains were in different sub-branches.

  6. [Primary structure of the elongation factor G from Escherichia coli. V. Amino acid sequence of the C-terminal domain].

    Science.gov (United States)

    Alakhov, Iu B; Vinokurov, L M; Dovgas, N V; Motuz, L P

    1983-03-01

    The amino acid sequence of the C-terminal domain of the elongation factor G (EF-G) has been studied. The polypeptide chain of the domain consists of 228 amino acid residues, and contains no tryptophan or cysteine residues. To determine its structure, the peptides obtained as a result of the fragment digestion by staphylococcal glutamic protease, cyanogen bromide cleavage, and tryptic hydrolysis of the fragment modified by maleic anhydride have been analyzed, as well as peptides obtained after hydrolyses of cyanogen bromide fragments with chymotrypsin, thermolysin and trypsin.

  7. Characterization of the HLA-DRβ1 third hypervariable region amino acid sequence according to charge and parental inheritance in systemic sclerosis.

    Science.gov (United States)

    Gentil, Coline A; Gammill, Hilary S; Luu, Christine T; Mayes, Maureen D; Furst, Dan E; Nelson, J Lee

    2017-03-07

    Specific HLA class II alleles are associated with systemic sclerosis (SSc) risk, clinical characteristics, and autoantibodies. HLA nomenclature initially developed with antibodies as typing reagents defining DRB1 allele groups. However, alleles from different DRB1 allele groups encode the same third hypervariable region (3rd HVR) sequence, the primary T-cell recognition site, and 3rd HVR charge differences can affect interactions with T cells. We considered 3rd HVR sequences (amino acids 67-74) irrespective of the allele group and analyzed parental inheritance considered according to the 3rd HVR charge, comparing SSc patients with controls. In total, 306 families (121 SSc and 185 controls) were HLA genotyped and parental HLA-haplotype origin was determined. Analysis was conducted according to DRβ1 3rd HVR sequence, charge, and parental inheritance. The distribution of 3rd HVR sequences differed in SSc patients versus controls (p = 0.007), primarily due to an increase of specific DRB1*11 alleles, in accord with previous observations. The 3rd HVR sequences were next analyzed according to charge and parental inheritance. Paternal transmission of DRB1 alleles encoding a +2 charge 3rd HVR was significantly reduced in SSc patients compared with maternal transmission (p = 0.0003, corrected for analysis of four charge categories p = 0.001). To a lesser extent, paternal transmission was increased when charge was 0 (p = 0.021, corrected for multiple comparisons p = 0.084). In contrast, paternal versus maternal inheritance was similar in controls. SSc patients differed from controls when DRB1 alleles were categorized according to 3rd HVR sequences. Skewed parental inheritance was observed in SSc patients but not in controls when the DRβ1 3rd HVR was considered according to charge. These observations suggest that epigenetic modulation of HLA merits investigation in SSc.

  8. Sequencing and analysis of the gene-rich space of cowpea

    Directory of Open Access Journals (Sweden)

    Cheung Foo

    2008-02-01

    Full Text Available Abstract Background Cowpea, Vigna unguiculata (L. Walp., is one of the most important food and forage legumes in the semi-arid tropics because of its drought tolerance and ability to grow on poor quality soils. Approximately 80% of cowpea production takes place in the dry savannahs of tropical West and Central Africa, mostly by poor subsistence farmers. Despite its economic and social importance in the developing world, cowpea remains to a large extent an underexploited crop. Among the major goals of cowpea breeding and improvement programs is the stacking of desirable agronomic traits, such as disease and pest resistance and response to abiotic stresses. Implementation of marker-assisted selection and breeding programs is severely limited by a paucity of trait-linked markers and a general lack of information on gene structure and organization. With a nuclear genome size estimated at ~620 Mb, the cowpea genome is an ideal target for reduced representation sequencing. Results We report here the sequencing and analysis of the gene-rich, hypomethylated portion of the cowpea genome selectively cloned by methylation filtration (MF technology. Over 250,000 gene-space sequence reads (GSRs with an average length of 610 bp were generated, yielding ~160 Mb of sequence information. The GSRs were assembled, annotated by BLAST homology searches of four public protein annotation databases and four plant proteomes (A. thaliana, M. truncatula, O. sativa, and P. trichocarpa, and analyzed using various domain and gene modeling tools. A total of 41,260 GSR assemblies and singletons were annotated, of which 19,786 have unique GenBank accession numbers. Within the GSR dataset, 29% of the sequences were annotated using the Arabidopsis Gene Ontology (GO with the largest categories of assigned function being catalytic activity and metabolic processes, groups that include the majority of cellular enzymes and components of amino acid, carbohydrate and lipid metabolism. A

  9. SmashCell: A software framework for the analysis of single-cell amplified genome sequences

    DEFF Research Database (Denmark)

    Harrington, Eoghan D; Arumugam, Manimozhiyan; Raes, Jeroen

    2010-01-01

    SUMMARY: Recent advances in single-cell manipulation technology, whole genome amplification and high-throughput sequencing have now made it possible to sequence the genome of an individual cell. The bioinformatic analysis of these genomes however is far more complicated than the analysis of those...

  10. Characterization and functional analysis of fatty acid binding protein from the carcinogenic liver fluke, Opisthorchis viverrini.

    Science.gov (United States)

    Sripa, Jittiyawadee; Laha, Thewarach; Sripa, Banchob

    2017-08-01

    In the present study, the cDNA encoding FABP (Ov-FABP) was isolated from the adult stage of Opisthorchis viverrini and characterized. The Ov-FABP protein sequence (107 amino acids) was predicted to have a molecular mass of 12.26kDa and an isoelectric point (PI) of 6.82. This sequence had a high identity and similarity to Cs-FABP of the related opisthorchid Clonorchis sinensis. Multiple sequence alignment with FABPs from other parasitic flatworms and mammals showed a number of conserved amino acids including Phe 34 , Gly 37 , Glu 38 , Glu 39 ,Val 50 , Iso 62 , Gly 81 , Ile 84 , Ser 87 and Arg 101 . In addition, the structure of Ov-FABP was predicted to have eleven β-sheets and one α-helix based on the known structures for FABPs from human (hL-FABP), rat and a schistosome. Phylogenetic analysis of amino acid sequence data revealed a close relationship of Ov-FABP with Cs-FABP and hL-FABP. Reverse transcription-PCR revealed that Ov-FABP was transcribed in the egg, metacercaria, juvenile and adult stages. The soluble form of recombinant Ov-FABP (rOv-FABP) was shown to specifically bind fatty acids, including oleic acid, palmitic acid and linoleic acid, as shown for other animals. Anti-serum against rOv-FABP (produced in mice) located the protein to parenchyma, egg, sucker musculature, testes and tegument of adult O. viverrini. Taken together, the findings suggest key functional roles for Ov-FABP in development, reproduction and/or host-parasite interactions. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  11. Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data

    Czech Academy of Sciences Publication Activity Database

    Macas, Jiří; Neumann, Pavel; Novák, Petr; Jiang, J.

    2010-01-01

    Roč. 26, č. 1797 (2010), s. 2101-2108 ISSN 1367-4803 R&D Projects: GA AV ČR KJB500960802; GA MŠk(CZ) OC10037; GA MŠk(CZ) LC06004 Institutional research plan: CEZ:AV0Z50510513 Keywords : next-generation sequencing * satellite repeats * K-mer analysis Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 4.877, year: 2010

  12. Analysis of expressed sequence tags from Prunus mume flower and fruit and development of simple sequence repeat markers

    Directory of Open Access Journals (Sweden)

    Gao Zhihong

    2010-07-01

    Full Text Available Abstract Background Expressed Sequence Tag (EST has been a cost-effective tool in molecular biology and represents an abundant valuable resource for genome annotation, gene expression, and comparative genomics in plants. Results In this study, we constructed a cDNA library of Prunus mume flower and fruit, sequenced 10,123 clones of the library, and obtained 8,656 expressed sequence tag (EST sequences with high quality. The ESTs were assembled into 4,473 unigenes composed of 1,492 contigs and 2,981 singletons and that have been deposited in NCBI (accession IDs: GW868575 - GW873047, among which 1,294 unique ESTs were with known or putative functions. Furthermore, we found 1,233 putative simple sequence repeats (SSRs in the P. mume unigene dataset. We randomly tested 42 pairs of PCR primers flanking potential SSRs, and 14 pairs were identified as true-to-type SSR loci and could amplify polymorphic bands from 20 individual plants of P. mume. We further used the 14 EST-SSR primer pairs to test the transferability on peach and plum. The result showed that nearly 89% of the primer pairs produced target PCR bands in the two species. A high level of marker polymorphism was observed in the plum species (65% and low in the peach (46%, and the clustering analysis of the three species indicated that these SSR markers were useful in the evaluation of genetic relationships and diversity between and within the Prunus species. Conclusions We have constructed the first cDNA library of P. mume flower and fruit, and our data provide sets of molecular biology resources for P. mume and other Prunus species. These resources will be useful for further study such as genome annotation, new gene discovery, gene functional analysis, molecular breeding, evolution and comparative genomics between Prunus species.

  13. Enhancement of extracellular lipid production by oleaginous yeast through preculture and sequencing batch culture strategy with acetic acid.

    Science.gov (United States)

    Huang, Xiang-Feng; Shen, Yi; Luo, Hui-Juan; Liu, Jia-Nan; Liu, Jia

    2018-01-01

    Oleaginous yeast Cryptococcus curvatus MUCL 29819, an acid-tolerant lipid producer, was tested to spill lipids extracellularly using different concentrations of acetic acid as carbon source. Extracellular lipids were released when the yeast was cultured with acetic acid exceeding 20g/L. The highest production of lipid (5.01g/L) was obtained when the yeast was cultured with 40g/L acetic acid. When the yeast was cultivated with moderate concentration (20g/L) of acetic acid, lipid production was further increased by 49.6% through preculture with 40g/L acetic acid as stimulant. When applying high concentration (40g/L) of acetic acid as carbon source in sequencing batch cultivation, extracellular lipids accounted up to 50.5% in the last cycle and the extracellular lipids reached 5.43g/L through the whole process. This study provides an effective strategy to enhance extracellular lipid production and facilitate the recovery of microbial lipids. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. The genome sequence of the highly acetic acid-tolerant Zygosaccharomyces bailii-derived interspecies hybrid strain ISA1307, isolated from a sparkling wine plant.

    Science.gov (United States)

    Mira, Nuno P; Münsterkötter, Martin; Dias-Valada, Filipa; Santos, Júlia; Palma, Margarida; Roque, Filipa C; Guerreiro, Joana F; Rodrigues, Fernando; Sousa, Maria João; Leão, Cecília; Güldener, Ulrich; Sá-Correia, Isabel

    2014-06-01

    In this work, it is described the sequencing and annotation of the genome of the yeast strain ISA1307, isolated from a sparkling wine continuous production plant. This strain, formerly considered of the Zygosaccharomyces bailii species, has been used to study Z. bailii physiology, in particular, its extreme tolerance to acetic acid stress at low pH. The analysis of the genome sequence described in this work indicates that strain ISA1307 is an interspecies hybrid between Z. bailii and a closely related species. The genome sequence of ISA1307 is distributed through 154 scaffolds and has a size of around 21.2 Mb, corresponding to 96% of the genome size estimated by flow cytometry. Annotation of ISA1307 genome includes 4385 duplicated genes (∼ 90% of the total number of predicted genes) and 1155 predicted single-copy genes. The functional categories including a higher number of genes are 'Metabolism and generation of energy', 'Protein folding, modification and targeting' and 'Biogenesis of cellular components'. The knowledge of the genome sequence of the ISA1307 strain is expected to contribute to accelerate systems-level understanding of stress resistance mechanisms in Z. bailii and to inspire and guide novel biotechnological applications of this yeast species/strain in fermentation processes, given its high resilience to acidic stress. The availability of the ISA1307 genome sequence also paves the way to a better understanding of the genetic mechanisms underlying the generation and selection of more robust hybrid yeast strains in the stressful environment of wine fermentations. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  15. The Genome Sequence of the Highly Acetic Acid-Tolerant Zygosaccharomyces bailii-Derived Interspecies Hybrid Strain ISA1307, Isolated From a Sparkling Wine Plant

    Science.gov (United States)

    Mira, Nuno P.; Münsterkötter, Martin; Dias-Valada, Filipa; Santos, Júlia; Palma, Margarida; Roque, Filipa C.; Guerreiro, Joana F.; Rodrigues, Fernando; Sousa, Maria João; Leão, Cecília; Güldener, Ulrich; Sá-Correia, Isabel

    2014-01-01

    In this work, it is described the sequencing and annotation of the genome of the yeast strain ISA1307, isolated from a sparkling wine continuous production plant. This strain, formerly considered of the Zygosaccharomyces bailii species, has been used to study Z. bailii physiology, in particular, its extreme tolerance to acetic acid stress at low pH. The analysis of the genome sequence described in this work indicates that strain ISA1307 is an interspecies hybrid between Z. bailii and a closely related species. The genome sequence of ISA1307 is distributed through 154 scaffolds and has a size of around 21.2 Mb, corresponding to 96% of the genome size estimated by flow cytometry. Annotation of ISA1307 genome includes 4385 duplicated genes (∼90% of the total number of predicted genes) and 1155 predicted single-copy genes. The functional categories including a higher number of genes are ‘Metabolism and generation of energy’, ‘Protein folding, modification and targeting’ and ‘Biogenesis of cellular components’. The knowledge of the genome sequence of the ISA1307 strain is expected to contribute to accelerate systems-level understanding of stress resistance mechanisms in Z. bailii and to inspire and guide novel biotechnological applications of this yeast species/strain in fermentation processes, given its high resilience to acidic stress. The availability of the ISA1307 genome sequence also paves the way to a better understanding of the genetic mechanisms underlying the generation and selection of more robust hybrid yeast strains in the stressful environment of wine fermentations. PMID:24453040

  16. Analysis of B-genome derived simple sequence repeat (SSR ...

    African Journals Online (AJOL)

    A study was conducted to investigate the genetic variability between 40 Musa genotypes maintained at the Musa germplasm collection of the International Institute for Tropical Agriculture, Ibadan using nine B-genome derived simple sequence repeat (SSR) markers. The nine primers produced reproducible and discrete ...

  17. Cloning and sequence analysis of the Antheraea pernyi ...

    Indian Academy of Sciences (India)

    The genome size of AnpeNPV is estimated at 128 kb. ... A genomic library was generated using HindIII and the positive clones were sequenced and analysed. ... Institute of Life Sciences, Jiangsu University, Xuefu Road 301, Zhenjiang 212013, People's Republic of China; School of Agricultural Science and Technology, ...

  18. Characterisation and Next-generation Sequencing Analysis of Unknown Arboviruses

    Science.gov (United States)

    2012-09-01

    incapacitating illness, lack of adequate control measures, and the ease of production of large quantities of virus. Characterisation by sequencing is...ability to induce a fatal or seriously incapacitating illness, the lack of adequate control measures, and the ease of production of large...Location Family Neutralising antibody detected Culex annulirostris (mosquito) DPP1163 1987 Darwin, NT Rhabdoviridae Cattle, buffalo

  19. Illumina-based de novo transcriptome sequencing and analysis of ...

    Indian Academy of Sciences (India)

    2017-12-18

    Dec 18, 2017 ... In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI ...

  20. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  1. POSA: perl objects for DNA sequencing data analysis

    NARCIS (Netherlands)

    Aerts, J.A.; Jungerius, B.J.; Groenen, M.A.M.

    2004-01-01

    Background - Capillary DNA sequencing machines allow the generation of vast amounts of data with little hands-on time. With this expansion of data generation, there is a growing need for automated data processing. Most available software solutions, however, still require user intervention or provide

  2. POSA : Perl objects for DNA sequencing data analysis

    NARCIS (Netherlands)

    Aerts, JA; Jungerius, BJ; Groenen, MA

    2004-01-01

    Background: Capillary DNA sequencing machines allow the generation of vast amounts of data with little hands-on time. With this expansion of data generation, there is a growing need for automated data processing. Most available software solutions, however, still require user intervention or provide

  3. DNA sequence and prokaryotic expression analysis of vitellogenin ...

    African Journals Online (AJOL)

    In this study, the DNA sequence of vitellogenin from Antheraea pernyi (Ap-Vg) was identified and its functional domain (30-740 aa, Ap-Vg-1) was expressed in Escherichia coli BL21 (DE3) cells. The recombinant Ap-Vg-1 proteins were purified and used for antibody preparation. The results showed that the intact DNA ...

  4. Illumina-based de novo transcriptome sequencing and analysis

    Indian Academy of Sciences (India)

    In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI nonredundant ...

  5. A deep sequencing analysis of transcriptomes and the development ...

    Indian Academy of Sciences (India)

    Mungbean (Vigna radiata L. Wilczek) is one of the most important leguminous food crops in Asia. We employed Illumina paired-end sequencing to analyse transcriptomes of three different mungbean genotypes. A total of 38.3–39.8 million paired-end reads with 73 bp lengths were generated. The pooled reads from the ...

  6. Sequence analysis of mitochondrial 16S ribosomal RNA gene ...

    Indian Academy of Sciences (India)

    Mosquitoes are vectors for the transmission of many human pathogens that include viruses, nematodes and protozoa. For the understanding of their vectorial capacity, identification of disease carrying and refractory strains is essential. Recently, molecular taxonomic techniques have been utilized for this purpose. Sequence ...

  7. A bibliometric analysis of global research on genome sequencing ...

    African Journals Online (AJOL)

    The results show that disease and protein related researches were the leading research focuses, and comparative genomics and evolution related research had strong potential in the near future. Key words: Genome sequencing, research trend, scientometrics, science citation index expanded (SCI-Expanded), word cluster ...

  8. Phylogenetic analysis of the Bifidobacterium genus using glycolysis enzyme sequences

    Directory of Open Access Journals (Sweden)

    Katelyn eBrandt

    2016-05-01

    Full Text Available Bifidobacteria are important members of the human gastrointestinal tract that promote the establishment of a healthy microbial consortium in the gut of infants. Recent studies have established that the Bifidobacterium genus is a polymorphic phylogenetic clade, which encompasses a diversity of species and subspecies that encode a broad range of proteins implicated in complex and non-digestible carbohydrate uptake and catabolism, ranging from human breast milk oligosaccharides, to plant fibers. Recent genomic studies have created a need to properly place Bifidobacterium species in a phylogenetic tree. Current approaches, based on core-genome analyses come at the cost of intensive sequencing and demanding analytical processes. Here, we propose a typing method based on sequences of glycolysis genes and the proteins they encode, to provide insights into diversity, typing, and phylogeny in this complex and broad genus. We show that glycolysis genes occur broadly in these genomes, to encode the machinery necessary for the biochemical spine of the cell, and provide a robust phylogenetic marker. Furthermore, glycolytic sequences-based trees are congruent with both the classical 16S rRNA phylogeny, and core genome-based strain clustering. Furthermore, these glycolysis markers can also be used to provide insights into the adaptive evolution of this genus, especially with regards to trends towards a high GC content. This streamlined method may open new avenues for phylogenetic studies on a broad scale, given the widespread occurrence of the glycolysis pathway in bacteria, and the diversity of the sequences they encode.

  9. Sequence analysis of mitochondrial 16S ribosomal RNA gene

    Indian Academy of Sciences (India)

    Mosquitoes are vectors for the transmission of many human pathogens that include viruses, nematodes and protozoa. For the understanding of their vectorial capacity, identification of disease carrying and refractory strains is essential. Recently, molecular taxonomic techniques have been utilized for this purpose. Sequence ...

  10. Sequencing and Gene Expression Analysis of Leishmania tropica LACK Gene.

    Science.gov (United States)

    Hammoudeh, Nour; Kweider, Mahmoud; Abbady, Abdul-Qader; Soukkarieh, Chadi

    2014-01-01

    Leishmania Homologue of receptors for Activated C Kinase (LACK) antigen is a 36-kDa protein, which provokes a very early immune response against Leishmania infection. There are several reports on the expression of LACK through different life-cycle stages of genus Leishmania, but only a few of them have focused on L.tropica. The present study provides details of the cloning, DNA sequencing and gene expression of LACK in this parasite species. First, several local isolates of Leishmania parasites were typed in our laboratory using PCR technique to verify of Leishmania parasite species. After that, LACK gene was amplified and cloned into a vector for sequencing. Finally, the expression of this molecule in logarithmic and stationary growth phase promastigotes, as well as in amastigotes, was evaluated by Reverse Transcription-PCR (RT-PCR) technique. The typing result confirmed that all our local isolates belong to L.tropica. LACK gene sequence was determined and high similarity was observed with the sequences of other Leishmania species. Furthermore, the expression of LACK gene in both promastigotes and amastigotes forms was confirmed. Overall, the data set the stage for future studies of the properties and immune role of LACK gene products.

  11. Cloning and sequence analysis of the defective in anther ...

    African Journals Online (AJOL)

    To clone the defective in anther dehiscence1 (DAD1) gene fragment of Chinese kale, about 700 bp product was obtained by PCR amplification using Chinese kale genomic DNA as the template and a pair of specific primers designed according to the conserved sequence of DAD1 genes of Arabidopsis thaliana and ...

  12. A Bibliometric Analysis of Global Research on Genome Sequencing ...

    African Journals Online (AJOL)

    YSHo

    This study was carried out to evaluate the global scientific production of genome sequencing research to assess the characteristics of the research performances and the research tendencies. Data were obtained from Science Citation Index Expanded database during 1991-2010. Conventional methods including document ...

  13. Sequencing and phylogenetic analysis of Herpes simplex virus type ...

    African Journals Online (AJOL)

    For determination of the genetic relationship of HSV-2 glycoprotein G gene (gG) in Iran with those in other countries, DNA fragment of 1100 bp corresponding to gG from six HSV-2 strains have been isolated from human infected sera samples in Iran, it was amplified in PCR system and was sequenced for determining ...

  14. Molecular cloning and sequence analysis of the cat myostatin gene ...

    African Journals Online (AJOL)

    Administrator

    2011-09-07

    Sep 7, 2011 ... TEAF. TEA/ATTS DNA binding domain factors. TEAD.01. TEA domain-containing factors, transcriptional enhancer factors 1, 3,. 4, 5. -1129/-1117(-) transcription factor binding sites located in the cat myostatin gene upstream sequence. According to previous works, we focused on analyzing and discussing.

  15. Sequence and comparative analysis of Leuconostoc dairy bacteriophages

    DEFF Research Database (Denmark)

    Kot, Witold; Hansen, Lars Henrik; Neve, Horst

    2014-01-01

    Bacteriophages attacking Leuconostoc species may significantly influence the quality of the final product. There is however limited knowledge of this group of phages in the literature. We have determined the complete genome sequences of nine Leuconostoc bacteriophages virulent to either Leuconostoc...

  16. Cloning and sequence analysis of the Antheraea pernyi ...

    Indian Academy of Sciences (India)

    Unknown

    Goldbach R W and Vlak J M 1999 Sequence and organiza- tion of the Spodoptera exigua multicapsid nucleopolyhedro- virus genome; J. Gen. Virol. 80 3289–3304. Jakubowska A, Oers M M, Cory J S, Ziemnick J and Vlak J M. 2005 European Leucoma salicis NPV is closely related to. North American Orgyia pseudotsugata ...

  17. Complete genome sequencing and evolutionary analysis of Indian isolates of Dengue virus type 2

    Energy Technology Data Exchange (ETDEWEB)

    Dash, Paban Kumar, E-mail: pabandash@rediffmail.com; Sharma, Shashi; Soni, Manisha; Agarwal, Ankita; Parida, Manmohan; Rao, P.V.Lakshmana

    2013-07-05

    Highlights: •Complete genome of Indian DENV-2 was deciphered for the first time in this study. •The recent Indian DENV-2 revealed presence of many unique amino acid residues. •Genotype shift (American to Cosmopolitan) characterizes evolution of DENV-2 in India. •Circulation of a unique clade of DENV-2 in South Asia was identified. -- Abstract: Dengue is the most important arboviral infection of global public health significance. It is now endemic in most parts of the South East Asia including India. Though Dengue virus type 2 (DENV-2) is predominantly associated with major outbreaks in India, complete genome information of Indian DENV-2 is not available. In this study, the full-length genome of five DENV-2 isolates (four from 2001 to 2011 and one from 1960), from different parts of India was determined. The complete genome of the Indian DENV-2 was found to be 10,670 bases long with an open reading frame coding for 3391 amino acids. The recent Indian DENV-2 (2001–2011) revealed a nucleotide sequence identity of around 90% and 97% with an older Indian DENV-2 (1960) and closely related Sri Lankan and Chinese DENV-2 respectively. Presence of unique amino acid residues and non-conservative substitutions in critical amino acid residues of major structural and non-structural proteins was observed in recent Indian DENV-2. Selection pressure analysis revealed positive selection in few amino acid sites of the genes encoding for structural and non-structural proteins. The molecular phylogenetic analysis based on comparison of both complete coding region and envelope protein gene with globally diverse DENV-2 viruses classified the recent Indian isolates into a unique South Asian clade within Cosmopolitan genotype. A shift of genotype from American to Cosmopolitan in 1970s characterized the evolution of DENV-2 in India. Present study is the first report on complete genome characterization of emerging DENV-2 isolates from India and highlights the circulation of a

  18. Complete genome sequencing and evolutionary analysis of Indian isolates of Dengue virus type 2

    International Nuclear Information System (INIS)

    Dash, Paban Kumar; Sharma, Shashi; Soni, Manisha; Agarwal, Ankita; Parida, Manmohan; Rao, P.V.Lakshmana

    2013-01-01

    Highlights: •Complete genome of Indian DENV-2 was deciphered for the first time in this study. •The recent Indian DENV-2 revealed presence of many unique amino acid residues. •Genotype shift (American to Cosmopolitan) characterizes evolution of DENV-2 in India. •Circulation of a unique clade of DENV-2 in South Asia was identified. -- Abstract: Dengue is the most important arboviral infection of global public health significance. It is now endemic in most parts of the South East Asia including India. Though Dengue virus type 2 (DENV-2) is predominantly associated with major outbreaks in India, complete genome information of Indian DENV-2 is not available. In this study, the full-length genome of five DENV-2 isolates (four from 2001 to 2011 and one from 1960), from different parts of India was determined. The complete genome of the Indian DENV-2 was found to be 10,670 bases long with an open reading frame coding for 3391 amino acids. The recent Indian DENV-2 (2001–2011) revealed a nucleotide sequence identity of around 90% and 97% with an older Indian DENV-2 (1960) and closely related Sri Lankan and Chinese DENV-2 respectively. Presence of unique amino acid residues and non-conservative substitutions in critical amino acid residues of major structural and non-structural proteins was observed in recent Indian DENV-2. Selection pressure analysis revealed positive selection in few amino acid sites of the genes encoding for structural and non-structural proteins. The molecular phylogenetic analysis based on comparison of both complete coding region and envelope protein gene with globally diverse DENV-2 viruses classified the recent Indian isolates into a unique South Asian clade within Cosmopolitan genotype. A shift of genotype from American to Cosmopolitan in 1970s characterized the evolution of DENV-2 in India. Present study is the first report on complete genome characterization of emerging DENV-2 isolates from India and highlights the circulation of a

  19. Whole-Genome Sequencing Analysis Accurately Predicts Antimicrobial Resistance Phenotypes in Campylobacter spp.

    Science.gov (United States)

    Zhao, S; Tyson, G H; Chen, Y; Li, C; Mukherjee, S; Young, S; Lam, C; Folster, J P; Whichard, J M; McDermott, P F

    2015-10-30

    The objectives of this study were to identify antimicrobial resistance genotypes for Campylobacter and to evaluate the correlation between resistance phenotypes and genotypes using in vitro antimicrobial susceptibility testing and whole-genome sequencing (WGS). A total of 114 Campylobacter species isolates (82 C. coli and 32 C. jejuni) obtained from 2000 to 2013 from humans, retail meats, and cecal samples from food production animals in the United States as part of the National Antimicrobial Resistance Monitoring System were selected for study. Resistance phenotypes were determined using broth microdilution of nine antimicrobials. Genomic DNA was sequenced using the Illumina MiSeq platform, and resistance genotypes were identified using assembled WGS sequences through blastx analysis. Eighteen resistance genes, including tet(O), blaOXA-61, catA, lnu(C), aph(2″)-Ib, aph(2″)-Ic, aph(2')-If, aph(2″)-Ig, aph(2″)-Ih, aac(6')-Ie-aph(2″)-Ia, aac(6')-Ie-aph(2″)-If, aac(6')-Im, aadE, sat4, ant(6'), aad9, aph(3')-Ic, and aph(3')-IIIa, and mutations in two housekeeping genes (gyrA and 23S rRNA) were identified. There was a high degree of correlation between phenotypic resistance to a given drug and the presence of one or more corresponding resistance genes. Phenotypic and genotypic correlation was 100% for tetracycline, ciprofloxacin/nalidixic acid, and erythromycin, and correlations ranged from 95.4% to 98.7% for gentamicin, azithromycin, clindamycin, and telithromycin. All isolates were susceptible to florfenicol, and no genes associated with florfenicol resistance were detected. There was a strong correlation (99.2%) between resistance genotypes and phenotypes, suggesting that WGS is a reliable indicator of resistance to the nine antimicrobial agents assayed in this study. WGS has the potential to be a powerful tool for antimicrobial resistance surveillance programs. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  20. Sequencing and expression analysis of hepcidin mRNA in donkey (Equus asinus liver

    Directory of Open Access Journals (Sweden)

    José P. Oliveira-Filho

    2012-10-01

    Full Text Available The hypoferremia that is observed during systemic inflammatory processes is mediated by hepcidin, which is a peptide that is mainly synthesized in the livers of several mammalian species. Hepcidin plays a key role in iron metabolism and in the innate immune system. It's up-regulation is particularly useful during acute inflammation, and it restricts the iron availability that is necessary for the growth of pathogenic microorganisms. In this study, the hepcidin mRNA of Equus asinus has been characterized, and the expression of donkey hepcidin in the liver has been determined. The donkey hepcidin sequence has an open reading frame (ORF of 261 nucleotides, and the deduced corresponding protein sequence has 86 amino acids. The amino acid sequence of donkey hepcidin was most homologous to Equus caballus (98%. The mature donkey hepcidin sequence (25 amino acids was 100% homologous to the equine mature hepcidin and has eight conserved cysteine residues that are found in all of the investigated hepcidin sequences. The expression profile of donkey hepcidin in the liver was high and was similar to the reference gene expression. The donkey hepcidin sequence was deposited in GenBankTM (HQ902884 and may be useful for additional studies on iron metabolism and the inflammatory process in this species.

  1. A Combined Linkage and Exome Sequencing Analysis for Electrocardiogram Parameters in the Erasmus Rucphen Family Study.

    Science.gov (United States)

    Silva, Claudia T; Zorkoltseva, Irina V; Amin, Najaf; Demirkan, Ayşe; van Leeuwen, Elisabeth M; Kors, Jan A; van den Berg, Marten; Stricker, Bruno H; Uitterlinden, André G; Kirichenko, Anatoly V; Witteman, Jacqueline C M; Willemsen, Rob; Oostra, Ben A; Axenovich, Tatiana I; van Duijn, Cornelia M; Isaacs, Aaron

    2016-01-01

    Electrocardiogram (ECG) measurements play a key role in the diagnosis and prediction of cardiac arrhythmias and sudden cardiac death. ECG parameters, such as the PR, QRS, and QT intervals, are known to be heritable and genome-wide association studies of these phenotypes have been successful in identifying common variants; however, a large proportion of the genetic variability of these traits remains to be elucidated. The aim of this study was to discover loci potentially harboring rare variants utilizing variance component linkage analysis in 1547 individuals from a large family-based study, the Erasmus Rucphen Family Study (ERF). Linked regions were further explored using exome sequencing. Five suggestive linkage peaks were identified: two for QT interval (1q24, LOD = 2.63; 2q34, LOD = 2.05), one for QRS interval (1p35, LOD = 2.52) and two for PR interval (9p22, LOD = 2.20; 14q11, LOD = 2.29). Fine-mapping using exome sequence data identified a C > G missense variant (c.713C > G, p.Ser238Cys) in the FCRL2 gene associated with QT (rs74608430; P = 2.8 × 10 -4 , minor allele frequency = 0.019). Heritability analysis demonstrated that the SNP explained 2.42% of the trait's genetic variability in ERF ( P = 0.02). Pathway analysis suggested that the gene is involved in cytosolic Ca 2+ levels ( P = 3.3 × 10 -3 ) and AMPK stimulated fatty acid oxidation in muscle ( P = 4.1 × 10 -3 ). Look-ups in bioinformatics resources showed that expression of FCRL2 is associated with ARHGAP24 and SETBP1 expression. This finding was not replicated in the Rotterdam study. Combining the bioinformatics information with the association and linkage analyses, FCRL2 emerges as a strong candidate gene for QT interval.

  2. Novel technologies applied to the nucleotide sequencing and comparative sequence analysis of the genomes of infectious agents in veterinary medicine.

    Science.gov (United States)

    Granberg, F; Bálint, Á; Belák, S

    2016-04-01

    Next-generation sequencing (NGS), also referred to as deep, high-throughput or massively parallel sequencing, is a powerful new tool that can be used for the complex diagnosis and intensive monitoring of infectious disease in veterinary medicine. NGS technologies are also being increasingly used to study the aetiology, genomics, evolution and epidemiology of infectious disease, as well as host-pathogen interactions and other aspects of infection biology. This review briefly summarises recent progress and achievements in this field by first introducing a range of novel techniques and then presenting examples of NGS applications in veterinary infection biology. Various work steps and processes for sampling and sample preparation, sequence analysis and comparative genomics, and improving the accuracy of genomic prediction are discussed, as are bioinformatics requirements. Examples of sequencing-based applications and comparative genomics in veterinary medicine are then provided. This review is based on novel references selected from the literature and on experiences of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine, Uppsala, Sweden.

  3. Cloning, sequence analysis, expression of Cyathus bulleri laccase in Pichia pastoris and characterization of recombinant laccase

    Directory of Open Access Journals (Sweden)

    Garg Neha

    2012-10-01

    Full Text Available Abstract Background Laccases are blue multi-copper oxidases and catalyze the oxidation of phenolic and non-phenolic compounds. There is considerable interest in using these enzymes for dye degradation as well as for synthesis of aromatic compounds. Laccases are produced at relatively low levels and, sometimes, as isozymes in the native fungi. The investigation of properties of individual enzymes therefore becomes difficult. The goal of this study was to over-produce a previously reported laccase from Cyathus bulleri using the well-established expression system of Pichia pastoris and examine and compare the properties of the recombinant enzyme with that of the native laccase. Results In this study, complete cDNA encoding laccase (Lac from white rot fungus Cyathus bulleri was amplified by RACE-PCR, cloned and expressed in the culture supernatant of Pichia pastoris under the control of the alcohol oxidase (AOX1 promoter. The coding region consisted of 1,542 bp and encodes a protein of 513 amino acids with a signal peptide of 16 amino acids. The deduced amino acid sequence of the matured protein displayed high homology with laccases from Trametes versicolor and Coprinus cinereus. The sequence analysis indicated the presence of Glu 460 and Ser 113 and LEL tripeptide at the position known to influence redox potential of laccases placing this enzyme as a high redox enzyme. Addition of copper sulfate to the production medium enhanced the level of laccase by about 12-fold to a final activity of 7200 U L-1. The recombinant laccase (rLac was purified by ~4-fold to a specific activity of ~85 U mg-1 protein. A detailed study of thermostability, chloride and solvent tolerance of the rLac indicated improvement in the first two properties when compared to the native laccase (nLac. Altered glycosylation pattern, identified by peptide mass finger printing, was proposed to contribute to altered properties of the rLac. Conclusion Laccase of C. bulleri was

  4. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Science.gov (United States)

    2010-07-01

    ... Colombettes; 1211 Geneva 20 Switzerland. Copies may also be inspected at the National Archives and Records... number of SEQ ID NOs, whether followed by a sequence or by the code “000.” (d) Where the description or... identifier, preceded by “SEQ ID NO:” in the text of the description or claims, even if the sequence is also...

  5. Recognition of 5'-YpG-3' sequences by coupled stacking/hydrogen bonding interactions with amino acid residues.

    Science.gov (United States)

    Lamoureux, Jason S; Maynes, Jason T; Glover, J N Mark

    2004-01-09

    The combined biochemical and structural study of hundreds of protein-DNA complexes has indicated that sequence-specific interactions are mediated by two mechanisms termed direct and indirect readout. Direct readout involves direct interactions between the protein and base-specific atoms exposed in the major and minor grooves of DNA. For indirect readout, the protein recognizes DNA by sensing conformational variations in the structure dependent on nucleotide sequence, typically through interactions with the phosphodiester backbone. Based on our recent structure of Ndt80 bound to DNA in conjunction with a search of the existing PDB database, we propose a new method of sequence-specific recognition that utilizes both direct and indirect readout. In this mode, a single amino acid side-chain recognizes two consecutive base-pairs. The 3'-base is recognized by canonical direct readout, while the 5'-base is recognized through a variation of indirect readout, whereby the conformational flexibility of the particular dinucleotide step, namely a 5'-pyrimidine-purine-3' step, facilitates its recognition by the amino acid via cation-pi interactions. In most cases, this mode of DNA recognition helps explain the sequence specificity of the protein for its target DNA.

  6. Comparative studies on tree pollen allergens. X. Further purification and N-terminal amino acid sequence analyses of the major allergen of birch pollen (Betula verrucosa).

    Science.gov (United States)

    Vik, H; Elsayed, S

    1986-01-01

    The previously isolated major allergen of birch pollen (fraction BV45), Int. Archs Allergy appl. Immun. 68: 70-78 (1982), was further purified by recycling chromatography. The purified preparation was run on a high-performance liquid chromatography (HPLC) TSK-G-2000 gel filtration chromatography column and, finally, on paper high-volt electrophoresis. The protein recovered met the homogeneity criteria required for performing the N-terminal sequence analysis. The allergenic and antigenic reactivities of the HPLC-purified protein, designated BV45B, was examined. A single homogeneous precipitation line in crossed immunoelectrophoresis (CIE) was shown. Specific IgE-inhibition tests and immuno-autoradiographic prints indicated that this allergen could bind reaginic IgE specificially and with good affinity. The homogeneity of BV45B was examined by isoelectric focusing (IEF). Several minor bands of pI differences of less than 0.1 units were visible, demonstrating the existence of some molecular variants of this protein. The N-terminal sequence analysis of the molecule was performed, and the following four amino acids were tentatively shown by sequential cleavage: NH2-Ala-Gly-Ile-Val-. The demonstration of one dominant N-terminal 1-dimethyl-amino-5-naphthalene sulphonyl (DNS)-amino acid by polyamide thin-layer chromatography at each sequence step confirmed that the N-terminal residue of the protein was not blocked; the heterogeneity shown by the IEF system was merely due to the presence of several homologous polymorphic proteins with identical N-terminal amino acid, the adequacy of the purification repertoire used.

  7. Network Analysis of Sequence-Function Relationships and Exploration of Sequence Space of TEM β-Lactamases.

    Science.gov (United States)

    Zeil, Catharina; Widmann, Michael; Fademrecht, Silvia; Vogel, Constantin; Pleiss, Jürgen

    2016-05-01

    The Lactamase Engineering Database (www.LacED.uni-stuttgart.de) was developed to facilitate the classification and analysis of TEM β-lactamases. The current version contains 474 TEM variants. Two hundred fifty-nine variants form a large scale-free network of highly connected point mutants. The network was divided into three subnetworks which were enriched by single phenotypes: one network with predominantly 2be and two networks with 2br phenotypes. Fifteen positions were found to be highly variable, contributing to the majority of the observed variants. Since it is expected that a considerable fraction of the theoretical sequence space is functional, the currently sequenced 474 variants represent only the tip of the iceberg of functional TEM β-lactamase variants which form a huge natural reservoir of highly interconnected variants. Almost 50% of the variants are part of a quartet. Thus, two single mutations that result in functional enzymes can be combined into a functional protein. Most of these quartets consist of the same phenotype, or the mutations are additive with respect to the phenotype. By predicting quartets from triplets, 3,916 unknown variants were constructed. Eighty-seven variants complement multiple quartets and therefore have a high probability of being functional. The construction of a TEM β-lactamase network and subsequent analyses by clustering and quartet prediction are valuable tools to gain new insights into the viable sequence space of TEM β-lactamases and to predict their phenotype. The highly connected sequence space of TEM β-lactamases is ideally suited to network analysis and demonstrates the strengths of network analysis over tree reconstruction methods. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  8. [Comparative analysis of sequence alignment of SH3GL1 gene as a disease candidate gene of adolescent idiopathic scoliosis].

    Science.gov (United States)

    Yang, Tao; Xu, Jian-zhong; Jia, Quan-zhang; Guo, Hong; Luo, Fei; Ye, Qing; Bai, Yun

    2010-03-15

    To identify whether SH3GL1 gene serves as a disease associated gene of adolescent idiopathic scoliosis (AIS). Positioning candidate cloning: "case-sibling or case-family control design" research scheme based on family constellation was designed. Fifty-six AIS patients (15 male and 41 female, mean age 15 years old, ranged from 8 to 22 years old, Cobb angle from 25 degrees to 110 degrees , average Cobb angle of 67.5 degrees ) from November 2007 to December 2008 were recruited. In all patients, blood preparation was collected, and genome DNA was extracted. According to nucleotide sequence of gene SH3GL1, primer pair for PCR amplification, cloning, and sequencing with 10 exons as emphasis was designed. Sequence comparative analysis for exon sequencing result between sib pairs or family pairs, and that between sib pair or family pairs and NCBI (National Center for Biotechnology Information) were conducted through Vector NTI Advance 10.3 software to judge whether basic group mutation occurred or not. Amino acid sequence comparative analysis for prediction was made. Ten exons of the candidate gene SH3GL1 were successfully amplified and cloned in genome DNA of an AIS sib pair and family pairs, and the sequencing obtained positive results. Twelve basic group mutations were found in 10 exons of the candidate gene SH3GL1 of patients with AIS. These mutations were located in the second exon (3 mutations), the fourth exon (1 mutations), the fifth exon (4 mutations), the sixth exon (1 mutations), the eighth exon (1 mutations), and the tenth exon (2 mutations, noncoding region). If basic group in 515 of mRNA was mutated to T, termination codon(TAG) came into being and open reading frame was altered. The sequence of protein showed brachytmema protein was encoded, which could cause changes of primary structure. SH3GL1 is possibly one of the disease associated genes of AIS.

  9. Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework.

    Science.gov (United States)

    Li, Miaoxin; Li, Jiang; Li, Mulin Jun; Pan, Zhicheng; Hsu, Jacob Shujui; Liu, Dajiang J; Zhan, Xiaowei; Wang, Junwen; Song, Youqiang; Sham, Pak Chung

    2017-05-19

    Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. XplorSeq: a software environment for integrated management and phylogenetic analysis of metagenomic sequence data.

    Science.gov (United States)

    Frank, Daniel N

    2008-10-07

    Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects. XplorSeq is a software package that facilitates the compilation, management and phylogenetic analysis of DNA sequences. XplorSeq was developed for, but is not limited to, high-throughput analysis of environmental rRNA gene sequences. XplorSeq integrates and extends several commonly used UNIX-based analysis tools by use of a Macintosh OS-X-based graphical user interface (GUI). Through this GUI, users may perform basic sequence import and assembly steps (base-calling, vector/primer trimming, contig assembly), perform BLAST (Basic Local Alignment and Search Tool; 123) searches of NCBI and local databases, create multiple sequence alignments, build phylogenetic trees, assemble Operational Taxonomic Units, estimate biodiversity indices, and summarize data in a variety of formats. Furthermore, sequences may be annotated with user-specified meta-data, which then can be used to sort data and organize analyses and reports. A document-based architecture permits parallel analysis of sequence data from multiple clones or amplicons, with sequences and other data stored in a single file. XplorSeq should benefit researchers who are engaged in analyses of environmental sequence data, especially those with little experience using bioinformatics software. Although XplorSeq was developed for management of rDNA sequence data, it can be applied to most any sequencing project. The application is available free of charge for non-commercial use at http://vent.colorado.edu/phyloware.

  11. XplorSeq: A software environment for integrated management and phylogenetic analysis of metagenomic sequence data

    Directory of Open Access Journals (Sweden)

    Frank Daniel N

    2008-10-01

    Full Text Available Abstract Background Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects. Results XplorSeq is a software package that facilitates the compilation, management and phylogenetic analysis of DNA sequences. XplorSeq was developed for, but is not limited to, high-throughput analysis of environmental rRNA gene sequences. XplorSeq integrates and extends several commonly used UNIX-based analysis tools by use of a Macintosh OS-X-based graphical user interface (GUI. Through this GUI, users may perform basic sequence import and assembly steps (base-calling, vector/primer trimming, contig assembly, perform BLAST (Basic Local Alignment and Search Tool; 123 searches of NCBI and local databases, create multiple sequence alignments, build phylogenetic trees, assemble Operational Taxonomic Units, estimate biodiversity indices, and summarize data in a variety of formats. Furthermore, sequences may be annotated with user-specified meta-data, which then can be used to sort data and organize analyses and reports. A document-based architecture permits parallel analysis of sequence data from multiple clones or amplicons, with sequences and other data stored in a single file. Conclusion XplorSeq should benefit researchers who are engaged in analyses of environmental sequence data, especially those with little experience using bioinformatics software. Although XplorSeq was developed for management of rDNA sequence data, it can be applied to most any sequencing project. The application is available free of charge for non-commercial use at http://vent.colorado.edu/phyloware.

  12. Addressing challenges in the production and analysis of illumina sequencing data.

    Science.gov (United States)

    Kircher, Martin; Heyn, Patricia; Kelso, Janet

    2011-07-29

    Advances in DNA sequencing technologies have made it possible to generate large amounts of sequence data very rapidly and at substantially lower cost than capillary sequencing. These new technologies have specific characteristics and limitations that require either consideration during project design, or which must be addressed during data analysis. Specialist skills, both at the laboratory and the computational stages of project design and analysis, are crucial to the generation of high quality data from these new platforms. The Illumina sequencers (including the Genome Analyzers I/II/IIe/IIx and the new HiScan and HiSeq) represent a widely used platform providing parallel readout of several hundred million immobilized sequences using fluorescent-dye reversible-terminator chemistry. Sequencing library quality, sample handling, instrument settings and sequencing chemistry have a strong impact on sequencing run quality. The presence of adapter chimeras and adapter sequences at the end of short-insert molecules, as well as increased error rates and short read lengths complicate many computational analyses. We discuss here some of the factors that influence the frequency and severity of these problems and provide solutions for circumventing these. Further, we present a set of general principles for good analysis practice that enable problems with sequencing runs to be identified and dealt with.

  13. Addressing challenges in the production and analysis of illumina sequencing data

    Directory of Open Access Journals (Sweden)

    Kelso Janet

    2011-07-01

    Full Text Available Abstract Advances in DNA sequencing technologies have made it possible to generate large amounts of sequence data very rapidly and at substantially lower cost than capillary sequencing. These new technologies have specific characteristics and limitations that require either consideration during project design, or which must be addressed during data analysis. Specialist skills, both at the laboratory and the computational stages of project design and analysis, are crucial to the generation of high quality data from these new platforms. The Illumina sequencers (including the Genome Analyzers I/II/IIe/IIx and the new HiScan and HiSeq represent a widely used platform providing parallel readout of several hundred million immobilized sequences using fluorescent-dye reversible-terminator chemistry. Sequencing library quality, sample handling, instrument settings and sequencing chemistry have a strong impact on sequencing run quality. The presence of adapter chimeras and adapter sequences at the end of short-insert molecules, as well as increased error rates and short read lengths complicate many computational analyses. We discuss here some of the factors that influence the frequency and severity of these problems and provide solutions for circumventing these. Further, we present a set of general principles for good analysis practice that enable problems with sequencing runs to be identified and dealt with.

  14. Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data.

    Science.gov (United States)

    Jia, Cheng; Hu, Yu; Kelly, Derek; Kim, Junhyong; Li, Mingyao; Zhang, Nancy R

    2017-11-02

    Recent technological breakthroughs have made it possible to measure RNA expression at the single-cell level, thus paving the way for exploring expression heterogeneity among individual cells. Current single-cell RNA sequencing (scRNA-seq) protocols are complex and introduce technical biases that vary across cells, which can bias downstream analysis without proper adjustment. To account for cell-to-cell technical differences, we propose a statistical framework, TASC (Toolkit for Analysis of Single Cell RNA-seq), an empirical Bayes approach to reliably model the cell-specific dropout rates and amplification bias by use of external RNA spike-ins. TASC incorporates the technical parameters, which reflect cell-to-cell batch effects, into a hierarchical mixture model to estimate the biological variance of a gene and detect differentially expressed genes. More importantly, TASC is able to adjust for covariates to further eliminate confounding that may originate from cell size and cell cycle differences. In simulation and real scRNA-seq data, TASC achieves accurate Type I error control and displays competitive sensitivity and improved robustness to batch effects in differential expression analysis, compared to existing methods. TASC is programmed to be computationally efficient, taking advantage of multi-threaded parallelization. We believe that TASC will provide a robust platform for researchers to leverage the power of scRNA-seq. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. The sequence and analysis of duplication rich human chromosome 16

    Energy Technology Data Exchange (ETDEWEB)

    Martin, J; Han, C; Gordon, L A; Terry, A; Prabhakar, S; She, X; Xie, G; Hellsten, U; Chan, Y M; Altherr, M; Couronne, O; Aerts, A; Bajorek, E; Black, S; Blumer, H; Branscomb, E; Brown, N; Bruno, W J; Buckingham, J; Callen, D F; Campbell, C S; Campbell, M L; Campbell, E W; Caoile, C; Challacombe, J F; Chasteen, L A; Chertkov, O; Chi, H C; Christensen, M; Clark, L M; Cohn, J D; Denys, M; Detter, J C; Dickson, M; Dimitrijevic-Bussod, M; Escobar, J; Fawcett, J J; Flowers, D; Fotopulos, D; Glavina, T; Gomez, M; Gonzales, E; Goodstein, D; Goodwin, L A; Grady, D L; Grigoriev, I; Groza, M; Hammon, N; Hawkins, T; Haydu, L; Hildebrand, C E; Huang, W; Israni, S; Jett, J; Jewett, P B; Kadner, K; Kimball, H; Kobayashi, A; Krawczyk, M; Leyba, T; Longmire, J L; Lopez, F; Lou, Y; Lowry, S; Ludeman, T; Manohar, C F; Mark, G A; McMurray, K L; Meincke, L J; Morgan, J; Moyzis, R K; Mundt, M O; Munk, A C; Nandkeshwar, R D; Pitluck, S; Pollard, M; Predki, P; Parson-Quintana, B; Ramirez, L; Rash, S; Retterer, J; Ricke, D O; Robinson, D; Rodriguez, A; Salamov, A; Saunders, E H; Scott, D; Shough, T; Stallings, R L; Stalvey, M; Sutherland, R D; Tapia, R; Tesmer, J G; Thayer, N; Thompson, L S; Tice, H; Torney, D C; Tran-Gyamfi, M; Tsai, M; Ulanovsky, L E; Ustaszewska, A; Vo, N; White, P S; Williams, A L; Wills, P L; Wu, J; Wu, K; Yang, J; DeJong, P; Bruce, D; Doggett, N A; Deaven, L; Schmutz, J; Grimwood, J; Richardson, P; Rokhsar, D S; Eichler, E E; Gilna, P; Lucas, S M; Myers, R M; Rubin, E M; Pennacchio, L A

    2005-04-06

    Human chromosome 16 features one of the highest levels of segmentally duplicated sequence among the human autosomes. We report here the 78,884,754 base pairs of finished chromosome 16 sequence, representing over 99.9% of its euchromatin. Manual annotation revealed 880 protein-coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes, and 3 RNA pseudogenes. These genes include metallothionein, cadherin, and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobase pairs were identified and result in gene content differences among humans. While the segmental duplications of chromosome 16 are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events likely to have had an impact on the evolution of primates and human disease susceptibility.

  16. mitoSAVE: mitochondrial sequence analysis of variants in Excel.

    Science.gov (United States)

    King, Jonathan L; Sajantila, Antti; Budowle, Bruce

    2014-09-01

    The mitochondrial genome (mtGenome) contains genetic information amenable to numerous applications such as medical research, population and evolutionary studies, and human identity testing. However, inconsistent nomenclature assignment makes haplotype comparison difficult and can lead to false exclusion of potentially useful profiles. Massively Parallel Sequencing (MPS) is a platform for sequencing large datasets and potentially whole populations with relative ease. However, the data generated are not easily parsed and interpreted. With this in mind, mitoSAVE has been developed to enable fast conversion of Variant Call Format (VCF) files. mitoSAVE is an Excel-based workbook that converts data within the VCF into mtDNA haplotypes using phylogenetically-established nomenclature as well as rule-based alignments consistent with current forensic standards. mitoSAVE is formatted for human mitochondrial genome; however, it can easily be adapted to support other reasonably small genomes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  17. Sequence analysis of the Legionella micdadei groELS operon

    DEFF Research Database (Denmark)

    Hindersson, P; Høiby, N; Bangsborg, Jette Marie

    1991-01-01

    A 2.7 kb DNA fragment encoding the 60 kDa common antigen (CA) and a 13 kDa protein of Legionella micdadei was sequenced. Two open reading frames of 57,677 and 10,456 Da were identified, corresponding to the heat shock proteins GroEL and GroES, respectively. Typical -35, -10, and Shine-Dalgarno heat...

  18. Cloning and sequence analysis of benzo-a-pyreneinducible ...

    African Journals Online (AJOL)

    Abstract. Polycyclic aromatic hydrocarbons (PAHs), dioxins, dibenzofurans and polychlorinated biphenyls (PCBs) present in polluted environment induce cytochrome P4501A ... The full-length cDNA was 2530 bp long and contained an open reading frame of 1566 bp encoding a protein of 521 amino acids and a stop codon.

  19. Analysis of plastid DNA-like sequences within the nuclear genomes of higher plants.

    Science.gov (United States)

    Ayliffe, M A; Scott, N S; Timmis, J N

    1998-06-01

    A wide-ranging examination of plastid (pt)DNA sequence homologies within higher plant nuclear genomes (promiscuous DNA) was undertaken. Digestion with methylation-sensitive restriction enzymes and Southern analysis was used to distinguish plastid and nuclear DNA in order to assess the extent of variability of promiscuous sequences within and between plant species. Some species, such as Gossypium hirsutum (cotton), Nicotiana tabacum (tobacco), and Chenopodium quinoa, showed homogenity of these sequences, while intraspecific sequence variation was observed among different cultivars of Pisum sativum (pea), Hordeum vulgare (barley), and Triticum aestivum (wheat). Hypervariability of plastid sequence homologies was identified in the nuclear genomes of Spinacea oleracea (spinach) and Beta vulgaris (beet), in which individual plants were shown to possess a unique spectrum of nuclear sequences with ptDNA homology. This hypervariability apparently extended to somatic variation in B. vulgaris. No sequences with ptDNA homology were identified by this method in the nuclear genome of Arabidopsis thaliana.

  20. Templated Synthesis of Peptide Nucleic Acids via Sequence-Selective Base-Filling Reactions

    OpenAIRE

    Heemstra, Jennifer M.; Liu, David R.

    2009-01-01

    The templated synthesis of nucleic acids has previously been achieved through the backbone ligation of preformed nucleotide monomers or oligomers. In contrast, here we demonstrate templated nucleic acid synthesis using a base-filling approach in which individual bases are added to abasic sites of a peptide nucleic acid (PNA). Because nucleobase substrates in this approach are not self-reactive, a base-filling approach may reduce the formation of nontemplated reaction products. Using either re...

  1. OPTSDNA: Performance evaluation of an efficient distributed bioinformatics system for DNA sequence analysis.

    Science.gov (United States)

    Khan, Mohammad Ibrahim; Sheel, Chotan

    2013-01-01

    Storage of sequence data is a big concern as the amount of data generated is exponential in nature at several locations. Therefore, there is a need to develop techniques to store data using compression algorithm. Here we describe optimal storage algorithm (OPTSDNA) for storing large amount of DNA sequences of varying length. This paper provides performance analysis of optimal storage algorithm (OPTSDNA) of a distributed bioinformatics computing system for analysis of DNA sequences. OPTSDNA algorithm is used for storing various sizes of DNA sequences into database. DNA sequences of different lengths were stored by using this algorithm. These input DNA sequences are varied in size from very small to very large. Storage size is calculated by this algorithm. Response time is also calculated in this work. The efficiency and performance of the algorithm is high (in size calculation with percentage) when compared with other known with sequential approach.

  2. The pseudoazurin gene from Thiosphaera pantotropha : analysis of upstream putative regulatory sequences and overexpression in Escherichia coli

    NARCIS (Netherlands)

    Leung, Y C; Chan, C.; Reader, J S; Willis, A C; van Spanning, R J; Ferguson, Stuart J; Radford, S E

    1997-01-01

    The pseudoazurin gene from Thiosphaera pantotropha has been cloned and sequenced. The deduced amino acid sequence showed that the protein contains an unusually alanine-rich signal peptide, 22 amino acid residues in length, which targets the protein to the periplasm. This pseudoazurin was expressed

  3. Purification and amino acid sequence of a bacteriocins produced by Lactobacillus salivarius K7 isolated from chicken intestine

    Directory of Open Access Journals (Sweden)

    Kenji Sonomoto

    2006-03-01

    Full Text Available A bacteriocin-producing strain, Lactobacillus K7, was isolated from a chicken intestine. The inhibitory activity was determined by spot-on-lawn technique. Identification of the strain was performed by morphological, biochemical (API 50 CH kit and molecular genetic (16S rDNA basis. Bacteriocin purification processes were carried out by amberlite adsorption, cation exchange and reverse-phase high perform- ance liquid chromatography. N-terminal amino acid sequences were performed by Edman degradation. Molecular mass was determined by electrospray-ionization (ESI mass spectrometry (MS. Lactobacillus K7 showed inhibitory activity against Lactobacillus sakei subsp. sakei JCM 1157T, Leuconostoc mesenteroides subsp. mesenteroides JCM 6124T and Bacillus coagulans JCM 2257T. This strain was identified as Lb. salivarius. The antimicrobial substance was destroyed by proteolytic enzymes, indicating its proteinaceous structure designated as a bacteriocin type. The purification of bacteriocin by amberlite adsorption, cation exchange, and reverse-phase chromatography resulted in only one single active peak, which was designated FK22. Molecular weight of this fraction was 4331.70 Da. By amino acid sequence, this peptide was homology to Abp 118 beta produced by Lb. salivarius UCC118. In addition, Lb. salivarius UCC118 produced 2-peptide bacteriocin, which was Abp 118 alpha and beta. Based on the partial amino acid sequences of Abp 118 beta, specific primers were designed from nucleotide sequences according to data from GenBank. The result showed that the deduced peptide was high homology to 2-peptide bacteriocin, Abp 118 alpha and beta.

  4. A model of the statistical power of comparative genome sequence analysis.

    OpenAIRE

    Sean R Eddy

    2005-01-01

    Comparative genome sequence analysis is powerful, but sequencing genomes is expensive. It is desirable to be able to predict how many genomes are needed for comparative genomics, and at what evolutionary distances. Here I describe a simple mathematical model for the common problem of identifying conserved sequences. The model leads to some useful rules of thumb. For a given evolutionary distance, the number of comparative genomes needed for a constant level of statistical stringency in identi...

  5. MiSeq: A Next Generation Sequencing Platform for Genomic Analysis.

    Science.gov (United States)

    Ravi, Rupesh Kanchi; Walton, Kendra; Khosroheidari, Mahdieh

    2018-01-01

    MiSeq, Illumina's integrated next generation sequencing instrument, uses reversible-terminator sequencing-by-synthesis technology to provide end-to-end sequencing solutions. The MiSeq instrument is one of the smallest benchtop sequencers that can perform onboard cluster generation, amplification, genomic DNA sequencing, and data analysis, including base calling, alignment and variant calling, in a single run. It performs both single- and paired-end runs with adjustable read lengths from 1 × 36 base pairs to 2 × 300 base pairs. A single run can produce output data of up to 15 Gb in as little as 4 h of runtime and can output up to 25 M single reads and 50 M paired-end reads. Thus, MiSeq provides an ideal platform for rapid turnaround time. MiSeq is also a cost-effective tool for various analyses focused on targeted gene sequencing (amplicon sequencing and target enrichment), metagenomics, and gene expression studies. For these reasons, MiSeq has become one of the most widely used next generation sequencing platforms. Here, we provide a protocol to prepare libraries for sequencing using the MiSeq instrument and basic guidelines for analysis of output data from the MiSeq sequencing run.

  6. Analysis of Free Amino Acids in Mammalian Brain Extracts.

    Science.gov (United States)

    Ksenofontov, A L; Boyko, A I; Mkrtchyan, G V; Tashlitsky, V N; Timofeeva, A V; Graf, A V; Bunik, V I; Baratova, L A

    2017-10-01

    An optimized method for analysis of free amino acids using a modified lithium-citrate buffer system with a Hitachi L-8800 amino acid analyzer is described. It demonstrates clear advantages over the sodium-citrate buffer system commonly used for the analysis of protein hydrolysates. A sample pretreatment technique for amino acid analysis of brain extracts is also discussed. The focus has been placed on the possibility of quantitative determination of the reduced form of glutathione (GSH) with simultaneous analysis of all other amino acids in brain extracts. The method was validated and calibration coefficient (K GSH ) was determined. Examples of chromatographic separation of free amino acids in extracts derived from different parts of the brain are presented.

  7. Protein chemotaxonomy of genus Datura. IV. Amino acid sequence of Datura ferredoxins depends not on the species but the section of Datura plants from which it comes.

    Science.gov (United States)

    Mino, Y

    1995-07-01

    The complete amino acid sequences of [2Fe-2S] ferredoxins from Datura quercifolia (section Stramonium) and D. fastuosa (section Dutra) have been determined by automated Edman degradation of the entire Cm-protein and of the peptides obtained by tryptic digestion and CNBr treatment. The D. quercifolia and D. fastuosa ferredoxins exhibited identical amino acid sequences to D. stramonium (section Stramonium) and D. metal (section Dutra) ferredoxins, respectively. This result suggests that the amino acid sequence of Datura ferredoxins depends on the section but not the species of Datura plants.

  8. Deep Sequencing Analysis of the Ixodes ricinus Haemocytome.

    Directory of Open Access Journals (Sweden)

    Michalis Kotsyfakis

    2015-05-01

    Full Text Available Ixodes ricinus is the main tick vector of the microbes that cause Lyme disease and tick-borne encephalitis in Europe. Pathogens transmitted by ticks have to overcome innate immunity barriers present in tick tissues, including midgut, salivary glands epithelia and the hemocoel. Molecularly, invertebrate immunity is initiated when pathogen recognition molecules trigger serum or cellular signalling cascades leading to the production of antimicrobials, pathogen opsonization and phagocytosis. We presently aimed at identifying hemocyte transcripts from semi-engorged female I. ricinus ticks by mass sequencing a hemocyte cDNA library and annotating immune-related transcripts based on their hemocyte abundance as well as their ubiquitous distribution.De novo assembly of 926,596 pyrosequence reads plus 49,328,982 Illumina reads (148 nt length from a hemocyte library, together with over 189 million Illumina reads from salivary gland and midgut libraries, generated 15,716 extracted coding sequences (CDS; these are displayed in an annotated hyperlinked spreadsheet format. Read mapping allowed the identification and annotation of tissue-enriched transcripts. A total of 327 transcripts were found significantly over expressed in the hemocyte libraries, including those coding for scavenger receptors, antimicrobial peptides, pathogen recognition proteins, proteases and protease inhibitors. Vitellogenin and lipid metabolism transcription enrichment suggests fat body components. We additionally annotated ubiquitously distributed transcripts associated with immune function, including immune-associated signal transduction proteins and transcription factors, including the STAT transcription factor.This is the first systems biology approach to describe the genes expressed in the haemocytes of this neglected disease vector. A total of 2,860 coding sequences were deposited to GenBank, increasing to 27,547 the number so far deposited by our previous transcriptome studies

  9. A Markovian analysis of bacterial genome sequence constraints

    Directory of Open Access Journals (Sweden)

    Aaron D. Skewes

    2013-08-01

    Full Text Available The arrangement of nucleotides within a bacterial chromosome is influenced by numerous factors. The degeneracy of the third codon within each reading frame allows some flexibility of nucleotide selection; however, the third nucleotide in the triplet of each codon is at least partly determined by the preceding two. This is most evident in organisms with a strong G + C bias, as the degenerate codon must contribute disproportionately to maintaining that bias. Therefore, a correlation exists between the first two nucleotides and the third in all open reading frames. If the arrangement of nucleotides in a bacterial chromosome is represented as a Markov process, we would expect that the correlation would be completely captured by a second-order Markov model and an increase in the order of the model (e.g., third-, fourth-…order would not capture any additional uncertainty in the process. In this manuscript, we present the results of a comprehensive study of the Markov property that exists in the DNA sequences of 906 bacterial chromosomes. All of the 906 bacterial chromosomes studied exhibit a statistically significant Markov property that extends beyond second-order, and therefore cannot be fully explained by codon usage. An unrooted tree containing all 906 bacterial chromosomes based on their transition probability matrices of third-order shares ∼25% similarity to a tree based on sequence homologies of 16S rRNA sequences. This congruence to the 16S rRNA tree is greater than for trees based on lower-order models (e.g., second-order, and higher-order models result in diminishing improvements in congruence. A nucleotide correlation most likely exists within every bacterial chromosome that extends past three nucleotides. This correlation places significant limits on the number of nucleotide sequences that can represent probable bacterial chromosomes. Transition matrix usage is largely conserved by taxa, indicating that this property is likely

  10. Analysis of next-generation sequencing data using Galaxy.

    Science.gov (United States)

    Blankenberg, Daniel; Hillman-Jackson, Jennifer

    2014-01-01

    The extraordinary throughput of next-generation sequencing (NGS) technology is outpacing our ability to analyze and interpret the data. This chapter will focus on practical informatics methods, strategies, and software tools for transforming NGS data into usable information through the use of a web-based platform, Galaxy. The Galaxy interface is explored through several different types of example analyses. Instructions for running one's own Galaxy server on local hardware or on cloud computing resources are provided. Installing new tools into a personal Galaxy instance is also demonstrated.

  11. cDNA, genomic sequence cloning and analysis of the ribosomal ...

    African Journals Online (AJOL)

    Ribosomal protein L37A (RPL37A) is a component of 60S large ribosomal subunit encoded by the RPL37A gene, which belongs to the family of ribosomal L37AE proteins, located in the cytoplasm. The complementary deoxyribonucleic acid (cDNA) and the genomic sequence of RPL37A were cloned successfully from giant ...

  12. In Vivo Enhancer Analysis Chromosome 16 Conserved NoncodingSequences

    Energy Technology Data Exchange (ETDEWEB)

    Pennacchio, Len A.; Ahituv, Nadav; Moses, Alan M.; Nobrega,Marcelo; Prabhakar, Shyam; Shoukry, Malak; Minovitsky, Simon; Visel,Axel; Dubchak, Inna; Holt, Amy; Lewis, Keith D.; Plajzer-Frick, Ingrid; Akiyama, Jennifer; De Val, Sarah; Afzal, Veena; Black, Brian L.; Couronne, Olivier; Eisen, Michael B.; Rubin, Edward M.

    2006-02-01

    The identification of enhancers with predicted specificitiesin vertebrate genomes remains a significant challenge that is hampered bya lack of experimentally validated training sets. In this study, weleveraged extreme evolutionary sequence conservation as a filter toidentify putative gene regulatory elements and characterized the in vivoenhancer activity of human-fish conserved and ultraconserved1 noncodingelements on human chromosome 16 as well as such elements from elsewherein the genome. We initially tested 165 of these extremely conservedsequences in a transgenic mouse enhancer assay and observed that 48percent (79/165) functioned reproducibly as tissue-specific enhancers ofgene expression at embryonic day 11.5. While driving expression in abroad range of anatomical structures in the embryo, the majority of the79 enhancers drove expression in various regions of the developingnervous system. Studying a set of DNA elements that specifically droveforebrain expression, we identified DNA signatures specifically enrichedin these elements and used these parameters to rank all ~;3,400human-fugu conserved noncoding elements in the human genome. The testingof the top predictions in transgenic mice resulted in a three-foldenrichment for sequences with forebrain enhancer activity. These datadramatically expand the catalogue of in vivo-characterized human geneenhancers and illustrate the future utility of such training sets for avariety of iological applications including decoding the regulatoryvocabulary of the human genome.

  13. Viral Metagenomics: Analysis of Begomoviruses by Illumina High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Ali Idris

    2014-03-01

    Full Text Available Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes (genus, Begomovirus; family, Geminiviridae were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA. Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS. CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions.

  14. Viral metagenomics: Analysis of begomoviruses by illumina high-throughput sequencing

    KAUST Repository

    Idris, Ali

    2014-03-12

    Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes) (genus, Begomovirus; family, Geminiviridae) were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA). Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS). CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions. 2014 by the authors; licensee MDPI, Basel, Switzerland.

  15. Molecular cloning, sequence analysis and phylogeny of first caudata g-type lysozyme in axolotl (Ambystoma mexicanum).

    Science.gov (United States)

    Yu, Haining; Gao, Jiuxiang; Lu, Yiling; Guang, Huijuan; Cai, Shasha; Zhang, Songyan; Wang, Yipeng

    2013-11-01

    Lysozymes are key proteins that play important roles in innate immune defense in many animal phyla by breaking down the bacterial cell-walls. In this study, we report the molecular cloning, sequence analysis and phylogeny of the first caudate amphibian g-lysozyme: a full-length spleen cDNA library from axolotl (Ambystoma mexicanum). A goose-type (g-lysozyme) EST was identified and the full-length cDNA was obtained using RACE-PCR. The axolotl g-lysozyme sequence represents an open reading frame for a putative signal peptide and the mature protein composed of 184 amino acids. The calculated molecular mass and the theoretical isoelectric point (pl) of this mature protein are 21523.0 Da and 4.37, respectively. Expression of g-lysozyme mRNA is predominantly found in skin, with lower levels in spleen, liver, muscle, and lung. Phylogenetic analysis revealed that caudate amphibian g-lysozyme had distinct evolution pattern for being juxtaposed with not only anura amphibian, but also with the fish, bird and mammal. Although the first complete cDNA sequence for caudate amphibian g-lysozyme is reported in the present study, clones encoding axolotl's other functional immune molecules in the full-length cDNA library will have to be further sequenced to gain insight into the fundamental aspects of antibacterial mechanisms in caudate.

  16. Sequence-based analysis of the bacterial and fungal compositions of multiple kombucha (tea fungus) samples.

    Science.gov (United States)

    Marsh, Alan J; O'Sullivan, Orla; Hill, Colin; Ross, R Paul; Cotter, Paul D

    2014-04-01

    Kombucha is a sweetened tea beverage that, as a consequence of fermentation, contains ethanol, carbon dioxide, a high concentration of acid (gluconic, acetic and lactic) as well as a number of other metabolites and is thought to contain a number of health-promoting components. The sucrose-tea solution is fermented by a symbiosis of bacteria and yeast embedded within a cellulosic pellicle, which forms a floating mat in the tea, and generates a new layer with each successful fermentation. The specific identity of the microbial populations present has been the focus of attention but, to date, the majority of studies have relied on culture-based analyses. To gain a more comprehensive insight into the kombucha microbiota we have carried out the first culture-independent, high-throughput sequencing analysis of the bacterial and fungal populations of 5 distinct pellicles as well as the resultant fermented kombucha at two time points. Following the analysis it was established that the major bacterial genus present was Gluconacetobacter, present at >85% in most samples, with only trace populations of Acetobacter detected (kombucha, also being revealed. The yeast populations were found to be dominated by Zygosaccharomyces at >95% in the fermented beverage, with a greater fungal diversity present in the cellulosic pellicle, including numerous species not identified in kombucha previously. Ultimately, this study represents the most accurate description of the microbiology of kombucha to date. Copyright © 2013 Elsevier Ltd. All rights reserved.

  17. Combined sequence-based and genetic mapping analysis of complex traits in outbred rats

    NARCIS (Netherlands)

    Baud, Amelie; Hermsen, Roel; Guryev, Victor; Stridh, Pernilla; Graham, Delyth; McBride, Martin W.; Foroud, Tatiana; Calderari, Sophie; Diez, Margarita; Ockinger, Johan; Beyeen, Amennai D.; Gillett, Alan; Abdelmagid, Nada; Guerreiro-Cacais, Andre Ortlieb; Jagodic, Maja; Tuncel, Jonatan; Norin, Ulrika; Beattie, Elisabeth; Huynh, Ngan; Miller, William H.; Koller, Daniel L.; Alam, Imranul; Falak, Samreen; Osborne-Pellegrin, Mary; Martinez-Membrives, Esther; Canete, Toni; Blazquez, Gloria; Vicens-Costa, Elia; Mont-Cardona, Carme; Diaz-Moran, Sira; Tobena, Adolf; Hummel, Oliver; Zelenika, Diana; Saar, Kathrin; Patone, Giannino; Bauerfeind, Anja; Bihoreau, Marie-Therese; Heinig, Matthias; Lee, Young-Ae; Rintisch, Carola; Schulz, Herbert; Wheeler, David A.; Worley, Kim C.; Muzny, Donna M.; Gibbs, Richard A.; Lathrop, Mark; Lansu, Nico; Toonen, Pim; Ruzius, Frans Paul; de Bruijn, Ewart; Hauser, Heidi; Adams, David J.; Keane, Thomas; Atanur, Santosh S.; Aitman, Tim J.; Flicek, Paul; Malinauskas, Tomas; Jones, E. Yvonne; Ekman, Diana; Lopez-Aumatell, Regina; Dominiczak, Anna F.; Johannesson, Martina; Holmdahl, Rikard; Olsson, Tomas; Gauguier, Dominique; Hubner, Norbert; Fernandez-Teruel, Alberto; Cuppen, Edwin; Mott, Richard; Flint, Jonathan

    Genetic mapping on fully sequenced individuals is transforming understanding of the relationship between molecular variation and variation in complex traits. Here we report a combined sequence and genetic mapping analysis in outbred rats that maps 355 quantitative trait loci for 122 phenotypes. We

  18. Sequence analysis of the mitochondrial genomes from Dutch pedigrees with Leber hereditary optic neuropathy

    NARCIS (Netherlands)

    Howell, Neil; Oostra, Roelof-Jan; Bolhuis, Piet A.; Spruijt, Liesbeth; Clarke, Lorne A.; Mackey, David A.; Preston, Gwen; Herrnstadt, Corinna

    2003-01-01

    The complete mitochondrial DNA (mtDNA) sequences for 63 Dutch pedigrees with Leber hereditary optic neuropathy (LHON) were determined, 56 of which carried one of the classic LHON mutations at nucleotide (nt) 3460, 11778, or 14484. Analysis of these sequences indicated that there were several

  19. High-Throughput Analysis of DNA Break-Induced Chromosome Rearrangements by Amplicon Sequencing.

    Science.gov (United States)

    Brown, Alexander J; Al-Soodani, Aneesa T; Saul, Miles; Her, Stephanie; Garcia, Juan C; Ramsden, Dale A; Her, Chengtao; Roberts, Steven A

    2018-01-01

    The mechanistic understanding of how DNA double-strand breaks (DSB) are repaired is rapidly advancing in part due to the advent of inducible site-specific break model systems as well as the employment of next-generation sequencing (NGS) technologies to sequence repair junctions at high depth. Unfortunately, the sheer volume of data produced by these methods makes it difficult to analyze the structure of repair junctions manually or with other general-purpose software. Here, we describe methods to produce amplicon libraries of DSB repair junctions for sequencing, to map the sequencing reads, and then to use a robust, custom python script, Hi-FiBR, to analyze the sequence structure of mapped reads. The Hi-FiBR analysis processes large data sets quickly and provides information such as number and type of repair events, size of deletion, size of insertion and inserted sequence, microhomology usage, and whether mismatches are due to sequencing error or biological effect. The analysis also corrects for common alignment errors generated by sequencing read mapping tools, allowing high-throughput analysis of DSB break repair fidelity to be accurately conducted regardless of which suite of NGS analysis software is available. © 2018 Elsevier Inc. All rights reserved.

  20. Cloning and sequence analysis of H. contortus HC58cDNA gene ...

    African Journals Online (AJOL)

    Phylogenetic analysis revealed close evolutionary proximity of the protein sequence to counterpart sequences in the cathepsin B like proteases, suggesting that HC58cDNA was a member of the papain family. Keywords:Haemonchus contortus, HC58cDNA, cathepsin B like protease, papain family. Kenya Veterinarian Vol.