WorldWideScience

Sample records for acid sequence analysis

  1. Biological sequence analysis: probabilistic models of proteins and nucleic acids

    National Research Council Canada - National Science Library

    Durbin, Richard

    1998-01-01

    ... analysis methods are now based on principles of probabilistic modelling. Examples of such methods include the use of probabilistically derived score matrices to determine the significance of sequence alignments, the use of hidden Markov models as the basis for profile searches to identify distant members of sequence families, and the inference...

  2. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    Energy Technology Data Exchange (ETDEWEB)

    Myers, G.; Foley, B.; Korber, B. [eds.] [Los Alamos National Lab., NM (United States). Theoretical Div.; Mellors, J.W. [ed.] [Univ. of Pittsburgh, PA (United States); Jeang, K.T. [ed.] [National Institutes of Health, Bethesda, MD (United States). Molecular Virology Section; Wain-Hobson, S. [Pasteur Inst., Paris (France)] [ed.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.

  3. Recent advances in nanopore-based nucleic acid analysis and sequencing

    International Nuclear Information System (INIS)

    Shi, Jidong; Fang, Ying; Hou, Junfeng

    2016-01-01

    Nanopore-based sequencing platforms are transforming the field of genomic science. This review (containing 116 references) highlights some recent progress on nanopore-based nucleic acid analysis and sequencing. These studies are classified into three categories, biological, solid-state, and hybrid nanopores, according to their nanoporous materials. We begin with a brief description of the translocation-based detection mechanism of nanopores. Next, specific examples are given in nanopore-based nucleic acid analysis and sequencing, with an emphasis on identifying strategies that can improve the resolution of nanopores. This review concludes with a discussion of future research directions that will advance the practical applications of nanopore technology. (author)

  4. Cloning and sequence analysis of putative type II fatty acid synthase ...

    Indian Academy of Sciences (India)

    Prakash

    Cloning and sequence analysis of putative type II fatty acid synthase genes from Arachis hypogaea L. ... acyl carrier protein (ACP), malonyl-CoA:ACP transacylase, β-ketoacyl-ACP .... Helix II plays a dominant role in the interaction ... main distinguishing features of plant ACPs in plastids and ..... synthase component; J. Biol.

  5. ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

    Science.gov (United States)

    Meiler, Arno; Klinger, Claudia; Kaufmann, Michael

    2012-09-08

    The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.

  6. ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

    Directory of Open Access Journals (Sweden)

    Meiler Arno

    2012-09-01

    Full Text Available Abstract Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.

  7. ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

    Science.gov (United States)

    2012-01-01

    Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836

  8. fCCAC: functional canonical correlation analysis to evaluate covariance between nucleic acid sequencing datasets.

    Science.gov (United States)

    Madrigal, Pedro

    2017-03-01

    Computational evaluation of variability across DNA or RNA sequencing datasets is a crucial step in genomic science, as it allows both to evaluate reproducibility of biological or technical replicates, and to compare different datasets to identify their potential correlations. Here we present fCCAC, an application of functional canonical correlation analysis to assess covariance of nucleic acid sequencing datasets such as chromatin immunoprecipitation followed by deep sequencing (ChIP-seq). We show how this method differs from other measures of correlation, and exemplify how it can reveal shared covariance between histone modifications and DNA binding proteins, such as the relationship between the H3K4me3 chromatin mark and its epigenetic writers and readers. An R/Bioconductor package is available at http://bioconductor.org/packages/fCCAC/ . pmb59@cam.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  9. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3.

    Directory of Open Access Journals (Sweden)

    Xiaoyu Wang

    Full Text Available Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals.

  10. Comparative sequence analysis of acid sensitive/resistance proteins in Escherichia coli and Shigella flexneri

    Science.gov (United States)

    Manikandan, Selvaraj; Balaji, Seetharaaman; Kumar, Anil; Kumar, Rita

    2007-01-01

    The molecular basis for the survival of bacteria under extreme conditions in which growth is inhibited is a question of great current interest. A preliminary study was carried out to determine residue pattern conservation among the antiporters of enteric bacteria, responsible for extreme acid sensitivity especially in Escherichia coli and Shigella flexneri. Here we found the molecular evidence that proved the relationship between E. coli and S. flexneri. Multiple sequence alignment of the gadC coded acid sensitive antiporter showed many conserved residue patterns at regular intervals at the N-terminal region. It was observed that as the alignment approaches towards the C-terminal, the number of conserved residues decreases, indicating that the N-terminal region of this protein has much active role when compared to the carboxyl terminal. The motif, FHLVFFLLLGG, is well conserved within the entire gadC coded protein at the amino terminal. The motif is also partially conserved among other antiporters (which are not coded by gadC) but involved in acid sensitive/resistance mechanism. Phylogenetic cluster analysis proves the relationship of Escherichia coli and Shigella flexneri. The gadC coded proteins are converged as a clade and diverged from other antiporters belongs to the amino acid-polyamine-organocation (APC) superfamily. PMID:21670792

  11. Amino acid sequence analysis of the annexin super-gene family of proteins.

    Science.gov (United States)

    Barton, G J; Newman, R H; Freemont, P S; Crumpton, M J

    1991-06-15

    The annexins are a widespread family of calcium-dependent membrane-binding proteins. No common function has been identified for the family and, until recently, no crystallographic data existed for an annexin. In this paper we draw together 22 available annexin sequences consisting of 88 similar repeat units, and apply the techniques of multiple sequence alignment, pattern matching, secondary structure prediction and conservation analysis to the characterisation of the molecules. The analysis clearly shows that the repeats cluster into four distinct families and that greatest variation occurs within the repeat 3 units. Multiple alignment of the 88 repeats shows amino acids with conserved physicochemical properties at 22 positions, with only Gly at position 23 being absolutely conserved in all repeats. Secondary structure prediction techniques identify five conserved helices in each repeat unit and patterns of conserved hydrophobic amino acids are consistent with one face of a helix packing against the protein core in predicted helices a, c, d, e. Helix b is generally hydrophobic in all repeats, but contains a striking pattern of repeat-specific residue conservation at position 31, with Arg in repeats 4 and Glu in repeats 2, but unconserved amino acids in repeats 1 and 3. This suggests repeats 2 and 4 may interact via a buried saltbridge. The loop between predicted helices a and b of repeat 3 shows features distinct from the equivalent loop in repeats 1, 2 and 4, suggesting an important structural and/or functional role for this region. No compelling evidence emerges from this study for uteroglobin and the annexins sharing similar tertiary structures, or for uteroglobin representing a derivative of a primordial one-repeat structure that underwent duplication to give the present day annexins. The analyses performed in this paper are re-evaluated in the Appendix, in the light of the recently published X-ray structure for human annexin V. The structure confirms most of

  12. Complete cDNA sequence and amino acid analysis of a bovine ribonuclease K6 gene.

    Science.gov (United States)

    Pietrowski, D; Förster, M

    2000-01-01

    The complete cDNA sequence of a ribonuclease k6 gene of Bos Taurus has been determined. It codes for a protein with 154 amino acids and contains the invariant cysteine, histidine and lysine residues as well as the characteristic motifs specific to ribonuclease active sites. The deduced protein sequence is 27 residues longer than other known ribonucleases k6 and shows amino acids exchanges which could reflect a strain specificity or polymorphism within the bovine genome. Based on sequence similarity we have termed the identified gene bovine ribonuclease k6 b (brk6b).

  13. A Single Electrochemical Probe Used for Analysis of Multiple Nucleic Acid Sequences

    Science.gov (United States)

    Mills, Dawn M.; Calvo-Marzal, Percy; Pinzon, Jeffer M.; Armas, Stephanie; Kolpashchikov, Dmitry M.; Chumbimuni-Torres, Karin Y.

    2017-01-01

    Electrochemical hybridization sensors have been explored extensively for analysis of specific nucleic acids. However, commercialization of the platform is hindered by the need for attachment of separate oligonucleotide probes complementary to a RNA or DNA target to an electrode’s surface. Here we demonstrate that a single probe can be used to analyze several nucleic acid targets with high selectivity and low cost. The universal electrochemical four-way junction (4J)-forming (UE4J) sensor consists of a universal DNA stem-loop (USL) probe attached to the electrode’s surface and two adaptor strands (m and f) which hybridize to the USL probe and the analyte to form a 4J associate. The m adaptor strand was conjugated with a methylene blue redox marker for signal ON sensing and monitored using square wave voltammetry. We demonstrated that a single sensor can be used for detection of several different DNA/RNA sequences and can be regenerated in 30 seconds by a simple water rinse. The UE4J sensor enables a high selectivity by recognition of a single base substitution, even at room temperature. The UE4J sensor opens a venue for a re-useable universal platform that can be adopted at low cost for the analysis of DNA or RNA targets. PMID:29371782

  14. Microwave-assisted acid and base hydrolysis of intact proteins containing disulfide bonds for protein sequence analysis by mass spectrometry.

    Science.gov (United States)

    Reiz, Bela; Li, Liang

    2010-09-01

    Controlled hydrolysis of proteins to generate peptide ladders combined with mass spectrometric analysis of the resultant peptides can be used for protein sequencing. In this paper, two methods of improving the microwave-assisted protein hydrolysis process are described to enable rapid sequencing of proteins containing disulfide bonds and increase sequence coverage, respectively. It was demonstrated that proteins containing disulfide bonds could be sequenced by MS analysis by first performing hydrolysis for less than 2 min, followed by 1 h of reduction to release the peptides originally linked by disulfide bonds. It was shown that a strong base could be used as a catalyst for microwave-assisted protein hydrolysis, producing complementary sequence information to that generated by microwave-assisted acid hydrolysis. However, using either acid or base hydrolysis, amide bond breakages in small regions of the polypeptide chains of the model proteins (e.g., cytochrome c and lysozyme) were not detected. Dynamic light scattering measurement of the proteins solubilized in an acid or base indicated that protein-protein interaction or aggregation was not the cause of the failure to hydrolyze certain amide bonds. It was speculated that there were some unknown local structures that might play a role in preventing an acid or base from reacting with the peptide bonds therein. 2010 American Society for Mass Spectrometry. Published by Elsevier Inc. All rights reserved.

  15. Structural analysis of complementary DNA and amino acid sequences of human and rat androgen receptors

    International Nuclear Information System (INIS)

    Chang, C.; Kokontis, J.; Liao, S.

    1988-01-01

    Structural analysis of cDNAs for human and rat androgen receptors (ARs) indicates that the amino-terminal regions of ARs are rich in oligo- and poly(amino acid) motifs as in some homeotic genes. The human AR has a long stretch of repeated glycines, whereas rat AR has a long stretch of glutamines. There is a considerable sequence similarity among ARs and the receptors for glucocorticoids, progestins, and mineralocorticoids within the steroid-binding domains. The cysteine-rich DNA-binding domains are well conserved. Translation of mRNA transcribed from AR cDNAs yielded 94- and 76-kDa proteins and smaller forms that bind to DNA and have high affinity toward androgens. These rat or human ARs were recognized by human autoantibodies to natural Ars. Molecular hybridization studies, using AR cDNAs as probes, indicated that the ventral prostate and other male accessory organs are rich in AR mRNA and that the production of AR mRNA in the target organs may be autoregulated by androgens

  16. A New Approach to Sequence Analysis Exemplified by Identification of cis-Elements in Abscisic Acid Inducible Promoters

    DEFF Research Database (Denmark)

    Busk, Peter Kamp; Hallin, Peter Fischer; Salomon, Jesper

    -regulatory elements. We have developed a method for identifying short, conserved motifs in biological sequences such as proteins, DNA and RNA5. This method was used for analysis of approximately 2000 Arabidopsis thaliana promoters that have been shown by DNA array analysis to be induced by abscisic acid6....... These promoters were compared to 28000 promoters that are not induced by abscisic acid. The analysis identified previously described ABA-inducible promoter elements such as ABRE, CE3 and CRT1 but also new cis-elements were found. Furthermore, the list of DNA elements could be used to predict ABA...

  17. Exome sequencing and SNP analysis detect novel compound heterozygosity in fatty acid hydroxylase-associated neurodegeneration

    Science.gov (United States)

    Pierson, Tyler Mark; Simeonov, Dimitre R; Sincan, Murat; Adams, David A; Markello, Thomas; Golas, Gretchen; Fuentes-Fajardo, Karin; Hansen, Nancy F; Cherukuri, Praveen F; Cruz, Pedro; Blackstone, Craig; Tifft, Cynthia; Boerkoel, Cornelius F; Gahl, William A

    2012-01-01

    Fatty acid hydroxylase-associated neurodegeneration due to fatty acid 2-hydroxylase deficiency presents with a wide range of phenotypes including spastic paraplegia, leukodystrophy, and/or brain iron deposition. All previously described families with this disorder were consanguineous, with homozygous mutations in the probands. We describe a 10-year-old male, from a non-consanguineous family, with progressive spastic paraplegia, dystonia, ataxia, and cognitive decline associated with a sural axonal neuropathy. The use of high-throughput sequencing techniques combined with SNP array analyses revealed a novel paternally derived missense mutation and an overlapping novel maternally derived ∼28-kb genomic deletion in FA2H. This patient provides further insight into the consistent features of this disorder and expands our understanding of its phenotypic presentation. The presence of a sural nerve axonal neuropathy had not been previously associated with this disorder and so may extend the phenotype. PMID:22146942

  18. Nucleotide and amino acid sequences of a coat protein of an Ukrainian isolate of Potato virus Y: comparison with homologous sequences of other isolates and phylogenetic analysis

    Directory of Open Access Journals (Sweden)

    Budzanivska I. G.

    2014-03-01

    Full Text Available Aim. Identification of the widespread Ukrainian isolate(s of PVY (Potato virus Y in different potato cultivars and subsequent phylogenetic analysis of detected PVY isolates based on NA and AA sequences of coat protein. Methods. ELISA, RT-PCR, DNA sequencing and phylogenetic analysis. Results. PVY has been identified serologically in potato cultivars of Ukrainian selection. In this work we have optimized a method for total RNA extraction from potato samples and offered a sensitive and specific PCR-based test system of own design for diagnostics of the Ukrainian PVY isolates. Part of the CP gene of the Ukrainian PVY isolate has been sequenced and analyzed phylogenetically. It is demonstrated that the Ukrainian isolate of Potato virus Y (CP gene has a higher percentage of homology with the recombinant isolates (strains of this pathogen (approx. 98.8– 99.8 % of homology for both nucleotide and translated amino acid sequences of the CP gene. The Ukrainian isolate of PVY is positioned in the separate cluster together with the isolates found in Syria, Japan and Iran; these isolates possibly have common origin. The Ukrainian PVY isolate is confirmed to be recombinant. Conclusions. This work underlines the need and provides the means for accurate monitoring of Potato virus Y in the agroecosystems of Ukraine. Most importantly, the phylogenetic analysis demonstrated the recombinant nature of this PVY isolate which has been attributed to the strain group O, subclade N:O.

  19. Protein sequence analysis by incorporating modified chaos game and physicochemical properties into Chou's general pseudo amino acid composition.

    Science.gov (United States)

    Xu, Chunrui; Sun, Dandan; Liu, Shenghui; Zhang, Yusen

    2016-10-07

    In this contribution we introduced a novel graphical method to compare protein sequences. By mapping a protein sequence into 3D space based on codons and physicochemical properties of 20 amino acids, we are able to get a unique P-vector from the 3D curve. This approach is consistent with wobble theory of amino acids. We compute the distance between sequences by their P-vectors to measure similarities/dissimilarities among protein sequences. Finally, we use our method to analyze four datasets and get better results compared with previous approaches. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. RNA Sequencing and Coexpression Analysis Reveal Key Genes Involved in α-Linolenic Acid Biosynthesis in Perilla frutescens Seed

    Directory of Open Access Journals (Sweden)

    Tianyuan Zhang

    2017-11-01

    Full Text Available Perilla frutescen is used as traditional food and medicine in East Asia. Its seeds contain high levels of α-linolenic acid (ALA, which is important for health, but is scarce in our daily meals. Previous reports on RNA-seq of perilla seed had identified fatty acid (FA and triacylglycerol (TAG synthesis genes, but the underlying mechanism of ALA biosynthesis and its regulation still need to be further explored. So we conducted Illumina RNA-sequencing in seven temporal developmental stages of perilla seeds. Sequencing generated a total of 127 million clean reads, containing 15.88 Gb of valid data. The de novo assembly of sequence reads yielded 64,156 unigenes with an average length of 777 bp. A total of 39,760 unigenes were annotated and 11,693 unigenes were found to be differentially expressed in all samples. According to Kyoto Encyclopedia of Genes and Genomes (KEGG pathway analysis, 486 unigenes were annotated in the “lipid metabolism” pathway. Of these, 150 unigenes were found to be involved in fatty acid (FA biosynthesis and triacylglycerol (TAG assembly in perilla seeds. A coexpression analysis showed that a total of 104 genes were highly coexpressed (r > 0.95. The coexpression network could be divided into two main subnetworks showing over expression in the medium or earlier and late phases, respectively. In order to identify the putative regulatory genes, a transcription factor (TF analysis was performed. This led to the identification of 45 gene families, mainly including the AP2-EREBP, bHLH, MYB, and NAC families, etc. After coexpression analysis of TFs with highly expression of FAD2 and FAD3 genes, 162 TFs were found to be significantly associated with two FAD genes (r > 0.95. Those TFs were predicted to be the key regulatory factors in ALA biosynthesis in perilla seed. The qRT-PCR analysis also verified the relevance of expression pattern between two FAD genes and partial candidate TFs. Although it has been reported that some TFs

  1. The structural analysis of protein sequences based on the quasi-amino acids code

    International Nuclear Information System (INIS)

    Ping, Zhu; Xu-Qing, Tang; Zhen-Yuan, Xu

    2009-01-01

    Proteomics is the study of proteins and their interactions in a cell. With the successful completion of the Human Genome Project, it comes the postgenome era when the proteomics technology is emerging. This paper studies protein molecule from the algebraic point of view. The algebraic system (Σ, +, *) is introduced, where Σ is the set of 64 codons. According to the characteristics of (Σ, +, *), a novel quasi-amino acids code classification method is introduced and the corresponding algebraic operation table over the set ZU of the 16 kinds of quasi-amino acids is established. The internal relation is revealed about quasi-amino acids. The results show that there exist some very close correlations between the properties of the quasi-amino acids and the codon. All these correlation relationships may play an important part in establishing the logic relationship between codons and the quasi-amino acids during the course of life origination. According to Ma F et al (2003 J. Anhui Agricultural University 30 439), the corresponding relation and the excellent properties about amino acids code are very difficult to observe. The present paper shows that (ZU, ⊕, ) is a field. Furthermore, the operational results display that the codon tga has different property from other stop codons. In fact, in the mitochondrion from human and ox genomic codon, tga is just tryptophane, is not the stop codon like in other genetic code, it is the case of the Chen W C et al (2002 Acta Biophysica Sinica 18(1) 87). The present theory avoids some inexplicable events of the 20 kinds of amino acids code, in other words it solves the problem of 'the 64 codon assignments of mRNA to amino acids is probably completely wrong' proposed by Yang (2006 Progress in Modern Biomedicine 6 3). (cross-disciplinary physics and related areas of science and technology)

  2. Molecular cloning and sequence analysis of complementary DNA encoding rat mammary gland medium-chain S-acyl fatty acid synthetase thio ester hydrolase

    International Nuclear Information System (INIS)

    Safford, R.; de Silva, J.; Lucas, C.

    1987-01-01

    Poly(A) + RNA from pregnant rat mammary glands was size-fractionated by sucrose gradient centrifugation, and fractions enriched in medium-chain S-acyl fatty acid synthetase thio ester hydrolase (MCH) were identified by in vitro translation and immunoprecipitation. A cDNA library was constructed, in pBR322, from enriched poly(A) + RNA and screened with two oligonucleotide probes deduced from rat MCH amino acid sequence data. Cross-hybridizing clones were isolated and found to contain cDNA inserts ranging from ∼ 1100 to 1550 base pairs (bp). A 1550-bp cDNA insert, from clone 43H09, was confirmed to encode MCH by hybrid-select translation/immunoprecipitation studies and by comparison of the amino acid sequence deduced from the DNA sequence of the clone to the amino acid sequence of the MCH peptides. Northern blot analysis revealed the size of the MCH mRNA to be 1500 nucleotides, and it is therefore concluded that the 1550-bp insert (including G x C tails) of clone 43H09 represents a full- or near-full-length copy of the MCH gene. The rat MCH sequence is the first reported sequence of a thioesterase from a mammalian source, but comparison of the deduced amino acid sequences of MCH and the recently published mallard duck medium-chain S-acyl fatty acid synthetase thioesterase reveals significant homology. In particular, a seven amino acid sequence containing the proposed active serine of the duck thioesterase is found to be perfectly conserved in rat MCH

  3. Bioinformatics analysis of the oxidosqualene cyclase gene and the amino acid sequence in mangrove plants

    Science.gov (United States)

    Basyuni, M.; Wati, R.

    2017-01-01

    This study described the bioinformatics methods to analyze seven oxidosqualene cyclase (OSC) genes from mangrove plants on DDBJ/EMBL/GenBank as well as predicted the structure, composition, similarity, subcellular localization and phylogenetic. The physical and chemical properties of seven mangrove OSC showed variation among the genes. The percentage of the secondary structure of seven mangrove OSC genes followed the order of a helix > random coil > extended chain structure. The values of chloroplast or signal peptide were too low, indicated that no chloroplast transit peptide or signal peptide of secretion pathway in mangrove OSC genes. The target peptide value of mitochondria varied from 0.163 to 0.430, indicated it was possible to exist. These results suggested the importance of understanding the diversity and functional of properties of the different amino acids in mangrove OSC genes. To clarify the relationship among the mangrove OSC gene, a phylogenetic tree was constructed. The phylogenetic tree shows that there are three clusters, Kandelia KcMS join with Bruguiera BgLUS, Rhizophora RsM1 was close to Bruguiera BgbAS, and Rhizophora RcCAS join with Kandelia KcCAS. The present study, therefore, supported the previous results that plant OSC genes form distinct clusters in the tree.

  4. Molecular cloning of chicken metallothionein. Deduction of the complete amino acid sequence and analysis of expression using cloned cDNA

    Energy Technology Data Exchange (ETDEWEB)

    Wei, D; Andrews, G K

    1988-01-25

    A cDNA library was constructed using RNA isolated from the livers of chickens which had been treated with zinc. This library was screened with a RNA probe complementary to mouse metallothionein-I (MT), and eight chicken MT cDNA clones were obtained. All of the cDNA clones contained nucleotide sequences homologous to regions of the longest (375 bp) cDNA clone. The latter contained an open reading frame of 189 bp, and the deduced amino acid sequence indicates a protein of 63 amino acids of which 20 are cysteine residues. Amino acid composition and partial amino acid sequence analyses of purified chicken MT protein agreed with the amino acid composition and sequence deduced from the cloned cDNA. Amino acid sequence comparison establish that chicken MT shares extensive homology with mammalian MTs. Southern blot analysis of chicken DNA indicates that the chicken MT gene is not a part of a large family of related sequences, but rather is likely to be a unique gene sequence. In the chicken liver, levels of chicken MT mRNA were rapidly induced by metals (Cd/sup 2 +/, Zn/sup 2 +/, Cu/sup 2 +/), glucocorticoids and lipopolysaccharide. MT mRNA was present in low levels in embryonic liver and increased to high levels during the first week after hatching before decreasing again to the basal levels found in adult liver. The results of this study establish that MT is highly conserved between birds and mammals and is regulated in the chicken by agents which also regulate expression of mammalian MT genes. However, in contrast to the mammals, the results suggest the existence of a single isoform of MT in the chicken.

  5. Human Retroviruses and AIDS. A compilation and analysis of nucleic acid and amino acid sequences: I--II; III--V

    Energy Technology Data Exchange (ETDEWEB)

    Myers, G.; Korber, B. [eds.] [Los Alamos National Lab., NM (United States); Wain-Hobson, S. [ed.] [Laboratory of Molecular Retrovirology, Pasteur Inst.; Smith, R.F. [ed.] [Baylor Coll. of Medicine, Houston, TX (United States). Dept. of Pharmacology; Pavlakis, G.N. [ed.] [National Cancer Inst., Frederick, MD (United States). Cancer Research Facility

    1993-12-31

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (I) HIV and SIV Nucleotide Sequences; (II) Amino Acid Sequences; (III) Analyses; (IV) Related Sequences; and (V) Database Communications. Information within all the parts is updated at least twice in each year, which accounts for the modes of binding and pagination in the compendium.

  6. Direct, rapid RNA sequence analysis

    International Nuclear Information System (INIS)

    Peattie, D.A.

    1987-01-01

    The original methods of RNA sequence analysis were based on enzymatic production and chromatographic separation of overlapping oligonucleotide fragments from within an RNA molecule followed by identification of the mononucleotides comprising the oligomer. Over the past decade the field of nucleic acid sequencing has changed dramatically, however, and RNA molecules now can be sequenced in a variety of more streamlined fashions. Most of the more recent advances in RNA sequencing have involved one-dimensional electrophoretic separation of 32 P-end-labeled oligoribonucleotides on polyacrylamide gels. In this chapter the author discusses two of these methods for determining the nucleotide sequences of RNA molecules rapidly: the chemical method and the enzymatic method. Both methods are direct and degradative, i.e., they rely on fragmatic and chemical approaches should be utilized. The single-strand-specific ribonucleases (A, T 1 , T 2 , and S 1 ) provide an efficient means to locate double-helical regions rapidly, and the chemical reactions provide a means to determine the RNA sequence within these regions. In addition, the chemical reactions allow one to assign interactions to specific atoms and to distinguish secondary interactions from tertiary ones. If the RNA molecule is small enough to be sequenced directly by the enzymatic or chemical method, the probing reactions can be done easily at the same time as sequencing reactions

  7. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene...

  8. Mass Spectrometry Analysis Coupled with de novo Sequencing Reveals Amino Acid Substitutions in Nucleocapsid Protein from Influenza A Virus

    Directory of Open Access Journals (Sweden)

    Zijian Li

    2014-02-01

    Full Text Available Amino acid substitutions in influenza A virus are the main reasons for both antigenic shift and virulence change, which result from non-synonymous mutations in the viral genome. Nucleocapsid protein (NP, one of the major structural proteins of influenza virus, is responsible for regulation of viral RNA synthesis and replication. In this report we used LC-MS/MS to analyze tryptic digestion of nucleocapsid protein of influenza virus (A/Puerto Rico/8/1934 H1N1, which was isolated and purified by SDS poly-acrylamide gel electrophoresis. Thus, LC-MS/MS analyses, coupled with manual de novo sequencing, allowed the determination of three substituted amino acid residues R452K, T423A and N430T in two tryptic peptides. The obtained results provided experimental evidence that amino acid substitutions resulted from non-synonymous gene mutations could be directly characterized by mass spectrometry in proteins of RNA viruses such as influenza A virus.

  9. Image sequence analysis

    CERN Document Server

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  10. Barley polyamine oxidase: Characterisation and analysis of the cofactor and the N-terminal amino acid sequence

    DEFF Research Database (Denmark)

    Radova, A.; Sebela, M.; Galuszka, P.

    2001-01-01

    This paper reports the first purification method developed for the isolation of an homogeneous polyamine oxidase (PAO) from etiolated barley seedlings. The crude enzyme preparation was obtained after initial precipitation of the extract with protamine sulphate and ammonium sulphate. The enzyme...... was further confirmed by measuring the fluorescence spectra, Barley PAO is an acidic protein (pI 5.4) containing 3% of neutral sugars: its molecular mass determined by SDS-PAGE was 56 kDa, whilst gel permeation chromatography revealed the higher value of 76 kDa. The N-terminal amino acid sequence of barley...... PAO shows a high degree of similarity to that of maize PAO and to several other flavoprotein oxidases. The polyamines spermine and spermidine were the only two substrates of the enzyme with K-m values 4 x 10(-5) and 3 x 10(-5) M and pH optima of 5.0 and 6.0, respectively. Barley polyamine oxidase...

  11. Amino Acids Sequence Based in Silico Analysis of RuBisCO (Ribulose-1,5 Bisphosphate Carboxylase Oxygenase Proteins in Some Carthamus L. ssp.

    Directory of Open Access Journals (Sweden)

    Emre SEVİNDİK

    2017-06-01

    Full Text Available RuBisCO is an important enzyme for plants to photosynthesize and balance carbon dioxide in the atmosphere. This study aimed to perform sequence, physicochemical, phylogenetic and 3D (three-dimensional comparative analyses of RuBisCO proteins in the Carthamus ssp. using various bioinformatics tools. The sequence lengths of the RuBisCO proteins were between 166 and 477 amino acids, with an average length of 411.8 amino acids. Their molecular weights (Mw ranged from 18711.47 to 52843.09 Da; the most acidic and basic protein sequences were detected in C. tinctorius (pI = 5.99 and in C. tenuis (pI = 6.92, respectively. The extinction coefficients of RuBisCO proteins at 280 nm ranged from 17,670 to 69,830 M-1 cm-1, the instability index (II values for RuBisCO proteins ranged from 33.31 to 39.39, while the GRAVY values of RuBisCO proteins ranged from -0.313 to -0.250. The most abundant amino acid in the RuBisCO protein was Gly (9.7%, while the least amino acid ratio was Trp (1.6 %. The putative phosphorylation sites of RuBisCO proteins were determined by NetPhos 2.0. Phylogenetic analysis revealed that RuBisCO proteins formed two main clades. A RAMPAGE analysis revealed that 96.3%-97.6% of residues were located in the favoured region of RuBisCO proteins. To predict the three dimensional (3D structure of the RuBisCO proteins PyMOL was used. The results of the current study provide insights into fundamental characteristic of RuBisCO proteins in Carthamus ssp.

  12. Identification of single amino acid substitutions (SAAS) in neuraminidase from influenza a virus (H1N1) via mass spectrometry analysis coupled with de novo peptide sequencing.

    Science.gov (United States)

    Peng, Qisheng; Wang, Zijian; Wu, Donglin; Li, Xiaoou; Liu, Xiaofeng; Sun, Wanchun; Liu, Ning

    2016-08-01

    Amino acid substitutions in the neuraminidase of the influenza virus are the main cause of the emergence of resistance to zanamivir or oseltamivir during seasonal influenza treatment; they are the result of non-synonymous mutations in the viral genome that can be successfully detected by polymer chain reaction (PCR)-based approaches. There is always an urgent need to detect variation in amino acid sequences directly at the protein level. Mass spectrometry coupled with de novo sequencing has been explored as an alternative and straightforward strategy for detecting amino acid substitutions, as well - this approach is the primary focus of the present study. Influenza virus (A/Puerto Rico/8/1934 H1N1) propagated in embryonated chicken eggs was purified by ultracentrifugation, followed by PNGase F treatment. The deglycosylated virion was lysed and separated by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). The gel band corresponding to neuraminidase was picked up and subjected to liquid chromatography tandem mass spectrometry (LC-MS/MS) analysis. LC-MS/MS analyses, coupled with manual de novo sequencing, allowed the determination of three amino acid substitutions: R346K, S349 N, and S370I/L, in the neuraminidase from the influenza virus (A/Puerto Rico/8/1934 H1N1), which were located in three mutated peptides of the neuraminidase: YGNGVWIGK, TKNHSSR, and PNGWTETDI/LK, respectively. We found that the amino acid substitutions in the proteins of RNA viruses (including influenza A virus) resulting from non-synonymous gene mutations can indeed be directly analyzed via mass spectrometry, and that manual interpretation of the MS/MS data may be beneficial. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  13. Differences in acid tolerance between Bifidobacterium breve BB8 and its acid-resistant derivative B. breve BB8dpH, revealed by RNA-sequencing and physiological analysis.

    Science.gov (United States)

    Yang, Xu; Hang, Xiaomin; Tan, Jing; Yang, Hong

    2015-06-01

    Bifidobacteria are common inhabitants of the human gastrointestinal tract, and their application has increased dramatically in recent years due to their health-promoting effects. The ability of bifidobacteria to tolerate acidic environments is particularly important for their function as probiotics because they encounter such environments in food products and during passage through the gastrointestinal tract. In this study, we generated a derivative, Bifidobacterium breve BB8dpH, which displayed a stable, acid-resistant phenotype. To investigate the possible reasons for the higher acid tolerance of B. breve BB8dpH, as compared with its parental strain B. breve BB8, a combined transcriptome and physiological approach was used to characterize differences between the two strains. An analysis of the transcriptome by RNA-sequencing indicated that the expression of 121 genes was increased by more than 2-fold, while the expression of 146 genes was reduced more than 2-fold, in B. breve BB8dpH. Validation of the RNA-sequencing data using real-time quantitative PCR analysis demonstrated that the RNA-sequencing results were highly reliable. The comparison analysis, based on differentially expressed genes, suggested that the acid tolerance of B. breve BB8dpH was enhanced by regulating the expression of genes involved in carbohydrate transport and metabolism, energy production, synthesis of cell envelope components (peptidoglycan and exopolysaccharide), synthesis and transport of glutamate and glutamine, and histidine synthesis. Furthermore, an analysis of physiological data showed that B. breve BB8dpH displayed higher production of exopolysaccharide and lower H(+)-ATPase activity than B. breve BB8. The results presented here will improve our understanding of acid tolerance in bifidobacteria, and they will lead to the development of new strategies to enhance the acid tolerance of bifidobacterial strains. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. Variation of amino acid sequences of serum amyloid a (SAA) and immunohistochemical analysis of amyloid a (AA) in Japanese domestic cats.

    Science.gov (United States)

    Tei, Meina; Uchida, Kazuyuki; Chambers, James K; Watanabe, Ken-Ichi; Tamamoto, Takashi; Ohno, Koichi; Nakayama, Hiroyuki

    2018-02-02

    Amyloid A (AA) amyloidosis, a fatal systemic amyloid disease, occurs secondary to chronic inflammatory conditions in humans. Although persistently elevated serum amyloid A (SAA) levels are required for its pathogenesis, not all individuals with chronic inflammation necessarily develop AA amyloidosis. Furthermore, many diseases in cats are associated with the elevated production of SAA, whereas only a small number actually develop AA amyloidosis. We hypothesized that a genetic mutation in the SAA gene may strongly contribute to the pathogenesis of feline AA amyloidosis. In the present study, genomic DNA from four Japanese domestic cats (JDCs) with AA amyloidosis and from five without amyloidosis was analyzed using polymerase chain reaction (PCR) amplification and direct sequencing. We identified the novel variation combination of 45R-51A in the deduced amino acid sequences of four JDCs with amyloidosis and five without. However, there was no relationship between amino acid variations and the distribution of AA amyloid deposits, indicating that differences in SAA sequences do not contribute to the pathogenesis of AA amyloidosis. Immunohistochemical analysis using antisera against the three different parts of the feline SAA protein-i.e., the N-terminal, central, and C-terminal regions-revealed that feline AA contained the C-terminus, unlike human AA. These results indicate that the cleavage and degradation of the C-terminus are not essential for amyloid fibril formation in JDCs.

  15. Optimization of short amino acid sequences classifier

    Science.gov (United States)

    Barcz, Aleksy; Szymański, Zbigniew

    This article describes processing methods used for short amino acid sequences classification. The data processed are 9-symbols string representations of amino acid sequences, divided into 49 data sets - each one containing samples labeled as reacting or not with given enzyme. The goal of the classification is to determine for a single enzyme, whether an amino acid sequence would react with it or not. Each data set is processed separately. Feature selection is performed to reduce the number of dimensions for each data set. The method used for feature selection consists of two phases. During the first phase, significant positions are selected using Classification and Regression Trees. Afterwards, symbols appearing at the selected positions are substituted with numeric values of amino acid properties taken from the AAindex database. In the second phase the new set of features is reduced using a correlation-based ranking formula and Gram-Schmidt orthogonalization. Finally, the preprocessed data is used for training LS-SVM classifiers. SPDE, an evolutionary algorithm, is used to obtain optimal hyperparameters for the LS-SVM classifier, such as error penalty parameter C and kernel-specific hyperparameters. A simple score penalty is used to adapt the SPDE algorithm to the task of selecting classifiers with best performance measures values.

  16. Computer-aided visualization and analysis system for sequence evaluation

    Energy Technology Data Exchange (ETDEWEB)

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  17. HTSSIP: An R package for analysis of high throughput sequencing data from nucleic acid stable isotope probing (SIP experiments.

    Directory of Open Access Journals (Sweden)

    Nicholas D Youngblut

    Full Text Available Combining high throughput sequencing with stable isotope probing (HTS-SIP is a powerful method for mapping in situ metabolic processes to thousands of microbial taxa. However, accurately mapping metabolic processes to taxa is complex and challenging. Multiple HTS-SIP data analysis methods have been developed, including high-resolution stable isotope probing (HR-SIP, multi-window high-resolution stable isotope probing (MW-HR-SIP, quantitative stable isotope probing (qSIP, and ΔBD. Currently, there is no publicly available software designed specifically for analyzing HTS-SIP data. To address this shortfall, we have developed the HTSSIP R package, an open-source, cross-platform toolset for conducting HTS-SIP analyses in a straightforward and easily reproducible manner. The HTSSIP package, along with full documentation and examples, is available from CRAN at https://cran.r-project.org/web/packages/HTSSIP/index.html and Github at https://github.com/buckleylab/HTSSIP.

  18. Detection of nucleic acid sequences by invader-directed cleavage

    Science.gov (United States)

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  19. Hybridization and sequencing of nucleic acids using base pair mismatches

    Science.gov (United States)

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  20. Integrated sequence analysis. Final report

    International Nuclear Information System (INIS)

    Andersson, K.; Pyy, P.

    1998-02-01

    The NKS/RAK subprojet 3 'integrated sequence analysis' (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term 'methodology' denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  1. Amino acid sequences and structures of chicken and turkey beta 2-microglobulin

    DEFF Research Database (Denmark)

    Welinder, K G; Jespersen, H M; Walther-Rasmussen, J

    1991-01-01

    The complete amino acid sequences of chicken and turkey beta 2-microglobulins have been determined by analyses of tryptic, V8-proteolytic and cyanogen bromide fragments, and by N-terminal sequencing. Mass spectrometric analysis of chicken beta 2-microglobulin supports the sequence-derived Mr of 11...

  2. Fractals in DNA sequence analysis

    Institute of Scientific and Technical Information of China (English)

    Yu Zu-Guo(喻祖国); Vo Anh; Gong Zhi-Min(龚志民); Long Shun-Chao(龙顺潮)

    2002-01-01

    Fractal methods have been successfully used to study many problems in physics, mathematics, engineering, finance,and even in biology. There has been an increasing interest in unravelling the mysteries of DNA; for example, how can we distinguish coding and noncoding sequences, and the problems of classification and evolution relationship of organisms are key problems in bioinformatics. Although much research has been carried out by taking into consideration the long-range correlations in DNA sequences, and the global fractal dimension has been used in these works by other people, the models and methods are somewhat rough and the results are not satisfactory. In recent years, our group has introduced a time series model (statistical point of view) and a visual representation (geometrical point of view)to DNA sequence analysis. We have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. In this paper, we introduce these fractal models and methods and the results of DNA sequence analysis.

  3. Integrated sequence analysis. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Andersson, K.; Pyy, P

    1998-02-01

    The NKS/RAK subprojet 3 `integrated sequence analysis` (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term `methodology` denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  4. Sequence analysis of Epstein-Barr virus EBNA-2 gene coding amino acid 148-487 in nasopharyngeal and gastric carcinomas

    Directory of Open Access Journals (Sweden)

    Wang Xinying

    2012-02-01

    Full Text Available Abstract Background The Epstein-Barr virus (EBV nuclear antigen 2 (EBNA-2 plays a key role in the B-cell growth transformation by initiating and maintaining the proliferation of infected B-cell upon EBV infection in vitro. Most studies about EBNA-2 have focused on its functions yet little is known for its intertypic polymorphisms. Results Coding region for amino acid (aa 148-487 of the EBNA-2 gene was sequenced in 25 EBV-associated gastric carcinomas (EBVaGCs, 56 nasopharyngeal carcinomas (NPCs and 32 throat washings (TWs from healthy donors in Northern China. Three variations (g48991t, c48998a, t49613a were detected in all of the samples (113/113, 100%. EBNA-2 could be classified into four distinct subtypes: E2-A, E2-B, E2-C and E2-D based on the deletion status of three aa (294Q, 357K and 358G. Subtypes E2-A and E2-C were detected in 56/113 (49.6%, 38/113 (33.6% samples, respectively. E2-A was observed more in EBVaGCs samples and subtype E2-D was only detected in the NPC samples. Variation analysis in EBNA-2 functional domains: the TAD residue (I438L and the NLS residues (E476G, P484H and I486T were only detected in NPC samples which located in the carboxyl terminus of EBNA-2 gene. Conclusions The subtypes E2-A and E2-C were the dominant genotypes of the EBNA-2 gene in Northern China. The subtype E2-D may be associated with the tumorigenesis of NPC. The NPC isolates were prone harbor to more mutations than the other two groups in the functional domains.

  5. Soil amino acid composition across a boreal forest successional sequence

    Science.gov (United States)

    Nancy R. Werdin-Pfisterer; Knut Kielland; Richard D. Boone

    2009-01-01

    Soil amino acids are important sources of organic nitrogen for plant nutrition, yet few studies have examined which amino acids are most prevalent in the soil. In this study, we examined the composition, concentration, and seasonal patterns of soil amino acids across a primary successional sequence encompassing a natural gradient of plant productivity and soil...

  6. Novel algorithms for protein sequence analysis

    NARCIS (Netherlands)

    Ye, Kai

    2008-01-01

    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology”s paradigm is that this order of amino acids determines the protein”s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1

  7. Cloning and sequence analysis of benzo-a-pyreneinducible ...

    African Journals Online (AJOL)

    The phylogenetic tree based on the amino acid sequences clearly shows tilapia CYP1A and killifish CYP1A to be more closely related to each other than to the other CYP1A subfamilies. Sequence analysis of 3727 bp of genomic DNA showed that the clone obtained was the structural gene of CYP1A which consists of ...

  8. Experimental assessment of the importance of amino acid positions identified by an entropy-based correlation analysis of multiple-sequence alignments.

    Science.gov (United States)

    Dietrich, Susanne; Borst, Nadine; Schlee, Sandra; Schneider, Daniel; Janda, Jan-Oliver; Sterner, Reinhard; Merkl, Rainer

    2012-07-17

    The analysis of a multiple-sequence alignment (MSA) with correlation methods identifies pairs of residue positions whose occupation with amino acids changes in a concerted manner. It is plausible to assume that positions that are part of many such correlation pairs are important for protein function or stability. We have used the algorithm H2r to identify positions k in the MSAs of the enzymes anthranilate phosphoribosyl transferase (AnPRT) and indole-3-glycerol phosphate synthase (IGPS) that show a high conn(k) value, i.e., a large number of significant correlations in which k is involved. The importance of the identified residues was experimentally validated by performing mutagenesis studies with sAnPRT and sIGPS from the archaeon Sulfolobus solfataricus. For sAnPRT, five H2r mutant proteins were generated by replacing nonconserved residues with alanine or the prevalent residue of the MSA. As a control, five residues with conn(k) values of zero were chosen randomly and replaced with alanine. The catalytic activities and conformational stabilities of the H2r and control mutant proteins were analyzed by steady-state enzyme kinetics and thermal unfolding studies. Compared to wild-type sAnPRT, the catalytic efficiencies (k(cat)/K(M)) were largely unaltered. In contrast, the apparent thermal unfolding temperature (T(M)(app)) was lowered in most proteins. Remarkably, the strongest observed destabilization (ΔT(M)(app) = 14 °C) was caused by the V284A exchange, which pertains to the position with the highest correlation signal [conn(k) = 11]. For sIGPS, six H2r mutant and four control proteins with alanine exchanges were generated and characterized. The k(cat)/K(M) values of four H2r mutant proteins were reduced between 13- and 120-fold, and their T(M)(app) values were decreased by up to 5 °C. For the sIGPS control proteins, the observed activity and stability decreases were much less severe. Our findings demonstrate that positions with high conn(k) values have an

  9. The amino acid sequence of snapping turtle (Chelydra serpentina) ribonuclease

    NARCIS (Netherlands)

    Beintema, Jacob; Broos, Jaap; Meulenberg, Janneke; Schüller, Cornelis

    1985-01-01

    Snapping turtle (Chelydra serpentina) ribonuclease was isolated from pancreatic tissue. Turtle ribonuclease binds much more weakly to the affinity chromatography matrix used than mammalian ribonucleases. The amino acid sequence was determined from overlapping peptides obtained from three different

  10. Amino acid substitutions in subunit 9 of the mitochondrial ATPase complex of Saccharomyces cerevisiae. Sequence analysis of a series of revertants of an oli1 mit- mutant carrying an amino acid substitution in the hydrophilic loop of subunit 9.

    Science.gov (United States)

    Willson, T A; Nagley, P

    1987-09-01

    This work concerns a biochemical genetic study of subunit 9 of the mitochondrial ATPase complex of Saccharomyces cerevisiae. Subunit 9, encoded by the mitochondrial oli1 gene, contains a hydrophilic loop connecting two transmembrane stems. In one particular oli1 mit- mutant 2422, the substitution of a positively charged amino acid in this loop (Arg39----Met) renders the ATPase complex non-functional. A series of 20 revertants, selected for their ability to grow on nonfermentable substrates, has been isolated from mutant 2422. The results of DNA sequence analysis of the oli1 gene in each revertant have led to the recognition of three groups of revertants. Class I revertants have undergone a same-site reversion event: the mutant Met39 is replaced either by arginine (as in wild-type) or lysine. Class II revertants maintain the mutant Met39 residue, but have undergone a second-site reversion event (Asn35----Lys). Two revertants showing an oligomycin-resistant phenotype carry this same second-site reversion in the loop region together with a further amino acid substitution in either of the two membrane-spanning segments of subunit 9 (either Gly23----Ser or Leu53----Phe). Class III revertants contain subunit 9 with the original mutant 2422 sequence, and additionally carry a recessive nuclear suppressor, demonstrated to represent a single gene. The results on the revertants in classes I and II indicate that there is a strict requirement for a positively charged residue in the hydrophilic loop close to the boundary of the lipid bilayer. The precise location of this positive charge is less stringent; in functional ATPase complexes it can be found at either residue 39 or 35. This charged residue is possibly required to interact with some other component of the mitochondrial ATPase complex. These findings, together with hydropathy plots of subunit 9 polypeptides from normal, mutant and revertant strains, led to the conclusion that the hydrophilic loop in normal subunit 9

  11. The complete amino acid sequence of human erythrocyte diphosphoglycerate mutase.

    OpenAIRE

    Haggarty, N W; Dunbar, B; Fothergill, L A

    1983-01-01

    The complete amino acid sequence of human erythrocyte diphosphoglycerate mutase, comprising 239 residues, was determined. The sequence was deduced from the four cyanogen bromide fragments, and from the peptides derived from these fragments after digestion with a number of proteolytic enzymes. Comparison of this sequence with that of the yeast glycolytic enzyme, phosphoglycerate mutase, shows that these enzymes are 47% identical. Most, but not all, of the residues implicated as being important...

  12. Characterization, Genome Sequence, and Analysis of Escherichia Phage CICC 80001, a Bacteriophage Infecting an Efficient L-Aspartic Acid Producing Escherichia coli.

    Science.gov (United States)

    Xu, Youqiang; Ma, Yuyue; Yao, Su; Jiang, Zengyan; Pei, Jiangsen; Cheng, Chi

    2016-03-01

    Escherichia phage CICC 80001 was isolated from the bacteriophage contaminated medium of an Escherichia coli strain HY-05C (CICC 11022S) which could produce L-aspartic acid. The phage had a head diameter of 45-50 nm and a tail of about 10 nm. The one-step growth curve showed a latent period of 10 min and a rise period of about 20 min. The average burst size was about 198 phage particles per infected cell. Tests were conducted on the plaques, multiplicity of infection, and host range. The genome of CICC 80001 was sequenced with a length of 38,810 bp, and annotated. The key proteins leading to host-cell lysis were phylogenetically analyzed. One protein belonged to class II holin, and the other two belonged to the endopeptidase family and N-acetylmuramoyl-L-alanine amidase family, respectively. The genome showed the sequence identity of 82.7% with that of Enterobacteria phage T7, and carried ten unique open reading frames. The bacteriophage resistant E. coli strain designated CICC 11021S was breeding and its L-aspartase activity was 84.4% of that of CICC 11022S.

  13. MEANS AND METHODS FOR CLONING NUCLEIC ACID SEQUENCES

    NARCIS (Netherlands)

    Geertsma, Eric Robin; Poolman, Berend

    2008-01-01

    The invention provides means and methods for efficiently cloning nucleic acid sequences of interest in micro-organisms that are less amenable to conventional nucleic acid manipulations, as compared to, for instance, E.coli. The present invention enables high-throughput cloning (and, preferably,

  14. Representation of protein-sequence information by amino acid subalphabets

    DEFF Research Database (Denmark)

    Andersen, C.A.F.; Brunak, Søren

    2004-01-01

    -sequence information, using machine learning strategies, where the primary goal is the discovery of novel powerful representations for use in AI techniques. In the case of proteins and the 20 different amino acids they typically contain, it is also a secondary goal to discover how the current selection of amino acids...

  15. Computational analysis of sequence selection mechanisms.

    Science.gov (United States)

    Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

    2004-04-01

    Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.

  16. Integrated analysis of 454 and Illumina transcriptomic sequencing characterizes carbon flux and energy source for fatty acid synthesis in developing Lindera glauca fruits for woody biodiesel.

    Science.gov (United States)

    Lin, Zixin; An, Jiyong; Wang, Jia; Niu, Jun; Ma, Chao; Wang, Libing; Yuan, Guanshen; Shi, Lingling; Liu, Lili; Zhang, Jinsong; Zhang, Zhixiang; Qi, Ji; Lin, Shanzhi

    2017-01-01

    Lindera glauca fruit with high quality and quantity of oil has emerged as a novel potential source of biodiesel in China, but the molecular regulatory mechanism of carbon flux and energy source for oil biosynthesis in developing fruits is still unknown. To better develop fruit oils of L. glauca as woody biodiesel, a combination of two different sequencing platforms (454 and Illumina) and qRT-PCR analysis was used to define a minimal reference transcriptome of developing L. glauca fruits, and to construct carbon and energy metabolic model for regulation of carbon partitioning and energy supply for FA biosynthesis and oil accumulation. We first analyzed the dynamic patterns of growth tendency, oil content, FA compositions, biodiesel properties, and the contents of ATP and pyridine nucleotide of L. glauca fruits from seven different developing stages. Comprehensive characterization of transcriptome of the developing L. glauca fruit was performed using a combination of two different next-generation sequencing platforms, of which three representative fruit samples (50, 125, and 150 DAF) and one mixed sample from seven developing stages were selected for Illumina and 454 sequencing, respectively. The unigenes separately obtained from long and short reads (201, and 259, respectively, in total) were reconciled using TGICL software, resulting in a total of 60,031 unigenes (mean length = 1061.95 bp) to describe a transcriptome for developing L. glauca fruits. Notably, 198 genes were annotated for photosynthesis, sucrose cleavage, carbon allocation, metabolite transport, acetyl-CoA formation, oil synthesis, and energy metabolism, among which some specific transporters, transcription factors, and enzymes were identified to be implicated in carbon partitioning and energy source for oil synthesis by an integrated analysis of transcriptomic sequencing and qRT-PCR. Importantly, the carbon and energy metabolic model was well established for oil biosynthesis of developing L

  17. Comprehensive global amino acid sequence analysis of PB1F2 protein of influenza A H5N1 viruses and the influenza A virus subtypes responsible for the 20th‐century pandemics

    Science.gov (United States)

    Pasricha, Gunisha; Mishra, Akhilesh C.; Chakrabarti, Alok K.

    2012-01-01

    Please cite this paper as: Pasricha et al. (2012) Comprehensive global amino acid sequence analysis of PB1F2 protein of influenza A H5N1 viruses and the Influenza A virus subtypes responsible for the 20th‐century pandemics. Influenza and Other Respiratory Viruses 7(4), 497–505. Background  PB1F2 is the 11th protein of influenza A virus translated from +1 alternate reading frame of PB1 gene. Since the discovery, varying sizes and functions of the PB1F2 protein of influenza A viruses have been reported. Selection of PB1 gene segment in the pandemics, variable size and pleiotropic effect of PB1F2 intrigued us to analyze amino acid sequences of this protein in various influenza A viruses. Methods  Amino acid sequences for PB1F2 protein of influenza A H5N1, H1N1, H2N2, and H3N2 subtypes were obtained from Influenza Research Database. Multiple sequence alignments of the PB1F2 protein sequences of the aforementioned subtypes were used to determine the size, variable and conserved domains and to perform mutational analysis. Results  Analysis showed that 96·4% of the H5N1 influenza viruses harbored full‐length PB1F2 protein. Except for the 2009 pandemic H1N1 virus, all the subtypes of the 20th‐century pandemic influenza viruses contained full‐length PB1F2 protein. Through the years, PB1F2 protein of the H1N1 and H3N2 viruses has undergone much variation. PB1F2 protein sequences of H5N1 viruses showed both human‐ and avian host‐specific conserved domains. Global database of PB1F2 protein revealed that N66S mutation was present only in 3·8% of the H5N1 strains. We found a novel mutation, N84S in the PB1F2 protein of 9·35% of the highly pathogenic avian influenza H5N1 influenza viruses. Conclusions  Varying sizes and mutations of the PB1F2 protein in different influenza A virus subtypes with pandemic potential were obtained. There was genetic divergence of the protein in various hosts which highlighted the host‐specific evolution of the virus

  18. SAAS: Short Amino Acid Sequence - A Promising Protein Secondary Structure Prediction Method of Single Sequence

    Directory of Open Access Journals (Sweden)

    Zhou Yuan Wu

    2013-07-01

    Full Text Available In statistical methods of predicting protein secondary structure, many researchers focus on single amino acid frequencies in α-helices, β-sheets, and so on, or the impact near amino acids on an amino acid forming a secondary structure. But the paper considers a short sequence of amino acids (3, 4, 5 or 6 amino acids as integer, and statistics short sequence's probability forming secondary structure. Also, many researchers select low homologous sequences as statistical database. But this paper select whole PDB database. In this paper we propose a strategy to predict protein secondary structure using simple statistical method. Numerical computation shows that, short amino acids sequence as integer to statistics, which can easy see trend of short sequence forming secondary structure, and it will work well to select large statistical database (whole PDB database without considering homologous, and Q3 accuracy is ca. 74% using this paper proposed simple statistical method, but accuracy of others statistical methods is less than 70%.

  19. Nucleotide and Predicted Amino Acid Sequence-Based Analysis of the Avian Metapneumovirus Type C Cell Attachment Glycoprotein Gene: Phylogenetic Analysis and Molecular Epidemiology of U.S. Pneumoviruses

    Science.gov (United States)

    Alvarez, Rene; Lwamba, Humphrey M.; Kapczynski, Darrell R.; Njenga, M. Kariuki; Seal, Bruce S.

    2003-01-01

    A serologically distinct avian metapneumovirus (aMPV) was isolated in the United States after an outbreak of turkey rhinotracheitis (TRT) in February 1997. The newly recognized U.S. virus was subsequently demonstrated to be genetically distinct from European subtypes and was designated aMPV serotype C (aMPV/C). We have determined the nucleotide sequence of the gene encoding the cell attachment glycoprotein (G) of aMPV/C (Colorado strain and three Minnesota isolates) and predicted amino acid sequence by sequencing cloned cDNAs synthesized from intracellular RNA of aMPV/C-infected cells. The nucleotide sequence comprised 1,321 nucleotides with only one predicted open reading frame encoding a protein of 435 amino acids, with a predicted Mr of 48,840. The structural characteristics of the predicted G protein of aMPV/C were similar to those of the human respiratory syncytial virus (hRSV) attachment G protein, including two mucin-like regions (heparin-binding domains) flanking both sides of a CX3C chemokine motif present in a conserved hydrophobic pocket. Comparison of the deduced G-protein amino acid sequence of aMPV/C with those of aMPV serotypes A, B, and D, as well as hRSV revealed overall predicted amino acid sequence identities ranging from 4 to 16.5%, suggesting a distant relationship. However, G-protein sequence identities ranged from 72 to 97% when aMPV/C was compared to other members within the aMPV/C subtype or 21% for the recently identified human MPV (hMPV) G protein. Ratios of nonsynonymous to synonymous nucleotide changes were greater than one in the G gene when comparing the more recent Minnesota isolates to the original Colorado isolate. Epidemiologically, this indicates positive selection among U.S. isolates since the first outbreak of TRT in the United States. PMID:12682171

  20. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  1. The complete amino acid sequence of human erythrocyte diphosphoglycerate mutase.

    Science.gov (United States)

    Haggarty, N W; Dunbar, B; Fothergill, L A

    1983-01-01

    The complete amino acid sequence of human erythrocyte diphosphoglycerate mutase, comprising 239 residues, was determined. The sequence was deduced from the four cyanogen bromide fragments, and from the peptides derived from these fragments after digestion with a number of proteolytic enzymes. Comparison of this sequence with that of the yeast glycolytic enzyme, phosphoglycerate mutase, shows that these enzymes are 47% identical. Most, but not all, of the residues implicated as being important for the activity of the glycolytic mutase are conserved in the erythrocyte diphosphoglycerate mutase. PMID:6313356

  2. Genome Sequencing and Analysis Conference IV

    Energy Technology Data Exchange (ETDEWEB)

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  3. Comprehensive global amino acid sequence analysis of PB1F2 protein of influenza A H5N1 viruses and the influenza A virus subtypes responsible for the 20th-century pandemics.

    Science.gov (United States)

    Pasricha, Gunisha; Mishra, Akhilesh C; Chakrabarti, Alok K

    2013-07-01

    PB1F2 is the 11th protein of influenza A virus translated from +1 alternate reading frame of PB1 gene. Since the discovery, varying sizes and functions of the PB1F2 protein of influenza A viruses have been reported. Selection of PB1 gene segment in the pandemics, variable size and pleiotropic effect of PB1F2 intrigued us to analyze amino acid sequences of this protein in various influenza A viruses. Amino acid sequences for PB1F2 protein of influenza A H5N1, H1N1, H2N2, and H3N2 subtypes were obtained from Influenza Research Database. Multiple sequence alignments of the PB1F2 protein sequences of the aforementioned subtypes were used to determine the size, variable and conserved domains and to perform mutational analysis. Analysis showed that 96·4% of the H5N1 influenza viruses harbored full-length PB1F2 protein. Except for the 2009 pandemic H1N1 virus, all the subtypes of the 20th-century pandemic influenza viruses contained full-length PB1F2 protein. Through the years, PB1F2 protein of the H1N1 and H3N2 viruses has undergone much variation. PB1F2 protein sequences of H5N1 viruses showed both human- and avian host-specific conserved domains. Global database of PB1F2 protein revealed that N66S mutation was present only in 3·8% of the H5N1 strains. We found a novel mutation, N84S in the PB1F2 protein of 9·35% of the highly pathogenic avian influenza H5N1 influenza viruses. Varying sizes and mutations of the PB1F2 protein in different influenza A virus subtypes with pandemic potential were obtained. There was genetic divergence of the protein in various hosts which highlighted the host-specific evolution of the virus. However, studies are required to correlate this sequence variability with the virulence and pathogenicity. © 2012 John Wiley & Sons Ltd.

  4. Sequence analysis corresponding to the PPE and PE proteins in ...

    Indian Academy of Sciences (India)

    Unknown

    AB repeats; Mycobacterium tuberculosis genome; PE-PPE domain; PPE, PE proteins; sequence analysis; surface antigens. J. Biosci. | Vol. ... bacterium tuberculosis genomes resulted in the identification of a previously uncharacterized 225 amino acid- ...... Vega Lopez F, Brooks L A, Dockrell H M, De Smet K A,. Thompson ...

  5. Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.

    Science.gov (United States)

    Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami

    2012-08-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or 15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.

  6. Secondary structure classification of amino-acid sequences using state-space modeling

    OpenAIRE

    Brunnert, Marcus; Krahnke, Tillmann; Urfer, Wolfgang

    2001-01-01

    The secondary structure classification of amino acid sequences can be carried out by a statistical analysis of sequence and structure data using state-space models. Aiming at this classification, a modified filter algorithm programmed in S is applied to data of three proteins. The application leads to correct classifications of two proteins even when using relatively simple estimation methods for the parameters of the state-space models. Furthermore, it has been shown that the assumed initial...

  7. Functional analysis of sequences adjacent to dapE of Corynebacterium glutamicum reveals the presence of aroP, which encodes the aromatic amino acid transporter.

    Science.gov (United States)

    Wehrmann, A; Morakkabati, S; Krämer, R; Sahm, H; Eggeling, L

    1995-10-01

    An initially nonclonable DNA locus close to a gene of L-lysine biosynthesis in Corynebacterium glutamicum was analyzed in detail. Its stepwise cloning and its functional identification by monitoring the amino acid uptakes of defined mutants, together with mechanistic studies, identified the corresponding structure as aroP, the general aromatic amino acid uptake system.

  8. Robustness analysis of chiller sequencing control

    International Nuclear Information System (INIS)

    Liao, Yundan; Sun, Yongjun; Huang, Gongsheng

    2015-01-01

    Highlights: • Uncertainties with chiller sequencing control were systematically quantified. • Robustness of chiller sequencing control was systematically analyzed. • Different sequencing control strategies were sensitive to different uncertainties. • A numerical method was developed for easy selection of chiller sequencing control. - Abstract: Multiple-chiller plant is commonly employed in the heating, ventilating and air-conditioning system to increase operational feasibility and energy-efficiency under part load condition. In a multiple-chiller plant, chiller sequencing control plays a key role in achieving overall energy efficiency while not sacrifices the cooling sufficiency for indoor thermal comfort. Various sequencing control strategies have been developed and implemented in practice. Based on the observation that (i) uncertainty, which cannot be avoided in chiller sequencing control, has a significant impact on the control performance and may cause the control fail to achieve the expected control and/or energy performance; and (ii) in current literature few studies have systematically addressed this issue, this paper therefore presents a study on robustness analysis of chiller sequencing control in order to understand the robustness of various chiller sequencing control strategies under different types of uncertainty. Based on the robustness analysis, a simple and applicable method is developed to select the most robust control strategy for a given chiller plant in the presence of uncertainties, which will be verified using case studies

  9. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    STORAGESEVER

    2010-07-19

    Jul 19, 2010 ... and antisense primers, a single band of 573 base pairs .... Amino acid sequence alignment of Cluster I and Cluster II of phylogenetic tree. First ten sequences ... sequence weighting, postion-spiecific gap penalties and weight.

  10. Quantiprot - a Python package for quantitative analysis of protein sequences.

    Science.gov (United States)

    Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

    2017-07-17

    The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.

  11. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    Science.gov (United States)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  12. Probabilistic accident sequence recovery analysis

    International Nuclear Information System (INIS)

    Stutzke, Martin A.; Cooper, Susan E.

    2004-01-01

    Recovery analysis is a method that considers alternative strategies for preventing accidents in nuclear power plants during probabilistic risk assessment (PRA). Consideration of possible recovery actions in PRAs has been controversial, and there seems to be a widely held belief among PRA practitioners, utility staff, plant operators, and regulators that the results of recovery analysis should be skeptically viewed. This paper provides a framework for discussing recovery strategies, thus lending credibility to the process and enhancing regulatory acceptance of PRA results and conclusions. (author)

  13. Mechanism analysis of acid tolerance response of bifidobacterium longum subsp. longum BBMN 68 by gene expression profile using RNA-sequencing.

    Directory of Open Access Journals (Sweden)

    Junhua Jin

    Full Text Available To analyze the mechanism of the acid tolerance response (ATR in Bifidobacterium longum subsp. longum BBMN68, we optimized the acid-adaptation condition to stimulate ATR effectively and analyzed the change of gene expression profile after acid-adaptation using high-throughput RNA-Seq. After acid-adaptation at pH 4.5 for 2 hours, the survival rate of BBMN68 at lethal pH 3.5 for 120 min was increased by 70 fold and the expression of 293 genes were upregulated by more than 2 fold, and 245 genes were downregulated by more than 2 fold. Gene expression profiling of ATR in BBMN68 suggested that, when the bacteria faced acid stress, the cells strengthened the integrity of cell wall and changed the permeability of membrane to keep the H(+ from entering. Once the H(+ entered the cytoplasm, the cells showed four main responses: First, the F(0F(1-ATPase system was initiated to discharge H(+. Second, the ability to produce NH(3 by cysteine-cystathionine-cycle was strengthened to neutralize excess H(+. Third, the cells started NER-UVR and NER-VSR systems to minimize the damage to DNA and upregulated HtpX, IbpA, and γ-glutamylcysteine production to protect proteins against damage. Fourth, the cells initiated global response signals ((pppGpp, polyP, and Sec-SRP to bring the whole cell into a state of response to the stress. The cells also secreted the quorum sensing signal (AI-2 to communicate between intraspecies cells by the cellular signal system, such as two-component systems, to improve the overall survival rate. Besides, the cells varied the pathways of producing energy by shifting to BCAA metabolism and enhanced the ability to utilize sugar to supply sufficient energy for the operation of the mechanism mentioned above. Based on these reults, it was inferred that, during industrial applications, the acid resistance of bifidobacteria could be improved by adding BCAA, γ-glutamylcysteine, cysteine, and cystathionine into the acid-stress environment.

  14. Correlation between fibroin amino acid sequence and physical silk properties.

    Science.gov (United States)

    Fedic, Robert; Zurovec, Michal; Sehnal, Frantisek

    2003-09-12

    The fiber properties of lepidopteran silk depend on the amino acid repeats that interact during H-fibroin polymerization. The aim of our research was to relate repeat composition to insect biology and fiber strength. Representative regions of the H-fibroin genes were sequenced and analyzed in three pyralid species: wax moth (Galleria mellonella), European flour moth (Ephestia kuehniella), and Indian meal moth (Plodia interpunctella). The amino acid repeats are species-specific, evidently a diversification of an ancestral region of 43 residues, and include three types of regularly dispersed motifs: modifications of GSSAASAA sequence, stretches of tripeptides GXZ where X and Z represent bulky residues, and sequences similar to PVIVIEE. No concatenations of GX dipeptide or alanine, which are typical for Bombyx silkworms and Antheraea silk moths, respectively, were found. Despite different repeat structure, the silks of G. mellonella and E. kuehniella exhibit similar tensile strength as the Bombyx and Antheraea silks. We suggest that in these latter two species, variations in the repeat length obstruct repeat alignment, but sufficiently long stretches of iterated residues get superposed to interact. In the pyralid H-fibroins, interactions of the widely separated and diverse motifs depend on the precision of repeat matching; silk is strong in G. mellonella and E. kuehniella, with 2-3 types of long homogeneous repeats, and nearly 10 times weaker in P. interpunctella, with seven types of shorter erratic repeats. The high proportion of large amino acids in the H-fibroin of pyralids has probably evolved in connection with the spinning habit of caterpillars that live in protective silk tubes and spin continuously, enlarging the tubes on one end and partly devouring the other one. The silk serves as a depot of energetically rich and essential amino acids that may be scarce in the diet.

  15. Sequence analysis of putative swrW gene required for surfactant ...

    African Journals Online (AJOL)

    Serratia marcescens produces biosurfactant serrawettin, essential for its population migration behavior. Serrawettin W1 was revealed to be an antibiotic serratamolide that makes it significant for deoxyribonucleic acid (DNA) and protein sequence analysis. Four nucleotide and amino-acid sequences from local strains ...

  16. Nonlinear analysis of sequence repeats of multi-domain proteins

    Energy Technology Data Exchange (ETDEWEB)

    Huang Yanzhao [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Li Mingfeng [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Xiao Yi [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China)]. E-mail: lmf_bill@sina.com

    2007-11-15

    Many multi-domain proteins have repetitive three-dimensional structures but nearly-random amino acid sequences. In the present paper, by using a modified recurrence plot proposed by us previously, we show that these amino acid sequences have hidden repetitions in fact. These results indicate that the repetitive domain structures are encoded by the repetitive sequences. This also gives a method to detect the repetitive domain structures directly from amino acid sequences.

  17. Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

    Science.gov (United States)

    Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

    2014-09-18

    Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.

  18. Sequence analysis by iterated maps, a review.

    Science.gov (United States)

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  19. CcMP-II, a new hemorrhagic metalloproteinase from Cerastes cerastes snake venom: purification, biochemical characterization and amino acid sequence analysis.

    Science.gov (United States)

    Boukhalfa-Abib, Hinda; Laraba-Djebari, Fatima

    2015-01-01

    Snake venom metalloproteinases (SVMPs) are the most abundant components in snake venoms. They are important in the induction of systemic alterations and local tissue damage after envenomation. CcMP-II, which is a metalloproteinase purified from Cerastes cerastes snake venom, was obtained by a combination of gel filtration, ion-exchange and affinity chromatographies. It was homogeneous on SDS-PAGE, with a molecular mass estimated to 35kDa and presents a pI of 5.6. CcMP-II has an N-terminal sequence of EDRHINLVSVADHRMXTKY, with high levels of homology with those of the members of class P-II of SVMPs, which comprises metalloproteinase and disintegrin-like domains together. This proteinase displayed a fibrinogenolytic and hemorrhagic activities. The proteolytic and hemorrhagic activities of CcMP-II were inhibited by EDTA and 1,10-phenanthroline. However, these activities were not affected by aprotinine and PMSF, suggesting that CcMP-II is a zinc-dependent hemorrhagic metalloproteinase with an α-fibrinogenase activity. The hemorrhagic metalloproteinase CcMP-II was also able to hydrolyze extracellular matrix components, such as type IV collagen and laminin. These results indicate that CcMP-II is implicated in the local and systemic bleeding, contributing thus in the toxicity of C. cerastes venom. Copyright © 2014 Elsevier Inc. All rights reserved.

  20. Preliminary hazard analysis using sequence tree method

    International Nuclear Information System (INIS)

    Huang Huiwen; Shih Chunkuan; Hung Hungchih; Chen Minghuei; Yih Swu; Lin Jiinming

    2007-01-01

    A system level PHA using sequence tree method was developed to perform Safety Related digital I and C system SSA. The conventional PHA is a brainstorming session among experts on various portions of the system to identify hazards through discussions. However, this conventional PHA is not a systematic technique, the analysis results strongly depend on the experts' subjective opinions. The analysis quality cannot be appropriately controlled. Thereby, this research developed a system level sequence tree based PHA, which can clarify the relationship among the major digital I and C systems. Two major phases are included in this sequence tree based technique. The first phase uses a table to analyze each event in SAR Chapter 15 for a specific safety related I and C system, such as RPS. The second phase uses sequence tree to recognize what I and C systems are involved in the event, how the safety related systems work, and how the backup systems can be activated to mitigate the consequence if the primary safety systems fail. In the sequence tree, the defense-in-depth echelons, including Control echelon, Reactor trip echelon, ESFAS echelon, and Indication and display echelon, are arranged to construct the sequence tree structure. All the related I and C systems, include digital system and the analog back-up systems are allocated in their specific echelon. By this system centric sequence tree based analysis, not only preliminary hazard can be identified systematically, the vulnerability of the nuclear power plant can also be recognized. Therefore, an effective simplified D3 evaluation can be performed as well. (author)

  1. [Complete genome sequencing of polymalic acid-producing strain Aureobasidium pullulans CCTCC M2012223].

    Science.gov (United States)

    Wang, Yongkang; Song, Xiaodan; Li, Xiaorong; Yang, Sang-tian; Zou, Xiang

    2017-01-04

    To explore the genome sequence of Aureobasidium pullulans CCTCC M2012223, analyze the key genes related to the biosynthesis of important metabolites, and provide genetic background for metabolic engineering. Complete genome of A. pullulans CCTCC M2012223 was sequenced by Illumina HiSeq high throughput sequencing platform. Then, fragment assembly, gene prediction, functional annotation, and GO/COG cluster were analyzed in comparison with those of other five A. pullulans varieties. The complete genome sequence of A. pullulans CCTCC M2012223 was 30756831 bp with an average GC content of 47.49%, and 9452 genes were successfully predicted. Genome-wide analysis showed that A. pullulans CCTCC M2012223 had the biggest genome assembly size. Protein sequences involved in the pullulan and polymalic acid pathway were highly conservative in all of six A. pullulans varieties. Although both A. pullulans CCTCC M2012223 and A. pullulans var. melanogenum have a close affinity, some point mutation and inserts were occurred in protein sequences involved in melanin biosynthesis. Genome information of A. pullulans CCTCC M2012223 was annotated and genes involved in melanin, pullulan and polymalic acid pathway were compared, which would provide a theoretical basis for genetic modification of metabolic pathway in A. pullulans.

  2. Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes

    Directory of Open Access Journals (Sweden)

    Rebecca M. Davidson

    2011-11-01

    Full Text Available Transcriptome sequencing is a powerful method for studying global expression patterns in large, complex genomes. Evaluation of sequence-based expression profiles during reproductive development would provide functional annotation to genes underlying agronomic traits. We generated transcriptome profiles for 12 diverse maize ( L. reproductive tissues representing male, female, developing seed, and leaf tissues using high throughput transcriptome sequencing. Overall, ∼80% of annotated genes were expressed. Comparative analysis between sequence and hybridization-based methods demonstrated the utility of ribonucleic acid sequencing (RNA-seq for expression determination and differentiation of paralagous genes (∼85% of maize genes. Analysis of 4975 gene families across reproductive tissues revealed expression divergence is proportional to family size. In all pairwise comparisons between tissues, 7 (pre- vs. postemergence cobs to 48% (pollen vs. ovule of genes were differentially expressed. Genes with expression restricted to a single tissue within this study were identified with the highest numbers observed in leaves, endosperm, and pollen. Coexpression network analysis identified 17 gene modules with complex and shared expression patterns containing many previously described maize genes. The data and analyses in this study provide valuable tools through improved gene annotation, gene family characterization, and a core set of candidate genes to further characterize maize reproductive development and improve grain yield potential.

  3. Digital image sequence processing, compression, and analysis

    CERN Document Server

    Reed, Todd R

    2004-01-01

    IntroductionTodd R. ReedCONTENT-BASED IMAGE SEQUENCE REPRESENTATIONPedro M. Q. Aguiar, Radu S. Jasinschi, José M. F. Moura, andCharnchai PluempitiwiriyawejTHE COMPUTATION OF MOTIONChristoph Stiller, Sören Kammel, Jan Horn, and Thao DangMOTION ANALYSIS AND DISPLACEMENT ESTIMATION IN THE FREQUENCY DOMAINLuca Lucchese and Guido Maria CortelazzoQUALITY OF SERVICE ASSESSMENT IN NEW GENERATION WIRELESS VIDEO COMMUNICATIONSGaetano GiuntaERROR CONCEALMENT IN DIGITAL VIDEOFrancesco G.B. De NataleIMAGE SEQUENCE RESTORATION: A WIDER PERSPECTIVEAnil KokaramVIDEO SUMMARIZATIONCuneyt M. Taskiran and Edward

  4. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    Phylogenetic analysis suggests that our sequences are clustered with sequences reported from Japan. This is the first phylogenetic analysis of HCV core gene from Pakistani population. Our sequences and sequences from Japan are grouped into same cluster in the phylogenetic tree. Sequence comparison and ...

  5. [Complete genome sequencing and sequence analysis of BCG Tice].

    Science.gov (United States)

    Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

    2012-10-04

    The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.

  6. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  7. CISAPS: Complex Informational Spectrum for the Analysis of Protein Sequences

    Directory of Open Access Journals (Sweden)

    Charalambos Chrysostomou

    2015-01-01

    Full Text Available Complex informational spectrum analysis for protein sequences (CISAPS and its web-based server are developed and presented. As recent studies show, only the use of the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient. Therefore, CISAPS is developed to consider and provide results in three forms including absolute, real, and imaginary spectrum. Biologically related features to the analysis of influenza A subtypes as presented as a case study in this study can also appear individually either in the real or imaginary spectrum. As the results presented, protein classes can present similarities or differences according to the features extracted from CISAPS web server. These associations are probable to be related with the protein feature that the specific amino acid index represents. In addition, various technical issues such as zero-padding and windowing that may affect the analysis are also addressed. CISAPS uses an expanded list of 611 unique amino acid indices where each one represents a different property to perform the analysis. This web-based server enables researchers with little knowledge of signal processing methods to apply and include complex informational spectrum analysis to their work.

  8. Predicting protein amidation sites by orchestrating amino acid sequence features

    Science.gov (United States)

    Zhao, Shuqiu; Yu, Hua; Gong, Xiujun

    2017-08-01

    Amidation is the fourth major category of post-translational modifications, which plays an important role in physiological and pathological processes. Identifying amidation sites can help us understanding the amidation and recognizing the original reason of many kinds of diseases. But the traditional experimental methods for predicting amidation sites are often time-consuming and expensive. In this study, we propose a computational method for predicting amidation sites by orchestrating amino acid sequence features. Three kinds of feature extraction methods are used to build a feature vector enabling to capture not only the physicochemical properties but also position related information of the amino acids. An extremely randomized trees algorithm is applied to choose the optimal features to remove redundancy and dependence among components of the feature vector by a supervised fashion. Finally the support vector machine classifier is used to label the amidation sites. When tested on an independent data set, it shows that the proposed method performs better than all the previous ones with the prediction accuracy of 0.962 at the Matthew's correlation coefficient of 0.89 and area under curve of 0.964.

  9. Evolution of sequence-defined highly functionalized nucleic acid polymers

    Science.gov (United States)

    Chen, Zhen; Lichtor, Phillip A.; Berliner, Adrian P.; Chen, Jonathan C.; Liu, David R.

    2018-03-01

    The evolution of sequence-defined synthetic polymers made of building blocks beyond those compatible with polymerase enzymes or the ribosome has the potential to generate new classes of receptors, catalysts and materials. Here we describe a ligase-mediated DNA-templated polymerization and in vitro selection system to evolve highly functionalized nucleic acid polymers (HFNAPs) made from 32 building blocks that contain eight chemically diverse side chains on a DNA backbone. Through iterated cycles of polymer translation, selection and reverse translation, we discovered HFNAPs that bind proprotein convertase subtilisin/kexin type 9 (PCSK9) and interleukin-6, two protein targets implicated in human diseases. Mutation and reselection of an active PCSK9-binding polymer yielded evolved polymers with high affinity (KD = 3 nM). This evolved polymer potently inhibited the binding between PCSK9 and the low-density lipoprotein receptor. Structure-activity relationship studies revealed that specific side chains at defined positions in the polymers are required for binding to their respective targets. Our findings expand the chemical space of evolvable polymers to include densely functionalized nucleic acids with diverse, researcher-defined chemical repertoires.

  10. Planarian homeobox genes: cloning, sequence analysis, and expression.

    Science.gov (United States)

    Garcia-Fernàndez, J; Baguñà, J; Saló, E

    1991-01-01

    Freshwater planarians (Platyhelminthes, Turbellaria, and Tricladida) are acoelomate, triploblastic, unsegmented, and bilaterally symmetrical organisms that are mainly known for their ample power to regenerate a complete organism from a small piece of their body. To identify potential pattern-control genes in planarian regeneration, we have isolated two homeobox-containing genes, Dth-1 and Dth-2 [Dugesia (Girardia) tigrina homeobox], by using degenerate oligonucleotides corresponding to the most conserved amino acid sequence from helix-3 of the homeodomain. Dth-1 and Dth-2 homeodomains are closely related (68% at the nucleotide level and 78% at the protein level) and show the conserved residues characteristic of the homeodomains identified to data. Similarity with most homeobox sequences is low (30-50%), except with Drosophila NK homeodomains (80-82% with NK-2) and the rodent TTF-1 homeodomain (77-87%). Some unusual amino acid residues specific to NK-2, TTF-1, Dth-1, and Dth-2 can be observed in the recognition helix (helix-3) and may define a family of homeodomains. The deduced amino acid sequences from the cDNAs contain, in addition to the homeodomain, other domains also present in various homeobox-containing genes. The expression of both genes, detected by Northern blot analysis, appear slightly higher in cephalic regions than in the rest of the intact organism, while a slight increase is detected in the central period (5 days) or regeneration. Images PMID:1714599

  11. Sequence Matching Analysis for Curriculum Development

    Directory of Open Access Journals (Sweden)

    Liem Yenny Bendatu

    2015-06-01

    Full Text Available Many organizations apply information technologies to support their business processes. Using the information technologies, the actual events are recorded and utilized to conform with predefined model. Conformance checking is an approach to measure the fitness and appropriateness between process model and actual events. However, when there are multiple events with the same timestamp, the traditional approach unfit to result such measures. This study attempts to develop a sequence matching analysis. Considering conformance checking as the basis of this approach, this proposed approach utilizes the current control flow technique in process mining domain. A case study in the field of educational process has been conducted. This study also proposes a curriculum analysis framework to test the proposed approach. By considering the learning sequence of students, it results some measurements for curriculum development. Finally, the result of the proposed approach has been verified by relevant instructors for further development.

  12. Metazoan Remaining Genes for Essential Amino Acid Biosynthesis: Sequence Conservation and Evolutionary Analyses

    Directory of Open Access Journals (Sweden)

    Igor R. Costa

    2014-12-01

    Full Text Available Essential amino acids (EAA consist of a group of nine amino acids that animals are unable to synthesize via de novo pathways. Recently, it has been found that most metazoans lack the same set of enzymes responsible for the de novo EAA biosynthesis. Here we investigate the sequence conservation and evolution of all the metazoan remaining genes for EAA pathways. Initially, the set of all 49 enzymes responsible for the EAA de novo biosynthesis in yeast was retrieved. These enzymes were used as BLAST queries to search for similar sequences in a database containing 10 complete metazoan genomes. Eight enzymes typically attributed to EAA pathways were found to be ubiquitous in metazoan genomes, suggesting a conserved functional role. In this study, we address the question of how these genes evolved after losing their pathway partners. To do this, we compared metazoan genes with their fungal and plant orthologs. Using phylogenetic analysis with maximum likelihood, we found that acetolactate synthase (ALS and betaine-homocysteine S-methyltransferase (BHMT diverged from the expected Tree of Life (ToL relationships. High sequence conservation in the paraphyletic group Plant-Fungi was identified for these two genes using a newly developed Python algorithm. Selective pressure analysis of ALS and BHMT protein sequences showed higher non-synonymous mutation ratios in comparisons between metazoans/fungi and metazoans/plants, supporting the hypothesis that these two genes have undergone non-ToL evolution in animals.

  13. Analysis of Pteridium ribosomal RNA sequences by rapid direct sequencing.

    Science.gov (United States)

    Tan, M K

    1991-08-01

    A total of 864 bases from 5 regions interspersed in the 18S and 26S rRNA molecules from various clones of Pteridium covering the general geographical distribution of the genus was analysed using a rapid rRNA sequencing technique. No base difference has been detected amongst the three major lineages, two of which apparently separated before the breakup of the ancient supercontinent, Pangaea. These regions of the rRNA sequences have thus been conserved for at least 160 million years and are here compared with other eukaryotic, especially plant rRNAs.

  14. Partial amino acid sequence of apolipoprotein(a) shows that it is homologous to plasminogen

    International Nuclear Information System (INIS)

    Eaton, D.L.; Fless, G.M.; Kohr, W.J.; McLean, J.W.; Xu, Q.T.; Miller, C.G.; Lawn, R.M.; Scanu, A.M.

    1987-01-01

    Apolipoprotein(a) [apo(a)] is a glycoprotein with M/sub r/ ∼ 280,000 that is disulfide linked to apolipoprotein B in lipoprotein(a) particles. Elevated plasma levels of lipoprotein(a) are correlated with atherosclerosis. Partial amino acid sequence of apo(a) shows that it has striking homology to plasminogen. Plasminogen is a plasma serine protease zymogen that consists of five homologous and tandemly repeated domains called kringles and a trypsin-like protease domain. The amino-terminal sequence obtained for apo(a) is homologous to the beginning of kringle 4 but not the amino terminus of plasminogen. Apo(a) was subjected to limited proteolysis by trypsin or V8 protease, and fragments generated were isolated and sequenced. Sequences obtained from several of these fragments are highly (77-100%) homologous to plasminogen residues 391-421, which reside within kringle 4. Analysis of these internal apo(a) sequences revealed that apo(a) may contain at least two kringle 4-like domains. A sequence obtained from another tryptic fragment also shows homology to the end of kringle 4 and the beginning of kringle 5. Sequence data obtained from the two tryptic fragments shows homology with the protease domain of plasminogen. One of these sequences is homologous to the sequences surrounding the activation site of plasminogen. Plasminogen is activated by the cleavage of a specific arginine residue by urokinase and tissue plasminogen activator; however, the corresponding site in apo(a) is a serine that would not be cleaved by tissue plasminogen activator or urokinase. Using a plasmin-specific assay, no proteolytic activity could be demonstrated for lipoprotein(a) particles. These results suggest that apo(a) contains kringle-like domains and an inactive protease domain

  15. Chimera: construction of chimeric sequences for phylogenetic analysis

    NARCIS (Netherlands)

    Leunissen, J.A.M.

    2003-01-01

    Chimera allows the construction of chimeric protein or nucleic acid sequence files by concatenating sequences from two or more sequence files in PHYLIP formats. It allows the user to interactively select genes and species from the input files. The concatenated result is stored to one single output

  16. Random amino acid mutations and protein misfolding lead to Shannon limit in sequence-structure communication.

    Directory of Open Access Journals (Sweden)

    Andreas Martin Lisewski

    2008-09-01

    Full Text Available The transmission of genomic information from coding sequence to protein structure during protein synthesis is subject to stochastic errors. To analyze transmission limits in the presence of spurious errors, Shannon's noisy channel theorem is applied to a communication channel between amino acid sequences and their structures established from a large-scale statistical analysis of protein atomic coordinates. While Shannon's theorem confirms that in close to native conformations information is transmitted with limited error probability, additional random errors in sequence (amino acid substitutions and in structure (structural defects trigger a decrease in communication capacity toward a Shannon limit at 0.010 bits per amino acid symbol at which communication breaks down. In several controls, simulated error rates above a critical threshold and models of unfolded structures always produce capacities below this limiting value. Thus an essential biological system can be realistically modeled as a digital communication channel that is (a sensitive to random errors and (b restricted by a Shannon error limit. This forms a novel basis for predictions consistent with observed rates of defective ribosomal products during protein synthesis, and with the estimated excess of mutual information in protein contact potentials.

  17. Human liver phosphatase 2A: cDNA and amino acid sequence of two catalytic subunit isotypes

    International Nuclear Information System (INIS)

    Arino, J.; Woon, Chee Wai; Brautigan, D.L.; Miller, T.B. Jr.; Johnson, G.L.

    1988-01-01

    Two cDNA clones were isolated from a human liver library that encode two phosphatase 2A catalytic subunits. The two cDNAs differed in eight amino acids (97% identity) with three nonconservative substitutions. All of the amino acid substitutions were clustered in the amino-terminal domain of the protein. Amino acid sequence of one human liver clone (HL-14) was identical to the rabbit skeletal muscle phosphatase 2A cDNA (with 97% nucleotide identity). The second human liver clone (HL-1) is encoded by a separate gene, and RNA gel blot analysis indicates that both mRNAs are expressed similarly in several human clonal cell lines. Sequence comparison with phosphatase 1 and 2A indicates highly divergent amino acid sequences at the amino and carboxyl termini of the proteins and identifies six highly conserved regions between the two proteins that are predicted to be important for phosphatase enzymatic activity

  18. Complete Genome Sequence of the Probiotic Lactic Acid Bacterium Lactobacillus Rhamnosus

    Directory of Open Access Journals (Sweden)

    Samat Kozhakhmetov

    2014-01-01

    Full Text Available Introduction: Lactobacilli are a bacteria commonly found in the gastrointestinal tract. Some species of this genus have probiotic properties. The most common of these is Lactobacillus rhamnosus, a microoganism, generally regarded as safe (GRAS. It is also a homofermentative L-(+-lactic acid producer. The genus Lactobacillus is characterized by an extraordinary degree of the phenotypic and genotypic diversity. However, the studies of the genus were conducted mostly with the unequally distributed, non-random choice of species for sequencing; thus, there is only one representative genome from the Lactobacillus rhamnosus clade available to date. The aim of this study was to characterize the genome sequencing of selected strains of Lactobacilli. Methods: 109 samples were isolated from national domestic dairy products in the laboratory of Center for life sciences. After screaning isolates for probiotic properties, a highly active Lactobacillus spp strain was chosen. Genomic DNA was extracted according to the manufacturing protocol (Wizard® Genomic DNA Purification Kit. The Lactobacillus rhamnosus strain was identified as the highly active Lactobacillus strain accoridng to its morphological, cultural, physiological, and biochemical properties, and a genotypic analysis. Results: The genome of Lactobacillus rhamnosus was sequenced using the Roche 454 GS FLX (454 GS FLX platforms. The initial draft assembly was prepared from 14 large contigs (20 all contigs by the Newbler gsAssembler 2.3 (454 Life Sciences, Branford, CT. Conclusion: A full genome-sequencing of selected strains of lactic acid bacteria was made during the study.

  19. Whole-Genome Sequence Analysis of Bombella intestini LMG 28161T, a Novel Acetic Acid Bacterium Isolated from the Crop of a Red-Tailed Bumble Bee, Bombus lapidarius.

    Directory of Open Access Journals (Sweden)

    Leilei Li

    Full Text Available The whole-genome sequence of Bombella intestini LMG 28161T, an endosymbiotic acetic acid bacterium (AAB occurring in bumble bees, was determined to investigate the molecular mechanisms underlying its metabolic capabilities. The draft genome sequence of B. intestini LMG 28161T was 2.02 Mb. Metabolic carbohydrate pathways were in agreement with the metabolite analyses of fermentation experiments and revealed its oxidative capacity towards sucrose, D-glucose, D-fructose and D-mannitol, but not ethanol and glycerol. The results of the fermentation experiments also demonstrated that the lack of effective aeration in small-scale carbohydrate consumption experiments may be responsible for the lack of reproducibility of such results in taxonomic studies of AAB. Finally, compared to the genome sequences of its nearest phylogenetic neighbor and of three other insect associated AAB strains, the B. intestini LMG 28161T genome lost 69 orthologs and included 89 unique genes. Although many of the latter were hypothetical they also included several type IV secretion system proteins, amino acid transporter/permeases and membrane proteins which might play a role in the interaction with the bumble bee host.

  20. FAST: FAST Analysis of Sequences Toolbox

    Directory of Open Access Journals (Sweden)

    Travis J. Lawrence

    2015-05-01

    Full Text Available FAST (FAST Analysis of Sequences Toolbox provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU’s Not Unix Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics makes FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format. Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought.

  1. Bayesian Correlation Analysis for Sequence Count Data.

    Directory of Open Access Journals (Sweden)

    Daniel Sánchez-Taltavull

    Full Text Available Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low-especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.

  2. CAFE: aCcelerated Alignment-FrEe sequence analysis.

    Science.gov (United States)

    Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A; Waterman, Michael S; Sun, Fengzhu

    2017-07-03

    Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. A basic analysis toolkit for biological sequences

    Directory of Open Access Journals (Sweden)

    Siragusa Enrico

    2007-09-01

    Full Text Available Abstract This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at http://www.math.unipa.it/~raffaele/BATS/ under the GNU GPL.

  4. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; Van Der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah; Siame, Kabengele Keith; Gey Van Pittius, Nicolaas Claudius; Van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-01-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  5. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  6. Amino acid substitutions in genetic variants of human serum albumin and in sequences inferred from molecular cloning

    International Nuclear Information System (INIS)

    Takahashi, N.; Takahashi, Y.; Blumberg, B.S.; Putnam, F.W.

    1987-01-01

    The structural changes in four genetic variants of human serum albumin were analyzed by tandem high-pressure liquid chromatography (HPLC) of the tryptic peptides, HPLC mapping and isoelectric focusing of the CNBr fragments, and amino acid sequence analysis of the purified peptides. Lysine-372 of normal (common) albumin A was changed to glutamic acid both in albumin Naskapi, a widespread polymorphic variant of North American Indians, and in albumin Mersin found in Eti Turks. The two variants also exhibited anomalous migration in NaDodSO 4 /PAGE, which is attributed to a conformational change. The identity of albumins Naskapi and Mersin may have originated through descent from a common mid-Asiatic founder of the two migrating ethnic groups, or it may represent identical but independent mutations of the albumin gene. In albumin Adana, from Eti Turks, the substitution site was not identified but was localized to the region from positions 447 through 548. The substitution of aspartic acid-550 by glycine was found in albumin Mexico-2 from four individuals of the Pima tribe. Although only single-point substitutions have been found in these and in certain other genetic variants of human albumin, five differences exist in the amino acid sequences inferred from cDNA sequences by workers in three other laboratories. However, our results on albumin A and on 14 different genetic variants accord with the amino acid sequence of albumin deduced from the genomic sequence. The apparent amino acid substitutions inferred from comparison of individual cDNA sequences probably reflect artifacts in cloning or in cDNA sequence analysis rather than polymorphism of the coding sections of the albumin gene

  7. Complete amino acid sequence of bovine colostrum low-Mr cysteine proteinase inhibitor.

    Science.gov (United States)

    Hirado, M; Tsunasawa, S; Sakiyama, F; Niinobe, M; Fujii, S

    1985-07-01

    The complete amino acid sequence of bovine colostrum cysteine proteinase inhibitor was determined by sequencing native inhibitor and peptides obtained by cyanogen bromide degradation, Achromobacter lysylendopeptidase digestion and partial acid hydrolysis of reduced and S-carboxymethylated protein. Achromobacter peptidase digestion was successfully used to isolate two disulfide-containing peptides. The inhibitor consists of 112 amino acids with an Mr of 12787. Two disulfide bonds were established between Cys 66 and Cys 77 and between Cys 90 and Cys 110. A high degree of homology in the sequence was found between the colostrum inhibitor and human gamma-trace, human salivary acidic protein and chicken egg-white cystatin.

  8. Comparative analysis of sequences from PT 2013

    DEFF Research Database (Denmark)

    Mikkelsen, Susie Sommer

    Sheatfish and not EHNV. Generally, mistakes occurred at the ends of the sequences. This can be due to several factors. One is that the sequence has not been trimmed of the sequence primer sites. Another is the lack of quality control of the chromatogram. Finally, sequencing in just one direction can result...... diseases in Europe. As part of the EURL proficiency test for fish diseases it is required to sequence any RANA virus isolates found in any of the samples. It is also highly recommended to sequence the ISA virus to determine whether it be HPRΔ or HPR0. Furthermore, it is recommended that any VHSV and IHNV...... isolates be genotyped. As part of the evaluation of the proficiency results it was decided this year to look into the quality and similarity of the sequence results for selected viruses. Ampoule III in the proficiency test 2013 contained an EHNV isolate. The EURL received 43 sequences from 41 laboratories...

  9. Time fluctuation analysis of forest fire sequences

    Science.gov (United States)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value

  10. SVAMP: Sequence variation analysis, maps and phylogeny

    KAUST Repository

    Naeem, Raeece

    2014-04-03

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis in real time from variant call format (VCF) and associated metadata files. Allele frequency map, geographical map of isolates, Tajima\\'s D metric, single nucleotide polymorphism density, GC and variation density are also available for visualization in real time. We demonstrate the utility of SVAMP in tracking a methicillin-resistant Staphylococcus aureus outbreak from published next-generation sequencing data across 15 countries. We also demonstrate the scalability and accuracy of our software on 245 Plasmodium falciparum malaria isolates from three continents. Availability and implementation: The Qt/C++ software code, binaries, user manual and example datasets are available at http://cbrc.kaust.edu.sa/svamp. © The Author 2014.

  11. Statistical analysis of next generation sequencing data

    CERN Document Server

    Nettleton, Dan

    2014-01-01

    Next Generation Sequencing (NGS) is the latest high throughput technology to revolutionize genomic research. NGS generates massive genomic datasets that play a key role in the big data phenomenon that surrounds us today. To extract signals from high-dimensional NGS data and make valid statistical inferences and predictions, novel data analytic and statistical techniques are needed. This book contains 20 chapters written by prominent statisticians working with NGS data. The topics range from basic preprocessing and analysis with NGS data to more complex genomic applications such as copy number variation and isoform expression detection. Research statisticians who want to learn about this growing and exciting area will find this book useful. In addition, many chapters from this book could be included in graduate-level classes in statistical bioinformatics for training future biostatisticians who will be expected to deal with genomic data in basic biomedical research, genomic clinical trials and personalized med...

  12. Movement Pattern Analysis Based on Sequence Signatures

    Directory of Open Access Journals (Sweden)

    Seyed Hossein Chavoshi

    2015-09-01

    Full Text Available Increased affordability and deployment of advanced tracking technologies have led researchers from various domains to analyze the resulting spatio-temporal movement data sets for the purpose of knowledge discovery. Two different approaches can be considered in the analysis of moving objects: quantitative analysis and qualitative analysis. This research focuses on the latter and uses the qualitative trajectory calculus (QTC, a type of calculus that represents qualitative data on moving point objects (MPOs, and establishes a framework to analyze the relative movement of multiple MPOs. A visualization technique called sequence signature (SESI is used, which enables to map QTC patterns in a 2D indexed rasterized space in order to evaluate the similarity of relative movement patterns of multiple MPOs. The applicability of the proposed methodology is illustrated by means of two practical examples of interacting MPOs: cars on a highway and body parts of a samba dancer. The results show that the proposed method can be effectively used to analyze interactions of multiple MPOs in different domains.

  13. Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding.

    Directory of Open Access Journals (Sweden)

    Marta Brozynska

    Full Text Available Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina and Ion Torrent (Life Technology sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare. Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.

  14. Noncoding sequence classification based on wavelet transform analysis: part I

    Science.gov (United States)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  15. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Science.gov (United States)

    Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.

    2009-01-01

    The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722

  16. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Directory of Open Access Journals (Sweden)

    Guy Leonard

    2009-01-01

    Full Text Available The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment fi le, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree fi les (with a user-defined combination of species name and/or database accession number. Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file and generation of species and accession number lists for use in supplementary materials or figure legends.

  17. Image sequence analysis workstation for multipoint motion analysis

    Science.gov (United States)

    Mostafavi, Hassan

    1990-08-01

    This paper describes an application-specific engineering workstation designed and developed to analyze motion of objects from video sequences. The system combines the software and hardware environment of a modem graphic-oriented workstation with the digital image acquisition, processing and display techniques. In addition to automation and Increase In throughput of data reduction tasks, the objective of the system Is to provide less invasive methods of measurement by offering the ability to track objects that are more complex than reflective markers. Grey level Image processing and spatial/temporal adaptation of the processing parameters is used for location and tracking of more complex features of objects under uncontrolled lighting and background conditions. The applications of such an automated and noninvasive measurement tool include analysis of the trajectory and attitude of rigid bodies such as human limbs, robots, aircraft in flight, etc. The system's key features are: 1) Acquisition and storage of Image sequences by digitizing and storing real-time video; 2) computer-controlled movie loop playback, freeze frame display, and digital Image enhancement; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored Image sequence; 4) model-based estimation and tracking of the six degrees of freedom of a rigid body: 5) field-of-view and spatial calibration: 6) Image sequence and measurement data base management; and 7) offline analysis software for trajectory plotting and statistical analysis.

  18. Analysis of correlations between sites in models of protein sequences

    International Nuclear Information System (INIS)

    Giraud, B.G.; Lapedes, A.; Liu, L.C.

    1998-01-01

    A criterion based on conditional probabilities, related to the concept of algorithmic distance, is used to detect correlated mutations at noncontiguous sites on sequences. We apply this criterion to the problem of analyzing correlations between sites in protein sequences; however, the analysis applies generally to networks of interacting sites with discrete states at each site. Elementary models, where explicit results can be derived easily, are introduced. The number of states per site considered ranges from 2, illustrating the relation to familiar classical spin systems, to 20 states, suitable for representing amino acids. Numerical simulations show that the criterion remains valid even when the genetic history of the data samples (e.g., protein sequences), as represented by a phylogenetic tree, introduces nonindependence between samples. Statistical fluctuations due to finite sampling are also investigated and do not invalidate the criterion. A subsidiary result is found: The more homogeneous a population, the more easily its average properties can drift from the properties of its ancestor. copyright 1998 The American Physical Society

  19. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication. CONCLUSIONS...

  20. Comparative analysis of the prion protein gene sequences in African lion.

    Science.gov (United States)

    Wu, Chang-De; Pang, Wan-Yong; Zhao, De-Ming

    2006-10-01

    The prion protein gene of African lion (Panthera Leo) was first cloned and polymorphisms screened. The results suggest that the prion protein gene of eight African lions is highly homogenous. The amino acid sequences of the prion protein (PrP) of all samples tested were identical. Four single nucleotide polymorphisms (C42T, C81A, C420T, T600C) in the prion protein gene (Prnp) of African lion were found, but no amino acid substitutions. Sequence analysis showed that the higher homology is observed to felis catus AF003087 (96.7%) and to sheep number M31313.1 (96.2%) Genbank accessed. With respect to all the mammalian prion protein sequences compared, the African lion prion protein sequence has three amino acid substitutions. The homology might in turn affect the potential intermolecular interactions critical for cross species transmission of prion disease.

  1. WEB-server for search of a periodicity in amino acid and nucleotide sequences

    Science.gov (United States)

    E Frenkel, F.; Skryabin, K. G.; Korotkov, E. V.

    2017-12-01

    A new web server (http://victoria.biengi.ac.ru/splinter/login.php) was designed and developed to search for periodicity in nucleotide and amino acid sequences. The web server operation is based upon a new mathematical method of searching for multiple alignments, which is founded on the position weight matrices optimization, as well as on implementation of the two-dimensional dynamic programming. This approach allows the construction of multiple alignments of the indistinctly similar amino acid and nucleotide sequences that accumulated more than 1.5 substitutions per a single amino acid or a nucleotide without performing the sequences paired comparisons. The article examines the principles of the web server operation and two examples of studying amino acid and nucleotide sequences, as well as information that could be obtained using the web server.

  2. Characterization and sequence analysis of cysteine and glycine-rich ...

    African Journals Online (AJOL)

    Primers specific for CSRP3 were designed using known cDNA sequences of Bos taurus published in database with different accession numbers. Polymerase chain reaction (PCR) was performed and products were purified and sequenced. Sequence analysis and alignment were carried out using CLUSTAL W (1.83).

  3. Prediction of beta-turns from amino acid sequences using the residue-coupled model.

    Science.gov (United States)

    Guruprasad, K; Shukla, S

    2003-04-01

    We evaluated the prediction of beta-turns from amino acid sequences using the residue-coupled model with an enlarged representative protein data set selected from the Protein Data Bank. Our results show that the probability values derived from a data set comprising 425 protein chains yielded an overall beta-turn prediction accuracy 68.74%, compared with 94.7% reported earlier on a data set of 30 proteins using the same method. However, we noted that the overall beta-turn prediction accuracy using probability values derived from the 30-protein data set reduces to 40.74% when tested on the data set comprising 425 protein chains. In contrast, using probability values derived from the 425 data set used in this analysis, the overall beta-turn prediction accuracy yielded consistent results when tested on either the 30-protein data set (64.62%) used earlier or a more recent representative data set comprising 619 protein chains (64.66%) or on a jackknife data set comprising 476 representative protein chains (63.38%). We therefore recommend the use of probability values derived from the 425 representative protein chains data set reported here, which gives more realistic and consistent predictions of beta-turns from amino acid sequences.

  4. Complete amino acid sequence of a Lolium perenne (perennial rye grass) pollen allergen, Lol p II.

    Science.gov (United States)

    Ansari, A A; Shenbagamurthi, P; Marsh, D G

    1989-07-05

    The complete amino acid sequence of a Lolium perenne (rye grass) pollen allergen, Lol p II was determined by automated Edman degradation of the protein and selected fragments. Cleavage of the protein by enzymatic and chemical techniques established an unambiguous sequence for the protein. Lol p II contains 97 amino acid residues, with a calculated molecular weight of 10,882. The protein lacks cysteine and glutamine and shows no evidence of glycosylation. Theoretical predictions by Fraga's (Fraga, S. (1982) Can. J. Chem. 60, 2606-2610) and Hopp and Woods' (Hopp, T. P., and Woods, K. R. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 3824-3828) methods indicate the presence of four hydrophilic regions, which may contribute to sequential or parts of conformational B-cell epitopes. Analysis of amphipathic regions by Berzofsky's method indicates the presence of a highly amphipathic region, which may contain, or contribute to, an Ia/T-cell epitope. This latter segment of Lol p II was found to be highly homologous with an antibody-binding segment of the major rye allergen Lol p I and may explain why immune responsiveness to both the allergens is associated with HLA-DR3.

  5. Incident sequence analysis; event trees, methods and graphical symbols

    International Nuclear Information System (INIS)

    1980-11-01

    When analyzing incident sequences, unwanted events resulting from a certain cause are looked for. Graphical symbols and explanations of graphical representations are presented. The method applies to the analysis of incident sequences in all types of facilities. By means of the incident sequence diagram, incident sequences, i.e. the logical and chronological course of repercussions initiated by the failure of a component or by an operating error, can be presented and analyzed simply and clearly

  6. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Science.gov (United States)

    2010-07-01

    ... mature protein, with the number 1. When presented, the amino acids preceding the mature protein, e.g... acids. (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter... data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  7. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    Directory of Open Access Journals (Sweden)

    Roberts Richard J

    2008-05-01

    Full Text Available Abstract Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360, cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  8. Complete amino acid sequence of human intestinal aminopeptidase N as deduced from cloned cDNA

    DEFF Research Database (Denmark)

    Cowell, G M; Kønigshøfer, E; Danielsen, E M

    1988-01-01

    The complete primary structure (967 amino acids) of an intestinal human aminopeptidase N (EC 3.4.11.2) was deduced from the sequence of a cDNA clone. Aminopeptidase N is anchored to the microvillar membrane via an uncleaved signal for membrane insertion. A domain constituting amino acid 250...

  9. Draft Genome Sequences of Two Novel Acidimicrobiaceae Members from an Acid Mine Drainage Biofilm Metagenome

    OpenAIRE

    Pinto, Ameet J.; Sharp, Jonathan O.; Yoder, Michael J.; Almstrand, Robert

    2016-01-01

    Bacteria belonging to the family Acidimicrobiaceae are frequently encountered in heavy metal-contaminated acidic environments. However, their phylogenetic and metabolic diversity is poorly resolved. We present draft genome sequences of two novel and phylogenetically distinct Acidimicrobiaceae members assembled from an acid mine drainage biofilm metagenome.

  10. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    Science.gov (United States)

    Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong

    2015-01-01

    Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  11. Establishing a framework for comparative analysis of genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  12. Noise reduction methods for nucleic acid and macromolecule sequencing

    Science.gov (United States)

    Schuller, Ivan K.; Di Ventra, Massimiliano; Balatsky, Alexander

    2018-05-08

    Methods, systems, and devices are disclosed for processing macromolecule sequencing data with substantial noise reduction. In one aspect, a method for reducing noise in a sequential measurement of a macromolecule comprising serial subunits includes cross-correlating multiple measured signals of a physical property of subunits of interest of the macromolecule, the multiple measured signals including the time data associated with the measurement of the signal, to remove or at least reduce signal noise that is not in the same frequency and in phase with the systematic signal contribution of the measured signals.

  13. Clostridium sticklandii, a specialist in amino acid degradation:revisiting its metabolism through its genome sequence

    Directory of Open Access Journals (Sweden)

    Pelletier Eric

    2010-10-01

    Full Text Available Abstract Background Clostridium sticklandii belongs to a cluster of non-pathogenic proteolytic clostridia which utilize amino acids as carbon and energy sources. Isolated by T.C. Stadtman in 1954, it has been generally regarded as a "gold mine" for novel biochemical reactions and is used as a model organism for studying metabolic aspects such as the Stickland reaction, coenzyme-B12- and selenium-dependent reactions of amino acids. With the goal of revisiting its carbon, nitrogen, and energy metabolism, and comparing studies with other clostridia, its genome has been sequenced and analyzed. Results C. sticklandii is one of the best biochemically studied proteolytic clostridial species. Useful additional information has been obtained from the sequencing and annotation of its genome, which is presented in this paper. Besides, experimental procedures reveal that C. sticklandii degrades amino acids in a preferential and sequential way. The organism prefers threonine, arginine, serine, cysteine, proline, and glycine, whereas glutamate, aspartate and alanine are excreted. Energy conservation is primarily obtained by substrate-level phosphorylation in fermentative pathways. The reactions catalyzed by different ferredoxin oxidoreductases and the exergonic NADH-dependent reduction of crotonyl-CoA point to a possible chemiosmotic energy conservation via the Rnf complex. C. sticklandii possesses both the F-type and V-type ATPases. The discovery of an as yet unrecognized selenoprotein in the D-proline reductase operon suggests a more detailed mechanism for NADH-dependent D-proline reduction. A rather unusual metabolic feature is the presence of genes for all the enzymes involved in two different CO2-fixation pathways: C. sticklandii harbours both the glycine synthase/glycine reductase and the Wood-Ljungdahl pathways. This unusual pathway combination has retrospectively been observed in only four other sequenced microorganisms. Conclusions Analysis of the C

  14. 5S ribosomal ribonucleic acid sequences in Bacteroides and Fusobacterium: evolutionary relationships within these genera and among eubacteria in general

    Science.gov (United States)

    Van den Eynde, H.; De Baere, R.; Shah, H. N.; Gharbia, S. E.; Fox, G. E.; Michalik, J.; Van de Peer, Y.; De Wachter, R.

    1989-01-01

    The 5S ribosomal ribonucleic acid (rRNA) sequences were determined for Bacteroides fragilis, Bacteroides thetaiotaomicron, Bacteroides capillosus, Bacteroides veroralis, Porphyromonas gingivalis, Anaerorhabdus furcosus, Fusobacterium nucleatum, Fusobacterium mortiferum, and Fusobacterium varium. A dendrogram constructed by a clustering algorithm from these sequences, which were aligned with all other hitherto known eubacterial 5S rRNA sequences, showed differences as well as similarities with respect to results derived from 16S rRNA analyses. In the 5S rRNA dendrogram, Bacteroides clustered together with Cytophaga and Fusobacterium, as in 16S rRNA analyses. Intraphylum relationships deduced from 5S rRNAs suggested that Bacteroides is specifically related to Cytophaga rather than to Fusobacterium, as was suggested by 16S rRNA analyses. Previous taxonomic considerations concerning the genus Bacteroides, based on biochemical and physiological data, were confirmed by the 5S rRNA sequence analysis.

  15. Scalable Kernel Methods and Algorithms for General Sequence Analysis

    Science.gov (United States)

    Kuksa, Pavel

    2011-01-01

    Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…

  16. Recurrence plot analysis of DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    Wu Zuobing [State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing 100080 (China)]. E-mail: wuzb@lnm.imech.ac.cn

    2004-11-15

    Recurrence plot technique of DNA sequences is established on metric representation and employed to analyze correlation structure of nucleotide strings. It is found that, in the transference of nucleotide strings, a human DNA fragment has a major correlation distance, but a yeast chromosome's correlation distance has a constant increasing.

  17. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition.

    Science.gov (United States)

    Hayat, Maqsood; Khan, Asifullah

    2011-02-21

    Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew's correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor. Copyright © 2010 Elsevier Ltd. All rights reserved.

  18. Sequence Design for a Test Tube of Interacting Nucleic Acid Strands.

    Science.gov (United States)

    Wolfe, Brian R; Pierce, Niles A

    2015-10-16

    We describe an algorithm for designing the equilibrium base-pairing properties of a test tube of interacting nucleic acid strands. A target test tube is specified as a set of desired "on-target" complexes, each with a target secondary structure and target concentration, and a set of undesired "off-target" complexes, each with vanishing target concentration. Sequence design is performed by optimizing the test tube ensemble defect, corresponding to the concentration of incorrectly paired nucleotides at equilibrium evaluated over the ensemble of the test tube. To reduce the computational cost of accepting or rejecting mutations to a random initial sequence, the structural ensemble of each on-target complex is hierarchically decomposed into a tree of conditional subensembles, yielding a forest of decomposition trees. Candidate sequences are evaluated efficiently at the leaf level of the decomposition forest by estimating the test tube ensemble defect from conditional physical properties calculated over the leaf subensembles. As optimized subsequences are merged toward the root level of the forest, any emergent defects are eliminated via ensemble redecomposition and sequence reoptimization. After successfully merging subsequences to the root level, the exact test tube ensemble defect is calculated for the first time, explicitly checking for the effect of the previously neglected off-target complexes. Any off-target complexes that form at appreciable concentration are hierarchically decomposed, added to the decomposition forest, and actively destabilized during subsequent forest reoptimization. For target test tubes representative of design challenges in the molecular programming and synthetic biology communities, our test tube design algorithm typically succeeds in achieving a normalized test tube ensemble defect ≤1% at a design cost within an order of magnitude of the cost of test tube analysis.

  19. Sequence and phylogenetic analysis of chicken anaemia virus obtained from backyard and commercial chickens in Nigeria.

    Science.gov (United States)

    Oluwayelu, D O; Todd, D; Olaleye, O D

    2008-12-01

    This work reports the first molecular analysis study of chicken anaemia virus (CAV) in backyard chickens in Africa using molecular cloning and sequence analysis to characterize CAV strains obtained from commercial chickens and Nigerian backyard chickens. Partial VP1 gene sequences were determined for three CAVs from commercial chickens and for six CAV variants present in samples from a backyard chicken. Multiple alignment analysis revealed that the 6% and 4% nucleotide diversity obtained respectively for the commercial and backyard chicken strains translated to only 2% amino acid diversity for each breed. Overall, the amino acid composition of Nigerian CAVs was found to be highly conserved. Since the partial VP1 gene sequence of two backyard chicken cloned CAV strains (NGR/CI-8 and NGR/CI-9) were almost identical and evolutionarily closely related to the commercial chicken strains NGR-1, and NGR-4 and NGR-5, respectively, we concluded that CAV infections had crossed the farm boundary.

  20. RNAblueprint: flexible multiple target nucleic acid sequence design.

    Science.gov (United States)

    Hammer, Stefan; Tschiatschek, Birgit; Flamm, Christoph; Hofacker, Ivo L; Findeiß, Sven

    2017-09-15

    Realizing the value of synthetic biology in biotechnology and medicine requires the design of molecules with specialized functions. Due to its close structure to function relationship, and the availability of good structure prediction methods and energy models, RNA is perfectly suited to be synthetically engineered with predefined properties. However, currently available RNA design tools cannot be easily adapted to accommodate new design specifications. Furthermore, complicated sampling and optimization methods are often developed to suit a specific RNA design goal, adding to their inflexibility. We developed a C ++  library implementing a graph coloring approach to stochastically sample sequences compatible with structural and sequence constraints from the typically very large solution space. The approach allows to specify and explore the solution space in a well defined way. Our library also guarantees uniform sampling, which makes optimization runs performant by not only avoiding re-evaluation of already found solutions, but also by raising the probability of finding better solutions for long optimization runs. We show that our software can be combined with any other software package to allow diverse RNA design applications. Scripting interfaces allow the easy adaption of existing code to accommodate new scenarios, making the whole design process very flexible. We implemented example design approaches written in Python to demonstrate these advantages. RNAblueprint , Python implementations and benchmark datasets are available at github: https://github.com/ViennaRNA . s.hammer@univie.ac.at, ivo@tbi.univie.ac.at or sven@tbi.univie.ac.at. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  1. Analysis of Neuronal Sequences Using Pairwise Biases

    Science.gov (United States)

    2015-08-27

    semantic memory (knowledge of facts) and implicit memory (e.g., how to ride a bike ). Evidence for the participation of the hippocampus in the formation of...hippocampal formation in an attempt to be cured of severe epileptic seizures. Although the surgery was successful in regards to reducing the frequency and...very different from each other in many ways including duration and number of spikes. Still, these sequences share a similar trend in the general order

  2. Google matrix analysis of DNA sequences.

    Science.gov (United States)

    Kandiah, Vivek; Shepelyansky, Dima L

    2013-01-01

    For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  3. Google matrix analysis of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Vivek Kandiah

    Full Text Available For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW. At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  4. Use of vectors in sequence analysis.

    Science.gov (United States)

    Ishikawa, T; Yamamoto, K; Yoshikura, H

    1987-10-01

    Applications of the vector diagram, a new type of representation of protein structure, in homology search of various proteins including oncogene products are presented. The method takes account of various kinds of information concerning the properties of amino acids, such as Chou and Fasman's probability data. The method can detect conformational similarities of proteins which may not be detected by the conventional programs.

  5. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    Directory of Open Access Journals (Sweden)

    Wadim L. Matochko

    2013-01-01

    Full Text Available Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa. The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq. Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN, which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  6. Molecular cloning, sequence analysis and structure prediction of the ...

    African Journals Online (AJOL)

    AJL

    2012-04-19

    Apr 19, 2012 ... The primers were based on the rBAT sequences of other animals deposited in GenBank. .... fragment; M1, 2000 bp DNA ladder; M2, 1000 bp DNA ladder. spliced to obtain the ..... A traffic signal for heterodimeric amino acid.

  7. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active...... related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein...... sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally...

  8. Phylogenetic analysis of the genus Hordeum using repetitive DNA sequences

    DEFF Research Database (Denmark)

    Svitashev, S.; Bryngelsson, T.; Vershinin, A.

    1994-01-01

    A set of six cloned barley (Hordeum vulgare) repetitive DNA sequences was used for the analysis of phylogenetic relationships among 31 species (46 taxa) of the genus Hordeum, using molecular hybridization techniques. In situ hybridization experiments showed dispersed organization of the sequences...

  9. Isolation and amino acid sequence of corticotropin-releasing factor from pig hypothalami.

    OpenAIRE

    Patthy, M; Horvath, J; Mason-Garcia, M; Szoke, B; Schlesinger, D H; Schally, A V

    1985-01-01

    A polypeptide was isolated from acid extracts of porcine hypothalami on the basis of its high ability to stimulate the release of corticotropin from superfused rat pituitary cells. After an initial separation by gel filtration on Sephadex G-25, further purification was carried out by reversed-phase HPLC. The isolated material was homogeneous chromatographically and by N-terminal sequencing. Based on automated gas-phase sequencing of the intact and CNBr-cleaved peptide and on carboxypeptidase ...

  10. Irritable bowel syndrome-diarrhea: characterization of genotype by exome sequencing, and phenotypes of bile acid synthesis and colonic transit

    Science.gov (United States)

    Klee, Eric W.; Shin, Andrea; Carlson, Paula; Li, Ying; Grover, Madhusudan; Zinsmeister, Alan R.

    2013-01-01

    The study objectives were: to mine the complete exome to identify putative rare single nucleotide variants (SNVs) associated with irritable bowel syndrome (IBS)-diarrhea (IBS-D) phenotype, to assess genes that regulate bile acids in IBS-D, and to explore univariate associations of SNVs with symptom phenotype and quantitative traits in an independent IBS cohort. Using principal components analysis, we identified two groups of IBS-D (n = 16) with increased fecal bile acids: rapid colonic transit or high bile acids synthesis. DNA was sequenced in depth, analyzing SNVs in bile acid genes (ASBT, FXR, OSTα/β, FGF19, FGFR4, KLB, SHP, CYP7A1, LRH-1, and FABP6). Exome findings were compared with those of 50 similar ethnicity controls. We assessed univariate associations of each SNV with quantitative traits and a principal components analysis and associations between SNVs in KLB and FGFR4 and symptom phenotype in 405 IBS, 228 controls and colonic transit in 70 IBS-D, 71 IBS-constipation. Mining the complete exome did not reveal significant associations with IBS-D over controls. There were 54 SNVs in 10 of 11 bile acid-regulating genes, with no SNVs in FGF19; 15 nonsynonymous SNVs were identified in similar proportions of IBS-D and controls. Variations in KLB (rs1015450, downstream) and FGFR4 [rs434434 (intronic), rs1966265, and rs351855 (nonsynonymous)] were associated with colonic transit (rs1966265; P = 0.043), fecal bile acids (rs1015450; P = 0.064), and principal components analysis groups (all 3 FGFR4 SNVs; P transit (P = 0.066). Thus exome sequencing identified additional variants in KLB and FGFR4 associated with bile acids or colonic transit in IBS-D. PMID:24200957

  11. Sequence determination and analysis of the NSs genes of two tospoviruses.

    Science.gov (United States)

    Hallwass, Mariana; Leastro, Mikhail O; Lima, Mirtes F; Inoue-Nagata, Alice K; Resende, Renato O

    2012-03-01

    The tospoviruses groundnut ringspot virus (GRSV) and zucchini lethal chlorosis virus (ZLCV) cause severe losses in many crops, especially in solanaceous and cucurbit species. In this study, the non-structural NSs gene and the 5'UTRs of these two biologically distinct tospoviruses were cloned and sequenced. The NSs sequence of GRSV and ZLCV were both 1,404 nucleotides long. Pairwise comparison showed that the NSs amino acid sequence of GRSV shared 69.6% identity with that of ZLCV and 75.9% identity with that of TSWV, while the NSs sequence of ZLCV and TSWV shared 67.9% identity. Phylogenetic analysis based on NSs sequences confirmed that these viruses cluster in the American clade.

  12. RevTrans: multiple alignment of coding DNA from aligned amino acid sequences

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Pedersen, Anders Gorm

    2003-01-01

    The simple fact that proteins are built from 20 amino acids while DNA only contains four different bases, means that the 'signal-to-noise ratio' in protein sequence alignments is much better than in alignments of DNA. Besides this information-theoretical advantage, protein alignments also benefit...... proteins. It is therefore preferable to align coding DNA at the amino acid level and it is for this purpose we have constructed the program RevTrans. RevTrans constructs a multiple DNA alignment by: (i) translating the DNA; (ii) aligning the resulting peptide sequences; and (iii) building a multiple DNA...

  13. Parametric inference for biological sequence analysis.

    Science.gov (United States)

    Pachter, Lior; Sturmfels, Bernd

    2004-11-16

    One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied to these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.

  14. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    Science.gov (United States)

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  15. Human acid β-glucosidase: isolation and amino acid sequence of a peptide containing the catalytic site

    International Nuclear Information System (INIS)

    Dinur, T.; Osiecki, K.M.; Legler, G.; Gatt, S.; Desnick, R.J.; Grabowski, G.A.

    1986-01-01

    Human acid β-glucosidase (D-glucosyl-N-acylsphingosine glucohydrolase, EC 3.2.1.45) cleaves the glucosidic bonds of glucosylceramide and synthetic β-glucosides. The deficient activity of this hydrolase is the enzymatic defect in the subtypes and variants of Gaucher disease, the most prevalent lysosomal storage disease. To isolate and characterize the catalytic site of the normal enzyme, brominated 3 H-labeled conduritol B epoxide ( 3 H-Br-CBE), which inhibits the enzyme by binding covalently to this site, was used as an affinity label. Under optimal conditions 1 mol of 3 H-Br-CBE bound to 1 mol of pure enzyme protein, indicating the presence of a single catalytic site per enzyme subunit. After V 8 protease digestion of the 3 H-Br-CBE-labeled homogeneous enzyme, three radiolabeled peptides, designated peptide A, B, or C, were resolved by reverse-phase HPLC. The partial amino acid sequence (37 residues) of peptide A (M/sub r/, 5000) was determined. The sequence of this peptide, which contained the catalytic site, had exact homology to the sequence near the carboxyl terminus of the protein, as predicted from the nucleotide sequence of the full-length cDNA encoding acid β-glucosidase

  16. Draft genome sequence of the docosahexaenoic acid producing thraustochytrid Aurantiochytrium sp. T66

    Directory of Open Access Journals (Sweden)

    Bin Liu

    2016-06-01

    Full Text Available Thraustochytrids are unicellular, marine protists, and there is a growing industrial interest in these organisms, particularly because some species, including strains belonging to the genus Aurantiochytrium, accumulate high levels of docosahexaenoic acid (DHA. Here, we report the draft genome sequence of Aurantiochytrium sp. T66 (ATCC PRA-276, with a size of 43 Mbp, and 11,683 predicted protein-coding sequences. The data has been deposited at DDBJ/EMBL/Genbank under the accession LNGJ00000000. The genome sequence will contribute new insight into DHA biosynthesis and regulation, providing a basis for metabolic engineering of thraustochytrids.

  17. Detection of COL III in Parchment by Amino Acid Analysis

    DEFF Research Database (Denmark)

    Vestergaard Poulsen Sommer, Dorte; Larsen, René

    2016-01-01

    Cultural heritage parchments made from the reticular dermis of animals have been subject to studies of deterioration and conservation by amino acid analysis. The reticular dermis contains a varying mixture of collagen I and III (COL I and III). When dealing with the results of the amino acid...... analyses, till now the COL III content has not been taken into account. Based on the available amino acid sequences we present a method for determining the amount of COL III in the reticular dermis of new and historical parchments calculated from the ratio of Ile/Val. We find COL III contents between 7...... and 32 % in new parchments and between 0.2 and 40 % in the historical parchments. This is consistent with results in the literature. The varying content of COL III has a significant influence on the uncertainty of the amino acid analysis. Although we have not found a simple correlation between the COL...

  18. RESEARCH NOTE Genome-based exome-sequencing analysis ...

    Indian Academy of Sciences (India)

    Navya

    2017-02-22

    Feb 22, 2017 ... Genome-based exome-sequencing analysis identifies GYG1, DIS3L, DDRGK1 genes ... Cardiology Division, Department of Internal Medicine, Severance .... with p values of <0.05 byanalyzing differences in allele distribution.

  19. Editorial: Special Issue on Algorithms for Sequence Analysis and Storage

    Directory of Open Access Journals (Sweden)

    Veli Mäkinen

    2014-03-01

    Full Text Available This special issue of Algorithms is dedicated to approaches to biological sequence analysis that have algorithmic novelty and potential for fundamental impact in methods used for genome research.

  20. Tools for integrated sequence-structure analysis with UCSF Chimera

    Directory of Open Access Journals (Sweden)

    Huang Conrad C

    2006-07-01

    Full Text Available Abstract Background Comparing related structures and viewing the structures in the context of sequence alignments are important tasks in protein structure-function research. While many programs exist for individual aspects of such work, there is a need for interactive visualization tools that: (a provide a deep integration of sequence and structure, far beyond mapping where a sequence region falls in the structure and vice versa; (b facilitate changing data of one type based on the other (for example, using only sequence-conserved residues to match structures, or adjusting a sequence alignment based on spatial fit; (c can be used with a researcher's own data, including arbitrary sequence alignments and annotations, closely or distantly related sets of proteins, etc.; and (d interoperate with each other and with a full complement of molecular graphics features. We describe enhancements to UCSF Chimera to achieve these goals. Results The molecular graphics program UCSF Chimera includes a suite of tools for interactive analyses of sequences and structures. Structures automatically associate with sequences in imported alignments, allowing many kinds of crosstalk. A novel method is provided to superimpose structures in the absence of a pre-existing sequence alignment. The method uses both sequence and secondary structure, and can match even structures with very low sequence identity. Another tool constructs structure-based sequence alignments from superpositions of two or more proteins. Chimera is designed to be extensible, and mechanisms for incorporating user-specific data without Chimera code development are also provided. Conclusion The tools described here apply to many problems involving comparison and analysis of protein structures and their sequences. Chimera includes complete documentation and is intended for use by a wide range of scientists, not just those in the computational disciplines. UCSF Chimera is free for non-commercial use and is

  1. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    Directory of Open Access Journals (Sweden)

    Xiaoxia Yang

    Full Text Available Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  2. First draft genome sequencing of indole acetic acid producing and plant growth promoting fungus Preussia sp. BSL10.

    Science.gov (United States)

    Khan, Abdul Latif; Asaf, Sajjad; Khan, Abdur Rahim; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung

    2016-05-10

    Preussia sp. BSL10, family Sporormiaceae, was actively producing phytohormone (indole-3-acetic acid) and extra-cellular enzymes (phosphatases and glucosidases). The fungus was also promoting the growth of arid-land tree-Boswellia sacra. Looking at such prospects of this fungus, we sequenced its draft genome for the first time. The Illumina based sequence analysis reveals an approximate genome size of 31.4Mbp for Preussia sp. BSL10. Based on ab initio gene prediction, total 32,312 coding sequences were annotated consisting of 11,967 coding genes, pseudogenes, and 221 tRNA genes. Furthermore, 321 carbohydrate-active enzymes were predicted and classified into many functional families. Copyright © 2016 Elsevier B.V. All rights reserved.

  3. Sequencing and Analysis of Neanderthal Genomic DNA

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  4. SVAMP: Sequence variation analysis, maps and phylogeny

    KAUST Repository

    Naeem, Raeece; Hidayah, Lailatul; Preston, Mark D.; Clark, Taane G.; Pain, Arnab

    2014-01-01

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis

  5. Isolation and sequence analysis of a cDNA clone encoding the fifth complement component

    DEFF Research Database (Denmark)

    Lundwall, Åke B; Wetsel, Rick A; Kristensen, Torsten

    1985-01-01

    DNA clone of 1.85 kilobase pairs was isolated. Hybridization of the mixed-sequence probe to the complementary strand of the plasmid insert and sequence analysis by the dideoxy method predicted the expected protein sequence of C5a (positions 1-12), amino-terminal to the anticipated priming site. The sequence......, subcloned into M13 mp8, and sequenced at random by the dideoxy technique, thereby generating a contiguous sequence of 1703 base pairs. This clone contained coding sequence for the C-terminal 262 amino acid residues of the beta-chain, the entire C5a fragment, and the N-terminal 98 residues of the alpha......'-chain. The 3' end of the clone had a polyadenylated tail preceded by a polyadenylation recognition site, a 3'-untranslated region, and base pairs homologous to the human Alu concensus sequence. Comparison of the derived partial human C5 protein sequence with that previously determined for murine C3 and human...

  6. Expressed sequence tags as a tool for phylogenetic analysis of placental mammal evolution.

    Directory of Open Access Journals (Sweden)

    Morgan Kullberg

    Full Text Available BACKGROUND: We investigate the usefulness of expressed sequence tags, ESTs, for establishing divergences within the tree of placental mammals. This is done on the example of the established relationships among primates (human, lagomorphs (rabbit, rodents (rat and mouse, artiodactyls (cow, carnivorans (dog and proboscideans (elephant. METHODOLOGY/PRINCIPAL FINDINGS: We have produced 2000 ESTs (1.2 mega bases from a marsupial mouse and characterized the data for their use in phylogenetic analysis. The sequences were used to identify putative orthologous sequences from whole genome projects. Although most ESTs stem from single sequence reads, the frequency of potential sequencing errors was found to be lower than allelic variation. Most of the sequences represented slowly evolving housekeeping-type genes, with an average amino acid distance of 6.6% between human and mouse. Positive Darwinian selection was identified at only a few single sites. Phylogenetic analyses of the EST data yielded trees that were consistent with those established from whole genome projects. CONCLUSIONS: The general quality of EST sequences and the general absence of positive selection in these sequences make ESTs an attractive tool for phylogenetic analysis. The EST approach allows, at reasonable costs, a fast extension of data sampling from species outside the genome projects.

  7. Cloning and sequence analysis of cDNA coding for rat nucleolar protein C23

    International Nuclear Information System (INIS)

    Ghaffari, S.H.; Olson, M.O.J.

    1986-01-01

    Using synthetic oligonucleotides as primers and probes, the authors have isolated and sequenced cDNA clones encoding protein C23, a putative nucleolus organizer protein. Poly(A + ) RNA was isolated from rat Novikoff hepatoma cells and enriched in C23 mRNA by sucrose density gradient ultracentrifugation. Two deoxyoligonuleotides, a 48- and a 27-mer, were synthesized on the basis of amino acid sequence from the C-terminal half of protein C23 and cDNA sequence data from CHO cell protein. The 48-mer was used a primer for synthesis of cDNA which was then inserted into plasmid pUC9. Transformed bacterial colonies were screened by hybridization with 32 P labeled 27-mer. Two clones among 5000 gave a strong positive signal. Plasmid DNAs from these clones were purified and characterized by blotting and nucleotide sequence analysis. The length of C23 mRNA was estimated to be 3200 bases in a northern blot analysis. The sequence of a 267 b.p. insert shows high homology with the CHO cDNA with only 9 nucleotide differences and an identical amino acid sequence. These studies indicate that this region of the protein is highly conserved

  8. DSAP: deep-sequencing small RNA analysis pipeline.

    Science.gov (United States)

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  9. The human receptor for urokinase plasminogen activator. NH2-terminal amino acid sequence and glycosylation variants

    DEFF Research Database (Denmark)

    Behrendt, N; Rønne, E; Ploug, M

    1990-01-01

    -PA. The purified protein shows a single 55-60 kDa band after sodium dodecyl sulfate-polyacrylamide gel electrophoresis and silver staining. It is a heavily glycosylated protein, the deglycosylated polypeptide chain comprising only 35 kDa. The glycosylated protein contains N-acetyl-D-glucosamine and sialic acid......, but no N-acetyl-D-galactosamine. Glycosylation is responsible for substantial heterogeneity in the receptor on phorbol ester-stimulated U937 cells, and also for molecular weight variations among various cell lines. The amino acid composition and the NH2-terminal amino acid sequence are reported...

  10. Amino-acid sequence of two trypsin isoinhibitors, ITD I and ITD III from squash seeds (Cucurbita maxima).

    Science.gov (United States)

    Wilusz, T; Wieczorek, M; Polanowski, A; Denton, A; Cook, J; Laskowski, M

    1983-01-01

    The amino-acid sequences of two trypsin isoinhibitors, ITD I and ITD III, from squash seeds (Cucurbita maxima) were determined. Both isoinhibitors contain 29 amino-acid residues, including 6 half cystine residues. They differ only by one amino acid. Lysine in position 9 of ITD III is substituted by glutamic acid in ITD I. Arginine in position 5 is present at the reactive site of both isoinhibitors. The previously published sequence of ITD III has been shown to be incorrect.

  11. Nonlinear analysis of river flow time sequences

    Science.gov (United States)

    Porporato, Amilcare; Ridolfi, Luca

    1997-06-01

    Within the field of chaos theory several methods for the analysis of complex dynamical systems have recently been proposed. In light of these ideas we study the dynamics which control the behavior over time of river flow, investigating the existence of a low-dimension deterministic component. The present article follows the research undertaken in the work of Porporato and Ridolfi [1996a] in which some clues as to the existence of chaos were collected. Particular emphasis is given here to the problem of noise and to nonlinear prediction. With regard to the latter, the benefits obtainable by means of the interpolation of the available time series are reported and the remarkable predictive results attained with this nonlinear method are shown.

  12. Accident sequence analysis of human-computer interface design

    International Nuclear Information System (INIS)

    Fan, C.-F.; Chen, W.-H.

    2000-01-01

    It is important to predict potential accident sequences of human-computer interaction in a safety-critical computing system so that vulnerable points can be disclosed and removed. We address this issue by proposing a Multi-Context human-computer interaction Model along with its analysis techniques, an Augmented Fault Tree Analysis, and a Concurrent Event Tree Analysis. The proposed augmented fault tree can identify the potential weak points in software design that may induce unintended software functions or erroneous human procedures. The concurrent event tree can enumerate possible accident sequences due to these weak points

  13. Food Fish Identification from DNA Extraction through Sequence Analysis

    Science.gov (United States)

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  14. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers.

    Directory of Open Access Journals (Sweden)

    Stephan Pabinger

    Full Text Available Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM. Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage

  15. Multilocus Sequence Analysis and rpoB Sequencing of Mycobacterium abscessus (Sensu Lato) Strains▿

    Science.gov (United States)

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-01-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536T, M. massiliense CIP 108297T, and M. bolletii CIP 108541T) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the clustering

  16. Multilocus sequence analysis and rpoB sequencing of Mycobacterium abscessus (sensu lato) strains.

    Science.gov (United States)

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-02-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536(T), M. massiliense CIP 108297(T), and M. bolletii CIP 108541(T)) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the

  17. An optimum analysis sequence for environmental gamma-ray spectrometry

    Energy Technology Data Exchange (ETDEWEB)

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L., E-mail: fta777@hotmail.co [Universidad Autonoma de Zacatecas, Centro Regional de Estudis Nucleares, Calle Cipres No. 10, Fracc. La Penuela, 98068 Zacatecas (Mexico)

    2010-10-15

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced {chi}{sup 2} criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  18. An optimum analysis sequence for environmental gamma-ray spectrometry

    International Nuclear Information System (INIS)

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L.

    2010-10-01

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced χ 2 criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  19. RNA2 of grapevine fanleaf virus: sequence analysis and coat protein cistron location.

    Science.gov (United States)

    Serghini, M A; Fuchs, M; Pinck, M; Reinbolt, J; Walter, B; Pinck, L

    1990-07-01

    The nucleotide sequence of the genomic RNA2 (3774 nucleotides) of grapevine fanleaf virus strain F13 was determined from overlapping cDNA clones and its genetic organization was deduced. Two rapid and efficient methods were used for cDNA cloning of the 5' region of RNA2. The complete sequence contained only one long open reading frame of 3555 nucleotides (1184 codons, 131K product). The analysis of the N-terminal sequence of purified coat protein (CP) and identification of its C-terminal residue have allowed the CP cistron to be precisely positioned within the polyprotein. The CP produced by proteolytic cleavage at the Arg/Gly site between residues 680 and 681 contains 504 amino acids (Mr 56019) and has hydrophobic properties. The Arg/Gly cleavage site deduced by N-terminal amino acid sequence analysis is the first for a nepovirus coat protein and for plant viruses expressing their genomic RNAs by polyprotein synthesis. Comparison of GFLV RNA2 with M RNA of cowpea mosaic comovirus and with RNA2 of two closely related nepoviruses, tomato black ring virus and Hungarian grapevine chrome mosaic virus, showed strong similarities among the 3' non-coding regions but less similarity among the 5' end non-coding sequences than reported among other nepovirus RNAs.

  20. Genomic localization, sequence analysis, and transcription of the putative human cytomegalovirus DNA polymerase gene

    International Nuclear Information System (INIS)

    Heilbronn, T.; Jahn, G.; Buerkle, A.; Freese, U.K.; Fleckenstein, B.; Zur Hausen, H.

    1987-01-01

    The human cytomegalovirus (HCMV)-induced DNA polymerase has been well characterized biochemically and functionally, but its genomic location has not yet been assigned. To identify the coding sequence, cross-hybridization with the herpes simplex virus type 1 (HSV-1) polymerase gene was used, as suggested by the close similarity of the herpes group virus-induced DNA polymerases to the HCMV DNA polymerase. A cosmid and plasmid library of the entire HCMV genome was screened with the BamHI Q fragment of HSF-1 at different stringency conditions. One PstI-HincII restriction fragment of 850 base pairs mapping within the EcoRI M fragment of HCMV cross-hybridized at T/sub m/ - 25/degrees/C. Sequence analysis revealed one open reading frame spanning the entire sequence. The amino acid sequence showed a highly conserved domain of 133 amino acids shared with the HSV and putative Esptein-Barr virus polymerase sequences. This domain maps within the C-terminal part of the HSV polymerase gene, which has been suggested to contain part of the catalytic center of the enzyme. Transcription analysis revealed one 5.4-kilobase early transcript in the sense orientation with respect to the open reading frame identified. This transcript appears to code for the 140-kilodalton HCMV polymerase protein

  1. In silico Analysis of osr40c1 Promoter Sequence Isolated from Indica Variety Pokkali

    OpenAIRE

    W.S.I. de Silva; M.M.N. Perera; K.L.N.S. Perera; A.M. Wickramasuriya; G.A.U. Jayasekera

    2017-01-01

    The promoter region of a drought and abscisic acid (ABA) inducible gene, osr40c1, was isolated from a salt-tolerant indica rice variety Pokkali, which is 670 bp upstream of the putative translation start codon. In silico promoter analysis of resulted sequence showed that at least 15 types of putative motifs were distributed within the sequence, including two types of common promoter elements, TATA and CAAT boxes. Additionally, several putative cis-acing regulatory elements which may be involv...

  2. The amino acid sequence of cytochrome c from Cucurbita maxima L. (pumpkin)

    Science.gov (United States)

    Thompson, E. W.; Richardson, M.; Boulter, D.

    1971-01-01

    The amino acid sequence of pumpkin cytochrome c was determined on 2μmol of protein. Some evidence was found for the occurrence of two forms of cytochrome c, whose sequences differed in three positions. Pumpkin cytochrome c consists of 111 residues and is homologous with mitochondrial cytochromes c from other plants. Experimental details are given in a supplementary paper that has been deposited as Supplementary Publication SUP 50005 at the National Lending Library for Science and Technology, Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1971), 121, 7. PMID:5131733

  3. Hydroquinone: O-glucosyltransferase from cultivated Rauvolfia cells: enrichment and partial amino acid sequences.

    Science.gov (United States)

    Arend, J; Warzecha, H; Stöckigt, J

    2000-01-01

    Plant cell suspension cultures of Rauvolfia are able to produce a high amount of arbutin by glucosylation of exogenously added hydroquinone. A four step purification procedure using anion exchange, hydrophobic interaction, hydroxyapatite-chromatography and chromatofocusing delivered in a yield of 0.5%, an approximately 390 fold enrichment of the involved glucosyltransferase. SDS-PAGE showed a M(r) for the enzyme of 52 kDa. Proteolysis of the pure enzyme with endoproteinase LysC revealed six peptide fragments with 9-23 amino acids which were sequenced. Sequence alignment of the six peptides showed high homologies to glycosyltransferases from other higher plants.

  4. Application of Ammonium Persulfate for Selective Oxidation of Guanines for Nucleic Acid Sequencing

    Directory of Open Access Journals (Sweden)

    Yafen Wang

    2017-07-01

    Full Text Available Nucleic acids can be sequenced by a chemical procedure that partially damages the nucleotide positions at their base repetition. Many methods have been reported for the selective recognition of guanine. The accurate identification of guanine in both single and double regions of DNA and RNA remains a challenging task. Herein, we present a new, non-toxic and simple method for the selective recognition of guanine in both DNA and RNA sequences via ammonium persulfate modification. This strategy can be further successfully applied to the detection of 5-methylcytosine by using PCR.

  5. Identification of metal ion binding sites based on amino acid sequences.

    Science.gov (United States)

    Cao, Xiaoyong; Hu, Xiuzhen; Zhang, Xiaojin; Gao, Sujuan; Ding, Changjiang; Feng, Yonge; Bao, Weihua

    2017-01-01

    The identification of metal ion binding sites is important for protein function annotation and the design of new drug molecules. This study presents an effective method of analyzing and identifying the binding residues of metal ions based solely on sequence information. Ten metal ions were extracted from the BioLip database: Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+ and Co2+. The analysis showed that Zn2+, Cu2+, Fe2+, Fe3+, and Co2+ were sensitive to the conservation of amino acids at binding sites, and promising results can be achieved using the Position Weight Scoring Matrix algorithm, with an accuracy of over 79.9% and a Matthews correlation coefficient of over 0.6. The binding sites of other metals can also be accurately identified using the Support Vector Machine algorithm with multifeature parameters as input. In addition, we found that Ca2+ was insensitive to hydrophobicity and hydrophilicity information and Mn2+ was insensitive to polarization charge information. An online server was constructed based on the framework of the proposed method and is freely available at http://60.31.198.140:8081/metal/HomePage/HomePage.html.

  6. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    Science.gov (United States)

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  7. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Science.gov (United States)

    2010-07-01

    ... may not include material other than part of the sequence listing. A fixed-width font should be used... integer expressing the number of bases or amino acid residues M. Type Whether presented sequence molecule is DNA, RNA, or PRT (protein). If a nucleotide sequence contains both DNA and RNA fragments, the type...

  8. Bacteria obtained from a sequencing batch reactor that are capable of growth on dehydroabietic acid.

    OpenAIRE

    Mohn, W W

    1995-01-01

    Eleven isolates capable of growth on the resin acid dehydroabietic acid (DhA) were obtained from a sequencing batch reactor designed to treat a high-strength process stream from a paper mill. The isolates belonged to two groups, represented by strains DhA-33 and DhA-35, which were characterized. In the bioreactor, bacteria like DhA-35 were more abundant than those like DhA-33. The population in the bioreactor of organisms capable of growth on DhA was estimated to be 1.1 x 10(6) propagules per...

  9. Complementary DNA and derived amino acid sequence of the α subunit of human complement protein C8: evidence for the existence of a separate α subunit messenger RNA

    International Nuclear Information System (INIS)

    Rao, A.G.; Howard, O.M.Z.; Ng, S.C.; Whitehead, A.S.; Colten, H.R.; Sodetz, J.M.

    1987-01-01

    The entire amino acid sequence of the α subunit (M/sub r/ 64,000) of the eight component of complement (C8) was determined by characterizing cDNA clones isolated from a human liver cDNA library. Two clones with overlapping inserts of net length 2.44 kilobases (kb) were isolated and found to contain the entire α coding region [1659 base pairs (bp)]. The 5' end consists of an untranslated region and a leader sequence of 30 amino acids. This sequence contains an apparent initiation Met, signal peptide, and propeptide which ends with an arginine-rich sequence that is characteristic of proteolytic processing sites found in the pro form of protein precursors. The 3' untranslated region contains two polyadenylation signals and a poly(A)sequence. RNA blot analysis of total cellular RNA from the human hepatoma cell line HepG2 revealed a message size of ∼2.5 kb. Features of the 5' and 3' sequences and the message size suggest that a separate mRNA codes for α and argues against the occurrence of a single-chain precursor form of the disulfide-linked α-λ subunit found in mature C8. Analysis of the derived amino acid sequence revealed several membrane surface seeking domains and a possible transmembrane domain. Analysis of the carbohydrate composition indicates 1 or 2 asparagine-linked but no O-linked oligosaccharide chains, a result consistent with predictions from the amino acid sequence. Most significantly, it exhibits a striking overall homology to human C9, with values of 24% on the basis of identity and 46% when conserved substitutions are allowed. As described in an accompanying report this homology also extends to the β subunit of C8

  10. Amino acid sequences mediating vascular cell adhesion molecule 1 binding to integrin alpha 4: homologous DSP sequence found for JC polyoma VP1 coat protein

    Directory of Open Access Journals (Sweden)

    Michael Andrew Meyer

    2013-07-01

    Full Text Available The JC polyoma viral coat protein VP1 was analyzed for amino acid sequences homologies to the IDSP sequence which mediates binding of VLA-4 (integrin alpha 4 to vascular cell adhesion molecule 1. Although the full sequence was not found, a DSP sequence was located near the critical arginine residue linked to infectivity of the virus and binding to sialic acid containing molecules such as integrins (3. For the JC polyoma virus, a DSP sequence was found at residues 70, 71 and 72 with homology also noted for the mouse polyoma virus and SV40 virus. Three dimensional modeling of the VP1 molecule suggests that the DSP loop has an accessible site for interaction from the external side of the assembled viral capsid pentamer.

  11. Molecular characterization, sequence analysis and tissue expression of a porcine gene – MOSPD2

    Directory of Open Access Journals (Sweden)

    Yang Jie

    2017-01-01

    Full Text Available The full-length cDNA sequence of a porcine gene, MOSPD2, was amplified using the rapid amplification of cDNA ends method based on a pig expressed sequence tag sequence which was highly homologous to the coding sequence of the human MOSPD2 gene. Sequence prediction analysis revealed that the open reading frame of this gene encodes a protein of 491 amino acids that has high homology with the motile sperm domain-containing protein 2 (MOSPD2 of five species: horse (89%, human (90%, chimpanzee (89%, rhesus monkey (89% and mouse (85%; thus, it could be defined as a porcine MOSPD2 gene. This novel porcine gene was assigned GeneID: 100153601. This gene is structured in 15 exons and 14 introns as revealed by computer-assisted analysis. The phylogenetic analysis revealed that the porcine MOSPD2 gene has a closer genetic relationship with the MOSPD2 gene of horse. Tissue expression analysis indicated that the porcine MOSPD2 gene is generally and differentially expressed in the spleen, muscle, skin, kidney, lung, liver, fat and heart. Our experiment is the first to establish the primary foundation for further research on the porcine MOSPD2 gene.

  12. Complete Genome Sequence of the Gamma-Aminobutyric Acid-Producing Strain Streptococcus thermophilus APC151.

    Science.gov (United States)

    Linares, Daniel M; Arboleya, Silvia; Ross, R Paul; Stanton, Catherine

    2017-04-27

    Here is presented the whole-genome sequence of Streptococcus thermophilus APC151, isolated from a marine fish. This bacterium produces gamma-aminobutyric acid (GABA) in high yields and is biotechnologically suitable to produce naturally GABA-enriched biofunctional yogurt. Its complete genome comprises 2,097 genes and 1,839,134 nucleotides, with an average G+C content of 39.1%. Copyright © 2017 Linares et al.

  13. Genome sequence of the acid-tolerant Desulfovibrio sp. DV isolated from the sediments of a Pb-Zn mine tailings dam in the Chita region, Russia

    Directory of Open Access Journals (Sweden)

    Anastasiia Kovaliova

    2017-03-01

    Full Text Available Here we report the draft genome sequence of the acid-tolerant Desulfovibrio sp. DV isolated from the sediments of a Pb-Zn mine tailings dam in the Chita region, Russia. The draft genome has a size of 4.9 Mb and encodes multiple K+-transporters and proton-consuming decarboxylases. The phylogenetic analysis based on concatenated ribosomal proteins revealed that strain DV clusters together with the acid-tolerant Desulfovibrio sp. TomC and Desulfovibrio magneticus. The draft genome sequence and annotation have been deposited at GenBank under the accession number MLBG00000000.

  14. Complete amino acid sequences of the ribosomal proteins L25, L29 and L31 from the archaebacterium Halobacterium marismortui.

    Science.gov (United States)

    Hatakeyama, T; Kimura, M

    1988-03-15

    Ribosomal proteins were extracted from 50S ribosomal subunits of the archaebacterium Halobacterium marismortui by decreasing the concentration of Mg2+ and K+, and the proteins were separated and purified by ion-exchange column chromatography on DEAE-cellulose. Ten proteins were purified to homogeneity and three of these proteins were subjected to sequence analysis. The complete amino acid sequences of the ribosomal proteins L25, L29 and L31 were established by analyses of the peptides obtained by enzymatic digestion with trypsin, Staphylococcus aureus protease, chymotrypsin and lysylendopeptidase. Proteins L25, L29 and L31 consist of 84, 115 and 95 amino acid residues with the molecular masses of 9472 Da, 12293 Da and 10418 Da respectively. A comparison of their sequences with those of other large-ribosomal-subunit proteins from other organisms revealed that protein L25 from H. marismortui is homologous to protein L23 from Escherichia coli (34.6%), Bacillus stearothermophilus (41.8%), and tobacco chloroplasts (16.3%) as well as to protein L25 from yeast (38.0%). Proteins L29 and L31 do not appear to be homologous to any other ribosomal proteins whose structures are so far known.

  15. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    Science.gov (United States)

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  16. Sequence analysis of the aminoacylase-1 family. A new proposed signature for metalloexopeptidases.

    Science.gov (United States)

    Biagini, A; Puigserver, A

    2001-03-01

    The amino acid sequence analysis of the human and porcine aminoacylases-1, the carboxypeptidase S precursor from Saccharomyces cerevisiae, the succinyl-diaminopimelate desuccinylase from Escherichia coli, Haemophilus influenzae and Corynebacterium glutamicum, the acetylornithine deacetylase from Escherichia coli and Dictyostelium discoideum and the carboxypeptidase G(2) precursor from Pseudomonas strain, using the Basic Local Alignment Search Tool (BLAST) and the Position-Specific Iterated BLAST (PSI-BLAST), allowed us to suggest that all these enzymes, which share common functional and biochemical features, belong to the same structural family. The three amino acid blocks which were found to be highly conserved, using the CLUSTAL W program, could be assigned to the catalytic active site, based on the general three-dimensional structure of the carboxypeptidase G(2) from the Pseudomonas strain precursor. Six additional proteins with the same signature have been retrieved after performing two successive PSI-BLAST iterations using the sequence of the conserved motif, namely Lactobacillus delbrueckii aminoacyl-histidine dipeptidase, Streptomyces griseus aminopeptidase, Saccharomyces cerevisiae aminopeptidase Y precursor, two Bacillus stearothermophilus N-carbamyl-L-amino acid amidohydrolases and Pseudomonas sp. hydantoin utilization protein C. The three conserved amino acid motifs corresponded to the following blocks: (i) [S, G, A]-H-x-D-x-V; (ii) G-x-x-D; and (iii) x-E-E. This new sequence signature is clearly different from that commonly reported in the literature for proteins belonging to the ArgE/DapE/CPG2/YscS family.

  17. De novo transcriptome sequencing and sequence analysis of the malaria vector Anopheles sinensis (Diptera: Culicidae)

    Science.gov (United States)

    2014-01-01

    Background Anopheles sinensis is the major malaria vector in China and Southeast Asia. Vector control is one of the most effective measures to prevent malaria transmission. However, there is little transcriptome information available for the malaria vector. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to build a transcriptome dataset for functional genomics analysis by large-scale RNA sequencing (RNA-seq). Methods To provide a more comprehensive and complete transcriptome of An. sinensis, eggs, larvae, pupae, male adults and female adults RNA were pooled together for cDNA preparation, sequenced using the Illumina paired-end sequencing technology and assembled into unigenes. These unigenes were then analyzed in their genome mapping, functional annotation, homology, codon usage bias and simple sequence repeats (SSRs). Results Approximately 51.6 million clean reads were obtained, trimmed, and assembled into 38,504 unigenes with an average length of 571 bp, an N50 of 711 bp, and an average GC content 51.26%. Among them, 98.4% of unigenes could be mapped onto the reference genome, and 69% of unigenes could be annotated with known biological functions. Homology analysis identified certain numbers of An. sinensis unigenes that showed homology or being putative 1:1 orthologues with genomes of other Dipteran species. Codon usage bias was analyzed and 1,904 SSRs were detected, which will provide effective molecular markers for the population genetics of this species. Conclusions Our data and analysis provide the most comprehensive transcriptomic resource and characteristics currently available for An. sinensis, and will facilitate genetic, genomic studies, and further vector control of An. sinensis. PMID:25000941

  18. Analysis and comparison of fragrant gene sequence in some rice cultivars

    Directory of Open Access Journals (Sweden)

    Karami Noushafarin

    2016-01-01

    Full Text Available It is known that the fragrant trait in rice (Oryza sativa L. is largely controlled by fgr gene on chromosome 8 and it has been specified that the existence of an 8 bp deletion and three single nucleotide polymorphism (SNP in exon 7 is effective on this trait. In this study, sequence alignment analysis of fgr exon7 on chromosome 8 for 11 different fragrant and non-fragrant cultivars revealed that 5 aromatic rice cultivars carried 3 SNPs and 8 bp deletion in exon7 which terminates prematurely at a TAA stop codon. However, 5 of the non-aromatics showed a sequence identical to the published Nipponbare, being non-fragrant Japonica variety sequence. An exception among them was Bejar, which had 8 bp deletion and 3SNPs but it was non-aromatic. Sequencing can determine nucleotide alignment of a gene and give beneficial information about gene function. In silico prediction showed proteins sequences alignment of fgr gene for Khazar and Domsiah genotypes were different. Betaine aldehyde dehydrogenase complete enzyme belongs to Khazar non-fragrant genotype that has complete length and 503 amino acids while non-functional BADH2 enzyme for Domsiah fragrant genotype has 251 amino acids that result in accumulate 2-acetyl-1-pyrroline (2AP and produces aroma in fragrant genotypes.

  19. Molecular cloning, expression analysis and sequence prediction of ...

    African Journals Online (AJOL)

    CCAAT/enhancer-binding protein beta as an essential transcriptional factor, regulates the differentiation of adipocytes and the deposition of fat. Herein, we cloned the whole open reading frame (ORF) of bovine C/EBPβ gene and analyzed its putative protein structures via DNA cloning and sequence analysis. Then, the ...

  20. Multilocus sequence analysis of phytopathogenic species of the genus Streptomyces

    Science.gov (United States)

    The identification and classification of species within the genus Streptomyces is difficult because there are presently 576 validly described species and this number increases every year. The value of the application of multilocus sequence analysis scheme to the systematics of Streptomyces species h...

  1. Sequence symmetry analysis in pharmacovigilance and pharmacoepidemiologic studies

    DEFF Research Database (Denmark)

    Lai, Edward Chia Cheng; Pratt, Nicole; Hsieh, Cheng Yang

    2017-01-01

    Sequence symmetry analysis (SSA) is a method for detecting adverse drug events by utilizing computerized claims data. The method has been increasingly used to investigate safety concerns of medications and as a pharmacovigilance tool to identify unsuspected side effects. Validation studies have i...

  2. Molecular cloning and sequence analysis of growth hormone cDNA of Neotropical freshwater fish Pacu (Piaractus mesopotamicus

    Directory of Open Access Journals (Sweden)

    Janeth Silva Pinheiro

    2008-01-01

    Full Text Available RT-PCR was used for amplifying Piaractus mesopotamicus growth hormone (GH cDNA obtained from mRNA extracted from pituitary cells. The amplified fragment was cloned and the complete cDNA sequence was determined. The cloned cDNA encompassed a sequence of 543 nucleotides that encoded a polypeptide of 178 amino acids corresponding to mature P. mesopotamicus GH. Comparison with other GH sequences showed a gap of 10 amino acids localized in the N terminus of the putative polypeptide of P. mesopotamicus. This same gap was also observed in other members of the family. Neighbor-joining tree analysis with GH sequences from fishes belonging to different taxonomic groups placed the P. mesopotamicus GH within the Otophysi group. To our knowledge, this is the first GH sequence of a Neotropical characiform fish deposited in GenBank.

  3. DNAApp: a mobile application for sequencing data analysis.

    Science.gov (United States)

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-11-15

    There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. The Android version of DNAApp is available in Google Play Store as 'DNAApp', and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. samuelg@bii.a-star.edu.sg. © The Author 2014. Published by Oxford University Press.

  4. DNAApp: a mobile application for sequencing data analysis

    Science.gov (United States)

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-01-01

    Summary: There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. Availability and implementation: The Android version of DNAApp is available in Google Play Store as ‘DNAApp’, and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. Contact: samuelg@bii.a-star.edu.sg PMID:25095882

  5. Long-read sequencing data analysis for yeasts.

    Science.gov (United States)

    Yue, Jia-Xing; Liti, Gianni

    2018-06-01

    Long-read sequencing technologies have become increasingly popular due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast Saccharomyces cerevisiae has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here, we present a modular computational framework named long-read sequencing data analysis for yeasts (LRSDAY), the first one-stop solution that streamlines this process. Starting from the raw sequencing reads, LRSDAY can produce chromosome-level genome assembly and comprehensive genome annotation in a highly automated manner with minimal manual intervention, which is not possible using any alternative tool available to date. The annotated genomic features include centromeres, protein-coding genes, tRNAs, transposable elements (TEs), and telomere-associated elements. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable to virtually any eukaryotic organism. When applying LRSDAY to an S. cerevisiae strain, it takes ∼41 h to generate a complete and well-annotated genome from ∼100× Pacific Biosciences (PacBio) running the basic workflow with four threads. Basic experience working within the Linux command-line environment is recommended for carrying out the analysis using LRSDAY.

  6. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1

    Science.gov (United States)

    Rhee, Mun Su; Moritz, Brélan E.; Xie, Gary; Glavina del Rio, T.; Dalin, E.; Tice, H.; Bruce, D.; Goodwin, L.; Chertkov, O.; Brettin, T.; Han, C.; Detter, C.; Pitluck, S.; Land, Miriam L.; Patel, Milind; Ou, Mark; Harbrucker, Roberta; Ingram, Lonnie O.; Shanmugam, K. T.

    2011-01-01

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 °C and pH 5.0 and ferments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 °C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemicellulose. This bacterium is also considered as a potential probiotic. Complete genome sequence of a representative strain, B. coagulans strain 36D1, is presented and discussed. PMID:22675583

  7. Isolation and complete amino acid sequence of human thymopoietin and splenin

    International Nuclear Information System (INIS)

    Audhya, T.; Schlesinger, D.H.; Goldstein, G.

    1987-01-01

    Human thymopoietin and splenin were isolated from human thymus and spleen, respectively, by monitoring tissue fractionation with a bovine thymopoietin RIA cross-reactive with human thymopoietin and splenin. Bovine thymopoietin and splenin are 49-amino acid polypeptides that differ by only 2 amino acids at positions 34 and 43; the change at position 34 in the active-site region changes the receptor specificities and biological activities. The complete amino acid sequences of purified human thymopoietin and splenin were determined and shown to be 48-amino acid polypeptides differing at four positions. Ten amino acids, constant within each species for thymopoietin and splenin, differ between the human and bovine polypeptides. The pentapeptide active side of thymopoietin (residues 32-36) is constant between the human and bovine thymopoietins, but position 34 in the active site of splenin has changed from glutamic acid in bovine splenin to alanine in human splenin, accounting for the biological activity of the human but not the bovine splenin on the human T-cell line MOLT-4

  8. Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.

    Science.gov (United States)

    Tan, Yen Hock; Huang, He; Kihara, Daisuke

    2006-08-15

    Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.

  9. Complete motif analysis of sequence requirements for translation initiation at non-AUG start codons.

    Science.gov (United States)

    Diaz de Arce, Alexander J; Noderer, William L; Wang, Clifford L

    2018-01-25

    The initiation of mRNA translation from start codons other than AUG was previously believed to be rare and of relatively low impact. More recently, evidence has suggested that as much as half of all translation initiation utilizes non-AUG start codons, codons that deviate from AUG by a single base. Furthermore, non-AUG start codons have been shown to be involved in regulation of expression and disease etiology. Yet the ability to gauge expression based on the sequence of a translation initiation site (start codon and its flanking bases) has been limited. Here we have performed a comprehensive analysis of translation initiation sites that utilize non-AUG start codons. By combining genetic-reporter, cell-sorting, and high-throughput sequencing technologies, we have analyzed the expression associated with all possible variants of the -4 to +4 positions of non-AUG translation initiation site motifs. This complete motif analysis revealed that 1) with the right sequence context, certain non-AUG start codons can generate expression comparable to that of AUG start codons, 2) sequence context affects each non-AUG start codon differently, and 3) initiation at non-AUG start codons is highly sensitive to changes in the flanking sequences. Complete motif analysis has the potential to be a key tool for experimental and diagnostic genomics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  11. Cloning and sequence analysis of sucrose phosphate synthase gene from varieties of Pennisetum species.

    Science.gov (United States)

    Li, H C; Lu, H B; Yang, F Y; Liu, S J; Bai, C J; Zhang, Y W

    2015-03-31

    Sucrose phosphate synthase (SPS) is an enzyme used by higher plants for sucrose synthesis. In this study, three primer sets were designed on the basis of known SPS sequences from maize (GenBank: NM_001112224.1) and sugarcane (GenBank: JN584485.1), and five novel SPS genes were identified by RT-PCR from the genomes of Pennisetum spp (the hybrid P. americanum x P. purpureum, P. purpureum Schum., P. purpureum Schum. cv. Red, P. purpureum Schum. cv. Taiwan, and P. purpureum Schum. cv. Mott). The cloned sequences showed 99.9% identity and 80-88% similarity to the SPS sequences of other plants. The SPS gene of hybrid Pennisetum had one nucleotide and four amino acid polymorphisms compared to the other four germplasms, and cluster analysis was performed to assess genetic diversity in this species. Additional characterization of the SPS gene product can potentially allow Pennisetum to be exploited as a biofuel source.

  12. Porcine MYF6 gene: sequence, homology analysis, and variation in the promoter region.

    Science.gov (United States)

    Wyszyńska-Koko, J; Kurył, J

    2004-01-01

    MYF6 gene codes for the bHLH transcription factor belonging to MyoD family. Its expression accompanies the processes of differentiation and maturation of myotubes during embriogenesis and continues on a relatively high level after birth, affecting the muscle phenotype. The porcine MYF6 gene was amplified and sequenced and compared with MYF6 gene sequences of other species. The amino acid sequence was deduced and an interspecies homology analysis was performed. Myf-6 protein shows a high conservation among species of 99 and 97% identity when comparing pig with cow and human, respectively, and of 93% when comparing pig with mouse and rat. The single nucleotide polymorphism (SNP) was revealed within the promoter region, which appeared to be T --> C transition recognized by a MspI restriction enzyme.

  13. Analysis of Sequence Diagram Layout in Advanced UML Modelling Tools

    Directory of Open Access Journals (Sweden)

    Ņikiforova Oksana

    2016-05-01

    Full Text Available System modelling using Unified Modelling Language (UML is the task that should be solved for software development. The more complex software becomes the higher requirements are stated to demonstrate the system to be developed, especially in its dynamic aspect, which in UML is offered by a sequence diagram. To solve this task, the main attention is devoted to the graphical presentation of the system, where diagram layout plays the central role in information perception. The UML sequence diagram due to its specific structure is selected for a deeper analysis on the elements’ layout. The authors research represents the abilities of modern UML modelling tools to offer automatic layout of the UML sequence diagram and analyse them according to criteria required for the diagram perception.

  14. Network clustering coefficient approach to DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gerhardt, Guenther J.L. [Universidade Federal do Rio Grande do Sul-Hospital de Clinicas de Porto Alegre, Rua Ramiro Barcelos 2350/sala 2040/90035-003 Porto Alegre (Brazil); Departamento de Fisica e Quimica da Universidade de Caxias do Sul, Rua Francisco Getulio Vargas 1130, 95001-970 Caxias do Sul (Brazil); Lemke, Ney [Programa Interdisciplinar em Computacao Aplicada, Unisinos, Av. Unisinos, 950, 93022-000 Sao Leopoldo, RS (Brazil); Corso, Gilberto [Departamento de Biofisica e Farmacologia, Centro de Biociencias, Universidade Federal do Rio Grande do Norte, Campus Universitario, 59072 970 Natal, RN (Brazil)]. E-mail: corso@dfte.ufrn.br

    2006-05-15

    In this work we propose an alternative DNA sequence analysis tool based on graph theoretical concepts. The methodology investigates the path topology of an organism genome through a triplet network. In this network, triplets in DNA sequence are vertices and two vertices are connected if they occur juxtaposed on the genome. We characterize this network topology by measuring the clustering coefficient. We test our methodology against two main bias: the guanine-cytosine (GC) content and 3-bp (base pairs) periodicity of DNA sequence. We perform the test constructing random networks with variable GC content and imposed 3-bp periodicity. A test group of some organisms is constructed and we investigate the methodology in the light of the constructed random networks. We conclude that the clustering coefficient is a valuable tool since it gives information that is not trivially contained in 3-bp periodicity neither in the variable GC content.

  15. Evolutionary analysis of hepatitis C virus gene sequences from 1953

    Science.gov (United States)

    Gray, Rebecca R.; Tanaka, Yasuhito; Takebe, Yutaka; Magiorkinis, Gkikas; Buskell, Zelma; Seeff, Leonard; Alter, Harvey J.; Pybus, Oliver G.

    2013-01-01

    Reconstructing the transmission history of infectious diseases in the absence of medical or epidemiological records often relies on the evolutionary analysis of pathogen genetic sequences. The precision of evolutionary estimates of epidemic history can be increased by the inclusion of sequences derived from ‘archived’ samples that are genetically distinct from contemporary strains. Historical sequences are especially valuable for viral pathogens that circulated for many years before being formally identified, including HIV and the hepatitis C virus (HCV). However, surprisingly few HCV isolates sampled before discovery of the virus in 1989 are currently available. Here, we report and analyse two HCV subgenomic sequences obtained from infected individuals in 1953, which represent the oldest genetic evidence of HCV infection. The pairwise genetic diversity between the two sequences indicates a substantial period of HCV transmission prior to the 1950s, and their inclusion in evolutionary analyses provides new estimates of the common ancestor of HCV in the USA. To explore and validate the evolutionary information provided by these sequences, we used a new phylogenetic molecular clock method to estimate the date of sampling of the archived strains, plus the dates of four more contemporary reference genomes. Despite the short fragments available, we conclude that the archived sequences are consistent with a proposed sampling date of 1953, although statistical uncertainty is large. Our cross-validation analyses suggest that the bias and low statistical power observed here likely arise from a combination of high evolutionary rate heterogeneity and an unstructured, star-like phylogeny. We expect that attempts to date other historical viruses under similar circumstances will meet similar problems. PMID:23938759

  16. Cloning and sequence analysis of serine proteinase of Gloydius ussuriensis venom gland

    International Nuclear Information System (INIS)

    Sun Dejun; Liu Shanshan; Yang Chunwei; Zhao Yizhuo; Chang Shufang; Yan Weiqun

    2005-01-01

    Objective: To construct a cDNA library by using mRNA from Gloydius ussuriensis (G. Ussuriensis) venom gland, to clone and analyze serine proteinase gene from the cDNA library. Methods: Total RNA was isolated from venom gland of G. ussuriensis, mRNA was purified by using mRNA isolation Kit. The whole length cDNA was synthesized by means of smart cDNA synthesis strategy, and amplified by long distance PCR procedure, lately cDAN was cloned into vector pBluescrip-sk. The recombinant cDNA was transformed into E. coli DH5α. The cDNA of serine proteinase gene in the venom gland of G. ussuriensis was detected and amplified using the in situ hybridization. The cDNA fragment was inserted into pGEMT vector, cloned and its nucleotide sequence was determined. Results: The capacity of cDNA library of venom gland was above 2.3 x 10 6 . Its open reading frame was composed of 702 nucleotides and coded a protein pre-zymogen of 234 amino acids. It contained 12 cysteine residues. The sequence analysis indicated that the deduced amino acid sequence of the cDNA fragment shared high identity with the thrombin-like enzyme genes of other snakes in the GenBank. the query sequence exhibited strong amino acid sequence homology of 85% to the serine proteas of T. gramineus, thrombin-like serine proteinase I of D. acutus and serine protease catroxase II of C. atrox respectively. Based on the amino acid sequences of other thrombin-like enzymes, the catalytic residues and disulfide bridges of this thrombin-like enzyme were deduced as follows: catalytic residues, His 41 , Asp 86 , Ser 180 ; and six disulfide bridges Cys 7 -Cys 139 , Cys 26 -Cys 42 , Cys 74 -Cys 232 , Cys 118 -Cys 186 , Cys 150 -Cys 165 , Cys 176 -Cys 201 . Conclusion: The capacity of cDNA library of venom gland is above 2.3 x 10 6 , overtop the level of 10 5 capicity. The constructed cDNA library of G. ussuriensis venom gland would be helpful platform to detect new target genes and further gene manipulate. The cloned serine

  17. Using SQL Databases for Sequence Similarity Searching and Analysis.

    Science.gov (United States)

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  18. Nucleotide sequence analysis of the Legionella micdadei mip gene, encoding a 30-kilodalton analog of the Legionella pneumophila Mip protein

    DEFF Research Database (Denmark)

    Bangsborg, Jette Marie; Cianciotto, N P; Hindersson, P

    1991-01-01

    After the demonstration of analogs of the Legionella pneumophila macrophage infectivity potentiator (Mip) protein in other Legionella species, the Legionella micdadei mip gene was cloned and expressed in Escherichia coli. DNA sequence analysis of the L. micdadei mip gene contained in the plasmid p...... homology with the mip-like genes of several Legionella species. Furthermore, amino acid sequence comparisons revealed significant homology to two eukaryotic proteins with isomerase activity (FK506-binding proteins)....

  19. Now And Next Generation Sequencing Techniques: Future of Sequence Analysis using Cloud Computing

    Directory of Open Access Journals (Sweden)

    Radhe Shyam Thakur

    2012-12-01

    Full Text Available Advancements in the field of sequencing techniques resulted in the huge sequenced data to be produced at a very faster rate. It is going cumbersome for the datacenter to maintain the databases. Data mining and sequence analysis approaches needs to analyze the databases several times to reach any efficient conclusion. To cope with such overburden on computer resources and to reach efficient and effective conclusions quickly, the virtualization of the resources and computation on pay as you go concept was introduced and termed as cloud computing. The datacenter’s hardware and software is collectively known as cloud which when available publicly is termed as public cloud. The datacenter’s resources are provided in a virtual mode to the clients via a service provider like Amazon, Google and Joyent which charges on pay as you go manner. The workload is shifted to the provider which is maintained by the required hardware and software upgradation. The service provider manages it by upgrading the requirements in the virtual mode. Basically a virtual environment is created according to the need of the user by taking permission from datacenter via internet, the task is performed and the environment is deleted after the task is over. In this discussion, we are focusing on the basics of cloud computing, the prerequisites and overall working of clouds. Furthermore, briefly the applications of cloud computing in biological systems, especially in comparative genomics, genome informatics and SNP detection with reference to traditional workflow are discussed.

  20. Now and next-generation sequencing techniques: future of sequence analysis using cloud computing.

    Science.gov (United States)

    Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav

    2012-01-01

    Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed "cloud computing") has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows.

  1. SEQUENCING AND SEQUENCE ANALYSIS OF MYOSTATIN GENE IN THE EXON 1 OF THE CAMEL (CAMELUS DROMEDARIUS

    Directory of Open Access Journals (Sweden)

    M. G. SHAH, A. S. QURESHI1, M. REISSMANN2 AND H. J. SCHWARTZ3

    2006-10-01

    Full Text Available Myostatin, also called growth differentiation factor-8 (GDF-8, is a member of the mammalian growth transforming family (TGF-beta superfamily, which is expressed specifically in developing an adult skeletal muscle. Muscular hypertrophy allele (mh allele in the double muscle breeds involved mutation within the myostatin gene. Genomic DNA was isolated from the camel hair using NucleoSpin Tissue kit. Two animals of each of the six breeds namely, Marecha, Dhatti, Larri, Kohi, Sakrai and Cambelpuri were used for sequencing. For PCR amplification of the gene, a primer pair was designed from homolog regions of already published sequences of farm animals from GenBank. Results showed that camel myostatin possessed more than 90% homology with that of cattle, sheep and pig. Camel formed separate cluster from the pig in spite of having high homology (98% and showed 94% homology with cattle and sheep as reported in literature. Sequence analysis of the PCR amplified part of exon 1 (256 bp of the camel myostatin was identical among six camel breeds.

  2. Functional analysis of protein N-myristoylation: Metabolic labeling studies using three oxygen-substituted analogs of myristic acid and cultured mammalian cells provide evidence for protein-sequence-specific incorporation and analog-specific redistribution

    International Nuclear Information System (INIS)

    Johnson, D.R.; Heuckeroth, R.O.; Gordon, J.I.; Cox, A.D.; Solski, P.A.; Buss, J.E.; Devadas, B.; Adams, S.P.; Leimgruber, R.M.

    1990-01-01

    Covalent attachment of myristic acid (C14:0) to the NH 2 -terminal glycine residue of a number of cellular, viral, and oncogene-encoded proteins is essential for full expression of their biological function. Substitution of oxygen for methylene groups in this fatty acid does not produce a significant change in chain length or stereochemistry but does result in a reduction in hydrophobicity. These heteroatom-containing analogs serve as alternative substrates for mammalian myristoyl-CoA: protein N-myristoyltransferase and offer the opportunity to explore structure/function relationships of myristate in N-myristoyltransferase proteins. The authors have synthesized three tritiated analogs of myristate with oxygen substituted for methylene groups at C6, C11, and C13. Metabolic labeling studies were performed with these compounds and (i) a murine myocyte cell line (BC 3 H1), (ii) a rat fibroblast cell that produces p60 v-src (3Xsrc), or (iii) NIH 3T3 cells that have been engineered to express a fusion protein consisting of an 11-residue myristoylation signal from the Rasheed sarcoma virus (RaSV) gag protein linked to c-Ha-ras with a Cys → Ser-186 mutation. Two-dimensional gel electrophoresis of membrane and soluble fractions prepared from cell lysates revealed different patterns of incorporation of the analogs into cellular N-myristoyl proteins. The demonstration that these analogs differ in the extent to which they are incorporated and in their ability to cause redistribution of any single protein suggests that they may also have sufficient selectivity to be of potential therapeutic value

  3. An Imaging And Graphics Workstation For Image Sequence Analysis

    Science.gov (United States)

    Mostafavi, Hassan

    1990-01-01

    This paper describes an application-specific engineering workstation designed and developed to analyze imagery sequences from a variety of sources. The system combines the software and hardware environment of the modern graphic-oriented workstations with the digital image acquisition, processing and display techniques. The objective is to achieve automation and high throughput for many data reduction tasks involving metric studies of image sequences. The applications of such an automated data reduction tool include analysis of the trajectory and attitude of aircraft, missile, stores and other flying objects in various flight regimes including launch and separation as well as regular flight maneuvers. The workstation can also be used in an on-line or off-line mode to study three-dimensional motion of aircraft models in simulated flight conditions such as wind tunnels. The system's key features are: 1) Acquisition and storage of image sequences by digitizing real-time video or frames from a film strip; 2) computer-controlled movie loop playback, slow motion and freeze frame display combined with digital image sharpening, noise reduction, contrast enhancement and interactive image magnification; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored image sequence; 4) automatic and manual field-of-view and spatial calibration; 5) image sequence data base generation and management, including the measurement data products; 6) off-line analysis software for trajectory plotting and statistical analysis; 7) model-based estimation and tracking of object attitude angles; and 8) interface to a variety of video players and film transport sub-systems.

  4. Multilocus sequence analysis of Treponema denticola strains of diverse origin

    Directory of Open Access Journals (Sweden)

    Mo Sisu

    2013-02-01

    Full Text Available Abstract Background The oral spirochete bacterium Treponema denticola is associated with both the incidence and severity of periodontal disease. Although the biological or phenotypic properties of a significant number of T. denticola isolates have been reported in the literature, their genetic diversity or phylogeny has never been systematically investigated. Here, we describe a multilocus sequence analysis (MLSA of 20 of the most highly studied reference strains and clinical isolates of T. denticola; which were originally isolated from subgingival plaque samples taken from subjects from China, Japan, the Netherlands, Canada and the USA. Results The sequences of the 16S ribosomal RNA gene, and 7 conserved protein-encoding genes (flaA, recA, pyrH, ppnK, dnaN, era and radC were successfully determined for each strain. Sequence data was analyzed using a variety of bioinformatic and phylogenetic software tools. We found no evidence of positive selection or DNA recombination within the protein-encoding genes, where levels of intraspecific sequence polymorphism varied from 18.8% (flaA to 8.9% (dnaN. Phylogenetic analysis of the concatenated protein-encoding gene sequence data (ca. 6,513 nucleotides for each strain using Bayesian and maximum likelihood approaches indicated that the T. denticola strains were monophyletic, and formed 6 well-defined clades. All analyzed T. denticola strains appeared to have a genetic origin distinct from that of ‘Treponema vincentii’ or Treponema pallidum. No specific geographical relationships could be established; but several strains isolated from different continents appear to be closely related at the genetic level. Conclusions Our analyses indicate that previous biological and biophysical investigations have predominantly focused on a subset of T. denticola strains with a relatively narrow range of genetic diversity. Our methodology and results establish a genetic framework for the discrimination and phylogenetic

  5. Sirius PSB: a generic system for analysis of biological sequences.

    Science.gov (United States)

    Koh, Chuan Hock; Lin, Sharene; Jedd, Gregory; Wong, Limsoon

    2009-12-01

    Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models - one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at: http://compbio.ddns.comp.nus.edu.sg/~sirius.

  6. sRNAnalyzer-a flexible and customizable small RNA sequencing data analysis pipeline.

    Science.gov (United States)

    Wu, Xiaogang; Kim, Taek-Kyun; Baxter, David; Scherler, Kelsey; Gordon, Aaron; Fong, Olivia; Etheridge, Alton; Galas, David J; Wang, Kai

    2017-12-01

    Although many tools have been developed to analyze small RNA sequencing (sRNA-Seq) data, it remains challenging to accurately analyze the small RNA population, mainly due to multiple sequence ID assignment caused by short read length. Additional issues in small RNA analysis include low consistency of microRNA (miRNA) measurement results across different platforms, miRNA mapping associated with miRNA sequence variation (isomiR) and RNA editing, and the origin of those unmapped reads after screening against all endogenous reference sequence databases. To address these issues, we built a comprehensive and customizable sRNA-Seq data analysis pipeline-sRNAnalyzer, which enables: (i) comprehensive miRNA profiling strategies to better handle isomiRs and summarization based on each nucleotide position to detect potential SNPs in miRNAs, (ii) different sequence mapping result assignment approaches to simulate results from microarray/qRT-PCR platforms and a local probabilistic model to assign mapping results to the most-likely IDs, (iii) comprehensive ribosomal RNA filtering for accurate mapping of exogenous RNAs and summarization based on taxonomy annotation. We evaluated our pipeline on both artificial samples (including synthetic miRNA and Escherichia coli cultures) and biological samples (human tissue and plasma). sRNAnalyzer is implemented in Perl and available at: http://srnanalyzer.systemsbiology.net/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs

    Directory of Open Access Journals (Sweden)

    Ruan Jishou

    2007-04-01

    Full Text Available Abstract Background Traditionally, it is believed that the native structure of a protein corresponds to a global minimum of its free energy. However, with the growing number of known tertiary (3D protein structures, researchers have discovered that some proteins can alter their structures in response to a change in their surroundings or with the help of other proteins or ligands. Such structural shifts play a crucial role with respect to the protein function. To this end, we propose a machine learning method for the prediction of the flexible/rigid regions of proteins (referred to as FlexRP; the method is based on a novel sequence representation and feature selection. Knowledge of the flexible/rigid regions may provide insights into the protein folding process and the 3D structure prediction. Results The flexible/rigid regions were defined based on a dataset, which includes protein sequences that have multiple experimental structures, and which was previously used to study the structural conservation of proteins. Sequences drawn from this dataset were represented based on feature sets that were proposed in prior research, such as PSI-BLAST profiles, composition vector and binary sequence encoding, and a newly proposed representation based on frequencies of k-spaced amino acid pairs. These representations were processed by feature selection to reduce the dimensionality. Several machine learning methods for the prediction of flexible/rigid regions and two recently proposed methods for the prediction of conformational changes and unstructured regions were compared with the proposed method. The FlexRP method, which applies Logistic Regression and collocation-based representation with 95 features, obtained 79.5% accuracy. The two runner-up methods, which apply the same sequence representation and Support Vector Machines (SVM and Naïve Bayes classifiers, obtained 79.2% and 78.4% accuracy, respectively. The remaining considered methods are

  8. [Sequence analysis of LEAFY homologous gene from Dendrobium moniliforme and application for identification of medicinal Dendrobium].

    Science.gov (United States)

    Xing, Wen-Rui; Hou, Bei-Wei; Guan, Jing-Jiao; Luo, Jing; Ding, Xiao-Yu

    2013-04-01

    The LEAFY (LFY) homologous gene of Dendrobium moniliforme (L.) Sw. was cloned by new primers which were designed based on the conservative region of known sequences of orchid LEAFY gene. Partial LFY homologous gene was cloned by common PCR, then we got the complete LFY homologous gene Den LFY by Tail-PCR. The complete sequence of DenLFY gene was 3 575 bp which contained three exons and two introns. Using BLAST method, comparison analysis among the exon of LFY homologous gene indicted that the DenLFY gene had high identity with orchids LFY homologous, including the related fragment of PhalLFY (84%) in Phalaenopsis hybrid cultivar, LFY homologous gene in Oncidium (90%) and in other orchid (over 80%). Using MP analysis, Dendrobium is found to be the sister to Oncidium and Phalaenopsis. Homologous analysis demonstrated that the C-terminal amino acids were highly conserved. When the exons and introns were separately considered, exons and the sequence of amino acid were good markers for the function research of DenLFY gene. The second intron can be used in authentication research of Dendrobium based on the length polymorphism between Dendrobium moniliforme and Dendrobium officinale.

  9. Molecular characterization of Giardia psittaci by multilocus sequence analysis.

    Science.gov (United States)

    Abe, Niichiro; Makino, Ikuko; Kojima, Atsushi

    2012-12-01

    Multilocus sequence analyses targeting small subunit ribosomal DNA (SSU rDNA), elongation factor 1 alpha (ef1α), glutamate dehydrogenase (gdh), and beta giardin (β-giardin) were performed on Giardia psittaci isolates from three Budgerigars (Melopsittacus undulates) and four Barred parakeets (Bolborhynchus lineola) kept in individual households or imported from overseas. Nucleotide differences and phylogenetic analyses at four loci indicate the distinction of G. psittaci from the other known Giardia species: Giardia muris, Giardia microti, Giardia ardeae, and Giardia duodenalis assemblages. Furthermore, G. psittaci was related more closely to G. duodenalis than to the other known Giardia species, except for G. microti. Conflicting signals regarded as "double peaks" were found at the same nucleotide positions of the ef1α in all isolates. However, the sequences of the other three loci, including gdh and β-giardin, which are known to be highly variable, from all isolates were also mutually identical at every locus. They showed no double peaks. These results suggest that double peaks found in the ef1α sequences are caused not by mixed infection with genetically different G. psittaci isolates but by allelic sequence heterogeneity (ASH), which is observed in diplomonad lineages including G. duodenalis. No sequence difference was found in any G. psittaci isolates at the gdh and β-giardin, suggesting that G. psittaci is indeed not more diverse genetically than other Giardia species. This report is the first to provide evidence related to the genetic characteristics of G. psittaci obtained using multilocus sequence analysis. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. Identities among actin-encoding cDNAs of the Nile tilapia (Oreochromis niloticus and other eukaryote species revealed by nucleotide and amino acid sequence analyses

    Directory of Open Access Journals (Sweden)

    Andréia B. Poletto

    2008-01-01

    Full Text Available Actin-encoding cDNAs of Nile tilapia (Oreochromis niloticus were isolated by RT-PCR using total RNA samples of different tissues and further characterized by nucleotide sequencing and in silico amino acid (aa sequence analysis. Comparisons among the actin gene sequences of O. niloticus and those of other species evidenced that the isolated genes present a high similarity to other fish and other vertebrate actin genes. The highest nucleotide resemblance was observed between O. niloticus and O. mossambicus a-actin and b-actin genes. Analysis of the predicted aa sequences revealed two distinct types of cytoplasmic actins, one cardiac muscle actin type and one skeletal muscle actin type that were expressed in different tissues of Nile tilapia. The evolutionary relationships between the Nile tilapia actin genes and diverse other organisms is discussed.

  11. Rapid identification of lettuce seed germination mutants by bulked segregant analysis and whole genome sequencing.

    Science.gov (United States)

    Huo, Heqiang; Henry, Isabelle M; Coppoolse, Eric R; Verhoef-Post, Miriam; Schut, Johan W; de Rooij, Han; Vogelaar, Aat; Joosen, Ronny V L; Woudenberg, Leo; Comai, Luca; Bradford, Kent J

    2016-11-01

    Lettuce (Lactuca sativa) seeds exhibit thermoinhibition, or failure to complete germination when imbibed at warm temperatures. Chemical mutagenesis was employed to develop lettuce lines that exhibit germination thermotolerance. Two independent thermotolerant lettuce seed mutant lines, TG01 and TG10, were generated through ethyl methanesulfonate mutagenesis. Genetic and physiological analyses indicated that these two mutations were allelic and recessive. To identify the causal gene(s), we applied bulked segregant analysis by whole genome sequencing. For each mutant, bulked DNA samples of segregating thermotolerant (mutant) seeds were sequenced and analyzed for homozygous single-nucleotide polymorphisms. Two independent candidate mutations were identified at different physical positions in the zeaxanthin epoxidase gene (ABSCISIC ACID DEFICIENT 1/ZEAXANTHIN EPOXIDASE, or ABA1/ZEP) in TG01 and TG10. The mutation in TG01 caused an amino acid replacement, whereas the mutation in TG10 resulted in alternative mRNA splicing. Endogenous abscisic acid contents were reduced in both mutants, and expression of the ABA1 gene from wild-type lettuce under its own promoter fully complemented the TG01 mutant. Conventional genetic mapping confirmed that the causal mutations were located near the ZEP/ABA1 gene, but the bulked segregant whole genome sequencing approach more efficiently identified the specific gene responsible for the phenotype. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.

  12. JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms.

    Science.gov (United States)

    Petrillo, Mauro; Angers-Loustau, Alexandre; Henriksson, Peter; Bonfini, Laura; Patak, Alex; Kreysa, Joachim

    2015-01-01

    The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico determination of PCR amplification with reference methods for GMO analysis. The European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) provides these methods in the GMOMETHODS database to support enforcement of EU legislation and GM food/feed control. The JRC GMO-Amplicons database is composed of more than 240 000 amplicons, which can be easily accessed and screened through a web interface. To our knowledge, this is the first attempt at pooling and collecting publicly available sequences related to GMOs in food and feed. The JRC GMO-Amplicons supports control laboratories in the design and assessment of GMO methods, providing inter-alia in silico prediction of primers specificity and GM targets coverage. The new tool can assist the laboratories in the analysis of complex issues, such as the detection and identification of unauthorized GMOs. Notably, the JRC GMO-Amplicons database allows the retrieval and characterization of GMO-related sequences included in patents documentation. Finally, it can help annotating poorly described GM sequences and identifying new relevant GMO-related sequences in public databases. The JRC GMO-Amplicons is freely accessible through a web-based portal that is hosted on the EU-RL GMFF website. Database URL: http://gmo-crl.jrc.ec.europa.eu/jrcgmoamplicons/. © The Author(s) 2015. Published by Oxford University Press.

  13. Fast computational methods for predicting protein structure from primary amino acid sequence

    Science.gov (United States)

    Agarwal, Pratul Kumar [Knoxville, TN

    2011-07-19

    The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.

  14. Sequence and phylogenetic analysis of chicken anaemia virus obtained from backyard and commercial chickens in Nigeria : research communication

    Directory of Open Access Journals (Sweden)

    D.O. Oluwayelu

    2008-09-01

    Full Text Available This work reports the first molecular analysis study of chicken anaemia virus (CAV in backyard chickens in Africa using molecular cloning and sequence analysis to characterize CAV strains obtained from commercial chickens and Nigerian backyard chickens. Partial VP1 gene sequences were determined for three CAVs from commercial chickens and for six CAV variants present in samples from a backyard chicken. Multiple alignment analysis revealed that the 6 % and 4 % nucleotide diversity obtained respectively for the commercial and backyard chicken strains translated to only 2 % amino acid diversity for each breed. Overall, the amino acid composition of Nigerian CAVs was found to be highly conserved. Since the partial VP1 gene sequence of two backyard chicken cloned CAV strains (NGR/Cl-8 and NGR/Cl-9 were almost identical and evolutionarily closely related to the commercial chicken strains NGR-1, and NGR-4 and NGR-5, respectively, we concluded that CAV infections had crossed the farm boundary.

  15. CPSS: a computational platform for the analysis of small RNA deep sequencing data.

    Science.gov (United States)

    Zhang, Yuanwei; Xu, Bo; Yang, Yifan; Ban, Rongjun; Zhang, Huan; Jiang, Xiaohua; Cooke, Howard J; Xue, Yu; Shi, Qinghua

    2012-07-15

    Next generation sequencing (NGS) techniques have been widely used to document the small ribonucleic acids (RNAs) implicated in a variety of biological, physiological and pathological processes. An integrated computational tool is needed for handling and analysing the enormous datasets from small RNA deep sequencing approach. Herein, we present a novel web server, CPSS (a computational platform for the analysis of small RNA deep sequencing data), designed to completely annotate and functionally analyse microRNAs (miRNAs) from NGS data on one platform with a single data submission. Small RNA NGS data can be submitted to this server with analysis results being returned in two parts: (i) annotation analysis, which provides the most comprehensive analysis for small RNA transcriptome, including length distribution and genome mapping of sequencing reads, small RNA quantification, prediction of novel miRNAs, identification of differentially expressed miRNAs, piwi-interacting RNAs and other non-coding small RNAs between paired samples and detection of miRNA editing and modifications and (ii) functional analysis, including prediction of miRNA targeted genes by multiple tools, enrichment of gene ontology terms, signalling pathway involvement and protein-protein interaction analysis for the predicted genes. CPSS, a ready-to-use web server that integrates most functions of currently available bioinformatics tools, provides all the information wanted by the majority of users from small RNA deep sequencing datasets. CPSS is implemented in PHP/PERL+MySQL+R and can be freely accessed at http://mcg.ustc.edu.cn/db/cpss/index.html or http://mcg.ustc.edu.cn/sdap1/cpss/index.html.

  16. A putative carbohydrate-binding domain of the lactose-binding Cytisus sessilifolius anti-H(O) lectin has a similar amino acid sequence to that of the L-fucose-binding Ulex europaeus anti-H(O) lectin.

    Science.gov (United States)

    Konami, Y; Yamamoto, K; Osawa, T; Irimura, T

    1995-04-01

    The complete amino acid sequence of a lactose-binding Cytisus sessilifolius anti-H(O) lectin II (CSA-II) was determined using a protein sequencer. After digestion of CSA-II with endoproteinase Lys-C or Asp-N, the resulting peptides were purified by reversed-phase high performance liquid chromatography (HPLC) and then subjected to sequence analysis. Comparison of the complete amino acid sequence of CSA-II with the sequences of other leguminous seed lectins revealed regions of extensive homology. The amino acid sequence of a putative carbohydrate-binding domain of CSA-II was found to be similar to those of several anti-H(O) leguminous lectins, especially to that of the L-fucose-binding Ulex europaeus lectin I (UEA-I).

  17. Detection and quantification of Plasmodium falciparum in blood samples using quantitative nucleic acid sequence-based amplification

    NARCIS (Netherlands)

    Schoone, G. J.; Oskam, L.; Kroon, N. C.; Schallig, H. D.; Omar, S. A.

    2000-01-01

    A quantitative nucleic acid sequence-based amplification (QT-NASBA) assay for the detection of Plasmodium parasites has been developed. Primers and probes were selected on the basis of the sequence of the small-subunit rRNA gene. Quantification was achieved by coamplification of the RNA in the

  18. Nucleotide and deduced amino acid sequence of the envelope gene of the Vasilchenko strain of TBE virus; comparison with other flaviviruses.

    Science.gov (United States)

    Gritsun, T S; Frolova, T V; Pogodina, V V; Lashkevich, V A; Venugopal, K; Gould, E A

    1993-02-01

    A strain of tick-borne encephalitis virus known as Vasilchenko (Vs) exhibits relatively low virulence characteristics in monkeys, Syrian hamsters and humans. The gene encoding the envelope glycoprotein of this virus was cloned and sequenced. Alignment of the sequence with those of other known tick-borne flaviviruses and identification of the recognised amino acid genetic marker EHLPTA confirmed its identity as a member of the TBE complex. However, Vs virus was distinguishable from eastern and western tick-borne serotypes by the presence of the sequence AQQ at amino acid positions 232-234 and also by the presence of other specific amino acid substitutions which may be genetic markers for these viruses and could determine their pathogenetic characteristics. When compared with other tick-borne flaviviruses, Vs virus had 12 unique amino acid substitutions including an additional potential glycosylation site at position (315-317). The Vs virus strain shared closest nucleotide and amino acid homology (84.5% and 95.5% respectively) with western and far eastern strains of tick-borne encephalitis virus. Comparison with the far eastern serotype of tick-borne encephalitis virus, by cross-immunoelectrophoresis of Vs virions and PAGE analysis of the extracted virion proteins, revealed differences in surface charge and virus stability that may account for the different virulence characteristics of Vs virus. These results support and enlarge upon previous data obtained from molecular and serological analysis.

  19. Multiple amino acid sequence alignment nitrogenase component 1: insights into phylogenetics and structure-function relationships.

    Directory of Open Access Journals (Sweden)

    James B Howard

    Full Text Available Amino acid residues critical for a protein's structure-function are retained by natural selection and these residues are identified by the level of variance in co-aligned homologous protein sequences. The relevant residues in the nitrogen fixation Component 1 α- and β-subunits were identified by the alignment of 95 protein sequences. Proteins were included from species encompassing multiple microbial phyla and diverse ecological niches as well as the nitrogen fixation genotypes, anf, nif, and vnf, which encode proteins associated with cofactors differing at one metal site. After adjusting for differences in sequence length, insertions, and deletions, the remaining >85% of the sequence co-aligned the subunits from the three genotypes. Six Groups, designated Anf, Vnf , and Nif I-IV, were assigned based upon genetic origin, sequence adjustments, and conserved residues. Both subunits subdivided into the same groups. Invariant and single variant residues were identified and were defined as "core" for nitrogenase function. Three species in Group Nif-III, Candidatus Desulforudis audaxviator, Desulfotomaculum kuznetsovii, and Thermodesulfatator indicus, were found to have a seleno-cysteine that replaces one cysteinyl ligand of the 8Fe:7S, P-cluster. Subsets of invariant residues, limited to individual groups, were identified; these unique residues help identify the gene of origin (anf, nif, or vnf yet should not be considered diagnostic of the metal content of associated cofactors. Fourteen of the 19 residues that compose the cofactor pocket are invariant or single variant; the other five residues are highly variable but do not correlate with the putative metal content of the cofactor. The variable residues are clustered on one side of the cofactor, away from other functional centers in the three dimensional structure. Many of the invariant and single variant residues were not previously recognized as potentially critical and their identification

  20. Environmental impact analysis for the main accidental sequences of ignitor

    International Nuclear Information System (INIS)

    Carpignano, A.; Francabandiera, S.; Vella, R.; Zucchetti, M.

    1996-01-01

    A safety analysis study has been applied to the Ignitor machine using Probabilistic Safety Assessment. The main initiating events have been identified, and accident sequences have been studied by means of traditional methods such as Failure Mode and Effect Analysis (FMEA), Fault Trees (FT) and Event Trees (ET). The consequences of the radioactive environmental releases have been assessed in terms of Effective Dose Equivalent (EDEs) to the Most Exposed Individuals (MEI) of the chosen site, by means of a population dose code. Results point out the low enviromental impact of the machine. 13 refs., 1 fig., 3 tabs

  1. Novel Biochip Platform for Nucleic Acid Analysis

    Directory of Open Access Journals (Sweden)

    Juan J. Diaz-Mochon

    2012-06-01

    Full Text Available This manuscript describes the use of a novel biochip platform for the rapid analysis/identification of nucleic acids, including DNA and microRNAs, with very high specificity. This approach combines a unique dynamic chemistry approach for nucleic acid testing and analysis developed by DestiNA Genomics with the STMicroelectronics In-Check platform, which comprises two microfluidic optimized and independent PCR reaction chambers, and a sequential microarray area for nucleic acid capture and identification by fluorescence. With its compact bench-top “footprint” requiring only a single technician to operate, the biochip system promises to transform and expand routine clinical diagnostic testing and screening for genetic diseases, cancers, drug toxicology and heart disease, as well as employment in the emerging companion diagnostics market.

  2. Sequence and transcription analysis of the human cytomegalovirus DNA polymerase gene

    International Nuclear Information System (INIS)

    Kouzarides, T.; Bankier, A.T.; Satchwell, S.C.; Weston, K.; Tomlinson, P.; Barrell, B.G.

    1987-01-01

    DNA sequence analysis has revealed that the gene coding for the human cytomegalovirus (HCMV) DNA polymerase is present within the long unique region of the virus genome. Identification is based on extensive amino acid homology between the predicted HCMV open reading frame HFLF2 and the DNA polymerase of herpes simplex virus type 1. The authors present here a 5280 base-pair DNA sequence containing the HCMV pol gene, along with the analysis of transcripts encoded within this region. Since HCMV pol also shows homology to the predicted Epstein-Barr virus pol, they were able to analyze the extent of homology between the DNA polymerases of three distantly related herpes viruses, HCMV, Epstein-Barr virus, and herpes simplex virus. The comparison shows that these DNA polymerases exhibit considerable amino acid homology and highlights a number of highly conserved regions; two such regions show homology to sequences within the adenovirus type 2 DNA polymerase. The HCMV pol gene is flanked by open reading frames with homology to those of other herpes viruses; upstream, there is a reading frame homologous to the glycoprotein B gene of herpes simplex virus type I and Epstein-Barr virus, and downstream there is a reading frame homologous to BFLF2 of Epstein-Barr virus

  3. Nucleic Acid Amplification Testing and Sequencing Combined with Acid-Fast Staining in Needle Biopsy Lung Tissues for the Diagnosis of Smear-Negative Pulmonary Tuberculosis.

    Science.gov (United States)

    Jiang, Faming; Huang, Weiwei; Wang, Ye; Tian, Panwen; Chen, Xuerong; Liang, Zongan

    2016-01-01

    Smear-negative pulmonary tuberculosis (PTB) is common and difficult to diagnose. In this study, we investigated the diagnostic value of nucleic acid amplification testing and sequencing combined with acid-fast bacteria (AFB) staining of needle biopsy lung tissues for patients with suspected smear-negative PTB. Patients with suspected smear-negative PTB who underwent percutaneous transthoracic needle biopsy between May 1, 2012, and June 30, 2015, were enrolled in this retrospective study. Patients with AFB in sputum smears were excluded. All lung biopsy specimens were fixed in formalin, embedded in paraffin, and subjected to acid-fast staining and tuberculous polymerase chain reaction (TB-PCR). For patients with positive AFB and negative TB-PCR results in lung tissues, probe assays and 16S rRNA sequencing were used for identification of nontuberculous mycobacteria (NTM). The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and diagnostic accuracy of PCR and AFB staining were calculated separately and in combination. Among the 220 eligible patients, 133 were diagnosed with TB (men/women: 76/57; age range: 17-80 years, confirmed TB: 9, probable TB: 124). Forty-eight patients who were diagnosed with other specific diseases were assigned as negative controls, and 39 patients with indeterminate final diagnosis were excluded from statistical analysis. The sensitivity, specificity, PPV, NPV, and accuracy of histological AFB (HAFB) for the diagnosis of smear-negative were 61.7% (82/133), 100% (48/48), 100% (82/82), 48.5% (48/181), and 71.8% (130/181), respectively. The sensitivity, specificity, PPV, and NPV of histological PCR were 89.5% (119/133), 95.8% (46/48), 98.3% (119/121), and 76.7% (46/60), respectively, demonstrating that histological PCR had significantly higher accuracy (91.2% [165/181]) than histological acid-fast staining (71.8% [130/181]), P pulmonary tuberculosis. For patients with positive histological AFB and

  4. Analysis of sequence diversity through internal transcribed spacers and simple sequence repeats to identify Dendrobium species.

    Science.gov (United States)

    Liu, Y T; Chen, R K; Lin, S J; Chen, Y C; Chin, S W; Chen, F C; Lee, C Y

    2014-04-08

    The Orchidaceae is one of the largest and most diverse families of flowering plants. The Dendrobium genus has high economic potential as ornamental plants and for medicinal purposes. In addition, the species of this genus are able to produce large crops. However, many Dendrobium varieties are very similar in outward appearance, making it difficult to distinguish one species from another. This study demonstrated that the 12 Dendrobium species used in this study may be divided into 2 groups by internal transcribed spacer (ITS) sequence analysis. Red and yellow flowers may also be used to separate these species into 2 main groups. In particular, the deciduous characteristic is associated with the ITS genetic diversity of the A group. Of 53 designed simple sequence repeat (SSR) primer pairs, 7 pairs were polymorphic for polymerase chain reaction products that were amplified from a specific band. The results of this study demonstrate that these 7 SSR primer pairs may potentially be used to identify Dendrobium species and their progeny in future studies.

  5. The amino acid sequences and activities of synergistic hemolysins from Staphylococcus cohnii.

    Science.gov (United States)

    Mak, Pawel; Maszewska, Agnieszka; Rozalska, Malgorzata

    2008-10-01

    Staphylococcus cohnii ssp. cohnii and S. cohnii ssp. urealyticus are a coagulase-negative staphylococci considered for a long time as unable to cause infections. This situation changed recently and pathogenic strains of these bacteria were isolated from hospital environments, patients and medical staff. Most of the isolated strains were resistant to many antibiotics. The present work describes isolation and characterization of several synergistic peptide hemolysins produced by these bacteria and acting as virulence factors responsible for hemolytic and cytotoxic activities. Amino acid sequences of respective hemolysins from S. cohnii ssp. cohnii (named as H1C, H2C and H3C) and S. cohnii ssp. urealyticus (H1U, H2U and H3U) were identical. Peptides H1 and H3 possessed significant amino acid homology to three synergistic hemolysins secreted by Staphylococcus lugdunensis and to putative antibacterial peptide produced by Staphylococcus saprophyticus ssp. saprophyticus. On the other hand, hemolysin H2 had a unique sequence. All isolated peptides lysed red cells from different mammalian species and exerted a cytotoxic effect on human fibroblasts.

  6. Parameters of proteome evolution from histograms of amino-acid sequence identities of paralogous proteins

    Directory of Open Access Journals (Sweden)

    Yan Koon-Kiu

    2007-11-01

    Full Text Available Abstract Background The evolution of the full repertoire of proteins encoded in a given genome is mostly driven by gene duplications, deletions, and sequence modifications of existing proteins. Indirect information about relative rates and other intrinsic parameters of these three basic processes is contained in the proteome-wide distribution of sequence identities of pairs of paralogous proteins. Results We introduce a simple mathematical framework based on a stochastic birth-and-death model that allows one to extract some of this information and apply it to the set of all pairs of paralogous proteins in H. pylori, E. coli, S. cerevisiae, C. elegans, D. melanogaster, and H. sapiens. It was found that the histogram of sequence identities p generated by an all-to-all alignment of all protein sequences encoded in a genome is well fitted with a power-law form ~ p-γ with the value of the exponent γ around 4 for the majority of organisms used in this study. This implies that the intra-protein variability of substitution rates is best described by the Gamma-distribution with the exponent α ≈ 0.33. Different features of the shape of such histograms allow us to quantify the ratio between the genome-wide average deletion/duplication rates and the amino-acid substitution rate. Conclusion We separately measure the short-term ("raw" duplication and deletion rates rdup∗ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemOCai3aa0baaSqaaiabbsgaKjabbwha1jabbchaWbqaaiabgEHiQaaaaaa@3283@, rdel∗ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemOCai3aa0baaSqaaiabbsga

  7. Using Behavior Sequence Analysis to Map Serial Killers' Life Histories.

    Science.gov (United States)

    Keatley, David A; Golightly, Hayley; Shephard, Rebecca; Yaksic, Enzo; Reid, Sasha

    2018-03-01

    The aim of the current research was to provide a novel method for mapping the developmental sequences of serial killers' life histories. An in-depth biographical account of serial killers' lives, from birth through to conviction, was gained and analyzed using Behavior Sequence Analysis. The analyses highlight similarities in behavioral events across the serial killers' lives, indicating not only which risk factors occur, but the temporal order of these factors. Results focused on early childhood environment, indicating the role of parental abuse; behaviors and events surrounding criminal histories of serial killers, showing that many had previous convictions and were known to police for other crimes; behaviors surrounding their murders, highlighting differences in victim choice and modus operandi; and, finally, trial pleas and convictions. The present research, therefore, provides a novel approach to synthesizing large volumes of data on criminals and presenting results in accessible, understandable outcomes.

  8. Sequence-based analysis of the microbial composition of water kefir from multiple sources.

    Science.gov (United States)

    Marsh, Alan J; O'Sullivan, Orla; Hill, Colin; Ross, R Paul; Cotter, Paul D

    2013-11-01

    Water kefir is a water-sucrose-based beverage, fermented by a symbiosis of bacteria and yeast to produce a final product that is lightly carbonated, acidic and that has a low alcohol percentage. The microorganisms present in water kefir are introduced via water kefir grains, which consist of a polysaccharide matrix in which the microorganisms are embedded. We aimed to provide a comprehensive sequencing-based analysis of the bacterial population of water kefir beverages and grains, while providing an initial insight into the corresponding fungal population. To facilitate this objective, four water kefirs were sourced from the UK, Canada and the United States. Culture-independent, high-throughput, sequencing-based analyses revealed that the bacterial fraction of each water kefir and grain was dominated by Zymomonas, an ethanol-producing bacterium, which has not previously been detected at such a scale. The other genera detected were representatives of the lactic acid bacteria and acetic acid bacteria. Our analysis of the fungal component established that it was comprised of the genera Dekkera, Hanseniaspora, Saccharomyces, Zygosaccharomyces, Torulaspora and Lachancea. This information will assist in the ultimate identification of the microorganisms responsible for the potentially health-promoting attributes of these beverages. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  9. Citrate synthase gene sequence: a new tool for phylogenetic analysis and identification of Ehrlichia.

    Science.gov (United States)

    Inokuma, H; Brouqui, P; Drancourt, M; Raoult, D

    2001-09-01

    The sequence of the citrate synthase gene (gltA) of 13 ehrlichial species (Ehrlichia chaffeensis, Ehrlichia canis, Ehrlichia muris, an Ehrlichia species recently detected from Ixodes ovatus, Cowdria ruminantium, Ehrlichia phagocytophila, Ehrlichia equi, the human granulocytic ehrlichiosis [HGE] agent, Anaplasma marginale, Anaplasma centrale, Ehrlichia sennetsu, Ehrlichia risticii, and Neorickettsia helminthoeca) have been determined by degenerate PCR and the Genome Walker method. The ehrlichial gltA genes are 1,197 bp (E. sennetsu and E. risticii) to 1,254 bp (A. marginale and A. centrale) long, and GC contents of the gene vary from 30.5% (Ehrlichia sp. detected from I. ovatus) to 51.0% (A. centrale). The percent identities of the gltA nucleotide sequences among ehrlichial species were 49.7% (E. risticii versus A. centrale) to 99.8% (HGE agent versus E. equi). The percent identities of deduced amino acid sequences were 44.4% (E. sennetsu versus E. muris) to 99.5% (HGE agent versus E. equi), whereas the homology range of 16S rRNA genes was 83.5% (E. risticii versus the Ehrlichia sp. detected from I. ovatus) to 99.9% (HGE agent, E. equi, and E. phagocytophila). The architecture of the phylogenetic trees constructed by gltA nucleotide sequences or amino acid sequences was similar to that derived from the 16S rRNA gene sequences but showed more-significant bootstrap values. Based upon the alignment analysis of the ehrlichial gltA sequences, two sets of primers were designed to amplify tick-borne Ehrlichia and Neorickettsia genogroup Ehrlichia (N. helminthoeca, E. sennetsu, and E. risticii), respectively. Tick-borne Ehrlichia species were specifically identified by restriction fragment length polymorphism (RFLP) patterns of AcsI and XhoI with the exception of E. muris and the very closely related ehrlichia derived from I. ovatus for which sequence analysis of the PCR product is needed. Similarly, Neorickettsia genogroup Ehrlichia species were specifically identified by

  10. Swab-to-Sequence: Real-time Data Analysis Platform for the Biomolecule Sequencer

    Data.gov (United States)

    National Aeronautics and Space Administration — DNA was successfully sequenced on the ISS in 2016, but the DNA sequenced was prepared on the ground. With FY’16 IRAD funds, the same team developed a...

  11. Differentiation of sheep pox and goat poxviruses by sequence analysis and PCR-RFLP of P32 gene.

    Science.gov (United States)

    Hosamani, Madhusudan; Mondal, Bimalendu; Tembhurne, Prabhakar A; Bandyopadhyay, Santanu Kumar; Singh, Raj Kumar; Rasool, Thaha Jamal

    2004-08-01

    Sheep pox and Goat pox are highly contagious viral diseases of small ruminants. These diseases were earlier thought to be caused by a single species of virus, as they are serologically indistinguishable. P32, one of the major immunogenic genes of Capripoxvirus, was isolated and Sequenced from two Indian isolates of goat poxvirus (GPV) and a vaccine strain of sheep poxvirus (SPV). The sequences were compared with other P32 sequences of capripoxviruses available in the database. Sequence analysis revealed that sheep pox and goat poxviruses share 97.5 and 94.7% homology at nucleotide and amino acid level, respectively. A major difference between them is the presence of an additional aspartic acid at 55th position of P32 of sheep poxvirus that is absent in both goat poxvirus and lumpy skin disease virus. Further, six unique neutral nucleotide substitutions were observed at positions 77, 275, 403, 552, 867 and 964 in the sequence of goat poxvirus, which can be taken as GPV signature residues. Similar unique nucleotide signatures could be identified in SPV and LSDV sequences also. Phylogenetic analysis showed that members of the Capripoxvirus could be delineated into three distinct clusters of GPV, SPV and LSDV based on the P32 genomic sequence. Using this information, a PCR-RFLP method has been developed for unequivocal genomic differentiation of SPV and GPV.

  12. Amino acid sequence and posttranslational modifications of human factor VIIa from plasma and transfected baby hamster kidney cells

    International Nuclear Information System (INIS)

    Thim, L.; Bjoern, S.; Christensen, M.; Nicolaisen, E.M.; Lund-Hansen, T.; Pedersen, A.H.; Hedner, U.

    1988-01-01

    Blood coagulation factor VII is a vitamin K dependent glycoprotein which in its activated form, factor VII a , participates in the coagulation process by activating factor X and/or factor IX in the presence of Ca 2+ and tissue factor. Three types of potential posttranslational modifications exist in the human factor VII a molecule, namely, 10 γ-carboxylated, N-terminally located glutamic acid residues, 1 β-hydroxylated aspartic acid residue, and 2 N-glycosylated asparagine residues. In the present study, the amino acid sequence and posttranslational modifications of recombinant factor VII a as purified from the culture medium of a transfected baby hamster kidney cell line have been compared to human plasma factor VII a . By use of HPLC, amino acid analysis, peptide mapping, and automated Edman degradation, the protein backbone of recombinant factor VII a was found to be identical with human factor VII a . Asparagine residues 145 and 322 were found to be fully N-glycosylated in human plasma factor VII a . In the recombinant factor VII a , asparagine residue 322 was fully glycosylated whereas asparagine residue 145 was only partially (approximately 66%) glycosylated. Besides minor differences in the sialic acid and fucose contents, the overall carbohydrate compositions were nearly identical in recombinant factor VII a and human plasma factor VII a . These results show that factor VII a as produced in the transfected baby hamster kidney cells is very similar to human plasma factor VII a and that this cell line thus might represent an alternative source for human factor VII a

  13. Influence of the Amino Acid Sequence on Protein-Mineral Interactions in Soil

    Science.gov (United States)

    Chacon, S. S.; Reardon, P. N.; Purvine, S.; Lipton, M. S.; Washton, N.; Kleber, M.

    2017-12-01

    The intimate associations between protein and mineral surfaces have profound impacts on nutrient cycling in soil. Proteins are an important source of organic C and N, and a subset of proteins, extracellular enzymes (EE), can catalyze the depolymerization of soil organic matter (SOM). Our goal was to determine how variation in the amino acid sequence could influence a protein's susceptibility to become chemically altered by mineral surfaces to infer the fate of adsorbed EE function in soil. We hypothesized that (1) addition of charged amino acids would enhance the adsorption onto oppositely charged mineral surfaces (2) addition of aromatic amino acids would increase adsorption onto zero charged surfaces (3) Increase adsorption of modified proteins would enhance their susceptibility to alterations by redox active minerals. To test these hypotheses, we generated three engineered proxies of a model protein Gb1 (IEP 4.0, 6.2 kDA) by inserting either negatively charged, positively charged or aromatic amino acids in the second loop. These modified proteins were allowed to interact with functionally different mineral surfaces (goethite, montmorillonite, kaolinite and birnessite) at pH 5 and 7. We used LC-MS/MS and solution-state Heteronuclear Single Quantum Coherence Spectroscopy NMR to observe modifications on engineered proteins as a consequence to mineral interactions. Preliminary results indicate that addition of any amino acids to a protein increase its susceptibility to fragmentation and oxidation by redox active mineral surfaces, and alter adsorption to the other mineral surfaces. This suggest that not all mineral surfaces in soil may act as sorbents for EEs and chemical modification of their structure should also be considered as an explanation for decrease in EE activity. Fragmentation of proteins by minerals can bypass the need to produce proteases, but microbial acquisition of other nutrients that require enzymes such as cellulases, ligninases or phosphatases

  14. Cloning, nucleotide sequence and transcriptional analysis of the uvrA gene from Neisseria gonorrhoeae

    International Nuclear Information System (INIS)

    Black, C.G.; Fyfe, J.A.M.; Davies, J.K.

    1997-01-01

    A recombinant plasmid capable of restoring UV resistance to an Escherichia coli uvrA mutant was isolated from a genomic library of Neisseria gonorrhoeae. Sequence analysis revealed an open reading frame whose deduced amino acid sequence displayed significant similarity to those of the UvrA proteins of other bacterial species. A second open reading frame (ORF259) was identified upstream from, and in the opposite orientation to the gonococcal uvrA gene. Transcriptional fusions between portions of the gonococcal uvrA upstream region and a reporter gene were used to localise promoter activity in both E. coli and N. gonorrhoeae. The transcriptional starting points of uvrA and ORF259 were mapped in E. coli by primer extension analysis, and corresponding σ 70 promoters were identified. The arrangement of the uvrA-ORF259 intergenic region is similar to that of the gonococcal recA-aroD intergenic region. Both contain inverted copies of the 10 bp neisserial DNA uptake sequence situated between divergently transcribed genes. However, there is no evidence that either the uptake sequence or the proximity of the promoters influences expression of these genes. (author)

  15. Single-cell sequencing unveils the lifestyle and CRISPR-based population history of Hydrotalea sp. in acid mine drainage.

    Science.gov (United States)

    Medeiros, J D; Leite, L R; Pylro, V S; Oliveira, F S; Almeida, V M; Fernandes, G R; Salim, A C M; Araújo, F M G; Volpini, A C; Oliveira, G; Cuadros-Orellana, S

    2017-10-01

    Acid mine drainage (AMD) is characterized by an acid and metal-rich run-off that originates from mining systems. Despite having been studied for many decades, much remains unknown about the microbial community dynamics in AMD sites, especially during their early development, when the acidity is moderate. Here, we describe draft genome assemblies from single cells retrieved from an early-stage AMD sample. These cells belong to the genus Hydrotalea and are closely related to Hydrotalea flava. The phylogeny and average nucleotide identity analysis suggest that all single amplified genomes (SAGs) form two clades that may represent different strains. These cells have the genomic potential for denitrification, copper and other metal resistance. Two coexisting CRISPR-Cas loci were recovered across SAGs, and we observed heterogeneity in the population with regard to the spacer sequences, together with the loss of trailer-end spacers. Our results suggest that the genomes of Hydrotalea sp. strains studied here are adjusting to a quickly changing selective pressure at the microhabitat scale, and an important form of this selective pressure is infection by foreign DNA. © 2017 John Wiley & Sons Ltd.

  16. Deep sequencing of the Mexican avocado transcriptome, an ancient angiosperm with a high content of fatty acids.

    Science.gov (United States)

    Ibarra-Laclette, Enrique; Méndez-Bravo, Alfonso; Pérez-Torres, Claudia Anahí; Albert, Victor A; Mockaitis, Keithanne; Kilaru, Aruna; López-Gómez, Rodolfo; Cervantes-Luevano, Jacob Israel; Herrera-Estrella, Luis

    2015-08-13

    Avocado (Persea americana) is an economically important tropical fruit considered to be a good source of fatty acids. Despite its importance, the molecular and cellular characterization of biochemical and developmental processes in avocado is limited due to the lack of transcriptome and genomic information. The transcriptomes of seeds, roots, stems, leaves, aerial buds and flowers were determined using different sequencing platforms. Additionally, the transcriptomes of three different stages of fruit ripening (pre-climacteric, climacteric and post-climacteric) were also analyzed. The analysis of the RNAseqatlas presented here reveals strong differences in gene expression patterns between different organs, especially between root and flower, but also reveals similarities among the gene expression patterns in other organs, such as stem, leaves and aerial buds (vegetative organs) or seed and fruit (storage organs). Important regulators, functional categories, and differentially expressed genes involved in avocado fruit ripening were identified. Additionally, to demonstrate the utility of the avocado gene expression atlas, we investigated the expression patterns of genes implicated in fatty acid metabolism and fruit ripening. A description of transcriptomic changes occurring during fruit ripening was obtained in Mexican avocado, contributing to a dynamic view of the expression patterns of genes involved in fatty acid biosynthesis and the fruit ripening process.

  17. Linear discriminant analysis of character sequences using occurrences of words

    KAUST Repository

    Dutta, Subhajit; Chaudhuri, Probal; Ghosh, Anil

    2014-01-01

    Classification of character sequences, where the characters come from a finite set, arises in disciplines such as molecular biology and computer science. For discriminant analysis of such character sequences, the Bayes classifier based on Markov models turns out to have class boundaries defined by linear functions of occurrences of words in the sequences. It is shown that for such classifiers based on Markov models with unknown orders, if the orders are estimated from the data using cross-validation, the resulting classifier has Bayes risk consistency under suitable conditions. Even when Markov models are not valid for the data, we develop methods for constructing classifiers based on linear functions of occurrences of words, where the word length is chosen by cross-validation. Such linear classifiers are constructed using ideas of support vector machines, regression depth, and distance weighted discrimination. We show that classifiers with linear class boundaries have certain optimal properties in terms of their asymptotic misclassification probabilities. The performance of these classifiers is demonstrated in various simulated and benchmark data sets.

  18. Linear discriminant analysis of character sequences using occurrences of words

    KAUST Repository

    Dutta, Subhajit

    2014-02-01

    Classification of character sequences, where the characters come from a finite set, arises in disciplines such as molecular biology and computer science. For discriminant analysis of such character sequences, the Bayes classifier based on Markov models turns out to have class boundaries defined by linear functions of occurrences of words in the sequences. It is shown that for such classifiers based on Markov models with unknown orders, if the orders are estimated from the data using cross-validation, the resulting classifier has Bayes risk consistency under suitable conditions. Even when Markov models are not valid for the data, we develop methods for constructing classifiers based on linear functions of occurrences of words, where the word length is chosen by cross-validation. Such linear classifiers are constructed using ideas of support vector machines, regression depth, and distance weighted discrimination. We show that classifiers with linear class boundaries have certain optimal properties in terms of their asymptotic misclassification probabilities. The performance of these classifiers is demonstrated in various simulated and benchmark data sets.

  19. Sequence analysis of PROTEOLYSIS 6 from Solanum lycopersicum

    Science.gov (United States)

    Roslan, Nur Farhana; Chew, Bee Lyn; Goh, Hoe-Han; Isa, Nurulhikma Md

    2018-04-01

    The N-end rule pathway is a protein degradation pathway that relates the protein half-life with the identity of its N-terminal residues. A destabilizing N-terminal residues is created by enzymatic reaction or chemical modifications. This destabilized substrate will be recognized by PROTEOLYSIS 6 (PRT6) protein, which encodes an E3 ligase enzyme and resulted in substrate degradation by proteasome. PRT6 has been studied in Arabidopsis thaliana and barley but not yet been studied in fleshy fruit plants. Hence, this study was carried out in tomato that is known as the model for fleshy fruit plants. BLASTX analysis identified that Solyc09g010830 which encodes for a PRT6 gene in tomato based on its sequence similarity with PRT6 in A. thaliana. In silico gene expression analysis shows that PRT6 gene was highly expressed in tomato fruits breaker +5. Co-expression analysis shows that PRT6 may not only involved in abiotic stresses but also in biotic stresses. The objective is to analyze the sequence and characterize PRT6 gene in tomato.

  20. Determining physical constraints in transcriptional initiationcomplexes using DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Shultzaberger, Ryan K.; Chiang, Derek Y.; Moses, Alan M.; Eisen,Michael B.

    2007-07-01

    Eukaryotic gene expression is often under the control ofcooperatively acting transcription factors whose binding is limited bystructural constraints. By determining these structural constraints, wecan understand the "rules" that define functional cooperativity.Conversely, by understanding the rules of binding, we can inferstructural characteristics. We have developed an information theory basedmethod for approximating the physical limitations of cooperativeinteractions by comparing sequence analysis to microarray expressiondata. When applied to the coordinated binding of the sulfur amino acidregulatory protein Met4 by Cbf1 and Met31, we were able to create acombinatorial model that can correctly identify Met4 regulatedgenes.

  1. Evolution of amino acid metabolism inferred through cladistic analysis.

    Science.gov (United States)

    Cunchillos, Chomin; Lecointre, Guillaume

    2003-11-28

    Because free amino acids were most probably available in primitive abiotic environments, their metabolism is likely to have provided some of the very first metabolic pathways of life. What were the first enzymatic reactions to emerge? A cladistic analysis of metabolic pathways of the 16 aliphatic amino acids and 2 portions of the Krebs cycle was performed using four criteria of homology. The analysis is not based on sequence comparisons but, rather, on coding similarities in enzyme properties. The properties used are shared specific enzymatic activity, shared enzymatic function without substrate specificity, shared coenzymes, and shared functional family. The tree shows that the earliest pathways to emerge are not portions of the Krebs cycle but metabolisms of aspartate, asparagine, glutamate, and glutamine. The views of Horowitz (Horowitz, N. H. (1945) Proc. Natl. Acad. Sci. U. S. A. 31, 153-157) and Cordón (Cordón, F. (1990) Tratado Evolucionista de Biologia, Aguilar, Madrid, Spain), according to which the upstream reactions in the catabolic pathways and the downstream reactions in the anabolic pathways are the earliest in evolution, are globally corroborated; however, with some exceptions. These are due to later opportunistic connections of pathways (actually already suggested by these authors). Earliest enzymatic functions are mostly catabolic; they were deaminations, transaminations, and decarboxylations. From the consensus tree we extracted four time spans for amino acid metabolism development. For some amino acids catabolism and biosynthesis occurred at the same time (Asp, Glu, Lys, Leu, Ala, Val, Ile, Pro, Arg). For others ultimate reactions that use amino acids as a substrate or as a product are distinct in time, with catabolism preceding anabolism for Asn, Gln, and Cys and anabolism preceding catabolism for Ser, Met, and Thr. Cladistic analysis of the structure of biochemical pathways makes hypotheses in biochemical evolution explicit and parsimonious.

  2. Molecular Cloning and Sequence Analysis of a Phenylalanine Ammonia-Lyase Gene from Dendrobium

    Science.gov (United States)

    Cai, Yongping; Lin, Yi

    2013-01-01

    In this study, a phenylalanine ammonia-lyase (PAL) gene was cloned from Dendrobium candidum using homology cloning and RACE. The full-length sequence and catalytic active sites that appear in PAL proteins of Arabidopsis thaliana and Nicotiana tabacum are also found: PAL cDNA of D. candidum (designated Dc-PAL1, GenBank No. JQ765748) has 2,458 bps and contains a complete open reading frame (ORF) of 2,142 bps, which encodes 713 amino acid residues. The amino acid sequence of DcPAL1 has more than 80% sequence identity with the PAL genes of other plants, as indicated by multiple alignments. The dominant sites and catalytic active sites, which are similar to that showing in PAL proteins of Arabidopsis thaliana and Nicotiana tabacum, are also found in DcPAL1. Phylogenetic tree analysis revealed that DcPAL is more closely related to PALs from orchidaceae plants than to those of other plants. The differential expression patterns of PAL in protocorm-like body, leaf, stem, and root, suggest that the PAL gene performs multiple physiological functions in Dendrobium candidum. PMID:23638048

  3. Molecular cloning and sequence analysis of a phenylalanine ammonia-lyase gene from dendrobium.

    Directory of Open Access Journals (Sweden)

    Qing Jin

    Full Text Available In this study, a phenylalanine ammonia-lyase (PAL gene was cloned from Dendrobium candidum using homology cloning and RACE. The full-length sequence and catalytic active sites that appear in PAL proteins of Arabidopsis thaliana and Nicotiana tabacum are also found: PAL cDNA of D. candidum (designated Dc-PAL1, GenBank No. JQ765748 has 2,458 bps and contains a complete open reading frame (ORF of 2,142 bps, which encodes 713 amino acid residues. The amino acid sequence of DcPAL1 has more than 80% sequence identity with the PAL genes of other plants, as indicated by multiple alignments. The dominant sites and catalytic active sites, which are similar to that showing in PAL proteins of Arabidopsis thaliana and Nicotiana tabacum, are also found in DcPAL1. Phylogenetic tree analysis revealed that DcPAL is more closely related to PALs from orchidaceae plants than to those of other plants. The differential expression patterns of PAL in protocorm-like body, leaf, stem, and root, suggest that the PAL gene performs multiple physiological functions in Dendrobium candidum.

  4. Statistically significant dependence of the Xaa-Pro peptide bond conformation on secondary structure and amino acid sequence

    Directory of Open Access Journals (Sweden)

    Leitner Dietmar

    2005-04-01

    Full Text Available Abstract Background A reliable prediction of the Xaa-Pro peptide bond conformation would be a useful tool for many protein structure calculation methods. We have analyzed the Protein Data Bank and show that the combined use of sequential and structural information has a predictive value for the assessment of the cis versus trans peptide bond conformation of Xaa-Pro within proteins. For the analysis of the data sets different statistical methods such as the calculation of the Chou-Fasman parameters and occurrence matrices were used. Furthermore we analyzed the relationship between the relative solvent accessibility and the relative occurrence of prolines in the cis and in the trans conformation. Results One of the main results of the statistical investigations is the ranking of the secondary structure and sequence information with respect to the prediction of the Xaa-Pro peptide bond conformation. We observed a significant impact of secondary structure information on the occurrence of the Xaa-Pro peptide bond conformation, while the sequence information of amino acids neighboring proline is of little predictive value for the conformation of this bond. Conclusion In this work, we present an extensive analysis of the occurrence of the cis and trans proline conformation in proteins. Based on the data set, we derived patterns and rules for a possible prediction of the proline conformation. Upon adoption of the Chou-Fasman parameters, we are able to derive statistically relevant correlations between the secondary structure of amino acid fragments and the Xaa-Pro peptide bond conformation.

  5. Sequence analysis of dolphin ferritin H and L subunits and possible iron-dependent translational control of dolphin ferritin gene

    Directory of Open Access Journals (Sweden)

    Sasaki Yukako

    2008-10-01

    Full Text Available Abstract Background Iron-storage protein, ferritin plays a central role in iron metabolism. Ferritin has dual function to store iron and segregate iron for protection of iron-catalyzed reactive oxygen species. Tissue ferritin is composed of two kinds of subunits (H: heavy chain or heart-type subunit; L: light chain or liver-type subunit. Ferritin gene expression is controlled at translational level in iron-dependent manner or at transcriptional level in iron-independent manner. However, sequencing analysis of marine mammalian ferritin subunits has not yet been performed fully. The purpose of this study is to reveal cDNA-derived amino acid sequences of cetacean ferritin H and L subunits, and demonstrate the possibility of expression of these subunits, especially H subunit, by iron. Methods Sequence analyses of cetacean ferritin H and L subunits were performed by direct sequencing of polymerase chain reaction (PCR fragments from cDNAs generated via reverse transcription-PCR of leukocyte total RNA prepared from blood samples of six different dolphin species (Pseudorca crassidens, Lagenorhynchus obliquidens, Grampus griseus, Globicephala macrorhynchus, Tursiops truncatus, and Delphinapterus leucas. The putative iron-responsive element sequence in the 5'-untranslated region of the six different dolphin species was revealed by direct sequencing of PCR fragments obtained using leukocyte genomic DNA. Results Dolphin H and L subunits consist of 182 and 174 amino acids, respectively, and amino acid sequence identities of ferritin subunits among these dolphins are highly conserved (H: 99–100%, (99→98 ; L: 98–100%. The conserved 28 bp IRE sequence was located -144 bp upstream from the initiation codon in the six different dolphin species. Conclusion These results indicate that six different dolphin species have conserved ferritin sequences, and suggest that these genes are iron-dependently expressed.

  6. TranslatomeDB: a comprehensive database and cloud-based analysis platform for translatome sequencing data.

    Science.gov (United States)

    Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie; Zhang, Gong

    2018-01-04

    Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Massively parallel sequencing and analysis of the Necator americanus transcriptome.

    Directory of Open Access Journals (Sweden)

    Cinzia Cantacessi

    2010-05-01

    Full Text Available The blood-feeding hookworm Necator americanus infects hundreds of millions of people worldwide. In order to elucidate fundamental molecular biological aspects of this hookworm, the transcriptome of the adult stage of Necator americanus was explored using next-generation sequencing and bioinformatic analyses.A total of 19,997 contigs were assembled from the sequence data; 6,771 of these contigs had known orthologues in the free-living nematode Caenorhabditis elegans, and most of them encoded proteins with WD40 repeats (10.6%, proteinase inhibitors (7.8% or calcium-binding EF-hand proteins (6.7%. Bioinformatic analyses inferred that the C. elegans homologues are involved mainly in biological pathways linked to ribosome biogenesis (70%, oxidative phosphorylation (63% and/or proteases (60%; most of these molecules were predicted to be involved in more than one biological pathway. Comparative analyses of the transcriptomes of N. americanus and the canine hookworm, Ancylostoma caninum, revealed qualitative and quantitative differences. For instance, proteinase inhibitors were inferred to be highly represented in the former species, whereas SCP/Tpx-1/Ag5/PR-1/Sc7 proteins ( = SCP/TAPS or Ancylostoma-secreted proteins were predominant in the latter. In N. americanus, essential molecules were predicted using a combination of orthology mapping and functional data available for C. elegans. Further analyses allowed the prioritization of 18 predicted drug targets which did not have homologues in the human host. These candidate targets were inferred to be linked to mitochondrial (e.g., processing proteins or amino acid metabolism (e.g., asparagine t-RNA synthetase.This study has provided detailed insights into the transcriptome of the adult stage of N. americanus and examines similarities and differences between this species and A. caninum. Future efforts should focus on comparative transcriptomic and proteomic investigations of the other predominant human

  8. Streaming support for data intensive cloud-based sequence analysis.

    Science.gov (United States)

    Issa, Shadi A; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of "resources-on-demand" and "pay-as-you-go", scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

  9. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Shadi A. Issa

    2013-01-01

    Full Text Available Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client’s site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

  10. Next-generation sequence analysis of cancer xenograft models.

    Directory of Open Access Journals (Sweden)

    Fernando J Rossello

    Full Text Available Next-generation sequencing (NGS studies in cancer are limited by the amount, quality and purity of tissue samples. In this situation, primary xenografts have proven useful preclinical models. However, the presence of mouse-derived stromal cells represents a technical challenge to their use in NGS studies. We examined this problem in an established primary xenograft model of small cell lung cancer (SCLC, a malignancy often diagnosed from small biopsy or needle aspirate samples. Using an in silico strategy that assign reads according to species-of-origin, we prospectively compared NGS data from primary xenograft models with matched cell lines and with published datasets. We show here that low-coverage whole-genome analysis demonstrated remarkable concordance between published genome data and internal controls, despite the presence of mouse genomic DNA. Exome capture sequencing revealed that this enrichment procedure was highly species-specific, with less than 4% of reads aligning to the mouse genome. Human-specific expression profiling with RNA-Seq replicated array-based gene expression experiments, whereas mouse-specific transcript profiles correlated with published datasets from human cancer stroma. We conclude that primary xenografts represent a useful platform for complex NGS analysis in cancer research for tumours with limited sample resources, or those with prominent stromal cell populations.

  11. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Science.gov (United States)

    Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461

  12. The myoglobin of Emperor penguin (Aptenodytes forsteri): amino acid sequence and functional adaptation to extreme conditions.

    Science.gov (United States)

    Tamburrini, M; Romano, M; Giardina, B; di Prisco, G

    1999-02-01

    In the framework of a study on molecular adaptations of the oxygen-transport and storage systems to extreme conditions in Antarctic marine organisms, we have investigated the structure/function relationship in Emperor penguin (Aptenodytes forsteri) myoglobin, in search of correlation with the bird life style. In contrast with previous reports, the revised amino acid sequence contains one additional residue and 15 differences. The oxygen-binding parameters seem well adapted to the diving behaviour of the penguin and to the environmental conditions of the Antarctic habitat. Addition of lactate has no major effect on myoglobin oxygenation over a large temperature range. Therefore, metabolic acidosis does not impair myoglobin function under conditions of prolonged physical effort, such as diving.

  13. GAWK, a novel human pituitary polypeptide: isolation, immunocytochemical localization and complete amino acid sequence.

    Science.gov (United States)

    Benjannet, S; Leduc, R; Lazure, C; Seidah, N G; Marcinkiewicz, M; Chrétien, M

    1985-01-16

    During the course of reverse-phase high pressure liquid chromatography (RP-HPLC) purification of a postulated big ACTH (1) from human pituitary gland extracts, a highly purified peptide bearing no resemblance to any known polypeptide was isolated. The complete sequence of this 74 amino acid polypeptide, called GAWK, has been determined. Search on a computer data bank on the possible homology to any known protein or fragment, using a mutation data matrix, failed to reveal any homology greater than 30%. An antibody produced against a synthetic fragment allowed us to detect several immunoreactive forms. The antisera also enabled us to localize the polypeptide, by immunocytochemistry, in the anterior lobe of the pituitary gland.

  14. Extended -Regular Sequence for Automated Analysis of Microarray Images

    Directory of Open Access Journals (Sweden)

    Jin Hee-Jeong

    2006-01-01

    Full Text Available Microarray study enables us to obtain hundreds of thousands of expressions of genes or genotypes at once, and it is an indispensable technology for genome research. The first step is the analysis of scanned microarray images. This is the most important procedure for obtaining biologically reliable data. Currently most microarray image processing systems require burdensome manual block/spot indexing work. Since the amount of experimental data is increasing very quickly, automated microarray image analysis software becomes important. In this paper, we propose two automated methods for analyzing microarray images. First, we propose the extended -regular sequence to index blocks and spots, which enables a novel automatic gridding procedure. Second, we provide a methodology, hierarchical metagrid alignment, to allow reliable and efficient batch processing for a set of microarray images. Experimental results show that the proposed methods are more reliable and convenient than the commercial tools.

  15. Sequence Quality Analysis Tool for HIV Type 1 Protease and Reverse Transcriptase

    OpenAIRE

    DeLong, Allison K.; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W.; Kantor, Rami

    2012-01-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802...

  16. The use of orthologous sequences to predict the impact of amino acid substitutions on protein function.

    Directory of Open Access Journals (Sweden)

    Nicholas J Marini

    2010-05-01

    Full Text Available Computational predictions of the functional impact of genetic variation play a critical role in human genetics research. For nonsynonymous coding variants, most prediction algorithms make use of patterns of amino acid substitutions observed among homologous proteins at a given site. In particular, substitutions observed in orthologous proteins from other species are often assumed to be tolerated in the human protein as well. We examined this assumption by evaluating a panel of nonsynonymous mutants of a prototypical human enzyme, methylenetetrahydrofolate reductase (MTHFR, in a yeast cell-based functional assay. As expected, substitutions in human MTHFR at sites that are well-conserved across distant orthologs result in an impaired enzyme, while substitutions present in recently diverged sequences (including a 9-site mutant that "resurrects" the human-macaque ancestor result in a functional enzyme. We also interrogated 30 sites with varying degrees of conservation by creating substitutions in the human enzyme that are accepted in at least one ortholog of MTHFR. Quite surprisingly, most of these substitutions were deleterious to the human enzyme. The results suggest that selective constraints vary between phylogenetic lineages such that inclusion of distant orthologs to infer selective pressures on the human enzyme may be misleading. We propose that homologous proteins are best used to reconstruct ancestral sequences and infer amino acid conservation among only direct lineal ancestors of a particular protein. We show that such an "ancestral site preservation" measure outperforms other prediction methods, not only in our selected set for MTHFR, but also in an exhaustive set of E. coli LacI mutants.

  17. Nucleic Acid Amplification Testing and Sequencing Combined with Acid-Fast Staining in Needle Biopsy Lung Tissues for the Diagnosis of Smear-Negative Pulmonary Tuberculosis.

    Directory of Open Access Journals (Sweden)

    Faming Jiang

    Full Text Available Smear-negative pulmonary tuberculosis (PTB is common and difficult to diagnose. In this study, we investigated the diagnostic value of nucleic acid amplification testing and sequencing combined with acid-fast bacteria (AFB staining of needle biopsy lung tissues for patients with suspected smear-negative PTB.Patients with suspected smear-negative PTB who underwent percutaneous transthoracic needle biopsy between May 1, 2012, and June 30, 2015, were enrolled in this retrospective study. Patients with AFB in sputum smears were excluded. All lung biopsy specimens were fixed in formalin, embedded in paraffin, and subjected to acid-fast staining and tuberculous polymerase chain reaction (TB-PCR. For patients with positive AFB and negative TB-PCR results in lung tissues, probe assays and 16S rRNA sequencing were used for identification of nontuberculous mycobacteria (NTM. The sensitivity, specificity, positive predictive value (PPV, negative predictive value (NPV, and diagnostic accuracy of PCR and AFB staining were calculated separately and in combination.Among the 220 eligible patients, 133 were diagnosed with TB (men/women: 76/57; age range: 17-80 years, confirmed TB: 9, probable TB: 124. Forty-eight patients who were diagnosed with other specific diseases were assigned as negative controls, and 39 patients with indeterminate final diagnosis were excluded from statistical analysis. The sensitivity, specificity, PPV, NPV, and accuracy of histological AFB (HAFB for the diagnosis of smear-negative were 61.7% (82/133, 100% (48/48, 100% (82/82, 48.5% (48/181, and 71.8% (130/181, respectively. The sensitivity, specificity, PPV, and NPV of histological PCR were 89.5% (119/133, 95.8% (46/48, 98.3% (119/121, and 76.7% (46/60, respectively, demonstrating that histological PCR had significantly higher accuracy (91.2% [165/181] than histological acid-fast staining (71.8% [130/181], P < 0.001. Parallel testing of histological AFB staining and PCR showed the

  18. Protein-Protein Interactions Prediction Using a Novel Local Conjoint Triad Descriptor of Amino Acid Sequences

    Directory of Open Access Journals (Sweden)

    Jun Wang

    2017-11-01

    Full Text Available Protein-protein interactions (PPIs play crucial roles in almost all cellular processes. Although a large amount of PPIs have been verified by high-throughput techniques in the past decades, currently known PPIs pairs are still far from complete. Furthermore, the wet-lab experiments based techniques for detecting PPIs are time-consuming and expensive. Hence, it is urgent and essential to develop automatic computational methods to efficiently and accurately predict PPIs. In this paper, a sequence-based approach called DNN-LCTD is developed by combining deep neural networks (DNNs and a novel local conjoint triad description (LCTD feature representation. LCTD incorporates the advantage of local description and conjoint triad, thus, it is capable to account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. DNNs can not only learn suitable features from the data by themselves, but also learn and discover hierarchical representations of data. When performing on the PPIs data of Saccharomyces cerevisiae, DNN-LCTD achieves superior performance with accuracy as 93.12%, precision as 93.75%, sensitivity as 93.83%, area under the receiver operating characteristic curve (AUC as 97.92%, and it only needs 718 s. These results indicate DNN-LCTD is very promising for predicting PPIs. DNN-LCTD can be a useful supplementary tool for future proteomics study.

  19. Scanning mutagenesis of the amino acid sequences flanking phosphorylation site 1 of the mitochondrial pyruvate dehydrogenase complex

    Directory of Open Access Journals (Sweden)

    Nagib eAhsan

    2012-07-01

    Full Text Available The mitochondrial pyruvate dehydrogenase complex is regulated by reversible seryl-phosphorylation of the E1α subunit by a dedicated, intrinsic kinase. The phospho-complex is reactivated when dephosphorylated by an intrinsic PP2C-type protein phosphatase. Both the position of the phosphorylated Ser-residue and the sequences of the flanking amino acids are highly conserved. We have used the synthetic peptide-based kinase client assay plus recombinant pyruvate dehydrogenase E1α and E1α-kinase to perform scanning mutagenesis of the residues flanking the site of phosphorylation. Consistent with the results from phylogenetic analysis of the flanking sequences, the direct peptide-based kinase assays tolerated very few changes. Even conservative changes such as Leu, Ile, or Val for Met, or Glu for Asp, gave very marked reductions in phosphorylation. Overall the results indicate that regulation of the mitochondrial pyruvate dehydrogenase complex by reversible phosphorylation is an extreme example of multiple, interdependent instances of co-evolution.

  20. Nonlinear analysis of sequence symmetry of beta-trefoil family proteins

    Energy Technology Data Exchange (ETDEWEB)

    Li Mingfeng [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Huang Yanzhao [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Xu Ruizhen [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Xiao Yi [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China)]. E-mail: yxiao@mail.hust.edu.cn

    2005-07-01

    The tertiary structures of proteins of beta-trefoil family have three-fold quasi-symmetry while their amino acid sequences appear almost at random. In the present paper we show that these amino acid sequences have hidden symmetries in fact and furthermore the degrees of these hidden symmetries are the same as those of their tertiary structures. We shall present a modified recurrence plot to reveal hidden symmetries in protein sequences. Our results can explain the contradiction in sequence-structure relations of proteins of beta-trefoil family.

  1. Complete genome sequence of probiotic Bacillus coagulans HM-08: A potential lactic acid producer.

    Science.gov (United States)

    Yao, Guoqiang; Gao, Pengfei; Zhang, Wenyi

    2016-06-20

    Bacillus coagulans HM-08 is a commercialized probiotic strain in China. Its genome contains a 3.62Mb circular chromosome with an average GC content of 46.3%. In silico analysis revealed the presence of one xyl operon as well as several other genes that are correlated to xylose utilization. The genetic information provided here may help to expand its future biotechnology potential in lactic acid production. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences

    KAUST Repository

    Chen, Peng

    2013-07-23

    Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. © 2013 Wiley Periodicals, Inc.

  3. Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences

    KAUST Repository

    Chen, Peng; Li, Jinyan; Limsoon, Wong; Kuwahara, Hiroyuki; Huang, Jianhua Z.; Gao, Xin

    2013-01-01

    Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. © 2013 Wiley Periodicals, Inc.

  4. Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences.

    Science.gov (United States)

    Chen, Peng; Li, Jinyan; Wong, Limsoon; Kuwahara, Hiroyuki; Huang, Jianhua Z; Gao, Xin

    2013-08-01

    Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. Copyright © 2013 Wiley Periodicals, Inc.

  5. Lactic acid production from potato peel waste by anaerobic sequencing batch fermentation using undefined mixed culture.

    Science.gov (United States)

    Liang, Shaobo; McDonald, Armando G; Coats, Erik R

    2015-11-01

    Lactic acid (LA) is a necessary industrial feedstock for producing the bioplastic, polylactic acid (PLA), which is currently produced by pure culture fermentation of food carbohydrates. This work presents an alternative to produce LA from potato peel waste (PPW) by anaerobic fermentation in a sequencing batch reactor (SBR) inoculated with undefined mixed culture from a municipal wastewater treatment plant. A statistical design of experiments approach was employed using set of 0.8L SBRs using gelatinized PPW at a solids content range from 30 to 50 g L(-1), solids retention time of 2-4 days for yield and productivity optimization. The maximum LA production yield of 0.25 g g(-1) PPW and highest productivity of 125 mg g(-1) d(-1) were achieved. A scale-up SBR trial using neat gelatinized PPW (at 80 g L(-1) solids content) at the 3 L scale was employed and the highest LA yield of 0.14 g g(-1) PPW and a productivity of 138 mg g(-1) d(-1) were achieved with a 1 d SRT. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. Amino acid sequence surrounding the chondroitin sulfate attachment site of thrombomodulin regulates chondroitin polymerization.

    Science.gov (United States)

    Izumikawa, Tomomi; Kitagawa, Hiroshi

    2015-05-01

    Thrombomodulin (TM) is a cell-surface glycoprotein and a critical mediator of endothelial anticoagulant function. TM exists as both a chondroitin sulfate (CS) proteoglycan (PG) form and a non-PG form lacking a CS chain (α-TM); therefore, TM can be described as a part-time PG. Previously, we reported that α-TM bears an immature, truncated linkage tetrasaccharide structure (GlcAβ1-3Galβ1-3Galβ1-4Xyl). However, the biosynthetic mechanism to generate part-time PGs remains unclear. In this study, we used several mutants to demonstrate that the amino acid sequence surrounding the CS attachment site influences the efficiency of chondroitin polymerization. In particular, the presence of acidic residues surrounding the CS attachment site was indispensable for the elongation of CS. In addition, mutants defective in CS elongation did not exhibit anti-coagulant activity, as in the case with α-TM. Together, these data support a model for CS chain assembly in which specific core protein determinants are recognized by a key biosynthetic enzyme involved in chondroitin polymerization. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. A Novel Phytase with Sequence Similarity to Purple Acid Phosphatases Is Expressed in Cotyledons of Germinating Soybean Seedlings 1

    Science.gov (United States)

    Hegeman, Carla E.; Grabau, Elizabeth A.

    2001-01-01

    Phytic acid (myo-inositol hexakisphosphate) is the major storage form of phosphorus in plant seeds. During germination, stored reserves are used as a source of nutrients by the plant seedling. Phytic acid is degraded by the activity of phytases to yield inositol and free phosphate. Due to the lack of phytases in the non-ruminant digestive tract, monogastric animals cannot utilize dietary phytic acid and it is excreted into manure. High phytic acid content in manure results in elevated phosphorus levels in soil and water and accompanying environmental concerns. The use of phytases to degrade seed phytic acid has potential for reducing the negative environmental impact of livestock production. A phytase was purified to electrophoretic homogeneity from cotyledons of germinated soybeans (Glycine max L. Merr.). Peptide sequence data generated from the purified enzyme facilitated the cloning of the phytase sequence (GmPhy) employing a polymerase chain reaction strategy. The introduction of GmPhy into soybean tissue culture resulted in increased phytase activity in transformed cells, which confirmed the identity of the phytase gene. It is surprising that the soybean phytase was unrelated to previously characterized microbial or maize (Zea mays) phytases, which were classified as histidine acid phosphatases. The soybean phytase sequence exhibited a high degree of similarity to purple acid phosphatases, a class of metallophosphoesterases. PMID:11500558

  8. Sequence analysis of L RNA of Lassa virus

    International Nuclear Information System (INIS)

    Vieth, Simon; Torda, Andrew E.; Asper, Marcel; Schmitz, Herbert; Guenther, Stephan

    2004-01-01

    The L RNA of three Lassa virus strains originating from Nigeria, Ghana/Ivory Coast, and Sierra Leone was sequenced and the data subjected to structure predictions and phylogenetic analyses. The L gene products had 2218-2221 residues, diverged by 18% at the amino acid level, and contained several conserved regions. Only one region of 504 residues (positions 1043-1546) could be assigned a function, namely that of an RNA polymerase. Secondary structure predictions suggest that this domain is very similar to RNA-dependent RNA polymerases of known structure encoded by plus-strand RNA viruses, permitting a model to be built. Outside the polymerase region, there is little structural data, except for regions of strong alpha-helical content and probably a coiled-coil domain at the N terminus. No evidence for reassortment or recombination during Lassa virus evolution was found. The secondary structure-assisted alignment of the RNA polymerase region permitted a reliable reconstruction of the phylogeny of all negative-strand RNA viruses, indicating that Arenaviridae are most closely related to Nairoviruses. In conclusion, the data provide a basis for structural and functional characterization of the Lassa virus L protein and reveal new insights into the phylogeny of negative-strand RNA viruses

  9. Lactobacillus kefiri shows inter-strain variations in the amino acid sequence of the S-layer proteins.

    Science.gov (United States)

    Malamud, Mariano; Carasi, Paula; Bronsoms, Sílvia; Trejo, Sebastián A; Serradell, María de Los Angeles

    2017-04-01

    The S-layer is a proteinaceous envelope constituted by subunits that self-assemble to form a two-dimensional lattice that covers the surface of different species of Bacteria and Archaea, and it could be involved in cell recognition of microbes among other several distinct functions. In this work, both proteomic and genomic approaches were used to gain knowledge about the sequences of the S-layer protein (SLPs) encoding genes expressed by six aggregative and sixteen non-aggregative strains of potentially probiotic Lactobacillus kefiri. Peptide mass fingerprint (PMF) analysis confirmed the identity of SLPs extracted from L. kefiri, and based on the homology with phylogenetically related species, primers located outside and inside the SLP-genes were employed to amplify genomic DNA. The O-glycosylation site SASSAS was found in all L. kefiri SLPs. Ten strains were selected for sequencing of the complete genes. The total length of the mature proteins varies from 492 to 576 amino acids, and all SLPs have a calculated pI between 9.37 and 9.60. The N-terminal region is relatively conserved and shows a high percentage of positively charged amino acids. Major differences among strains are found in the C-terminal region. Different groups could be distinguished regarding the mature SLPs and the similarities observed in the PMF spectra. Interestingly, SLPs of the aggregative strains are 100% homologous, although these strains were isolated from different kefir grains. This knowledge provides relevant data for better understanding of the mechanisms involved in SLPs functionality and could contribute to the development of products of biotechnological interest from potentially probiotic bacteria.

  10. Analysis of Peptides and Conjugates by Amino Acid Analysis

    DEFF Research Database (Denmark)

    Højrup, Peter

    2015-01-01

    Amino acid analysis is a highly accurate method for characterization of the composition of synthetic peptides. Together with mass spectrometry, it gives a reliable control of peptide quality and quantity before conjugation and immunization.Peptides are hydrolyzed, preferably in gas phase, with 6 M...... HCl at 110 °C for 20-24 h and the resulting amino acids analyzed by ion-exchange chromatography with post-column ninhydrin derivatization. Depending on the hydrolysis conditions, tryptophan is destroyed, and cysteine also, unless derivatized, and the amides, glutamine and asparagine, are deamidated...... to glutamic acid and aspartic acid, respectively. Three different ways of calculating results are suggested, and taking the above limitations into account, a quantitation better than 5 % can usually be obtained....

  11. Sequencing Infrastructure Investments under Deep Uncertainty Using Real Options Analysis

    Directory of Open Access Journals (Sweden)

    Nishtha Manocha

    2018-02-01

    Full Text Available The adaptation tipping point and adaptation pathway approach developed to make decisions under deep uncertainty do not shed light on which among the multiple available pathways should be chosen as the preferred pathway. This creates the need to extend these approaches by means of suitable tools that can help sequence actions and subsequently enable the outlining of relevant policies. This paper presents two sequencing approaches, namely, the “Build to Target” and “Build Up” approach, to aid in sub-selecting a set of preferred pathways. Both approaches differ in the levels of flexibility they offer. They are exemplified by means of two case studies wherein the Net Present Valuation and the Real Options Analysis are employed as selection criterions. The results demonstrate the benefit of these two approaches when used in conjunction with the adaptation pathways and show how the pathways selected by means of a Build to Target approach generally have a value greater than, or at least the same as, the pathways selected by the Build Up approach. Further, this paper also demonstrates the capacity of Real Options to quantify and capture the economic value of flexibility, which cannot be done by traditional valuation approaches such as Net Present Valuation.

  12. Reverse transcriptase sequences from mulberry LTR retrotransposons: characterization analysis

    Directory of Open Access Journals (Sweden)

    Ma Bi

    2017-10-01

    Full Text Available Copia and Gypsy play important roles in structural, functional and evolutionary dynamics of plant genomes. In this study, a total of 106 and 101, Copia and Gypsy reverse transcriptase (rt were amplified respectively in the Morus notabilis genome using degenerate primers. All sequences exhibited high levels of heterogeneity, were rich in AT and possessed higher sequence divergence of Copia rt in comparison to Gypsy rt. Two reasons are likely to account for this phenomenon: a these elements often experience deletions or fragmentation by illegitimate or unequal homologous recombination in the transposition process; b strong purifying selective pressure drives the evolution of these elements through “selective silencing” with random mutation and eventual deletion from the host genome. Interestingly, mulberry rt clustered with other rt from distantly related taxa according to the phylogenetic analysis. This phenomenon did not result from horizontal transposable element transfer. Results obtained from fluorescence in situ hybridization revealed that most of the hybridization signals were preferentially concentrated in pericentromeric and distal regions of chromosomes, and these elements may play important roles in the regions in which they are found. Results of this study support the continued pursuit of further functional studies of Copia and Gypsy in the mulberry genome.

  13. Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides

    Science.gov (United States)

    McMillen, Chelsea L.; Wright, Patience M.; Cassady, Carolyn J.

    2016-05-01

    Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.

  14. In silico Analysis of osr40c1 Promoter Sequence Isolated from Indica Variety Pokkali

    Directory of Open Access Journals (Sweden)

    W.S.I. de Silva

    2017-07-01

    Full Text Available The promoter region of a drought and abscisic acid (ABA inducible gene, osr40c1, was isolated from a salt-tolerant indica rice variety Pokkali, which is 670 bp upstream of the putative translation start codon. In silico promoter analysis of resulted sequence showed that at least 15 types of putative motifs were distributed within the sequence, including two types of common promoter elements, TATA and CAAT boxes. Additionally, several putative cis-acing regulatory elements which may be involved in regulation of osr40c1 expression under different conditions were found in the 5′-upstream region of osr40c1. These are ABA-responsive element, light-responsive elements (ATCT-motif, Box I, G-box, GT1-motif, Gap-box and Sp1, myeloblastosis oncogene response element (CCAAT-box, auxin responsive element (TGA-element, gibberellin-responsive element (GARE-motif and fungal-elicitor responsive elements (Box E and Box-W1. A putative regulatory element, required for endosperm-specific pattern of gene expression designated as Skn-1 motif, was also detected in the Pokkali osr40c1 promoter region. In conclusion, the bioinformatic analysis of osr40c1 promoter region isolated from indica rice variety Pokkali led to the identification of several important stress-responsive cis-acting regulatory elements, and therefore, the isolated promoter sequence could be employed in rice genetic transformation to mediate expression of abiotic stress induced genes.

  15. Confirmation of a novel siadenovirus species detected in raptors: partial sequence and phylogenetic analysis.

    Science.gov (United States)

    Kovács, Endre R; Benko, Mária

    2009-03-01

    Partial genome characterisation of a novel adenovirus, found recently in organ samples of multiple species of dead birds of prey, was carried out by sequence analysis of PCR-amplified DNA fragments. The virus, named as raptor adenovirus 1 (RAdV-1), has originally been detected by a nested PCR method with consensus primers targeting the adenoviral DNA polymerase gene. Phylogenetic analysis with the deduced amino acid sequence of the small PCR product has implied a new siadenovirus type present in the samples. Since virus isolation attempts remained unsuccessful, further characterisation of this putative novel siadenovirus was carried out with the use of PCR on the infected organ samples. The DNA sequence of the central genome part of RAdV-1, encompassing nine full (pTP, 52K, pIIIa, III, pVII, pX, pVI, hexon, protease) and two partial (DNA polymerase and DBP) genes and exceeding 12 kb pairs in size, was determined. Phylogenetic tree reconstructions, based on several genes, unambiguously confirmed the preliminary classification of RAdV-1 as a new species within the genus Siadenovirus. Further study of RAdV-1 is of interest since it represents a rare adenovirus genus of yet undetermined host origin.

  16. Hybridization properties of long nucleic acid probes for detection of variable target sequences, and development of a hybridization prediction algorithm

    Science.gov (United States)

    Öhrmalm, Christina; Jobs, Magnus; Eriksson, Ronnie; Golbob, Sultan; Elfaitouri, Amal; Benachenhou, Farid; Strømme, Maria; Blomberg, Jonas

    2010-01-01

    One of the main problems in nucleic acid-based techniques for detection of infectious agents, such as influenza viruses, is that of nucleic acid sequence variation. DNA probes, 70-nt long, some including the nucleotide analog deoxyribose-Inosine (dInosine), were analyzed for hybridization tolerance to different amounts and distributions of mismatching bases, e.g. synonymous mutations, in target DNA. Microsphere-linked 70-mer probes were hybridized in 3M TMAC buffer to biotinylated single-stranded (ss) DNA for subsequent analysis in a Luminex® system. When mismatches interrupted contiguous matching stretches of 6 nt or longer, it had a strong impact on hybridization. Contiguous matching stretches are more important than the same number of matching nucleotides separated by mismatches into several regions. dInosine, but not 5-nitroindole, substitutions at mismatching positions stabilized hybridization remarkably well, comparable to N (4-fold) wobbles in the same positions. In contrast to shorter probes, 70-nt probes with judiciously placed dInosine substitutions and/or wobble positions were remarkably mismatch tolerant, with preserved specificity. An algorithm, NucZip, was constructed to model the nucleation and zipping phases of hybridization, integrating both local and distant binding contributions. It predicted hybridization more exactly than previous algorithms, and has the potential to guide the design of variation-tolerant yet specific probes. PMID:20864443

  17. Homology analyses of the protein sequences of fatty acid synthases from chicken liver, rat mammary gland, and yeast

    International Nuclear Information System (INIS)

    Chang, Soo-Ik; Hammes, G.G.

    1989-01-01

    Homology analyses of the protein sequences of chicken liver and rat mammary gland fatty acid synthases were carried out. The amino acid sequences of the chicken and rat enzymes are 67% identical. If conservative substitutions are allowed, 78% of the amino acids are matched. A region of low homologies exists between the functional domains, in particular around amino acid residues 1059-1264 of the chicken enzyme. Homologies between the active sites of chicken and rat and of chicken and yeast enzymes have been analyzed by an alignment method. A high degree of homology exists between the active sites of the chicken and rat enzymes. However, the chicken and yeast enzymes show a lower degree of homology. The DADPH-binding dinucleotide folds of the β-ketoacyl reductase and the enoyl reductase sites were identified by comparison with a known consensus sequence for the DADP- and FAD-binding dinucleotide folds. The active sites of all of the enzymes are primarily in hydrophobic regions of the protein. This study suggests that the genes for the functional domains of fatty acid synthase were originally separated, and these genes were connected to each other by using different connecting nucleotide sequences in different species. An alternative explanation for the differences in rat and chicken is a common ancestry and mutations in the joining regions during evolution

  18. Human factors review for Severe Accident Sequence Analysis (SASA)

    International Nuclear Information System (INIS)

    Krois, P.A.; Haas, P.M.; Manning, J.J.; Bovell, C.R.

    1984-01-01

    The paper will discuss work being conducted during this human factors review including: (1) support of the Severe Accident Sequence Analysis (SASA) Program based on an assessment of operator actions, and (2) development of a descriptive model of operator severe accident management. Research by SASA analysts on the Browns Ferry Unit One (BF1) anticipated transient without scram (ATWS) was supported through a concurrent assessment of operator performance to demonstrate contributions to SASA analyses from human factors data and methods. A descriptive model was developed called the Function Oriented Accident Management (FOAM) model, which serves as a structure for bridging human factors, operations, and engineering expertise and which is useful for identifying needs/deficiencies in the area of accident management. The assessment of human factors issues related to ATWS required extensive coordination with SASA analysts. The analysis was consolidated primarily to six operator actions identified in the Emergency Procedure Guidelines (EPGs) as being the most critical to the accident sequence. These actions were assessed through simulator exercises, qualitative reviews, and quantitative human reliability analyses. The FOAM descriptive model assumes as a starting point that multiple operator/system failures exceed the scope of procedures and necessitates a knowledge-based emergency response by the operators. The FOAM model provides a functionally-oriented structure for assembling human factors, operations, and engineering data and expertise into operator guidance for unconventional emergency responses to mitigate severe accident progression and avoid/minimize core degradation. Operators must also respond to potential radiological release beyond plant protective barriers. Research needs in accident management and potential uses of the FOAM model are described. 11 references, 1 figure

  19. Polyvinyl-alcohol-based magnetic beads for rapid and efficient separation of specific or unspecific nucleic acid sequences

    International Nuclear Information System (INIS)

    Oster, J.; Parker, Jeffrey; Brassard, Lothar

    2001-01-01

    The versatile application of polyvinyl-alcohol-based magnetic M-PVA beads is demonstrated in the separation of genomic DNA, sequence specific nucleic acid purification, and binding of bacteria for subsequent DNA extraction and detection. It is shown that nucleic acids can be obtained in high yield and purity using M-PVA beads, making sample preparation efficient, fast and highly adaptable for automation processes

  20. Bile acid analysis in human disorders of bile acid biosynthesis

    NARCIS (Netherlands)

    Vaz, Frédéric M.; Ferdinandusse, Sacha

    2017-01-01

    Bile acids facilitate the absorption of lipids in the gut, but are also needed to maintain cholesterol homeostasis, induce bile flow, excrete toxic substances and regulate energy metabolism by acting as signaling molecules. Bile acid biosynthesis is a complex process distributed across many cellular

  1. Sequence analysis of cereal sucrose synthase genes and isolation ...

    African Journals Online (AJOL)

    SERVER

    2007-10-18

    Oct 18, 2007 ... sequencing of sucrose synthase gene fragment from sor- ghum using primers designed at their conserved exons. MATERIALS AND METHODS. Multiple sequence alignment. Sucrose synthase gene sequences of various cereals like rice, maize, and barley were accessed from NCBI Genbank database.

  2. Combining protein sequence, structure, and dynamics: A novel approach for functional evolution analysis of PAS domain superfamily.

    Science.gov (United States)

    Dong, Zheng; Zhou, Hongyu; Tao, Peng

    2018-02-01

    PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.

  3. Accident Sequence Evaluation Program: Human reliability analysis procedure

    Energy Technology Data Exchange (ETDEWEB)

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs.

  4. Accident Sequence Evaluation Program: Human reliability analysis procedure

    International Nuclear Information System (INIS)

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs

  5. A Quantitative Accident Sequence Analysis for a VHTR

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Jintae; Lee, Joeun; Jae, Moosung [Hanyang University, Seoul (Korea, Republic of)

    2016-05-15

    In Korea, the basic design features of VHTR are currently discussed in the various design concepts. Probabilistic risk assessment (PRA) offers a logical and structured method to assess risks of a large and complex engineered system, such as a nuclear power plant. It will be introduced at an early stage in the design, and will be upgraded at various design and licensing stages as the design matures and the design details are defined. Risk insights to be developed from the PRA are viewed as essential to developing a design that is optimized in meeting safety objectives and in interpreting the applicability of the existing demands to the safety design approach of the VHTR. In this study, initiating events which may occur in VHTRs were selected through MLD method. The initiating events were then grouped into four categories for the accident sequence analysis. Initiating events frequency and safety systems failure rate were calculated by using reliability data obtained from the available sources and fault tree analysis. After quantification, uncertainty analysis was conducted. The SR and LR frequency are calculated respectively 7.52E- 10/RY and 7.91E-16/RY, which are relatively less than the core damage frequency of LWRs.

  6. Diverse Bacterial PKS Sequences Derived From Okadaic Acid-Producing Dinoflagellates

    Directory of Open Access Journals (Sweden)

    Kathleen S. Rein

    2008-05-01

    Full Text Available Okadaic acid (OA and the related dinophysistoxins are isolated from dinoflagellates of the genus Prorocentrum and Dinophysis. Bacteria of the Roseobacter group have been associated with okadaic acid producing dinoflagellates and have been previously implicated in OA production. Analysis of 16S rRNA libraries reveals that Roseobacter are the most abundant bacteria associated with OA producing dinoflagellates of the genus Prorocentrum and are not found in association with non-toxic dinoflagellates. While some polyketide synthase (PKS genes form a highly supported Prorocentrum clade, most appear to be bacterial, but unrelated to Roseobacter or Alpha-Proteobacterial PKSs or those derived from other Alveolates Karenia brevis or Crytosporidium parvum.

  7. Data for amino acid alignment of Japanese stingray melanocortin receptors with other gnathostome melanocortin receptor sequences, and the ligand selectivity of Japanese stingray melanocortin receptors

    Directory of Open Access Journals (Sweden)

    Akiyoshi Takahashi

    2016-06-01

    Full Text Available This article contains structure and pharmacological characteristics of melanocortin receptors (MCRs related to research published in “Characterization of melanocortin receptors from stingray Dasyatis akajei, a cartilaginous fish” (Takahashi et al., 2016 [1]. The amino acid sequences of the stingray, D. akajei, MC1R, MC2R, MC3R, MC4R, and MC5R were aligned with the corresponding melanocortin receptor sequences from the elephant shark, Callorhinchus milii, the dogfish, Squalus acanthias, the goldfish, Carassius auratus, and the mouse, Mus musculus. These alignments provide the basis for phylogenetic analysis of these gnathostome melanocortin receptor sequences. In addition, the Japanese stingray melanocortin receptors were separately expressed in Chinese Hamster Ovary cells, and stimulated with stingray ACTH, α-MSH, β-MSH, γ-MSH, δ-MSH, and β-endorphin. The dose response curves reveal the order of ligand selectivity for each stingray MCR.

  8. Nucleotide sequence of a cDNA for branched chain acyltransferase with analysis of the deduced protein structure

    International Nuclear Information System (INIS)

    Hummel, K.B.; Litwer, S.; Bradford, A.P.; Aitken, A.; Danner, D.J.; Yeaman, S.J.

    1988-01-01

    Nucleotide sequence was determined for a 1.6-kilobase human cDNA putative for the branched chain acyltransferase protein of the branched chain α-ketoacid dehydrogenase complex. Translation of the sequence reveals an open reading frame encoding a 315-amino acid protein of molecular weight 35,759 followed by 560 bases of 3'-untranslated sequence. Three repeats of the polyadenylation signal hexamer ATTAAA are present prior to the polyadenylate tail. Within the open reading frame is a 10-amino acid fragment which matches exactly the amino acid sequence around the lipoate-lysine residue in bovine kidney branched chain acyltransferase, thus confirming the identity of the cDNA. Analysis of the deduced protein structure for the human branched chain acyltransferase revealed an organization into domains similar to that reported for the acyltransferase proteins of the pyruvate and α-ketoglutarate dehydrogenase complexes. This similarity in organization suggests that a more detailed analysis of the proteins will be required to explain the individual substrate and multienzyme complex specificity shown by these acyltransferases

  9. Comparing methods of classifying life courses: Sequence analysis and latent class analysis

    NARCIS (Netherlands)

    Elzinga, C.H.; Liefbroer, Aart C.; Han, Sapphire

    2017-01-01

    We compare life course typology solutions generated by sequence analysis (SA) and latent class analysis (LCA). First, we construct an analytic protocol to arrive at typology solutions for both methodologies and present methods to compare the empirical quality of alternative typologies. We apply this

  10. Comparing methods of classifying life courses: sequence analysis and latent class analysis

    NARCIS (Netherlands)

    Han, Y.; Liefbroer, A.C.; Elzinga, C.

    2017-01-01

    We compare life course typology solutions generated by sequence analysis (SA) and latent class analysis (LCA). First, we construct an analytic protocol to arrive at typology solutions for both methodologies and present methods to compare the empirical quality of alternative typologies. We apply this

  11. [Comparative genomics and evolutionary analysis of CRISPR loci in acetic acid bacteria].

    Science.gov (United States)

    Xia, Kai; Liang, Xin-le; Li, Yu-dong

    2015-12-01

    The clustered regularly interspaced short palindromic repeat (CRISPR) is a widespread adaptive immunity system that exists in most archaea and many bacteria against foreign DNA, such as phages, viruses and plasmids. In general, CRISPR system consists of direct repeat, leader, spacer and CRISPR-associated sequences. Acetic acid bacteria (AAB) play an important role in industrial fermentation of vinegar and bioelectrochemistry. To investigate the polymorphism and evolution pattern of CRISPR loci in acetic acid bacteria, bioinformatic analyses were performed on 48 species from three main genera (Acetobacter, Gluconacetobacter and Gluconobacter) with whole genome sequences available from the NCBI database. The results showed that the CRISPR system existed in 32 species of the 48 strains studied. Most of the CRISPR-Cas system in AAB belonged to type I CRISPR-Cas system (subtype E and C), but type II CRISPR-Cas system which contain cas9 gene was only found in the genus Acetobacter and Gluconacetobacter. The repeat sequences of some CRISPR were highly conserved among species from different genera, and the leader sequences of some CRISPR possessed conservative motif, which was associated with regulated promoters. Moreover, phylogenetic analysis of cas1 demonstrated that they were suitable for classification of species. The conservation of cas1 genes was associated with that of repeat sequences among different strains, suggesting they were subjected to similar functional constraints. Moreover, the number of spacer was positively correlated with the number of prophages and insertion sequences, indicating the acetic acid bacteria were continually invaded by new foreign DNA. The comparative analysis of CRISR loci in acetic acid bacteria provided the basis for investigating the molecular mechanism of different acetic acid tolerance and genome stability in acetic acid bacteria.

  12. Partial amino acid sequence of the branched chain amino acid aminotransferase (TmB) of E. coli JA199 pDU11

    International Nuclear Information System (INIS)

    Feild, M.J.; Armstrong, F.B.

    1987-01-01

    E. coli JA199 pDU11 harbors a multicopy plasmid containing the ilv GEDAY gene cluster of S. typhimurium. TmB, gene product of ilv E, was purified, crystallized, and subjected to Edman degradation using a gas phase sequencer. The intact protein yielded an amino terminal 31 residue sequence. Both carboxymethylated apoenzyme and [ 3 H]-NaBH-reduced holoenzyme were then subjected to digestion by trypsin. The digests were fractionated using reversed phase HPLC, and the peptides isolated were sequenced. The borohydride-treated holoenzyme was used to isolate the cofactor-binding peptide. The peptide is 27 residues long and a comparison with known sequences of other aminotransferases revealed limited homology. Peptides accounting for 211 of 288 predicted residues have been sequenced, including 9 residues of the carboxyl terminus. Comparison of peptides with the inferred amino acid sequence of the E. coli K-12 enzyme has helped determine the sequence of the amino terminal 59 residues; only two differences between the sequences are noted in this region

  13. Amino acid sequences of predicted proteins and their annotation for 95 organism species. - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Gclust Server Amino acid sequences of predicted proteins and their annotation for 95 organis...m species. Data detail Data name Amino acid sequences of predicted proteins and their annotation for 95 orga...nism species. DOI 10.18908/lsdba.nbdc00464-001 Description of data contents Amino acid sequences of predicted proteins...Database Description Download License Update History of This Database Site Policy | Contact Us Amino acid sequences of predicted prot...eins and their annotation for 95 organism species. - Gclust Server | LSDB Archive ...

  14. MultiSeq: unifying sequence and structure data for evolutionary analysis

    Directory of Open Access Journals (Sweden)

    Wright Dan

    2006-08-01

    Full Text Available Abstract Background Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlations in and between these data to answer critical research questions is extremely challenging. This problem needs to be approached from several directions: information science to organize and search the data; information visualization to assist in recognizing correlations; mathematics to formulate statistical inferences; and biology to analyze chemical and physical properties in terms of sequence and structure changes. Results Here we present MultiSeq, a unified bioinformatics analysis environment that allows one to organize, display, align and analyze both sequence and structure data for proteins and nucleic acids. While special emphasis is placed on analyzing the data within the framework of evolutionary biology, the environment is also flexible enough to accommodate other usage patterns. The evolutionary approach is supported by the use of predefined metadata, adherence to standard ontological mappings, and the ability for the user to adjust these classifications using an electronic notebook. MultiSeq contains a new algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of a homologous group of distantly related proteins. The method, based on the multidimensional QR factorization of multiple sequence and structure alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. Conclusion MultiSeq is a major extension of the Multiple Alignment tool that is provided as part of VMD, a structural

  15. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    Science.gov (United States)

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  16. Purification and partial amino-acid sequence of gibberellin 20-oxidase from Cucurbita maxima L. endosperm.

    Science.gov (United States)

    Lange, T

    1994-01-01

    Gibberellin (GA) 20-oxidase was purified to apparent homogeneity from Cucurbita maxima endosperm by fractionated ammonium-sulphate precipitation, gel-filtration chromatography and anion-exchange and hydrophobic-interaction high-performance liquid chromatography (HPLC). Average purification after the last step was 55-fold with 3.9% of the activity recovered. The purest single fraction was enriched 101-fold with 0.2% overall recovery. Apparent relative molecular mass of the enzyme was 45 kDa, as determined by gel-filtration HPLC and sodium dodecyl sulphate-polyacrylamide gel electrophoresis, indicating that GA 20-oxidase is probably a monomeric enzyme. The purified enzyme degraded on two-dimensional gel electrophoresis, giving two protein spots: a major one corresponding to a molecular mass of 30 kDa and a minor one at 45 kDa. The isoelectric point for both was 5.4. The amino-acid sequences of the amino-terminus of the purified enzyme and of two peptides from a tryptic digest were determined. The purified enzyme catalysed the sequential conversion of [14C]GA12 to [14C]GA15, [14C]GA24 and [14C]GA25, showing that carbon atom 20 was oxidised to the corresponding alcohol, aldehyde and carboxylic acid in three consecutive reactions. [14C]Gibberellin A53 was similarly converted to [14C]GA44, [14C]GA19, [14C]GA17 and small amounts of a fourth product, which was preliminarily identified as [14C]GA20, a C19-gibberellin. All GAs except [14C]GA20 were identified by combined gas chromatography-mass spectrometry. The cofactor requirements in the absence of dithiothreitol were essentially as in its presence (Lange et al., Planta 195, 98-107, 1994), except that ascorbate was essential for enzyme activity and the optimal concentration of catalase was lower.

  17. Draft Genome Sequence of Lactobacillus delbrueckii subsp. bulgaricus CFL1, a Lactic Acid Bacterium Isolated from French Handcrafted Fermented Milk

    OpenAIRE

    Meneghel, Julie; Dugat-Bony, Eric; Irlinger, Fran?oise; Loux, Valentin; Vidal, Marie; Passot, St?phanie; B?al, Catherine; Layec, S?verine; Fonseca, Fernanda

    2016-01-01

    Lactobacillus delbrueckii subsp. bulgaricus (L. bulgaricus) is a lactic acid bacterium widely used for the production of yogurt and cheeses. Here, we report the genome sequence of L. bulgaricus CFL1 to improve our knowledge on its stress-induced damages following production and end-use processes.

  18. N-terminal amino acid sequence of Bacillus licheniformis alpha-amylase: comparison with Bacillus amyloliquefaciens and Bacillus subtilis Enzymes.

    OpenAIRE

    Kuhn, H; Fietzek, P P; Lampen, J O

    1982-01-01

    The thermostable, liquefying alpha-amylase from Bacillus licheniformis was immunologically cross-reactive with the thermolabile, liquefying alpha-amylase from Bacillus amyloliquefaciens. Their N-terminal amino acid sequences showed extensive homology with each other, but not with the saccharifying alpha-amylases of Bacillus subtilis.

  19. Draft Genome Sequence of Lactobacillus delbrueckii subsp. bulgaricus CFL1, a Lactic Acid Bacterium Isolated from French Handcrafted Fermented Milk.

    Science.gov (United States)

    Meneghel, Julie; Dugat-Bony, Eric; Irlinger, Françoise; Loux, Valentin; Vidal, Marie; Passot, Stéphanie; Béal, Catherine; Layec, Séverine; Fonseca, Fernanda

    2016-03-03

    Lactobacillus delbrueckii subsp. bulgaricus (L. bulgaricus) is a lactic acid bacterium widely used for the production of yogurt and cheeses. Here, we report the genome sequence of L. bulgaricus CFL1 to improve our knowledge on its stress-induced damages following production and end-use processes. Copyright © 2016 Meneghel et al.

  20. Acid mine drainage neutralization in a pilot sequencing batch reactor using limestone from a paper and pulp industry

    CSIR Research Space (South Africa)

    Vadapalli, VRK

    2015-10-01

    Full Text Available This study investigated the implications of using two grades of limestone from a paper and pulp industry for neutralization of acid mine drainage (AMD) in a pilot sequencing batch reactor (SBR). In this regard, two grades of calcium carbonate were...

  1. Frame sequences analysis technique of linear objects movement

    Science.gov (United States)

    Oshchepkova, V. Y.; Berg, I. A.; Shchepkin, D. V.; Kopylova, G. V.

    2017-12-01

    Obtaining data by noninvasive methods are often needed in many fields of science and engineering. This is achieved through video recording in various frame rate and light spectra. In doing so quantitative analysis of movement of the objects being studied becomes an important component of the research. This work discusses analysis of motion of linear objects on the two-dimensional plane. The complexity of this problem increases when the frame contains numerous objects whose images may overlap. This study uses a sequence containing 30 frames at the resolution of 62 × 62 pixels and frame rate of 2 Hz. It was required to determine the average velocity of objects motion. This velocity was found as an average velocity for 8-12 objects with the error of 15%. After processing dependencies of the average velocity vs. control parameters were found. The processing was performed in the software environment GMimPro with the subsequent approximation of the data obtained using the Hill equation.

  2. Transcriptome sequencing and positive selected genes analysis of Bombyx mandarina.

    Directory of Open Access Journals (Sweden)

    Tingcai Cheng

    Full Text Available The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG and posterior silk gland (PSG. Three sericin genes (sericin 1, sericin 2, and sericin 3 were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25 were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs and 361 insertion-deletions (INDELs were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research.

  3. Genetic Analysis Using Partial Sequencing of Melanocortin 4 Receptor (MC4R Gene in Bligon Goat

    Directory of Open Access Journals (Sweden)

    Latifah Latifah

    2017-08-01

    Full Text Available Melanocortin 4 Receptor gene is involved in sympathetic nerve activity, adrenal and thyroid functions, and media for leptin in regulating energy balance and homeostasis. The aim of this research was to perform genetic analysis of MC4R gene sequences from Bligon goats. Fourty blood samples of Bligon does were used for DNA extraction. The primers were designed after alignment of 12 DNA sequences of MC4R gene from goat, sheep, and cattle. The primers were constructed on the Capra hircus MC4R gene sequence from GenBank (accession No. NM_001285591. Two DNA polymorphisms of MC4R were revealed in exon region (g.998 A/G and g.1079 C/T. The SNP g.998 A/G was a non-synonymous polymorphism i.e., changing of amino acid from methionine (Met to isoleucine (Ile. The SNP g.1079 C/T was a synonymous polymorphism. Restriction enzyme mapping on Bligon goat MC4R gene revealed three restriction enzymes (RsaI (GT’AC, Acc651 (G’GTAC_C, and KpnI (G_GTAC’C, which can recognize the SNP at g.1079 C/T. The restriction enzymes may be used for genotyping of the gene target using PCR-RFLP method in the future research.

  4. Draft Genome Sequencing and Comparative Analysis of Aspergillus sojae NBRC4239

    Science.gov (United States)

    Sato, Atsushi; Oshima, Kenshiro; Noguchi, Hideki; Ogawa, Masahiro; Takahashi, Tadashi; Oguma, Tetsuya; Koyama, Yasuji; Itoh, Takehiko; Hattori, Masahira; Hanya, Yoshiki

    2011-01-01

    We conducted genome sequencing of the filamentous fungus Aspergillus sojae NBRC4239 isolated from the koji used to prepare Japanese soy sauce. We used the 454 pyrosequencing technology and investigated the genome with respect to enzymes and secondary metabolites in comparison with other Aspergilli sequenced. Assembly of 454 reads generated a non-redundant sequence of 39.5-Mb possessing 13 033 putative genes and 65 scaffolds composed of 557 contigs. Of the 2847 open reading frames with Pfam domain scores of >150 found in A. sojae NBRC4239, 81.7% had a high degree of similarity with the genes of A. oryzae. Comparative analysis identified serine carboxypeptidase and aspartic protease genes unique to A. sojae NBRC4239. While A. oryzae possessed three copies of α-amyalse gene, A. sojae NBRC4239 possessed only a single copy. Comparison of 56 gene clusters for secondary metabolites between A. sojae NBRC4239 and A. oryzae revealed that 24 clusters were conserved, whereas 32 clusters differed between them that included a deletion of 18 508 bp containing mfs1, mao1, dmaT, and pks-nrps for the cyclopiazonic acid (CPA) biosynthesis, explaining the no productivity of CPA in A. sojae. The A. sojae NBRC4239 genome data will be useful to characterize functional features of the koji moulds used in Japanese industries. PMID:21659486

  5. [Cloning and bioinformatics analysis of abscisic acid 8'-hydroxylase from Pseudostellariae Radix].

    Science.gov (United States)

    Li, Jun; Long, Deng-Kai; Zhou, Tao; Ding, Ling; Zheng, Wei; Jiang, Wei-Ke

    2016-07-01

    Abscisic acid 8'-hydroxylase was one of key enzymes genes in the metabolism of abscisic acid (ABA). Seven menbers of abscisic acid 8'-hydroxylase were identified from Pseudostellaria heterophylla transcriptome sequencing results by using sequence homology. The expression profiles of these genes were analyzed by transcriptome data. The coding sequence of ABA8ox1 was cloned and analyzed by informational technology. The full-length cDNA of ABA8ox1 was 1 401 bp,with 480 encoded amino acids. The predicated isoelectric point (pI) and relative molecular mass (MW) were 8.55 and 53 kDa,respectively. Transmembrane structure analysis showed that there were 21 amino acids in-side and 445 amino acids out-side. High level of transcripts can detect in bark of root and fibrous root. Multi-alignment and phylogenetic analysis both show that ABA8ox1 had a high similarity with the CYP707As from other plants,especially with AtCYP707A1 and AtCYP707A3 in Arabidopsis thaliana. These results lay a foundation for molecular mechanism of tuberous root expanding and response to adversity stress. Copyright© by the Chinese Pharmaceutical Association.

  6. An analysis of expressed sequence tags of developing castor endosperm using a full-length cDNA library

    Directory of Open Access Journals (Sweden)

    Wallis James G

    2007-07-01

    Full Text Available Abstract Background Castor seeds are a major source for ricinoleate, an important industrial raw material. Genomics studies of castor plant will provide critical information for understanding seed metabolism, for effectively engineering ricinoleate production in transgenic oilseeds, or for genetically improving castor plants by eliminating toxic and allergic proteins in seeds. Results Full-length cDNAs are useful resources in annotating genes and in providing functional analysis of genes and their products. We constructed a full-length cDNA library from developing castor endosperm, and obtained 4,720 ESTs from 5'-ends of the cDNA clones representing 1,908 unique sequences. The most abundant transcripts are genes encoding storage proteins, ricin, agglutinin and oleosins. Several other sequences are also very numerous, including two acidic triacylglycerol lipases, and the oleate hydroxylase (FAH12 gene that is responsible for ricinoleate biosynthesis. The role(s of the lipases in developing castor seeds are not clear, and co-expressing of a lipase and the FAH12 did not result in significant changes in hydroxy fatty acid accumulation in transgenic Arabidopsis seeds. Only one oleate desaturase (FAD2 gene was identified in our cDNA sequences. Sequence and functional analyses of the castor FAD2 were carried out since it had not been characterized previously. Overexpression of castor FAD2 in a FAH12-expressing Arabidopsis line resulted in decreased accumulation of hydroxy fatty acids in transgenic seeds. Conclusion Our results suggest that transcriptional regulation of FAD2 and FAH12 genes maybe one of the mechanisms that contribute to a high level of ricinoleate accumulation in castor endosperm. The full-length cDNA library will be used to search for additional genes that affect ricinoleate accumulation in seed oils. Our EST sequences will also be useful to annotate the castor genome, which whole sequence is being generated by shotgun sequencing at

  7. Cloning, sequencing, and sequence analysis of two novel plasmids from the thermophilic anaerobic bacterium Anaerocellum thermophilum

    DEFF Research Database (Denmark)

    Clausen, Anders; Mikkelsen, Marie Just; Schrøder, I.

    2004-01-01

    The nucleotide sequence of two novel plasmids isolated from the extreme thermophilic anaerobic bacterium Anaerocellum thermophilum DSM6725 (A. thermophilum), growing optimally at 70degreesC, has been determined. pBAS2 was found to be a 3653 bp plasmid with a GC content of 43%, and the sequence re...... with highest similarity to DNA repair protein from Campylobacter jejuni (25% aa). Orf34 showed similarity to sigma factors with highest similarity (28% aa) to the sporulation specific Sigma factor, Sigma 28(K) from Bacillus thuringiensis....

  8. Rapid and Sensitive Isothermal Detection of Nucleic-acid Sequence by Multiple Cross Displacement Amplification.

    Science.gov (United States)

    Wang, Yi; Wang, Yan; Ma, Ai-Jing; Li, Dong-Xun; Luo, Li-Juan; Liu, Dong-Xin; Jin, Dong; Liu, Kai; Ye, Chang-Yun

    2015-07-08

    We have devised a novel amplification strategy based on isothermal strand-displacement polymerization reaction, which was termed multiple cross displacement amplification (MCDA). The approach employed a set of ten specially designed primers spanning ten distinct regions of target sequence and was preceded at a constant temperature (61-65 °C). At the assay temperature, the double-stranded DNAs were at dynamic reaction environment of primer-template hybrid, thus the high concentration of primers annealed to the template strands without a denaturing step to initiate the synthesis. For the subsequent isothermal amplification step, a series of primer binding and extension events yielded several single-stranded DNAs and single-stranded single stem-loop DNA structures. Then, these DNA products enabled the strand-displacement reaction to enter into the exponential amplification. Three mainstream methods, including colorimetric indicators, agarose gel electrophoresis and real-time turbidity, were selected for monitoring the MCDA reaction. Moreover, the practical application of the MCDA assay was successfully evaluated by detecting the target pathogen nucleic acid in pork samples, which offered advantages on quick results, modest equipment requirements, easiness in operation, and high specificity and sensitivity. Here we expounded the basic MCDA mechanism and also provided details on an alternative (Single-MCDA assay, S-MCDA) to MCDA technique.

  9. Automatic analysis of the 2015 Gorkha earthquake aftershock sequence.

    Science.gov (United States)

    Baillard, C.; Lyon-Caen, H.; Bollinger, L.; Rietbrock, A.; Letort, J.; Adhikari, L. B.

    2016-12-01

    The Mw 7.8 Gorkha earthquake, that partially ruptured the Main Himalayan Thrust North of Kathmandu on the 25th April 2015, was the largest and most catastrophic earthquake striking Nepal since the great M8.4 1934 earthquake. This mainshock was followed by multiple aftershocks, among them, two notable events that occurred on the 12th May with magnitudes of 7.3 Mw and 6.3 Mw. Due to these recent events it became essential for the authorities and for the scientific community to better evaluate the seismic risk in the region through a detailed analysis of the earthquake catalog, amongst others, the spatio-temporal distribution of the Gorkha aftershock sequence. Here we complement this first study by doing a microseismic study using seismic data coming from the eastern part of the Nepalese Seismological Center network associated to one broadband station in Everest. Our primary goal is to deliver an accurate catalog of the aftershock sequence. Due to the exceptional number of events detected we performed an automatic picking/locating procedure which can be splitted in 4 steps: 1) Coarse picking of the onsets using a classical STA/LTA picker, 2) phase association of picked onsets to detect and declare seismic events, 3) Kurtosis pick refinement around theoretical arrival times to increase picking and location accuracy and, 4) local magnitude calculation based amplitude of waveforms. This procedure is time efficient ( 1 sec/event), reduces considerably the location uncertainties ( 2 to 5 km errors) and increases the number of events detected compared to manual processing. Indeed, the automatic detection rate is 10 times higher than the manual detection rate. By comparing to the USGS catalog we were able to give a new attenuation law to compute local magnitudes in the region. A detailed analysis of the seismicity shows a clear migration toward the east of the region and a sudden decrease of seismicity 100 km east of Kathmandu which may reveal the presence of a tectonic

  10. Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88

    DEFF Research Database (Denmark)

    Pel, Herman J.; de Winde, Johannes H.; Archer, David B.

    2007-01-01

    The filamentous fungus Aspergillus niger is widely exploited by the fermentation industry for the production of enzymes and organic acids, particularly citric acid. We sequenced the 33.9-megabase genome of A. niger CBS 513.88, the ancestor of currently used enzyme production strains. A high level...... clusters for fumonisin and ochratoxin A synthesis....

  11. Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88

    NARCIS (Netherlands)

    Pel, Herman J.; de Winde, Johannes H.; Archer, David B.; Dyer, Paul S.; Hofmann, Gerald; Schaap, Peter J.; Turner, Geoffrey; Albang, Richard; Albermann, Kaj; Andersen, Mikael R.; Bendtsen, Jannick D.; Benen, Jacques A. E.; van den Berg, Marco; Breestraat, Stefaan; Caddick, Mark X.; Contreras, Roland; Cornell, Michael; Coutinho, Pedro M.; Danchin, Etienne G. J.; Debets, Alfons J. M.; Dekker, Peter; van Dijck, Piet W. M.; van Dijk, Alard; Dijkhuizen, Lubbert; Driessen, Arnold J. M.; d'Enfert, Christophe; Geysens, Steven; Groot, Gert S. P.; de Groot, Piet W. J.; Guillemette, Thomas; Henrissat, Bernard; Herweijer, Marga; van den Hombergh, Johannes P. T. W.; van den Hondel, Cees A. M. J. J.; van der Heijden, Rene T. J. M.; van der Kaaij, Rachel M.; Klis, Frans M.; Kools, Harrie J.; Kubicek, Christian P.; van Kuyk, Patricia A.; Lauber, Juergen; Lu, Xin; van der Maarel, Marc J. E. C.; Meulenberg, Rogier; Menke, Hildegard; Mortimer, Martin A.; Nielsen, Jens; Oliver, Stephen G.; Olsthoorn, Maurien; Pal, Karoly; van Peij, Noel N. M. E.; Ram, Arthur F. J.; Rinas, Ursula; Roubos, Johannes A.; Sagt, Cees M. J.; Schmoll, Monika; Sun, Jibin; Ussery, David; Varga, Janos; Vervecken, Wouter; de Vondervoort, Peter J. J. van; Wedler, Holger; Wosten, Han A. B.; Zeng, An-Ping; van Ooyen, Albert J. J.; Visser, Jaap; Stam, Hein; Enfert, Christophe d’; Lauber, Jürgen; Goosen, Coenie; de Vries, Ronald P.

    The filamentous fungus Aspergillus niger is widely exploited by the fermentation industry for the production of enzymes and organic acids, particularly citric acid. We sequenced the 33.9-megabase genome of A. niger CBS 513.88, the ancestor of currently used enzyme production strains. A high level of

  12. Using Willie's Acid-Base Box for Blood Gas Analysis

    Science.gov (United States)

    Dietz, John R.

    2011-01-01

    In this article, the author describes a method developed by Dr. William T. Lipscomb for teaching blood gas analysis of acid-base status and provides three examples using Willie's acid-base box. Willie's acid-base box is constructed using three of the parameters of standard arterial blood gas analysis: (1) pH; (2) bicarbonate; and (3) CO[subscript…

  13. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  14. Analysis of simple sequence repeats in rice bean (Vigna umbellata using an SSR-enriched library

    Directory of Open Access Journals (Sweden)

    Lixia Wang

    2016-02-01

    Full Text Available Rice bean (Vigna umbellata Thunb., a warm-season annual legume, is grown in Asia mainly for dried grain or fodder and plays an important role in human and animal nutrition because the grains are rich in protein and some essential fatty acids and minerals. With the aim of expediting the genetic improvement of rice bean, we initiated a project to develop genomic resources and tools for molecular breeding in this little-known but important crop. Here we report the construction of an SSR-enriched genomic library from DNA extracted from pooled young leaf tissues of 22 rice bean genotypes and developing SSR markers. In 433,562 reads generated by a Roche 454 GS-FLX sequencer, we identified 261,458 SSRs, of which 48.8% were of compound form. Dinucleotide repeats were predominant with an absolute proportion of 81.6%, followed by trinucleotides (17.8%. Other types together accounted for 0.6%. The motif AC/GT accounted for 77.7% of the total, followed by AAG/CTT (14.3%, and all others accounted for 12.0%. Among the flanking sequences, 2928 matched putative genes or gene models in the protein database of Arabidopsis thaliana, corresponding with 608 non-redundant Gene Ontology terms. Of these sequences, 11.2% were involved in cellular components, 24.2% were involved molecular functions, and 64.6% were associated with biological processes. Based on homolog analysis, 1595 flanking sequences were similar to mung bean and 500 to common bean genomic sequences. Comparative mapping was conducted using 350 sequences homologous to both mung bean and common bean sequences. Finally, a set of primer pairs were designed, and a validation test showed that 58 of 220 new primers can be used in rice bean and 53 can be transferred to mung bean. However, only 11 were polymorphic when tested on 32 rice bean varieties. We propose that this study lays the groundwork for developing novel SSR markers and will enhance the mapping of qualitative and quantitative traits and marker

  15. Amino acid substitution: its use in detection and analysis of genetic variants

    International Nuclear Information System (INIS)

    Popp, R.A.; Hirsch, G.P.; Bradshaw, B.S.

    1979-01-01

    Techniques of chemical analysis, amino acid sequencing and autoradiography are being used to study the frequency of incorporation of normally noncoded amino acids into hemoglobins and seminal fluid proteins. We are studying, by the sequencing of radiolabeled proteins followed by the recovery of [ 3 H] isoleucine phenylthiohydantoin by high-performance liquid chromatography, the frequency at which normally noncoded isoleucine is incorporated into hemoglobin because of base-substitution mutations versus translational errors. Irradiation increases the isoleucine content of human hemoglobin and the frequency of substitution of isoleucine for specific amino acids in rabbit hemoglobin. Studies to date indicate that these techniques have been developed sufficiently for initial analysis of the potential of drugs and environmental pollutants to induce base-substitution mutations in mammalian somatic cells

  16. Acid rain compliance planning using decision analysis

    International Nuclear Information System (INIS)

    Norris, C.; Sweet, T.; Borison, A.

    1991-01-01

    Illinois Power Company (IP) is an investor-owned electric and natural gas utility serving portions of downstate Illinois. In addition to one nuclear unit and several small gas and/or oil-fired units, IP has ten coal-fired units. It is easy to understand the impact the Clean Air Act Amendments of 1990 (CAAA) could have on IP. Prior to passage of the CAAA, IP formed several teams to evaluate the specific compliance options at each of the high sulfur coal units. Following that effort, numerous economic analyses of compliance strategies were conducted. The CAAA have introduced a new dimension to planning under uncertainty. Not only are many of the familiar variables uncertain, but the specific form of regulation, and indeed, the compliance goal itself is hard to define. For IP, this led them to use techniques not widely used within their corporation. This paper summarizes the analytical methods used in these analyses and the preliminary results as of July, 1991. The analysis used three approaches to examine the acid rain compliance decision. These approaches were: (1) the 'most-likely,' or single-path scenario approach; (2) a multi-path strategy analysis using the strategies defined in the single-scenario analysis; and (3) a less constrained multi-path option analysis which selects the least cost compliance option for each unit

  17. A genome-wide analysis of lentivector integration sites using targeted sequence capture and next generation sequencing technology.

    Science.gov (United States)

    Ustek, Duran; Sirma, Sema; Gumus, Ergun; Arikan, Muzaffer; Cakiris, Aris; Abaci, Neslihan; Mathew, Jaicy; Emrence, Zeliha; Azakli, Hulya; Cosan, Fulya; Cakar, Atilla; Parlak, Mahmut; Kursun, Olcay

    2012-10-01

    One application of next-generation sequencing (NGS) is the targeted resequencing of interested genes which has not been used in viral integration site analysis of gene therapy applications. Here, we combined targeted sequence capture array and next generation sequencing to address the whole genome profiling of viral integration sites. Human 293T and K562 cells were transduced with a HIV-1 derived vector. A custom made DNA probe sets targeted pLVTHM vector used to capture lentiviral vector/human genome junctions. The captured DNA was sequenced using GS FLX platform. Seven thousand four hundred and eighty four human genome sequences flanking the long terminal repeats (LTR) of pLVTHM fragment sequences matched with an identity of at least 98% and minimum 50 bp criteria in both cells. In total, 203 unique integration sites were identified. The integrations in both cell lines were totally distant from the CpG islands and from the transcription start sites and preferentially located in introns. A comparison between the two cell lines showed that the lentiviral-transduced DNA does not have the same preferred regions in the two different cell lines. Copyright © 2012 Elsevier B.V. All rights reserved.

  18. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    Science.gov (United States)

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  19. mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences.

    Science.gov (United States)

    Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E

    2013-08-15

    Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.

  20. Sequencing and analysis of an Irish human genome.

    LENUS (Irish Health Repository)

    Tong, Pin

    2010-01-01

    Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.

  1. Exome Sequence Analysis of 14 Families With High Myopia

    DEFF Research Database (Denmark)

    Kloss, Bethany A.; Tompson, Stuart W.; Whisenhunt, Kristina N.

    2017-01-01

    Purpose: To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Methods: Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sang...

  2. Database-driven primary analysis of raw sequencing data

    DEFF Research Database (Denmark)

    2014-01-01

    The present invention relates to methods for identifying the source of a biological sequence containing sample from raw sequencing reads. The method may be used to identify the source of unknown DNA and can be used for diagnostic, biodefense, food safety and quality, and hygiene applications...

  3. Sequence analysis and overexpression of a pectin lyase gene (pel1) from Aspergillus oryzae KBN616.

    Science.gov (United States)

    Kitamoto, N; Yoshino-Yasuda, S; Ohmiya, K; Tsukagoshi, N

    2001-01-01

    A gene (pel1) encoding pectin lyase (Pel1) was isolated from a shoyu koji mold, Aspergillus oryzae KBN616, and characterized. The structural gene comprised 1,196 bp with a single intron. The ORF encoded 381 amino acids with a signal peptide of 20 amino acids. The deduced amino acid sequence showed high similarity to those of Aspergillus niger pectin lyases and Glomerella cingulata PnlA. The pel1 gene was successfully overexpressed under the promoter of the A. oryzae TEF1 gene. The molecular mass of the recombinant pectin lyase substantially coincided with that calculated based on nucleotide sequence.

  4. A Δ-9 Fatty Acid Desaturase Gene in the Microalga Myrmecia incisa Reisigl: Cloning and Functional Analysis

    Directory of Open Access Journals (Sweden)

    Wen-Bin Xue

    2016-07-01

    Full Text Available The green alga Myrmecia incisa is one of the richest natural sources of arachidonic acid (ArA. To better understand the regulation of ArA biosynthesis in M. incisa, a novel gene putatively encoding the Δ9 fatty acid desaturase (FAD was cloned and characterized for the first time. Rapid-amplification of cDNA ends (RACE was employed to yield a full length cDNA designated as MiΔ9FAD, which is 2442 bp long in sequence. Comparing cDNA open reading frame (ORF sequence to genomic sequence indicated that there are 8 introns interrupting the coding region. The deduced MiΔ9FAD protein is composed of 432 amino acids. It is soluble and localized in the chloroplast, as evidenced by the absence of transmembrane domains as well as the presence of a 61-amino acid chloroplast transit peptide. Multiple sequence alignment of amino acids revealed two conserved histidine-rich motifs, typical for Δ9 acyl-acyl carrier protein (ACP desaturases. To determine the function of MiΔ9FAD, the gene was heterologously expressed in a Saccharomyces cerevisiae mutant strain with impaired desaturase activity. Results of GC-MS analysis indicated that MiΔ9FAD was able to restore the synthesis of monounsaturated fatty acids, generating palmitoleic acid and oleic acid through the addition of a double bond in the Δ9 position of palmitic acid and stearic acid, respectively.

  5. Accelerating next generation sequencing data analysis with system level optimizations.

    Science.gov (United States)

    Kathiresan, Nagarajan; Temanni, Ramzi; Almabrazi, Hakeem; Syed, Najeeb; Jithesh, Puthen V; Al-Ali, Rashid

    2017-08-22

    Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.

  6. The sequence and analysis of a Chinese pig genome

    Directory of Open Access Journals (Sweden)

    Fang Xiaodong

    2012-11-01

    Full Text Available Abstract Background The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP, as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome. Results Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes. Conclusion Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models.

  7. Analysis of expressed sequence tags from the Ulva prolifera (Chlorophyta)

    Science.gov (United States)

    Niu, Jianfeng; Hu, Haiyan; Hu, Songnian; Wang, Guangce; Peng, Guang; Sun, Song

    2010-01-01

    In 2008, a green tide broke out before the sailing competition of the 29th Olympic Games in Qingdao. The causative species was determined to be Enteromorpha prolifera ( Ulva prolifera O. F. Müller), a familiar green macroalga along the coastline of China. Rapid accumulation of a large biomass of floating U. prolifera prompted research on different aspects of this species. In this study, we constructed a nonnormalized cDNA library from the thalli of U. prolifera and acquired 10 072 high-quality expressed sequence tags (ESTs). These ESTs were assembled into 3 519 nonredundant gene groups, including 1 446 clusters and 2 073 singletons. After annotation with the nr database, a large number of genes were found to be related with chloroplast and ribosomal protein, GO functional classification showed 1 418 ESTs participated in photosynthesis and 1 359 ESTs were responsible for the generation of precursor metabolites and energy. In addition, rather comprehensive carbon fixation pathways were found in U. prolifera using KEGG. Some stress-related and signal transduction-related genes were also found in this study. All the evidences displayed that U. prolifera had substance and energy foundation for the intense photosynthesis and the rapid proliferation. Phylogenetic analysis of cytochrome c oxidase subunit I revealed that this green-tide causative species is most closely affiliated to Pseudendoclonium akinetum (Ulvophyceae).

  8. Amino acids analysis during lactic acid fermentation by single strain ...

    African Journals Online (AJOL)

    L. salivarius alone showed relatively good assimilation of various amino acids that existed at only a little amounts in MRS media (Asn, Asp, Cit, Cys, Glu, His, Lys, Orn, Phe, Pro, Tyr, Arg, Ile, Leu, Met, Ser, Thr, Trp and Val), whereas Ala and Gly accumulated in L. salivarius cultures. P. acidilactici, in contrast, hydrolyzed the ...

  9. Comparison of complete genome sequences of dog rabies viruses isolated from China and Mexico reveals key amino acid changes that may be associated with virus replication and virulence.

    Science.gov (United States)

    Yu, Fulai; Zhang, Guoqing; Zhong, Xiangfu; Han, Na; Song, Yunfeng; Zhao, Ling; Cui, Min; Rayner, Simon; Fu, Zhen F

    2014-07-01

    Rabies is a global problem, but its impact and prevalence vary across different regions. In some areas, such as parts of Africa and Asia, the virus is prevalent in the domestic dog population, leading to epidemic waves and large numbers of human fatalities. In other regions, such as the Americas, the virus predominates in wildlife and bat populations, with sporadic spillover into domestic animals. In this work, we attempted to investigate whether these distinct environments led to selective pressures that result in measurable changes within the genome at the amino acid level. To this end, we collected and sequenced the full genome of two isolates from divergent environments. The first isolate (DRV-AH08) was from China, where the virus is present in the dog population and the country is experiencing a serious epidemic. The second isolate (DRV-Mexico) was taken from Mexico, where the virus is present in both wildlife and domestic dog populations, but at low levels as a consequence of an effective vaccination program. We then combined and compared these with other full genome sequences to identify distinct amino acid changes that might be associated with environment. Phylogenetic analysis identified strain DRV-AH08 as belonging to the China-I lineage, which has emerged to become the dominant lineage in the current epidemic. The Mexico strain was placed in the D11 Mexico lineage, associated with the West USA-Mexico border clade. Amino acid sequence analysis identified only 17 amino acid differences in the N, G and L proteins. These differences may be associated with virus replication and virulence-for example, the short incubation period observed in the current epidemic in China.

  10. Cloning and sequence analysis demonstrate the chromate reduction ability of a novel chromate reductase gene from Serratia sp.

    Science.gov (United States)

    Deng, Peng; Tan, Xiaoqing; Wu, Ying; Bai, Qunhua; Jia, Yan; Xiao, Hong

    2015-03-01

    The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica , which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function.

  11. Cloning and sequence analysis demonstrate the chromate reduction ability of a novel chromate reductase gene from Serratia sp

    Science.gov (United States)

    DENG, PENG; TAN, XIAOQING; WU, YING; BAI, QUNHUA; JIA, YAN; XIAO, HONG

    2015-01-01

    The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica, which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function. PMID:25667630

  12. CCK-5: sequence analysis of a small cholecystokinin from canine brain and intestine

    International Nuclear Information System (INIS)

    Shively, J.; Reeve, J.R. Jr.; Eysselein, V.E.; Ben-Avram, C.; Vigna, S.R.; Walsh, J.H.

    1987-01-01

    The purpose of this study is to purify and to characterize chemically cholecystokinin (CCK)-like peptides present in brain and gut extracts that elute from gel filtration after the octapeptide. Canine small intestinal mucosa and brain were boiled in water and then extracted in cold trifluoroacetic acid, and cholecystokinin-like immunoreactivity was determined by carboxyl-terminal specific radioimmunoassay. Gel permeation chromatography on Sephadex G-50 revealed a form of CCK apparently smaller than CCK-8. Microsequence analysis showed that the amino terminal primary sequence of this small CCK was Gly-Trp-Met-Asp. Immunochemical and chromatographic analysis indicated that the carboxyl-terminal residue was Phe-NH 2 and thus the full sequence is Gly-Trp-Met-Asp-Phe-NH 2 . An antibody that recognizes synthetic CCK-8, CCK-5, and CCK-equally did not reveal the presence of significant amounts of CCK-4. These results indicate that CCK-5 is the major CCK form smaller than the octapeptide present in brain and small intestine. This finding, coupled with the demonstration by others that CCK-5 interacts with high-affinity brain CCK receptors, indicates that CCK-5 may play a physiological role in brain function

  13. Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions.

    Science.gov (United States)

    Nishizawa, M; Nishizawa, K

    2000-10-01

    The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the 'between gene' GC content heterogeneity, which is linked to 'isochores', is a principal factor associated with the bias in substitution patterns in human, 'within gene' heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed.

  14. Amino acid sequence of bovine muzzle epithelial desmocollin derived from cloned cDNA: a novel subtype of desmosomal cadherins.

    Science.gov (United States)

    Koch, P J; Goldschmidt, M D; Walsh, M J; Zimbelmann, R; Schmelz, M; Franke, W W

    1991-05-01

    Desmosomes are cell-type-specific intercellular junctions found in epithelium, myocardium and certain other tissues. They consist of assemblies of molecules involved in the adhesion of specific cell types and in the anchorage of cell-type-specific cytoskeletal elements, the intermediate-size filaments, to the plasma membrane. To explore the individual desmosomal components and their functions we have isolated DNA clones encoding the desmosomal glycoprotein, desmocollin, using antibodies and a cDNA expression library from bovine muzzle epithelium. The cDNA-deduced amino-acid sequence of desmocollin (presently we cannot decide to which of the two desmocollins, DC I or DC II, this clone relates) defines a polypeptide with a calculated molecular weight of 85,000, with a single candidate sequence of 24 amino acids sufficiently long for a transmembrane arrangement, and an extracellular aminoterminal portion of 561 amino acid residues, compared to a cytoplasmic part of only 176 amino acids. Amino acid sequence comparisons have revealed that desmocollin is highly homologous to members of the cadherin family of cell adhesion molecules, including the previously sequenced desmoglein, another desmosome-specific cadherin. Using riboprobes derived from cDNAs for Northern-blot analyses, we have identified an mRNA of approximately 6 kb in stratified epithelia such as muzzle epithelium and tongue mucosa but not in two epithelial cell culture lines containing desmosomes and desmoplakins. The difference may indicate drastic differences in mRNA concentration or the existence of cell-type-specific desmocollin subforms. The molecular topology of desmocollin(s) is discussed in relation to possible functions of the individual molecular domains.

  15. In Silico Phylogenetic Analysis and Molecular Modelling Study of 2-Haloalkanoic Acid Dehalogenase Enzymes from Bacterial and Fungal Origin

    Directory of Open Access Journals (Sweden)

    Raghunath Satpathy

    2016-01-01

    Full Text Available 2-Haloalkanoic acid dehalogenase enzymes have broad range of applications, starting from bioremediation to chemical synthesis of useful compounds that are widely distributed in fungi and bacteria. In the present study, a total of 81 full-length protein sequences of 2-haloalkanoic acid dehalogenase from bacteria and fungi were retrieved from NCBI database. Sequence analysis such as multiple sequence alignment (MSA, conserved motif identification, computation of amino acid composition, and phylogenetic tree construction were performed on these primary sequences. From MSA analysis, it was observed that the sequences share conserved lysine (K and aspartate (D residues in them. Also, phylogenetic tree indicated a subcluster comprised of both fungal and bacterial species. Due to nonavailability of experimental 3D structure for fungal 2-haloalkanoic acid dehalogenase in the PDB, molecular modelling study was performed for both fungal and bacterial sources of enzymes present in the subcluster. Further structural analysis revealed a common evolutionary topology shared between both fungal and bacterial enzymes. Studies on the buried amino acids showed highly conserved Leu and Ser in the core, despite variation in their amino acid percentage. Additionally, a surface exposed tryptophan was conserved in all of these selected models.

  16. Prevalence of Plasmodium spp. in malaria asymptomatic African migrants assessed by nucleic acid sequence based amplification

    Directory of Open Access Journals (Sweden)

    Schallig Henk DFH

    2009-01-01

    Full Text Available Abstract Background Malaria is one of the most important infectious diseases in the world. Although most cases are found distributed in the tropical regions of Africa, Asia, Central and South Americas, there is in Europe a significant increase in the number of imported cases in non-endemic countries, in particular due to the higher mobility in today's society. Methods The prevalence of a possible asymptomatic infection with Plasmodium species was assessed using Nucleic Acid Sequence Based Amplification (NASBA assays on clinical samples collected from 195 study cases with no clinical signs related to malaria and coming from sub-Saharan African regions to Southern Italy. In addition, base-line demographic, clinical and socio-economic information was collected from study participants who also underwent a full clinical examination. Results Sixty-two study subjects (31.8% were found positive for Plasmodium using a pan Plasmodium specific NASBA which can detect all four Plasmodium species causing human disease, based on the small subunit 18S rRNA gene (18S NASBA. Twenty-four samples (38% of the 62 18S NASBA positive study cases were found positive with a Pfs25 mRNA NASBA, which is specific for the detection of gametocytes of Plasmodium falciparum. A statistically significant association was observed between 18S NASBA positivity and splenomegaly, hepatomegaly and leukopaenia and country of origin. Conclusion This study showed that a substantial proportion of people originating from malaria endemic countries harbor malaria parasites in their blood. If transmission conditions are available, they could potentially be a reservoir. Thefore, health authorities should pay special attention to the health of this potential risk group and aim to improve their health conditions.

  17. Genomic sequencing of uric acid metabolizing and clearing genes in relationship to xanthine oxidase inhibitor dose.

    Science.gov (United States)

    Carroll, Matthew B; Smith, Derek M; Shaak, Thomas L

    2017-03-01

    It remains unclear why the dose of xanthine oxidase inhibitors (XOI) allopurinol or febuxostat varies among patients though they reach similar serum uric acid (SUA) goal. We pursued genomic sequencing of XOI metabolism and clearance genes to identify single-nucleotide polymorphisms (SNPs) relate to differences in XOI dose. Subjects with a diagnosis of Gout based on the 1977 American College of Rheumatology Classification Criteria for the disorder, who were on stable doses of a XOI, and who were at their goal SUA level, were enrolled. The primary outcome was relationship between SNPs in any of these genes to XOI dose. The secondary outcome was relationship between SNPs and change in pre- and post-treatment SUA. We enrolled 100 subjects. The average patient age was 68.6 ± 10.6 years old. Over 80% were men and 77% were Caucasian. One SNP was associated with a higher XOI dose: rs75995567 (p = 0.031). Two SNPs were associated with 300 mg daily of allopurinol: rs11678615 (p = 0.022) and rs3731722 on Aldehyde Oxidase (AO) (His1297Arg) (p = 0.001). Two SNPs were associated with a lower dose of allopurinol: rs1884725 (p = 0.033) and rs34650714 (p = 0.006). For the secondary outcome, rs13415401 was the only SNP related to a smaller mean SUA change. Ten SNPs were identified with a larger change in SUA. Though multiple SNPs were identified in the primary and secondary outcomes of this study, rs3731722 is known to alter catalytic function for some aldehyde oxidase substrates.

  18. Event Sequence Analysis of the Air Intelligence Agency Information Operations Center Flight Operations

    National Research Council Canada - National Science Library

    Larsen, Glen

    1998-01-01

    This report applies Event Sequence Analysis, methodology adapted from aircraft mishap investigation, to an investigation of the performance of the Air Intelligence Agency's Information Operations Center (IOC...

  19. Amino-acid sequences of trypsin inhibitors from watermelon (Citrullus vulgaris) and red bryony (Bryonia dioica) seeds.

    Science.gov (United States)

    Otlewski, J; Whatley, H; Polanowski, A; Wilusz, T

    1987-11-01

    The amino-acid sequences of two trypsin inhibitors isolated from red bryony (Bryonia dioica) and watermelon (Citrullus vulgaris) seeds are reported. Both species represent different genera of the Cucurbitaceae family, which have not been previously investigated as a source of proteinase inhibitors. The sequences are unique but are very similar to those of other proteinase inhibitors which have been isolated from squash seeds. Based on structural homology we assume that the Arg5-Ile6 peptide bond represents the reactive site bond of both inhibitors.

  20. Alternative splicing of human elastin mRNA indicated by sequence analysis of cloned genomic and complementary DNA

    International Nuclear Information System (INIS)

    Indik, Z.; Yeh, H.; Ornstein-goldstein, N.; Sheppard, P.; Anderson, N.; Rosenbloom, J.C.; Peltonen, L.; Rosenbloom, J.

    1987-01-01

    Poly(A) + RNA, isolated from a single 7-mo fetal human aorta, was used to synthesize cDNA by the RNase H method, and the cDNA was inserted into λgt10. Recombinant phage containing elastin sequences were identified by hybridization with cloned, exon-containing fragments of the human elastin gene. Three clones containing inserts of 3.3, 2.7, and 2.3 kilobases were selected for further analysis. Three overlapping clones containing 17.8 kilobases of the human elastin gene were also isolated from genomic libraries. Complete sequence analysis of the six clones demonstrated that: (i) the cDNA encompassed the entire translated portion of the mRNA encoding 786 amino acids, including several unusual hydrophilic amino acid sequences not previously identified in porcine tropoelastin, (ii) exons encoding either hydrophobic or crosslinking domains in the protein alternated in the gene, and (iii) a great abundance of Alu repetitive sequences occurred throughout the introns. The data also indicated substantial alternative splicing of the mRNA. These results suggest the potential for significant variation in the precise molecular structure of the elastic fiber in the human population

  1. Analysis of the Macaca mulatta transcriptome and the sequence divergence between Macaca and human.

    Science.gov (United States)

    Magness, Charles L; Fellin, P Campion; Thomas, Matthew J; Korth, Marcus J; Agy, Michael B; Proll, Sean C; Fitzgibbon, Matthew; Scherer, Christina A; Miner, Douglas G; Katze, Michael G; Iadonato, Shawn P

    2005-01-01

    We report the initial sequencing and comparative analysis of the Macaca mulatta transcriptome. Cloned sequences from 11 tissues, nine animals, and three species (M. mulatta, M. fascicularis, and M. nemestrina) were sampled, resulting in the generation of 48,642 sequence reads. These data represent an initial sampling of the putative rhesus orthologs for 6,216 human genes. Mean nucleotide diversity within M. mulatta and sequence divergence among M. fascicularis, M. nemestrina, and M. mulatta are also reported.

  2. Sequence analysis of mitochondrial 16S ribosomal RNA gene ...

    Indian Academy of Sciences (India)

    Unknown

    For the understanding of their vectorial capacity, identification of disease carrying and refractory strains is essential. ... been widely used for phylogenetic studies and sequence differences in ... In order to fill up the internal gap, a new set.

  3. simple sequence repeat (SSR) markers in genetic analysis of

    African Journals Online (AJOL)

    Yomi

    2012-08-28

    1998). Cross- species amplification of soybean (Glycine max) simple sequence repeats (SSRs) within the genus and other legume genera: implications for the transferability of SSRs in plants. Mol. Biol. Evol. 15:1275-1287.

  4. Sequence and expression analysis of gaps in human chromosome 20

    DEFF Research Database (Denmark)

    Minocherhomji, Sheroy; Seemann, Stefan; Mang, Yuan

    2012-01-01

    /or overlap disease-associated loci, including the DLGAP4 locus. In this study, we sequenced ~99% of all three unfinished gaps on human chr 20, determined their complete genomic sizes and assessed epigenetic profiles using a combination of Sanger sequencing, mate pair paired-end high-throughput sequencing......The finished human genome-assemblies comprise several hundred un-sequenced euchromatic gaps, which may be rich in long polypurine/polypyrimidine stretches. Human chromosome 20 (chr 20) currently has three unfinished gaps remaining on its q-arm. All three gaps are within gene-dense regions and...... and chromatin, methylation and expression analyses. We found histone 3 trimethylated at Lysine 27 to be distributed across all three gaps in immortalized B-lymphocytes. In one gap, five novel CpG islands were predominantly hypermethylated in genomic DNA from peripheral blood lymphocytes and human cerebellum...

  5. DELIMINATE--a fast and efficient method for loss-less compression of genomic sequences: sequence analysis.

    Science.gov (United States)

    Mohammed, Monzoorul Haque; Dutta, Anirban; Bose, Tungadri; Chadaram, Sudha; Mande, Sharmila S

    2012-10-01

    An unprecedented quantity of genome sequence data is currently being generated using next-generation sequencing platforms. This has necessitated the development of novel bioinformatics approaches and algorithms that not only facilitate a meaningful analysis of these data but also aid in efficient compression, storage, retrieval and transmission of huge volumes of the generated data. We present a novel compression algorithm (DELIMINATE) that can rapidly compress genomic sequence data in a loss-less fashion. Validation results indicate relatively higher compression efficiency of DELIMINATE when compared with popular general purpose compression algorithms, namely, gzip, bzip2 and lzma. Linux, Windows and Mac implementations (both 32 and 64-bit) of DELIMINATE are freely available for download at: http://metagenomics.atc.tcs.com/compression/DELIMINATE. sharmila@atc.tcs.com Supplementary data are available at Bioinformatics online.

  6. Analysis of 16S rRNA amplicon sequencing options on the Roche/454 next-generation titanium sequencing platform.

    Directory of Open Access Journals (Sweden)

    Hideyuki Tamaki

    Full Text Available BACKGROUND: 16S rRNA gene pyrosequencing approach has revolutionized studies in microbial ecology. While primer selection and short read length can affect the resulting microbial community profile, little is known about the influence of pyrosequencing methods on the sequencing throughput and the outcome of microbial community analyses. The aim of this study is to compare differences in output, ease, and cost among three different amplicon pyrosequencing methods for the Roche/454 Titanium platform METHODOLOGY/PRINCIPAL FINDINGS: The following three pyrosequencing methods for 16S rRNA genes were selected in this study: Method-1 (standard method is the recommended method for bi-directional sequencing using the LIB-A kit; Method-2 is a new option designed in this study for unidirectional sequencing with the LIB-A kit; and Method-3 uses the LIB-L kit for unidirectional sequencing. In our comparison among these three methods using 10 different environmental samples, Method-2 and Method-3 produced 1.5-1.6 times more useable reads than the standard method (Method-1, after quality-based trimming, and did not compromise the outcome of microbial community analyses. Specifically, Method-3 is the most cost-effective unidirectional amplicon sequencing method as it provided the most reads and required the least effort in consumables management. CONCLUSIONS: Our findings clearly demonstrated that alternative pyrosequencing methods for 16S rRNA genes could drastically affect sequencing output (e.g. number of reads before and after trimming but have little effect on the outcomes of microbial community analysis. This finding is important for both researchers and sequencing facilities utilizing 16S rRNA gene pyrosequencing for microbial ecological studies.

  7. Compilation and analysis of Escherichia coli promoter DNA sequences.

    OpenAIRE

    Hawley, D K; McClure, W R

    1983-01-01

    The DNA sequence of 168 promoter regions (-50 to +10) for Escherichia coli RNA polymerase were compiled. The complete listing was divided into two groups depending upon whether or not the promoter had been defined by genetic (promoter mutations) or biochemical (5' end determination) criteria. A consensus promoter sequence based on homologies among 112 well-defined promoters was determined that was in substantial agreement with previous compilations. In addition, we have tabulated 98 promoter ...

  8. Detection and sequence analysis of accessory gene regulator genes of Staphylococcus pseudintermedius isolates

    Directory of Open Access Journals (Sweden)

    M. Ananda Chitra

    2015-07-01

    Full Text Available Background: Staphylococcus pseudintermedius (SP is the major pathogenic species of dogs involved in a wide variety of skin and soft tissue infections. The accessory gene regulator (agr locus of Staphylococcus aureus has been extensively studied, and it influences the expression of many virulence genes. It encodes a two-component signal transduction system that leads to down-regulation of surface proteins and up-regulation of secreted proteins during in vitro growth of S. aureus. The objective of this study was to detect and sequence analyzing the AgrA, B, and D of SP isolated from canine skin infections. Materials and Methods: In this study, we have isolated and identified SP from canine pyoderma and otitis cases by polymerase chain reaction (PCR and confirmed by PCR-restriction fragment length polymorphism. Primers for SP agrA and agrBD genes were designed using online primer designing software and BLAST searched for its specificity. Amplification of the agr genes was carried out for 53 isolates of SP by PCR and sequencing of agrA, B, and D were carried out for five isolates and analyzed using DNAstar and Mega5.2 software. Results: A total of 53 (59% SP isolates were obtained from 90 samples. 15 isolates (28% were confirmed to be methicillinresistant SP (MRSP with the detection of the mecA gene. Accessory gene regulator A, B, and D genes were detected in all the SP isolates. Complete nucleotide sequences of the above three genes for five isolates were submitted to GenBank, and their accession numbers are from KJ133557 to KJ133571. AgrA amino acid sequence analysis showed that it is mainly made of alpha-helices and is hydrophilic in nature. AgrB is a transmembrane protein, and AgrD encodes the precursor of the autoinducing peptide (AIP. Sequencing of the agrD gene revealed that the 5 canine SP strains tested could be divided into three Agr specificity groups (RIPTSTGFF, KIPTSTGFF, and RIPISTGFF based on the putative AIP produced by each strain

  9. Sequence analysis of the L protein of the Ebola 2014 outbreak: Insight into conserved regions and mutations.

    Science.gov (United States)

    Ayub, Gohar; Waheed, Yasir

    2016-06-01

    The 2014 Ebola outbreak was one of the largest that have occurred; it started in Guinea and spread to Nigeria, Liberia and Sierra Leone. Phylogenetic analysis of the current virus species indicated that this outbreak is the result of a divergent lineage of the Zaire ebolavirus. The L protein of Ebola virus (EBOV) is the catalytic subunit of the RNA‑dependent RNA polymerase complex, which, with VP35, is key for the replication and transcription of viral RNA. Earlier sequence analysis demonstrated that the L protein of all non‑segmented negative‑sense (NNS) RNA viruses consists of six domains containing conserved functional motifs. The aim of the present study was to analyze the presence of these motifs in 2014 EBOV isolates, highlight their function and how they may contribute to the overall pathogenicity of the isolates. For this purpose, 81 2014 EBOV L protein sequences were aligned with 475 other NNS RNA viruses, including Paramyxoviridae and Rhabdoviridae viruses. Phylogenetic analysis of all EBOV outbreak L protein sequences was also performed. Analysis of the amino acid substitutions in the 2014 EBOV outbreak was conducted using sequence analysis. The alignment demonstrated the presence of previously conserved motifs in the 2014 EBOV isolates and novel residues. Notably, all the mutations identified in the 2014 EBOV isolates were tolerant, they were pathogenic with certain examples occurring within previously determined functional conserved motifs, possibly altering viral pathogenicity, replication and virulence. The phylogenetic analysis demonstrated that all sequences with the exception of the 2014 EBOV sequences were clustered together. The 2014 EBOV outbreak has acquired a great number of mutations, which may explain the reasons behind this unprecedented outbreak. Certain residues critical to the function of the polymerase remain conserved and may be targets for the development of antiviral therapeutic agents.

  10. Analysis of Comparative Sequence and Genomic Data to Verify Phylogenetic Relationship and Explore a New Subfamily of Bacterial Lipases.

    Directory of Open Access Journals (Sweden)

    Malihe Masomian

    Full Text Available Thermostable and organic solvent-tolerant enzymes have significant potential in a wide range of synthetic reactions in industry due to their inherent stability at high temperatures and their ability to endure harsh organic solvents. In this study, a novel gene encoding a true lipase was isolated by construction of a genomic DNA library of thermophilic Aneurinibacillus thermoaerophilus strain HZ into Escherichia coli plasmid vector. Sequence analysis revealed that HZ lipase had 62% identity to putative lipase from Bacillus pseudomycoides. The closely characterized lipases to the HZ lipase gene are from thermostable Bacillus and Geobacillus lipases belonging to the subfamily I.5 with ≤ 57% identity. The amino acid sequence analysis of HZ lipase determined a conserved pentapeptide containing the active serine, GHSMG and a Ca(2+-binding motif, GCYGSD in the enzyme. Protein structure modeling showed that HZ lipase consisted of an α/β hydrolase fold and a lid domain. Protein sequence alignment, conserved regions analysis, clustal distance matrix and amino acid composition illustrated differences between HZ lipase and other thermostable lipases. Phylogenetic analysis revealed that this lipase represented a new subfamily of family I of bacterial true lipases, classified as family I.9. The HZ lipase was expressed under promoter Plac using IPTG and was characterized. The recombinant enzyme showed optimal activity at 65 °C and retained ≥ 97% activity after incubation at 50 °C for 1h. The HZ lipase was stable in various polar and non-polar organic solvents.

  11. GENETIC ANALYSIS OF ABSCISIC ACID BIOSYNTHESIS

    Energy Technology Data Exchange (ETDEWEB)

    MCCARTY D R

    2012-01-10

    The carotenoid cleavage dioxygenases (CCD) catalyze synthesis of a variety of apo-carotenoid secondary metabolites in plants, animals and bacteria. In plants, the reaction catalyzed by the 11, 12, 9-cis-epoxy carotenoid dioxygenase (NCED) is the first committed and key regulated step in synthesis of the plant hormone, abscisic acid (ABA). ABA is a key regulator of plant stress responses and has critical functions in normal root and seed development. The molecular mechanisms responsible for developmental control of ABA synthesis in plant tissues are poorly understood. Five of the nine CCD genes present in the Arabidopsis genome encode NCED's involved in control of ABA synthesis in the plant. This project is focused on functional analysis of these five AtNCED genes as a key to understanding developmental regulation of ABA synthesis and dissecting the role of ABA in plant development. For this purpose, the project developed a comprehensive set of gene knockouts in the AtNCED genes that facilitate genetic dissection of ABA synthesis. These mutants were used in combination with key molecular tools to address the following specific objectives: (1) the role of ABA synthesis in root development; (2) developmental control of ABA synthesis in seeds; (3) analysis of ATNCED over-expressers; (4) preliminary crystallography of the maize VP14 protein.

  12. First fungal genome sequence from Africa: A preliminary analysis

    Directory of Open Access Journals (Sweden)

    Rene Sutherland

    2012-01-01

    Full Text Available Some of the most significant breakthroughs in the biological sciences this century will emerge from the development of next generation sequencing technologies. The ease of availability of DNA sequence made possible through these new technologies has given researchers opportunities to study organisms in a manner that was not possible with Sanger sequencing. Scientists will, therefore, need to embrace genomics, as well as develop and nurture the human capacity to sequence genomes and utilise the ’tsunami‘ of data that emerge from genome sequencing. In response to these challenges, we sequenced the genome of Fusarium circinatum, a fungal pathogen of pine that causes pitch canker, a disease of great concern to the South African forestry industry. The sequencing work was conducted in South Africa, making F. circinatum the first eukaryotic organism for which the complete genome has been sequenced locally. Here we report on the process that was followed to sequence, assemble and perform a preliminary characterisation of the genome. Furthermore, details of the computer annotation and manual curation of this genome are presented. The F. circinatum genome was found to be nearly 44 million bases in size, which is similar to that of four other Fusarium genomes that have been sequenced elsewhere. The genome contains just over 15 000 open reading frames, which is less than that of the related species, Fusarium oxysporum, but more than that for Fusarium verticillioides. Amongst the various putative gene clusters identified in F. circinatum, those encoding the secondary metabolites fumosin and fusarin appeared to harbour evidence of gene translocation. It is anticipated that similar comparisons of other loci will provide insights into the genetic basis for pathogenicity of the pitch canker pathogen. Perhaps more importantly, this project has engaged a relatively large group of scientists

  13. Amino acid sequences of the ribosomal proteins HL30 and HmaL5 from the archaebacterium Halobacterium marismortui.

    Science.gov (United States)

    Hatakeyama, T; Hatakeyama, T

    1990-07-06

    The complete amino acid sequences of the ribosomal proteins HL30 and HmaL5 from the archaebacterium Halobacterium marismortui were determined. Protein HL30 was found to be acetylated at its N-terminal amino acid and shows homology to the eukaryotic ribosomal proteins YL34 from yeast and RL31 from rat. Protein HmaL5 was homologous to the protein L5 from Escherichia coli and Bacillus stearothermophilus as well as to YL16 from yeast. HmaL5 shows more similarities to its eukaryotic counterpart than to eubacterial ones.

  14. Isolation and amino acid sequence of a short-chain neurotoxin from an Australian elapid snake, Pseudechis australis.

    OpenAIRE

    Takasaki, C; Tamiya, N

    1985-01-01

    A short-chain neurotoxin Pseudechis australis a (toxin Pa a) was isolated from the venom of an Australian elapid snake Pseudechis australis (king brown snake) by sequential chromatography on CM-cellulose, Sephadex G-50 and CM-cellulose columns. Toxin Pa a has an LD50 (intravenous) value of 76 micrograms/kg body wt. in mice and consists of 62 amino acid residues. The amino acid sequence of Pa a shows considerable homology with those of short-chain neurotoxins of elapid snakes, especially of tr...

  15. Molecular cloning and expression analysis of jasmonic acid dependent but salicylic acid independent LeWRKY1.

    Science.gov (United States)

    Lu, M; Wang, L F; Du, X H; Yu, Y K; Pan, J B; Nan, Z J; Han, J; Wang, W X; Zhang, Q Z; Sun, Q P

    2015-11-30

    Various plant genes can be activated or inhibited by phytohormones under conditions of biotic and abiotic stress, especially in response to jasmonic acid (JA) and salicylic acid (SA). Interactions between JA and SA may be synergistic or antagonistic, depending on the stress condition. In this study, we cloned a full-length cDNA (LeWRKY1, GenBank accession No. FJ654265) from Lycopersicon esculentum by rapid amplification of cDNA ends. Sequence analysis showed that this gene is a group II WRKY transcription factor. Analysis of LeWRKY1 mRNA expression in various tissues by qRT-PCR showed that the highest and lowest expression occurred in the leaves and stems, respectively. In addition, LeWRKY1 expression was induced by JA and Botrytis cinerea Pers., but not by SA.

  16. Design of Tail-Clamp Peptide Nucleic Acid Tethered with Azobenzene Linker for Sequence-Specific Detection of Homopurine DNA

    Directory of Open Access Journals (Sweden)

    Shinjiro Sawada

    2017-10-01

    Full Text Available DNA carries genetic information in its sequence of bases. Synthetic oligonucleotides that can sequence-specifically recognize a target gene sequence are a useful tool for regulating gene expression or detecting target genes. Among the many synthetic oligonucleotides, tail-clamp peptide nucleic acid (TC-PNA offers advantages since it has two homopyrimidine PNA strands connected via a flexible ethylene glycol-type linker that can recognize complementary homopurine sequences via Watson-Crick and Hoogsteen base pairings and form thermally-stable PNA/PNA/DNA triplex structures. Here, we synthesized a series of TC-PNAs that can possess different lengths of azobenzene-containing linkers and studied their binding behaviours to homopurine single-stranded DNA. Introduction of azobenzene at the N-terminus amine of PNA increased the thermal stability of PNA-DNA duplexes. Further extension of the homopyrimidine PNA strand at the N-terminus of PNA-AZO further increased the binding stability of the PNA/DNA/PNA triplex to the target homopurine sequence; however, it induced TC-PNA/DNA/TC-PNA complex formation. Among these TC-PNAs, 9W5H-C4-AZO consisting of nine Watson-Crick bases and five Hoogsteen bases tethered with a beta-alanine conjugated azobenzene linker gave a stable 1:1 TC-PNA/ssDNA complex and exhibited good mismatch recognition. Our design for TC-PNA-AZO can be utilized for detecting homopurine sequences in various genes.

  17. Sequencing and analysis of the Mediterranean amphioxus (Branchiostoma lanceolatum transcriptome.

    Directory of Open Access Journals (Sweden)

    Silvan Oulion

    Full Text Available BACKGROUND: The basally divergent phylogenetic position of amphioxus (Cephalochordata, as well as its conserved morphology, development and genetics, make it the best proxy for the chordate ancestor. Particularly, studies using the amphioxus model help our understanding of vertebrate evolution and development. Thus, interest for the amphioxus model led to the characterization of both the transcriptome and complete genome sequence of the American species, Branchiostoma floridae. However, recent technical improvements allowing induction of spawning in the laboratory during the breeding season on a daily basis with the Mediterranean species Branchiostoma lanceolatum have encouraged European Evo-Devo researchers to adopt this species as a model even though no genomic or transcriptomic data have been available. To fill this need we used the pyrosequencing method to characterize the B. lanceolatum transcriptome and then compared our results with the published transcriptome of B. floridae. RESULTS: Starting with total RNA from nine different developmental stages of B. lanceolatum, a normalized cDNA library was constructed and sequenced on Roche GS FLX (Titanium mode. Around 1.4 million of reads were produced and assembled into 70,530 contigs (average length of 490 bp. Overall 37% of the assembled sequences were annotated by BlastX and their Gene Ontology terms were determined. These results were then compared to genomic and transcriptomic data of B. floridae to assess similarities and specificities of each species. CONCLUSION: We obtained a high-quality amphioxus (B. lanceolatum reference transcriptome using a high throughput sequencing approach. We found that 83% of the predicted genes in the B. floridae complete genome sequence are also found in the B. lanceolatum transcriptome, while only 41% were found in the B. floridae transcriptome obtained with traditional Sanger based sequencing. Therefore, given the high degree of sequence conservation

  18. Characterization of the HLA-DRβ1 third hypervariable region amino acid sequence according to charge and parental inheritance in systemic sclerosis.

    Science.gov (United States)

    Gentil, Coline A; Gammill, Hilary S; Luu, Christine T; Mayes, Maureen D; Furst, Dan E; Nelson, J Lee

    2017-03-07

    Specific HLA class II alleles are associated with systemic sclerosis (SSc) risk, clinical characteristics, and autoantibodies. HLA nomenclature initially developed with antibodies as typing reagents defining DRB1 allele groups. However, alleles from different DRB1 allele groups encode the same third hypervariable region (3rd HVR) sequence, the primary T-cell recognition site, and 3rd HVR charge differences can affect interactions with T cells. We considered 3rd HVR sequences (amino acids 67-74) irrespective of the allele group and analyzed parental inheritance considered according to the 3rd HVR charge, comparing SSc patients with controls. In total, 306 families (121 SSc and 185 controls) were HLA genotyped and parental HLA-haplotype origin was determined. Analysis was conducted according to DRβ1 3rd HVR sequence, charge, and parental inheritance. The distribution of 3rd HVR sequences differed in SSc patients versus controls (p = 0.007), primarily due to an increase of specific DRB1*11 alleles, in accord with previous observations. The 3rd HVR sequences were next analyzed according to charge and parental inheritance. Paternal transmission of DRB1 alleles encoding a +2 charge 3rd HVR was significantly reduced in SSc patients compared with maternal transmission (p = 0.0003, corrected for analysis of four charge categories p = 0.001). To a lesser extent, paternal transmission was increased when charge was 0 (p = 0.021, corrected for multiple comparisons p = 0.084). In contrast, paternal versus maternal inheritance was similar in controls. SSc patients differed from controls when DRB1 alleles were categorized according to 3rd HVR sequences. Skewed parental inheritance was observed in SSc patients but not in controls when the DRβ1 3rd HVR was considered according to charge. These observations suggest that epigenetic modulation of HLA merits investigation in SSc.

  19. Sequence stratigraphic analysis of Cenomanian greenhouse palaeosols: A case study from southern Patagonia, Argentina

    Science.gov (United States)

    Varela, Augusto N.; Veiga, Gonzalo D.; Poiré, Daniel G.

    2012-10-01

    The aim of this contribution is to analyse extrinsic (i.e., tectonics, climate and eustasy) and intrinsic (i.e., palaeotopography, palaeodrainage and relative sedimentation rates) factors that controlled palaeosol development in the Cenomanian Mata Amarilla Formation (Austral foreland basin, southwestern Patagonia, Argentina). Detailed sedimentological logs, facies analysis, pedofeatures and palaeosol horizon identification led to the definition of six pedotypes, which represent Histosols, acid sulphate Histosols, Vertisols, hydromorphic Vertisols, Inceptisols and vertic Alfisols. Small- and large-scale changes in palaeosol development were recognised throughout the units. Small-scale or high-frequency variations, identified within the middle section are represented by the lateral and vertical superimposition of Inceptisols, Vertisols and hydromorphic Vertisols. Lateral changes are interpreted as the result of intrinsic factors to the depositional systems, such as the relative position within the floodplain and the distance from the main channels, that condition the nature of parent material, the sedimentation rate and eventually the palaeotopographic position. Vertical stacking of different soil types is linked to avulsion processes and the relatively abrupt change in the distance to main channels as the system aggraded. The large-scale or low-frequency vertical variations in palaeosol type occurring in the Mata Amarilla Formation are related to long-term changes in depositional environments. The lower and upper sections of the studied logs are characterised by Histosols and acid sulphate Histosols, and few hydromorphic Vertisols associated with low-gradient coastal environments (i.e., lagoons, estuaries and distal fluvial systems). At the lower boundary of the middle section, a thick palaeosol succession composed of vertic Alfisols occurs. The rest of the middle section is characterised by Vertisols, hydromorphic Vertisols and Inceptisols occurring on distal and

  20. ROC analysis of acid demineralized artificial caries

    International Nuclear Information System (INIS)

    Kang, Byung Cheol

    1997-01-01

    This study is designed to determine the artificial incipient proximal caries lesion detectability by dentists on Ektaspeed Plus film using ROC analysis. Sixteen premolars and 30 molars, which have 52 proximal caries-like demineralized lesions using acid-gel technique were added to 20 sound premolars and 30 sound molars to make 24 plaster blocks. Each block with 4 teeth and 6 contacting proximal surfaces was placed in an optical bench to take 12 bitewing radiographs with Ektaspeed Plus film. Thirty-six dentists acted as observers to evaluated the proximal lesions using five rating scales for ROC analysis. They were also asked to determine the presence or absence of the proximal caries. The true status of the proximal caries was established by the consensus of three oral and maxillofacila radiologists. For evaluation of intra-observer agreement, 9 dentist reread the radiographs at an interval of 1 month. The Pearson correlation coefficient for the intra-observer agreement was 0.746 (good agreement). Ten observer's data set were degenerated. The mean area under ROC curve from 26 observers was 0.806 and standard deviation was 0.061. The sensitivity and the specificity of the binary response were 0.17 (SD=0.11) and 0.78 (SD=0.17) respectively. The binary response only reveal a single values of sensitivity and the specificity. The ROC analysis to assess the diagnostic accuracy in caries detection, which producing estimates of sensitivities for all specifities, yield more comprehensive measures of diagnostic performance than single values for sensitivity and specificity.

  1. Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information.

    Science.gov (United States)

    Upadhyay, Atul Kumar; Sowdhamini, Ramanathan

    2016-01-01

    3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.

  2. Whole-Genome Sequencing and Comparative Genome Analysis of Bacillus subtilis Strains Isolated from Non-Salted Fermented Soybean Foods.

    Directory of Open Access Journals (Sweden)

    Mayumi Kamada

    Full Text Available Bacillus subtilis is the main component in the fermentation of soybeans. To investigate the genetics of the soybean-fermenting B. subtilis strains and its relationship with the productivity of extracellular poly-γ-glutamic acid (γPGA, we sequenced the whole genome of eight B. subtilis stains isolated from non-salted fermented soybean foods in Southeast Asia. Assembled nucleotide sequences were compared with those of a natto (fermented soybean food starter strain B. subtilis BEST195 and the laboratory standard strain B. subtilis 168 that is incapable of γPGA production. Detected variants were investigated in terms of insertion sequences, biotin synthesis, production of subtilisin NAT, and regulatory genes for γPGA synthesis, which were related to fermentation process. Comparing genome sequences, we found that the strains that produce γPGA have a deletion in a protein that constitutes the flagellar basal body, and this deletion was not found in the non-producing strains. We further identified diversity in variants of the bio operon, which is responsible for the biotin auxotrophism of the natto starter strains. Phylogenetic analysis using multilocus sequencing typing revealed that the B. subtilis strains isolated from the non-salted fermented soybeans were not clustered together, while the natto-fermenting strains were tightly clustered; this analysis also suggested that the strain isolated from "Tua Nao" of Thailand traces a different evolutionary process from other strains.

  3. The isolation, purification and amino-acid sequence of insulin from the teleost fish Cottus scorpius (daddy sculpin).

    Science.gov (United States)

    Cutfield, J F; Cutfield, S M; Carne, A; Emdin, S O; Falkmer, S

    1986-07-01

    Insulin from the principal islets of the teleost fish, Cottus scorpius (daddy sculpin), has been isolated and sequenced. Purification involved acid/alcohol extraction, gel filtration, and reverse-phase high-performance liquid chromatography to yield nearly 1 mg pure insulin/g wet weight islet tissue. Biological potency was estimated as 40% compared to porcine insulin. The sculpin insulin crystallised in the absence of zinc ions although zinc is known to be present in the islets in significant amounts. Two other hormones, glucagon and pancreatic polypeptide, were copurified with the insulin, and an N-terminal sequence for pancreatic polypeptide was determined. The primary structure of sculpin insulin shows a number of sequence changes unique so far amongst teleost fish. These changes occur at A14 (Arg), A15 (Val), and B2 (Asp). The B chain contains 29 amino acids and there is no N-terminal extension as seen with several other fish. Presumably as a result of the amino acid substitutions, sculpin insulin does not readily form crystals containing zinc-insulin hexamers, despite the presence of the coordinating B10 His.

  4. Confirmation and Sequence analysis of N gene of PPRV in South Xinjiang, China

    Directory of Open Access Journals (Sweden)

    YongHong Liu

    Full Text Available ABSTRACT In China, Peste des petits ruminants (PPR was officially first reported in 2007. From 2010 until the outbreak of 2013, PPRV infection was not reported. In November 2013, PPRV re-emerged in Xinjiang and rapidly spread to 22 P/A/M (provinces, autonomous regions and municipalities of China. In the study, suspected PPRV-infected sheep in a breeding farm of South Xinjiang in 2014 were diagnosed and the characteristics of complete sequence of N protein gene of PPRV was analyzed. The sheep showed PPRV-infected signs, such as fever, orinasal secretions increase, dyspnea and diarrhea, with 60% of morbidity and 21.1% of fatality rate. The macroscopic lesions after autopsy and histopathological changes were observed under light microscope including stomatitis, broncho-interstitial pneumonia, catarrhal hemorrhagic enteritis and intracytoplasmic eosinophilic inclusions in multinucleated giantcell in lung. The formalin-fixed mixed tissues samples were positive by nucleic acid extraction and RT-PCR detection. The nucleotide of N protein gene of China/XJNJ/2014 strain was extremely high homology with the China/XJYL/2013 strain, and the highest with PRADESH_95 strain from India in exotic strains. Phylogenetic analysis based on complete sequence of N protein gene of PPRV showed that the China/XJNJ/2014 strain, other strain of 2013-2014 in this study and Tibetan strains all belonged to lineage Ⅳ, but the PPRV strains of 2013-2014 in this study and Tibetan strains were in different sub-branches.

  5. [Sequencing and analysis of the complete genome of a rabies virus isolate from Sika deer].

    Science.gov (United States)

    Zhao, Yun-Jiao; Guo, Li; Huang, Ying; Zhang, Li-Shi; Qian, Ai-Dong

    2008-05-01

    One DRV strain was isolated from Sika Deer brain and sequenced. Nine overlapped gene fragments were amplified by RT-PCR through 3'-RACE and 5'-RACE method, and the complete DRV genome sequence was assembled. The length of the complete genome is 11863bp. The DRV genome organization was similar to other rabies viruses which were composed of five genes and the initiation sites and termination sites were highly conservative. There were mutated amino acids in important antigen sites of nucleoprotein and glycoprotein. The nucleotide and amino acid homologies of gene N, P, M, G, L in strains with completed genomie sequencing were compared. Compared with N gene sequence of other typical rabies viruses, a phylogenetic tree was established . These results indicated that DRV belonged to gene type 1. The highest homology compared with Chinese vaccine strain 3aG was 94%, and the lowest was 71% compared with WCBV. These findings provided theoretical reference for further research in rabies virus.

  6. [Trace analysis of aristolochic acid A].

    Science.gov (United States)

    Liu, Yalin; Gao, Huimin; Wang, Zhimin; Zhang, Qiwei

    2010-12-01

    A HPLC method for limit detection of aristolochic acid A in the Chinese herbs containing aristolochic acid or suspected-containing aristolochic acid and their preparations was established. The samples were analyzed on an Alltima C18 column eluted with methanol-water-acetic acid (68:32:1.5) as the mobile phase. Flow rate was at 1.0 mL x min(-1) and the detection wavelength was at 390 nm. The calibration curve was linear over the range from 0.016 to 0.51 g (r = 0.9993) and LOD was 4 ng. The average recovery was 101.2% with RSD of 2.01%. The procedures of sample preparation were systematically investigated. The contents of aristolochic acid A in Radix et Rhizoma Asari bought from market or drugstore were fluctuated from 3.1 to 26.6 microg x g(-1) and 3 of 11 samples accorded with the quality requirement of current Chinese Pharmacopoeia. Among 15 batches samples of Chinese medicaments, only one sample was found to contain aristolochic acid A. The present investigation shows that the method is sensitive and repeatable and it could be used for the limit detection of aristolochic acid A in the Chinese herbal medicines containing trace amount of aristolochic acid A or suspected-containing aristolochic acid A and their preparations.

  7. Sequencing and analysis of the gene-rich space of cowpea

    Directory of Open Access Journals (Sweden)

    Cheung Foo

    2008-02-01

    Full Text Available Abstract Background Cowpea, Vigna unguiculata (L. Walp., is one of the most important food and forage legumes in the semi-arid tropics because of its drought tolerance and ability to grow on poor quality soils. Approximately 80% of cowpea production takes place in the dry savannahs of tropical West and Central Africa, mostly by poor subsistence farmers. Despite its economic and social importance in the developing world, cowpea remains to a large extent an underexploited crop. Among the major goals of cowpea breeding and improvement programs is the stacking of desirable agronomic traits, such as disease and pest resistance and response to abiotic stresses. Implementation of marker-assisted selection and breeding programs is severely limited by a paucity of trait-linked markers and a general lack of information on gene structure and organization. With a nuclear genome size estimated at ~620 Mb, the cowpea genome is an ideal target for reduced representation sequencing. Results We report here the sequencing and analysis of the gene-rich, hypomethylated portion of the cowpea genome selectively cloned by methylation filtration (MF technology. Over 250,000 gene-space sequence reads (GSRs with an average length of 610 bp were generated, yielding ~160 Mb of sequence information. The GSRs were assembled, annotated by BLAST homology searches of four public protein annotation databases and four plant proteomes (A. thaliana, M. truncatula, O. sativa, and P. trichocarpa, and analyzed using various domain and gene modeling tools. A total of 41,260 GSR assemblies and singletons were annotated, of which 19,786 have unique GenBank accession numbers. Within the GSR dataset, 29% of the sequences were annotated using the Arabidopsis Gene Ontology (GO with the largest categories of assigned function being catalytic activity and metabolic processes, groups that include the majority of cellular enzymes and components of amino acid, carbohydrate and lipid metabolism. A

  8. Analysis of long-range correlation in sequences data of proteins

    OpenAIRE

    ADRIANA ISVORAN; LAURA UNIPAN; DANA CRACIUN; VASILE MORARIU

    2007-01-01

    The results presented here suggest the existence of correlations in the sequence data of proteins. 32 proteins, both globular and fibrous, both monomeric and polymeric, were analyzed. The primary structures of these proteins were treated as time series. Three spatial series of data for each sequence of a protein were generated from numerical correspondences between each amino acid and a physical property associated with it, i.e., its electric charge, its polar character and its dipole moment....

  9. Analysis of expressed sequence tags from Prunus mume flower and fruit and development of simple sequence repeat markers

    Directory of Open Access Journals (Sweden)

    Gao Zhihong

    2010-07-01

    Full Text Available Abstract Background Expressed Sequence Tag (EST has been a cost-effective tool in molecular biology and represents an abundant valuable resource for genome annotation, gene expression, and comparative genomics in plants. Results In this study, we constructed a cDNA library of Prunus mume flower and fruit, sequenced 10,123 clones of the library, and obtained 8,656 expressed sequence tag (EST sequences with high quality. The ESTs were assembled into 4,473 unigenes composed of 1,492 contigs and 2,981 singletons and that have been deposited in NCBI (accession IDs: GW868575 - GW873047, among which 1,294 unique ESTs were with known or putative functions. Furthermore, we found 1,233 putative simple sequence repeats (SSRs in the P. mume unigene dataset. We randomly tested 42 pairs of PCR primers flanking potential SSRs, and 14 pairs were identified as true-to-type SSR loci and could amplify polymorphic bands from 20 individual plants of P. mume. We further used the 14 EST-SSR primer pairs to test the transferability on peach and plum. The result showed that nearly 89% of the primer pairs produced target PCR bands in the two species. A high level of marker polymorphism was observed in the plum species (65% and low in the peach (46%, and the clustering analysis of the three species indicated that these SSR markers were useful in the evaluation of genetic relationships and diversity between and within the Prunus species. Conclusions We have constructed the first cDNA library of P. mume flower and fruit, and our data provide sets of molecular biology resources for P. mume and other Prunus species. These resources will be useful for further study such as genome annotation, new gene discovery, gene functional analysis, molecular breeding, evolution and comparative genomics between Prunus species.

  10. STING Millennium: a web-based suite of programs for comprehensive and simultaneous analysis of protein structure and sequence

    Science.gov (United States)

    Neshich, Goran; Togawa, Roberto C.; Mancini, Adauto L.; Kuser, Paula R.; Yamagishi, Michel E. B.; Pappas, Georgios; Torres, Wellington V.; Campos, Tharsis Fonseca e; Ferreira, Leonardo L.; Luna, Fabio M.; Oliveira, Adilton G.; Miura, Ronald T.; Inoue, Marcus K.; Horita, Luiz G.; de Souza, Dimas F.; Dominiquini, Fabiana; Álvaro, Alexandre; Lima, Cleber S.; Ogawa, Fabio O.; Gomes, Gabriel B.; Palandrani, Juliana F.; dos Santos, Gabriela F.; de Freitas, Esther M.; Mattiuz, Amanda R.; Costa, Ivan C.; de Almeida, Celso L.; Souza, Savio; Baudet, Christian; Higa, Roberto H.

    2003-01-01

    STING Millennium Suite (SMS) is a new web-based suite of programs and databases providing visualization and a complex analysis of molecular sequence and structure for the data deposited at the Protein Data Bank (PDB). SMS operates with a collection of both publicly available data (PDB, HSSP, Prosite) and its own data (contacts, interface contacts, surface accessibility). Biologists find SMS useful because it provides a variety of algorithms and validated data, wrapped-up in a user friendly web interface. Using SMS it is now possible to analyze sequence to structure relationships, the quality of the structure, nature and volume of atomic contacts of intra and inter chain type, relative conservation of amino acids at the specific sequence position based on multiple sequence alignment, indications of folding essential residue (FER) based on the relationship of the residue conservation to the intra-chain contacts and Cα–Cα and Cβ–Cβ distance geometry. Specific emphasis in SMS is given to interface forming residues (IFR)—amino acids that define the interactive portion of the protein surfaces. SMS may simultaneously display and analyze previously superimposed structures. PDB updates trigger SMS updates in a synchronized fashion. SMS is freely accessible for public data at http://www.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS and http://trantor.bioc.columbia.edu/SMS. PMID:12824333

  11. cDNA cloning, sequence analysis, and chromosomal localization of the gene for human carnitine palmitoyltransferase

    International Nuclear Information System (INIS)

    Finocchiaro, G.; Taroni, F.; Martin, A.L.; Colombo, I.; Tarelli, G.T.; DiDonato, S.; Rocchi, M.

    1991-01-01

    The authors have cloned and sequenced a cDNA encoding human liver carnitine palmitoyltransferase an inner mitochondrial membrane enzyme that plays a major role in the fatty acid oxidation pathway. Mixed oligonucleotide primers whose sequences were deduced from one tryptic peptide obtained from purified CPTase were used in a polymerase chain reaction, allowing the amplification of a 0.12-kilobase fragment of human genomic DNA encoding such a peptide. A 60-base-pair (bp) oligonucleotide synthesized on the basis of the sequence from this fragment was used for the screening of a cDNA library from human liver and hybridized to a cDNA insert of 2255 bp. This cDNA contains an open reading frame of 1974 bp that encodes a protein of 658 amino acid residues including 25 residues of an NH 2 -terminal leader peptide. The assignment of this open reading frame to human liver CPTase is confirmed by matches to seven different amino acid sequences of tryptic peptides derived from pure human CPTase and by the 82.2% homology with the amino acid sequence of rat CPTase. The NH 2 -terminal region of CPTase contains a leucine-proline motif that is shared by carnitine acetyl- and octanoyltransferases and by choline acetyltransferase. The gene encoding CPTase was assigned to human chromosome 1, region 1q12-1pter, by hybridization of CPTase cDNA with a DNA panel of 19 human-hanster somatic cell hybrids

  12. Accident Sequence Precursor Analysis for SGTR by Using Dynamic PSA Approach

    International Nuclear Information System (INIS)

    Lee, Han Sul; Heo, Gyun Young; Kim, Tae Wan

    2016-01-01

    In order to address this issue, this study suggests the sequence tree model to analyze accident sequence systematically. Using the sequence tree model, all possible scenarios which need a specific safety action to prevent the core damage can be identified and success conditions of safety action under complicated situation such as combined accident will be also identified. Sequence tree is branch model to divide plant condition considering the plant dynamics. Since sequence tree model can reflect the plant dynamics, arising from interaction of different accident timing and plant condition and from the interaction between the operator action, mitigation system, and the indicators for operation, sequence tree model can be used to develop the dynamic event tree model easily. Target safety action for this study is a feed-and-bleed (F and B) operation. A F and B operation directly cools down the reactor cooling system (RCS) using the primary cooling system when residual heat removal by the secondary cooling system is not available. In this study, a TLOFW accident and a TLOFW accident with LOCA were the target accidents. Based on the conventional PSA model and indicators, the sequence tree model for a TLOFW accident was developed. Based on the results of a sampling analysis and data from the conventional PSA model, the CDF caused by Sequence no. 26 can be realistically estimated. For a TLOFW accident with LOCA, second accident timings were categorized according to plant condition. Indicators were selected as branch point using the flow chart and tables, and a corresponding sequence tree model was developed. If sampling analysis is performed, practical accident sequences can be identified based on the sequence analysis. If a realistic distribution for the variables can be obtained for sampling analysis, much more realistic accident sequences can be described. Moreover, if the initiating event frequency under a combined accident can be quantified, the sequence tree model

  13. Sequencing and phylogenetic analysis of Herpes simplex virus type ...

    African Journals Online (AJOL)

    For determination of the genetic relationship of HSV-2 glycoprotein G gene (gG) in Iran with those in other countries, DNA fragment of 1100 bp corresponding to gG from six HSV-2 strains have been isolated from human infected sera samples in Iran, it was amplified in PCR system and was sequenced for determining ...

  14. Transcriptome analysis of blueberry using 454 EST sequencing

    Science.gov (United States)

    Blueberry (Vaccinium corymbosum) is a major berry crop in the United States, and one that has great nutritional and economical value. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities du...

  15. Characterization and sequence analysis of cysteine and glycine-rich ...

    African Journals Online (AJOL)

    Tarek

    2011-04-18

    Apr 18, 2011 ... nucleotide alignment of both native buffalo and cattle CSRP3 cDNAs sequences ..... Exon III, Identities = 71/75 (94%), Gaps = 1/75 (1%) Strand=Plus/Plus ... Band MR, Larson JH, Rebeiz M, Green CA, Heyen DW, Donovan J,.

  16. Functional analysis of bipartite begomovirus coat protein promoter sequences

    International Nuclear Information System (INIS)

    Lacatus, Gabriela; Sunter, Garry

    2008-01-01

    We demonstrate that the AL2 gene of Cabbage leaf curl virus (CaLCuV) activates the CP promoter in mesophyll and acts to derepress the promoter in vascular tissue, similar to that observed for Tomato golden mosaic virus (TGMV). Binding studies indicate that sequences mediating repression and activation of the TGMV and CaLCuV CP promoter specifically bind different nuclear factors common to Nicotiana benthamiana, spinach and tomato. However, chromatin immunoprecipitation demonstrates that TGMV AL2 can interact with both sequences independently. Binding of nuclear protein(s) from different crop species to viral sequences conserved in both bipartite and monopartite begomoviruses, including TGMV, CaLCuV, Pepper golden mosaic virus and Tomato yellow leaf curl virus suggests that bipartite begomoviruses bind common host factors to regulate the CP promoter. This is consistent with a model in which AL2 interacts with different components of the cellular transcription machinery that bind viral sequences important for repression and activation of begomovirus CP promoters

  17. The DNA sequence, annotation and analysis of human chromosome 3

    DEFF Research Database (Denmark)

    Muzny, D.M.; Bolund, Lars; As part of the Chinese Human Genome Sequencing Consortium, E.T.A.L.

    2006-01-01

    as numerous loci involved in multiple human cancers such as the gene encoding FHIT, which contains the most common constitutive fragile site in the genome, FRA3B. Using genomic sequence from chimpanzee and rhesus macaque, we were able to characterize the breakpoints defining a large pericentric inversion...

  18. Sequence analysis of mitochondrial 16S ribosomal RNA gene

    Indian Academy of Sciences (India)

    Mosquitoes are vectors for the transmission of many human pathogens that include viruses, nematodes and protozoa. For the understanding of their vectorial capacity, identification of disease carrying and refractory strains is essential. Recently, molecular taxonomic techniques have been utilized for this purpose. Sequence ...

  19. Illumina-based de novo transcriptome sequencing and analysis

    Indian Academy of Sciences (India)

    In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI nonredundant ...

  20. Generation and analysis of expressed sequence tags from Botrytis cinerea

    Directory of Open Access Journals (Sweden)

    EVELYN SILVA

    2006-01-01

    Full Text Available Botrytis cinerea is a filamentous plant pathogen of a wide range of plant species, and its infection may cause enormous damage both during plant growth and in the post-harvest phase. We have constructed a cDNA library from an isolate of B. cinerea and have sequenced 11,482 expressed sequence tags that were assembled into 1,003 contigs sequences and 3,032 singletons. Approximately 81% of the unigenes showed significant similarity to genes coding for proteins with known functions: more than 50% of the sequences code for genes involved in cellular metabolism, 12% for transport of metabolites, and approximately 10% for cellular organization. Other functional categories include responses to biotic and abiotic stimuli, cell communication, cell homeostasis, and cell development. We carried out pair-wise comparisons with fungal databases to determine the B. cinerea unisequence set with relevant similarity to genes in other fungal pathogenic counterparts. Among the 4,035 non-redundant B. cinerea unigenes, 1,338 (23% have significant homology with Fusarium verticillioides unigenes. Similar values were obtained for Saccharomyces cerevisiae and Aspergillus nidulans (22% and 24%, respectively. The lower percentages of homology were with Magnaporthe grisae and Neurospora crassa (13% and 19%, respectively. Several genes involved in putative and known fungal virulence and general pathogenicity were identified. The results provide important information for future research on this fungal pathogen

  1. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  2. DNA sequence and prokaryotic expression analysis of vitellogenin ...

    African Journals Online (AJOL)

    In this study, the DNA sequence of vitellogenin from Antheraea pernyi (Ap-Vg) was identified and its functional domain (30-740 aa, Ap-Vg-1) was expressed in Escherichia coli BL21 (DE3) cells. The recombinant Ap-Vg-1 proteins were purified and used for antibody preparation. The results showed that the intact DNA ...

  3. A bibliometric analysis of global research on genome sequencing ...

    African Journals Online (AJOL)

    The results show that disease and protein related researches were the leading research focuses, and comparative genomics and evolution related research had strong potential in the near future. Key words: Genome sequencing, research trend, scientometrics, science citation index expanded (SCI-Expanded), word cluster ...

  4. Cloning and sequence analysis of the defective in anther ...

    African Journals Online (AJOL)

    To clone the defective in anther dehiscence1 (DAD1) gene fragment of Chinese kale, about 700 bp product was obtained by PCR amplification using Chinese kale genomic DNA as the template and a pair of specific primers designed according to the conserved sequence of DAD1 genes of Arabidopsis thaliana and ...

  5. Sequence and comparative analysis of Leuconostoc dairy bacteriophages

    DEFF Research Database (Denmark)

    Kot, Witold; Hansen, Lars Henrik; Neve, Horst

    2014-01-01

    Bacteriophages attacking Leuconostoc species may significantly influence the quality of the final product. There is however limited knowledge of this group of phages in the literature. We have determined the complete genome sequences of nine Leuconostoc bacteriophages virulent to either Leuconostoc...

  6. Complete genome sequencing and evolutionary analysis of Indian isolates of Dengue virus type 2

    Energy Technology Data Exchange (ETDEWEB)

    Dash, Paban Kumar, E-mail: pabandash@rediffmail.com; Sharma, Shashi; Soni, Manisha; Agarwal, Ankita; Parida, Manmohan; Rao, P.V.Lakshmana

    2013-07-05

    Highlights: •Complete genome of Indian DENV-2 was deciphered for the first time in this study. •The recent Indian DENV-2 revealed presence of many unique amino acid residues. •Genotype shift (American to Cosmopolitan) characterizes evolution of DENV-2 in India. •Circulation of a unique clade of DENV-2 in South Asia was identified. -- Abstract: Dengue is the most important arboviral infection of global public health significance. It is now endemic in most parts of the South East Asia including India. Though Dengue virus type 2 (DENV-2) is predominantly associated with major outbreaks in India, complete genome information of Indian DENV-2 is not available. In this study, the full-length genome of five DENV-2 isolates (four from 2001 to 2011 and one from 1960), from different parts of India was determined. The complete genome of the Indian DENV-2 was found to be 10,670 bases long with an open reading frame coding for 3391 amino acids. The recent Indian DENV-2 (2001–2011) revealed a nucleotide sequence identity of around 90% and 97% with an older Indian DENV-2 (1960) and closely related Sri Lankan and Chinese DENV-2 respectively. Presence of unique amino acid residues and non-conservative substitutions in critical amino acid residues of major structural and non-structural proteins was observed in recent Indian DENV-2. Selection pressure analysis revealed positive selection in few amino acid sites of the genes encoding for structural and non-structural proteins. The molecular phylogenetic analysis based on comparison of both complete coding region and envelope protein gene with globally diverse DENV-2 viruses classified the recent Indian isolates into a unique South Asian clade within Cosmopolitan genotype. A shift of genotype from American to Cosmopolitan in 1970s characterized the evolution of DENV-2 in India. Present study is the first report on complete genome characterization of emerging DENV-2 isolates from India and highlights the circulation of a

  7. Complete genome sequencing and evolutionary analysis of Indian isolates of Dengue virus type 2

    International Nuclear Information System (INIS)

    Dash, Paban Kumar; Sharma, Shashi; Soni, Manisha; Agarwal, Ankita; Parida, Manmohan; Rao, P.V.Lakshmana

    2013-01-01

    Highlights: •Complete genome of Indian DENV-2 was deciphered for the first time in this study. •The recent Indian DENV-2 revealed presence of many unique amino acid residues. •Genotype shift (American to Cosmopolitan) characterizes evolution of DENV-2 in India. •Circulation of a unique clade of DENV-2 in South Asia was identified. -- Abstract: Dengue is the most important arboviral infection of global public health significance. It is now endemic in most parts of the South East Asia including India. Though Dengue virus type 2 (DENV-2) is predominantly associated with major outbreaks in India, complete genome information of Indian DENV-2 is not available. In this study, the full-length genome of five DENV-2 isolates (four from 2001 to 2011 and one from 1960), from different parts of India was determined. The complete genome of the Indian DENV-2 was found to be 10,670 bases long with an open reading frame coding for 3391 amino acids. The recent Indian DENV-2 (2001–2011) revealed a nucleotide sequence identity of around 90% and 97% with an older Indian DENV-2 (1960) and closely related Sri Lankan and Chinese DENV-2 respectively. Presence of unique amino acid residues and non-conservative substitutions in critical amino acid residues of major structural and non-structural proteins was observed in recent Indian DENV-2. Selection pressure analysis revealed positive selection in few amino acid sites of the genes encoding for structural and non-structural proteins. The molecular phylogenetic analysis based on comparison of both complete coding region and envelope protein gene with globally diverse DENV-2 viruses classified the recent Indian isolates into a unique South Asian clade within Cosmopolitan genotype. A shift of genotype from American to Cosmopolitan in 1970s characterized the evolution of DENV-2 in India. Present study is the first report on complete genome characterization of emerging DENV-2 isolates from India and highlights the circulation of a

  8. Characterization of shark complement factor I gene(s): genomic analysis of a novel shark-specific sequence.

    Science.gov (United States)

    Shin, Dong-Ho; Webb, Barbara M; Nakao, Miki; Smith, Sylvia L

    2009-07-01

    Complement factor I is a crucial regulator of mammalian complement activity. Very little is known of complement regulators in non-mammalian species. We isolated and sequenced four highly similar complement factor I cDNAs from the liver of the nurse shark (Ginglymostoma cirratum), designated as GcIf-1, GcIf-2, GcIf-3 and GcIf-4 (previously referred to as nsFI-a, -b, -c and -d) which encode 689, 673, 673 and 657 amino acid residues, respectively. They share 95% (shark-specific sequence between the leader peptide (LP) and the factor I membrane attack complex (FIMAC) domain. The cDNA sequences differ only in the size and composition of the shark-specific region (SSR). Sequence analysis of each SSR has identified within the region two novel short sequences (SS1 and SS2) and three repeat sequences (RS1-3). Genomic analysis has revealed the existence of three introns between the leader peptide and the FIMAC domain, tentatively designated intron 1, intron 2, and intron 3 which span 4067, 2293 and 2082bp, respectively. Southern blot analysis suggests the presence of a single gene copy for each cDNA type. Phylogenetic analysis suggests that complement factor I of cartilaginous fish diverged prior to the emergence of mammals. All four GcIf cDNA species are expressed in four different tissues and the liver is the main tissue in which expression level of all four is high. This suggests that the expression of GcIf isotypes is tissue-dependent.

  9. Cloning, sequence analysis, expression of Cyathus bulleri laccase in Pichia pastoris and characterization of recombinant laccase

    Directory of Open Access Journals (Sweden)

    Garg Neha

    2012-10-01

    Full Text Available Abstract Background Laccases are blue multi-copper oxidases and catalyze the oxidation of phenolic and non-phenolic compounds. There is considerable interest in using these enzymes for dye degradation as well as for synthesis of aromatic compounds. Laccases are produced at relatively low levels and, sometimes, as isozymes in the native fungi. The investigation of properties of individual enzymes therefore becomes difficult. The goal of this study was to over-produce a previously reported laccase from Cyathus bulleri using the well-established expression system of Pichia pastoris and examine and compare the properties of the recombinant enzyme with that of the native laccase. Results In this study, complete cDNA encoding laccase (Lac from white rot fungus Cyathus bulleri was amplified by RACE-PCR, cloned and expressed in the culture supernatant of Pichia pastoris under the control of the alcohol oxidase (AOX1 promoter. The coding region consisted of 1,542 bp and encodes a protein of 513 amino acids with a signal peptide of 16 amino acids. The deduced amino acid sequence of the matured protein displayed high homology with laccases from Trametes versicolor and Coprinus cinereus. The sequence analysis indicated the presence of Glu 460 and Ser 113 and LEL tripeptide at the position known to influence redox potential of laccases placing this enzyme as a high redox enzyme. Addition of copper sulfate to the production medium enhanced the level of laccase by about 12-fold to a final activity of 7200 U L-1. The recombinant laccase (rLac was purified by ~4-fold to a specific activity of ~85 U mg-1 protein. A detailed study of thermostability, chloride and solvent tolerance of the rLac indicated improvement in the first two properties when compared to the native laccase (nLac. Altered glycosylation pattern, identified by peptide mass finger printing, was proposed to contribute to altered properties of the rLac. Conclusion Laccase of C. bulleri was

  10. Cloning, sequence analysis, expression of Cyathus bulleri laccase in Pichia pastoris and characterization of recombinant laccase.

    Science.gov (United States)

    Garg, Neha; Bieler, Nora; Kenzom, Tenzin; Chhabra, Meenu; Ansorge-Schumacher, Marion; Mishra, Saroj

    2012-10-23

    Laccases are blue multi-copper oxidases and catalyze the oxidation of phenolic and non-phenolic compounds. There is considerable interest in using these enzymes for dye degradation as well as for synthesis of aromatic compounds. Laccases are produced at relatively low levels and, sometimes, as isozymes in the native fungi. The investigation of properties of individual enzymes therefore becomes difficult. The goal of this study was to over-produce a previously reported laccase from Cyathus bulleri using the well-established expression system of Pichia pastoris and examine and compare the properties of the recombinant enzyme with that of the native laccase. In this study, complete cDNA encoding laccase (Lac) from white rot fungus Cyathus bulleri was amplified by RACE-PCR, cloned and expressed in the culture supernatant of Pichia pastoris under the control of the alcohol oxidase (AOX)1 promoter. The coding region consisted of 1,542 bp and encodes a protein of 513 amino acids with a signal peptide of 16 amino acids. The deduced amino acid sequence of the matured protein displayed high homology with laccases from Trametes versicolor and Coprinus cinereus. The sequence analysis indicated the presence of Glu 460 and Ser 113 and LEL tripeptide at the position known to influence redox potential of laccases placing this enzyme as a high redox enzyme. Addition of copper sulfate to the production medium enhanced the level of laccase by about 12-fold to a final activity of 7200 U L-1. The recombinant laccase (rLac) was purified by ~4-fold to a specific activity of ~85 U mg(-1) protein. A detailed study of thermostability, chloride and solvent tolerance of the rLac indicated improvement in the first two properties when compared to the native laccase (nLac). Altered glycosylation pattern, identified by peptide mass finger printing, was proposed to contribute to altered properties of the rLac. Laccase of C. bulleri was successfully produced extra-cellularly to a high level of 7200

  11. Isolation, sequencing and expression of RED, a novel human gene encoding an acidic-basic dipeptide repeat.

    Science.gov (United States)

    Assier, E; Bouzinba-Segard, H; Stolzenberg, M C; Stephens, R; Bardos, J; Freemont, P; Charron, D; Trowsdale, J; Rich, T

    1999-04-16

    A novel human gene RED, and the murine homologue, MuRED, were cloned. These genes were named after the extensive stretch of alternating arginine (R) and glutamic acid (E) or aspartic acid (D) residues that they contain. We term this the 'RED' repeat. The genes of both species were expressed in a wide range of tissues and we have mapped the human gene to chromosome 5q22-24. MuRED and RED shared 98% sequence identity at the amino acid level. The open reading frame of both genes encodes a 557 amino acid protein. RED fused to a fluorescent tag was expressed in nuclei of transfected cells and localised to nuclear dots. Co-localisation studies showed that these nuclear dots did not contain either PML or Coilin, which are commonly found in the POD or coiled body nuclear compartments. Deletion of the amino terminal 265 amino acids resulted in a failure to sort efficiently to the nucleus, though nuclear dots were formed. Deletion of a further 50 amino acids from the amino terminus generates a protein that can sort to the nucleus but is unable to generate nuclear dots. Neither construct localised to the nucleolus. The characteristics of RED and its nuclear localisation implicate it as a regulatory protein, possibly involved in transcription.

  12. A Combined Linkage and Exome Sequencing Analysis for Electrocardiogram Parameters in the Erasmus Rucphen Family Study.

    Science.gov (United States)

    Silva, Claudia T; Zorkoltseva, Irina V; Amin, Najaf; Demirkan, Ayşe; van Leeuwen, Elisabeth M; Kors, Jan A; van den Berg, Marten; Stricker, Bruno H; Uitterlinden, André G; Kirichenko, Anatoly V; Witteman, Jacqueline C M; Willemsen, Rob; Oostra, Ben A; Axenovich, Tatiana I; van Duijn, Cornelia M; Isaacs, Aaron

    2016-01-01

    Electrocardiogram (ECG) measurements play a key role in the diagnosis and prediction of cardiac arrhythmias and sudden cardiac death. ECG parameters, such as the PR, QRS, and QT intervals, are known to be heritable and genome-wide association studies of these phenotypes have been successful in identifying common variants; however, a large proportion of the genetic variability of these traits remains to be elucidated. The aim of this study was to discover loci potentially harboring rare variants utilizing variance component linkage analysis in 1547 individuals from a large family-based study, the Erasmus Rucphen Family Study (ERF). Linked regions were further explored using exome sequencing. Five suggestive linkage peaks were identified: two for QT interval (1q24, LOD = 2.63; 2q34, LOD = 2.05), one for QRS interval (1p35, LOD = 2.52) and two for PR interval (9p22, LOD = 2.20; 14q11, LOD = 2.29). Fine-mapping using exome sequence data identified a C > G missense variant (c.713C > G, p.Ser238Cys) in the FCRL2 gene associated with QT (rs74608430; P = 2.8 × 10 -4 , minor allele frequency = 0.019). Heritability analysis demonstrated that the SNP explained 2.42% of the trait's genetic variability in ERF ( P = 0.02). Pathway analysis suggested that the gene is involved in cytosolic Ca 2+ levels ( P = 3.3 × 10 -3 ) and AMPK stimulated fatty acid oxidation in muscle ( P = 4.1 × 10 -3 ). Look-ups in bioinformatics resources showed that expression of FCRL2 is associated with ARHGAP24 and SETBP1 expression. This finding was not replicated in the Rotterdam study. Combining the bioinformatics information with the association and linkage analyses, FCRL2 emerges as a strong candidate gene for QT interval.

  13. A combined linkage and exome sequencing analysis for electrocardiogram parameters in the Erasmus Rucphen Family study

    Directory of Open Access Journals (Sweden)

    Claudia Tamar Silva

    2016-11-01

    Full Text Available Electrocardiogram (ECG measurements play a key role in the diagnosis and prediction of cardiac arrhythmias and sudden cardiac death. ECG parameters, such as the PR, QRS, and QT intervals, are known to be heritable and genome-wide association studies (GWAS of these phenotypes have been successful in identifying common variants; however, a large proportion of the genetic variability of these traits remains to be elucidated. The aim of this study was to discover loci potentially harboring rare variants utilizing variance component linkage analysis in 1547 individuals from a large family-based study, the Erasmus Rucphen Family Study (ERF. Linked regions were further explored using exome sequencing. Five suggestive linkage peaks were identified: two for QT interval (1q24, LOD = 2.63; 2q34, LOD = 2.05, one for QRS interval (1p35, LOD = 2.52 and two for PR interval (9p22, LOD = 2.20; 14q11, LOD = 2.29. Fine-mapping using exome sequence data identified a C > G missense variant (c.713C>G, p.Ser238Cys in the FCRL2 gene associated with QT (rs74608430; P = 2.8 ×10-4, minor allele frequency = 0.019. Heritability analysis demonstrated that the SNP explained 2.42% of the trait’s genetic variability in ERF (P = 0.02. Pathway analysis suggested that the gene is involved in cytosolic Ca2+ levels (P = 3.3 × 10-3 and AMPK stimulated fatty acid oxidation in muscle (P = 4.1 ×10-3. Look-ups in bioinformatics resources showed that expression of FCRL2 is associated with ARHGAP24 and SETBP1 expression. This finding was not replicated in the Rotterdam study. Combining the bioinformatics information with the association and linkage analyses, FCRL2 emerges as a strong candidate gene for QT interval.

  14. Amino acid analysis in biological fluids by GC-MS

    OpenAIRE

    Kaspar, Hannelore

    2009-01-01

    Amino acids are intermediates in cellular metabolism and their quantitative analysis plays an important role in disease diagnostics. A gas chromatography-mass spectrometry (GC-MS) based method was developed for the quantitative analysis of free amino acids as their propyl chloroformate derivatives in biological fluids. Derivatization with propyl chloroformate could be carried out directly in the biological samples without prior protein precipitation or solid-phase extraction of the amino acid...

  15. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.

    Science.gov (United States)

    Tian, Pengfei; Best, Robert B

    2017-10-17

    Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.

  16. Improvement of gas chromatographic analysis for organic acids and ...

    African Journals Online (AJOL)

    Yomi

    2010-08-27

    Aug 27, 2010 ... and ethanol fermentation by using the anaerobic bacterium. Clostridium ... GC analysis. Standard solution for GC analysis consisted of acetic acid (Sigma-. Aldrich ... Microorganism and inoculum preparation. C. beijerinckii ...

  17. Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework.

    Science.gov (United States)

    Li, Miaoxin; Li, Jiang; Li, Mulin Jun; Pan, Zhicheng; Hsu, Jacob Shujui; Liu, Dajiang J; Zhan, Xiaowei; Wang, Junwen; Song, Youqiang; Sham, Pak Chung

    2017-05-19

    Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. XplorSeq: a software environment for integrated management and phylogenetic analysis of metagenomic sequence data.

    Science.gov (United States)

    Frank, Daniel N

    2008-10-07

    Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects. XplorSeq is a software package that facilitates the compilation, management and phylogenetic analysis of DNA sequences. XplorSeq was developed for, but is not limited to, high-throughput analysis of environmental rRNA gene sequences. XplorSeq integrates and extends several commonly used UNIX-based analysis tools by use of a Macintosh OS-X-based graphical user interface (GUI). Through this GUI, users may perform basic sequence import and assembly steps (base-calling, vector/primer trimming, contig assembly), perform BLAST (Basic Local Alignment and Search Tool; 123) searches of NCBI and local databases, create multiple sequence alignments, build phylogenetic trees, assemble Operational Taxonomic Units, estimate biodiversity indices, and summarize data in a variety of formats. Furthermore, sequences may be annotated with user-specified meta-data, which then can be used to sort data and organize analyses and reports. A document-based architecture permits parallel analysis of sequence data from multiple clones or amplicons, with sequences and other data stored in a single file. XplorSeq should benefit researchers who are engaged in analyses of environmental sequence data, especially those with little experience using bioinformatics software. Although XplorSeq was developed for management of rDNA sequence data, it can be applied to most any sequencing project. The application is available free of charge for non-commercial use at http://vent.colorado.edu/phyloware.

  19. Typing of canine parvovirus isolates using mini-sequencing based single nucleotide polymorphism analysis.

    Science.gov (United States)

    Naidu, Hariprasad; Subramanian, B Mohana; Chinchkar, Shankar Ramchandra; Sriraman, Rajan; Rana, Samir Kumar; Srinivasan, V A

    2012-05-01

    The antigenic types of canine parvovirus (CPV) are defined based on differences in the amino acids of the major capsid protein VP2. Type specificity is conferred by a limited number of amino acid changes and in particular by few nucleotide substitutions. PCR based methods are not particularly suitable for typing circulating variants which differ in a few specific nucleotide substitutions. Assays for determining SNPs can detect efficiently nucleotide substitutions and can thus be adapted to identify CPV types. In the present study, CPV typing was performed by single nucleotide extension using the mini-sequencing technique. A mini-sequencing signature was established for all the four CPV types (CPV2, 2a, 2b and 2c) and feline panleukopenia virus. The CPV typing using the mini-sequencing reaction was performed for 13 CPV field isolates and the two vaccine strains available in our repository. All the isolates had been typed earlier by full-length sequencing of the VP2 gene. The typing results obtained from mini-sequencing matched completely with that of sequencing. Typing could be achieved with less than 100 copies of standard plasmid DNA constructs or ≤10¹ FAID₅₀ of virus by mini-sequencing technique. The technique was also efficient for detecting multiple types in mixed infections. Copyright © 2012 Elsevier B.V. All rights reserved.

  20. Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools.

    Science.gov (United States)

    Kisand, Veljo; Lettieri, Teresa

    2013-04-01

    De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize

  1. Purification and amino acid sequence of a bacteriocins produced by Lactobacillus salivarius K7 isolated from chicken intestine

    Directory of Open Access Journals (Sweden)

    Kenji Sonomoto

    2006-03-01

    Full Text Available A bacteriocin-producing strain, Lactobacillus K7, was isolated from a chicken intestine. The inhibitory activity was determined by spot-on-lawn technique. Identification of the strain was performed by morphological, biochemical (API 50 CH kit and molecular genetic (16S rDNA basis. Bacteriocin purification processes were carried out by amberlite adsorption, cation exchange and reverse-phase high perform- ance liquid chromatography. N-terminal amino acid sequences were performed by Edman degradation. Molecular mass was determined by electrospray-ionization (ESI mass spectrometry (MS. Lactobacillus K7 showed inhibitory activity against Lactobacillus sakei subsp. sakei JCM 1157T, Leuconostoc mesenteroides subsp. mesenteroides JCM 6124T and Bacillus coagulans JCM 2257T. This strain was identified as Lb. salivarius. The antimicrobial substance was destroyed by proteolytic enzymes, indicating its proteinaceous structure designated as a bacteriocin type. The purification of bacteriocin by amberlite adsorption, cation exchange, and reverse-phase chromatography resulted in only one single active peak, which was designated FK22. Molecular weight of this fraction was 4331.70 Da. By amino acid sequence, this peptide was homology to Abp 118 beta produced by Lb. salivarius UCC118. In addition, Lb. salivarius UCC118 produced 2-peptide bacteriocin, which was Abp 118 alpha and beta. Based on the partial amino acid sequences of Abp 118 beta, specific primers were designed from nucleotide sequences according to data from GenBank. The result showed that the deduced peptide was high homology to 2-peptide bacteriocin, Abp 118 alpha and beta.

  2. Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data.

    Science.gov (United States)

    Jia, Cheng; Hu, Yu; Kelly, Derek; Kim, Junhyong; Li, Mingyao; Zhang, Nancy R

    2017-11-02

    Recent technological breakthroughs have made it possible to measure RNA expression at the single-cell level, thus paving the way for exploring expression heterogeneity among individual cells. Current single-cell RNA sequencing (scRNA-seq) protocols are complex and introduce technical biases that vary across cells, which can bias downstream analysis without proper adjustment. To account for cell-to-cell technical differences, we propose a statistical framework, TASC (Toolkit for Analysis of Single Cell RNA-seq), an empirical Bayes approach to reliably model the cell-specific dropout rates and amplification bias by use of external RNA spike-ins. TASC incorporates the technical parameters, which reflect cell-to-cell batch effects, into a hierarchical mixture model to estimate the biological variance of a gene and detect differentially expressed genes. More importantly, TASC is able to adjust for covariates to further eliminate confounding that may originate from cell size and cell cycle differences. In simulation and real scRNA-seq data, TASC achieves accurate Type I error control and displays competitive sensitivity and improved robustness to batch effects in differential expression analysis, compared to existing methods. TASC is programmed to be computationally efficient, taking advantage of multi-threaded parallelization. We believe that TASC will provide a robust platform for researchers to leverage the power of scRNA-seq. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Sequence analysis of the Legionella micdadei groELS operon

    DEFF Research Database (Denmark)

    Hindersson, P; Høiby, N; Bangsborg, Jette Marie

    1991-01-01

    A 2.7 kb DNA fragment encoding the 60 kDa common antigen (CA) and a 13 kDa protein of Legionella micdadei was sequenced. Two open reading frames of 57,677 and 10,456 Da were identified, corresponding to the heat shock proteins GroEL and GroES, respectively. Typical -35, -10, and Shine-Dalgarno heat...

  4. Sequence analysis-based characterization and identification of neurovirulence-associated variants of 36 EV71 strains from China.

    Science.gov (United States)

    Xu, Jun; Wang, Fang; Zhao, Desheng; Liu, Jiang; Su, Hong; Wang, Baolong

    2018-03-30

    Enterovirus 71 (EV71) is the main pathogen of hand-foot-mouth disease (HFMD) and causes several neurological complications. As new strains of EV71 are constantly discovered, it is important to understand the genomic characteristics of the viruses and the mechanism of virulence. Herein, we isolated five strains of EV71 from HFMD patients with or without neurovirulence and sequenced their whole genomes. We then performed whole genome sequence analysis of totally 36 EV71 strains. The phylogenetic analysis of the VP1 region revealed all five isolated strains are clustered into C4a of C4 subgenotype. In addition, by comparing the complete genome sequences of 36 strains, 253 variable amino acid positions were found, 14 of which were identified to be associated with neurovirulence (P < 0.05). Moreover, a similar pattern of amino acid variants combination was identified in four strains without neurovirulence, indicating this type of variant pattern might be associated with avirulence. The strains with neurovirulence appeared to be distinguished from those without neurovirulence by the variants in VP1 and P2 regions, implying VP1 and P2 are the important regions associated with neurovirulence. Indeed, 3-D modeling of VP1 and P2 regions of non-neurovirulent and neurovirulent strains revealed that the different variants resulted in different protein structures and amino acid composition of ligand binding site, which might account for their difference in neurovirulence. In summary, our study reveals 14 variable amino acid positions of VP1, P2 and P3 regions are related to the virulence and that mutations in the capsid proteins of EV71 might contribute to neurovirulence. © 2018 Wiley Periodicals, Inc.

  5. The Matrix Method of Representation, Analysis and Classification of Long Genetic Sequences

    Directory of Open Access Journals (Sweden)

    Ivan V. Stepanyan

    2017-01-01

    Full Text Available The article is devoted to a matrix method of comparative analysis of long nucleotide sequences by means of presenting each sequence in the form of three digital binary sequences. This method uses a set of symmetries of biochemical attributes of nucleotides. It also uses the possibility of presentation of every whole set of N-mers as one of the members of a Kronecker family of genetic matrices. With this method, a long nucleotide sequence can be visually represented as an individual fractal-like mosaic or another regular mosaic of binary type. In contrast to natural nucleotide sequences, artificial random sequences give non-regular patterns. Examples of binary mosaics of long nucleotide sequences are shown, including cases of human chromosomes and penicillins. The obtained results are then discussed.

  6. Simple sequence repeat (SSR) markers analysis of genetic diversity ...

    African Journals Online (AJOL)

    hope&shola

    2012-04-24

    Apr 24, 2012 ... erucic acid in the oil and low glucosinolate content in the meal has made rapeseed a valuable source of high quality oil for people and nutritional protein for live-stock. (Qiu et al., 2006). Previous studies have demonstrated that yellow seeds have a thinner seed coat than black seeds in the same genetic ...

  7. Molecular cloning, expression analysis and sequence prediction of ...

    African Journals Online (AJOL)

    ajl yemi

    2011-11-28

    Nov 28, 2011 ... prediction of CCAAT/enhancer-binding protein beta ... CCAAT/enhancer-binding protein beta (C/EBPβ), as an essential transcriptional factor, regulates the ... acid area from 274 to 337 was found, concurring with the main ...

  8. Molecular cloning, sequence analysis and tissue expression of ...

    African Journals Online (AJOL)

    Proofreader

    2017-10-01

    Oct 1, 2017 ... p-distance model for amino acid substitutions. A bootstrap .... These were a thymine/cytosine (T/C) SNP and a thymine/adenine (T/A) SNP. ..... Two rat homologues of Drosophila achaete–scute specifically expressed in ...

  9. Extraterrestrial material analysis: loss of amino acids during liquid-phase acid hydrolysis

    Science.gov (United States)

    Buch, Arnaud; Brault, Amaury; Szopa, Cyril; Freissinet, Caroline

    2015-04-01

    Searching for building blocks of life in extraterrestrial material is a way to learn more about how life could have appeared on Earth. With this aim, liquid-phase acid hydrolysis has been used, since at least 1970 , in order to extract amino acids and other organic molecules from extraterrestrial materials (e.g. meteorites, lunar fines) or Earth analogues (e.g. Atacama desert soil). This procedure involves drastic conditions such as heating samples in 6N HCl for 24 h, either under inert atmosphere/vacuum, or air. Analysis of the hydrolyzed part of the sample should give its total (free plus bound) amino acid content. The present work deals with the influence of the 6N HCl hydrolysis on amino acid degradation. Our experiments have been performed on a standard solution of 17 amino acids. After liquid-phase acid hydrolysis (6N HCl) under argon atmosphere (24 h at 100°C), the liquid phase was evaporated and the dry residue was derivatized with N-Methyl-N-(t-butyldimethylsilyl)trifluoroacetamide (MTBSTFA) and dimethylformamide (DMF), followed by gas chromatography-mass spectrometry analysis. After comparison with derivatized amino acids from the standard solution, a significant reduction of the chromatographic peak areas was observed for most of the amino acids after liquid-phase acid hydrolysis. Furthermore, the same loss pattern was observed when the amino acids were exposed to cold 6N HCl for a short amount of time. The least affected amino acid, i.e. glycine, was found to be 73,93% percent less abundant compared to the non-hydrolyzed standard, while the most affected, i.e. histidine, was not found in the chromatograms after hydrolysis. Our experiments thereby indicate that liquid-phase acid hydrolysis, even under inert atmosphere, leads to a partial or total loss of all of the 17 amino acids present in the standard solution, and that a quick cold contact with 6N HCl is sufficient to lead to a loss of amino acids. Therefore, in the literature, the reported increase

  10. OPTSDNA: Performance evaluation of an efficient distributed bioinformatics system for DNA sequence analysis.

    Science.gov (United States)

    Khan, Mohammad Ibrahim; Sheel, Chotan

    2013-01-01

    Storage of sequence data is a big concern as the amount of data generated is exponential in nature at several locations. Therefore, there is a need to develop techniques to store data using compression algorithm. Here we describe optimal storage algorithm (OPTSDNA) for storing large amount of DNA sequences of varying length. This paper provides performance analysis of optimal storage algorithm (OPTSDNA) of a distributed bioinformatics computing system for analysis of DNA sequences. OPTSDNA algorithm is used for storing various sizes of DNA sequences into database. DNA sequences of different lengths were stored by using this algorithm. These input DNA sequences are varied in size from very small to very large. Storage size is calculated by this algorithm. Response time is also calculated in this work. The efficiency and performance of the algorithm is high (in size calculation with percentage) when compared with other known with sequential approach.

  11. PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids.

    Science.gov (United States)

    Kuznetsov, Igor B; McDuffie, Michael

    2015-05-07

    Alignment of amino acid sequences is the main sequence comparison method used in computational molecular biology. The selection of the amino acid substitution matrix best suitable for a given alignment problem is one of the most important decisions the user has to make. In a conventional amino acid substitution matrix all elements are fixed and their values cannot be easily adjusted. Moreover, most existing amino acid substitution matrices account for the average (dis)similarities between amino acid types and do not distinguish the contribution of a specific biochemical property to these (dis)similarities. PR2ALIGN is a stand-alone software program and a web-server that provide the functionality for implementing flexible user-specified alignment scoring functions and aligning pairs of amino acid sequences based on the comparison of the profiles of biochemical properties of these sequences. Unlike the conventional sequence alignment methods that use 20x20 fixed amino acid substitution matrices, PR2ALIGN uses a set of weighted biochemical properties of amino acids to measure the distance between pairs of aligned residues and to find an optimal minimal distance global alignment. The user can provide any number of amino acid properties and specify a weight for each property. The higher the weight for a given property, the more this property affects the final alignment. We show that in many cases the approach implemented in PR2ALIGN produces better quality pair-wise alignments than the conventional matrix-based approach. PR2ALIGN will be helpful for researchers who wish to align amino acid sequences by using flexible user-specified alignment scoring functions based on the biochemical properties of amino acids instead of the amino acid substitution matrix. To the best of the authors' knowledge, there are no existing stand-alone software programs or web-servers analogous to PR2ALIGN. The software is freely available from http://pr2align.rit.albany.edu.

  12. Analysis of xylem formation in pine by cDNA sequencing

    Science.gov (United States)

    Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.; hide

    1998-01-01

    Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.

  13. MiSeq: A Next Generation Sequencing Platform for Genomic Analysis.

    Science.gov (United States)

    Ravi, Rupesh Kanchi; Walton, Kendra; Khosroheidari, Mahdieh

    2018-01-01

    MiSeq, Illumina's integrated next generation sequencing instrument, uses reversible-terminator sequencing-by-synthesis technology to provide end-to-end sequencing solutions. The MiSeq instrument is one of the smallest benchtop sequencers that can perform onboard cluster generation, amplification, genomic DNA sequencing, and data analysis, including base calling, alignment and variant calling, in a single run. It performs both single- and paired-end runs with adjustable read lengths from 1 × 36 base pairs to 2 × 300 base pairs. A single run can produce output data of up to 15 Gb in as little as 4 h of runtime and can output up to 25 M single reads and 50 M paired-end reads. Thus, MiSeq provides an ideal platform for rapid turnaround time. MiSeq is also a cost-effective tool for various analyses focused on targeted gene sequencing (amplicon sequencing and target enrichment), metagenomics, and gene expression studies. For these reasons, MiSeq has become one of the most widely used next generation sequencing platforms. Here, we provide a protocol to prepare libraries for sequencing using the MiSeq instrument and basic guidelines for analysis of output data from the MiSeq sequencing run.

  14. cDNA, genomic sequence cloning and analysis of the ribosomal ...

    African Journals Online (AJOL)

    Ribosomal protein L37A (RPL37A) is a component of 60S large ribosomal subunit encoded by the RPL37A gene, which belongs to the family of ribosomal L37AE proteins, located in the cytoplasm. The complementary deoxyribonucleic acid (cDNA) and the genomic sequence of RPL37A were cloned successfully from giant ...

  15. The shikimate pathway: review of amino acid sequence, function and three-dimensional structures of the enzymes.

    Science.gov (United States)

    Mir, Rafia; Jallu, Shais; Singh, T P

    2015-06-01

    The aromatic compounds such as aromatic amino acids, vitamin K and ubiquinone are important prerequisites for the metabolism of an organism. All organisms can synthesize these aromatic metabolites through shikimate pathway, except for mammals which are dependent on their diet for these compounds. The pathway converts phosphoenolpyruvate and erythrose 4-phosphate to chorismate through seven enzymatically catalyzed steps and chorismate serves as a precursor for the synthesis of variety of aromatic compounds. These enzymes have shown to play a vital role for the viability of microorganisms and thus are suggested to present attractive molecular targets for the design of novel antimicrobial drugs. This review focuses on the seven enzymes of the shikimate pathway, highlighting their primary sequences, functions and three-dimensional structures. The understanding of their active site amino acid maps, functions and three-dimensional structures will provide a framework on which the rational design of antimicrobial drugs would be based. Comparing the full length amino acid sequences and the X-ray crystal structures of these enzymes from bacteria, fungi and plant sources would contribute in designing a specific drug and/or in developing broad-spectrum compounds with efficacy against a variety of pathogens.

  16. Viral metagenomics: Analysis of begomoviruses by illumina high-throughput sequencing

    KAUST Repository

    Idris, Ali

    2014-03-12

    Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes) (genus, Begomovirus; family, Geminiviridae) were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA). Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS). CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions. 2014 by the authors; licensee MDPI, Basel, Switzerland.

  17. Viral Metagenomics: Analysis of Begomoviruses by Illumina High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Ali Idris

    2014-03-01

    Full Text Available Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes (genus, Begomovirus; family, Geminiviridae were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA. Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS. CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions.

  18. Characterization of fatty acid-producing wastewater microbial communities using next generation sequencing technologies

    Science.gov (United States)

    While wastewater represents a viable source of bacterial biodiesel production, very little is known on the composition of these microbial communities. We studied the taxonomic diversity and succession of microbial communities in bioreactors accumulating fatty acids using 454-pyro...

  19. Molecular cloning, sequence analysis and phylogeny of first caudata g-type lysozyme in axolotl (Ambystoma mexicanum).

    Science.gov (United States)

    Yu, Haining; Gao, Jiuxiang; Lu, Yiling; Guang, Huijuan; Cai, Shasha; Zhang, Songyan; Wang, Yipeng

    2013-11-01

    Lysozymes are key proteins that play important roles in innate immune defense in many animal phyla by breaking down the bacterial cell-walls. In this study, we report the molecular cloning, sequence analysis and phylogeny of the first caudate amphibian g-lysozyme: a full-length spleen cDNA library from axolotl (Ambystoma mexicanum). A goose-type (g-lysozyme) EST was identified and the full-length cDNA was obtained using RACE-PCR. The axolotl g-lysozyme sequence represents an open reading frame for a putative signal peptide and the mature protein composed of 184 amino acids. The calculated molecular mass and the theoretical isoelectric point (pl) of this mature protein are 21523.0 Da and 4.37, respectively. Expression of g-lysozyme mRNA is predominantly found in skin, with lower levels in spleen, liver, muscle, and lung. Phylogenetic analysis revealed that caudate amphibian g-lysozyme had distinct evolution pattern for being juxtaposed with not only anura amphibian, but also with the fish, bird and mammal. Although the first complete cDNA sequence for caudate amphibian g-lysozyme is reported in the present study, clones encoding axolotl's other functional immune molecules in the full-length cDNA library will have to be further sequenced to gain insight into the fundamental aspects of antibacterial mechanisms in caudate.

  20. Maturity onset diabetes of youth (MODY) in Turkish children: sequence analysis of 11 causative genes by next generation sequencing.

    Science.gov (United States)

    Ağladıoğlu, Sebahat Yılmaz; Aycan, Zehra; Çetinkaya, Semra; Baş, Veysel Nijat; Önder, Aşan; Peltek Kendirci, Havva Nur; Doğan, Haldun; Ceylaner, Serdar

    2016-04-01

    Maturity-onset diabetes of the youth (MODY), is a genetically and clinically heterogeneous group of diseasesand is often misdiagnosed as type 1 or type 2 diabetes. The aim of this study is to investigate both novel and proven mutations of 11 MODY genes in Turkish children by using targeted next generation sequencing. A panel of 11 MODY genes were screened in 43 children with MODY diagnosed by clinical criterias. Studies of index cases was done with MISEQ-ILLUMINA, and family screenings and confirmation studies of mutations was done by Sanger sequencing. We identified 28 (65%) point mutations among 43 patients. Eighteen patients have GCK mutations, four have HNF1A, one has HNF4A, one has HNF1B, two have NEUROD1, one has PDX1 gene variations and one patient has both HNF1A and HNF4A heterozygote mutations. This is the first study including molecular studies of 11 MODY genes in Turkish children. GCK is the most frequent type of MODY in our study population. Very high frequency of novel mutations (42%) in our study population, supports that in heterogenous disorders like MODY sequence analysis provides rapid, cost effective and accurate genetic diagnosis.

  1. Whole genome sequencing and bioinformatics analysis of two Egyptian genomes.

    Science.gov (United States)

    ElHefnawi, Mahmoud; Jeon, Sungwon; Bhak, Youngjune; ElFiky, Asmaa; Horaiz, Ahmed; Jun, JeHoon; Kim, Hyunho; Bhak, Jong

    2018-05-15

    We report two Egyptian male genomes (EGP1 and EGP2) sequenced at ~ 30× sequencing depths. EGP1 had 4.7 million variants, where 198,877 were novel variants while EGP2 had 209,109 novel variants out of 4.8 million variants. The mitochondrial haplogroup of the two individuals were identified to be H7b1 and L2a1c, respectively. We also identified the Y haplogroup of EGP1 (R1b) and EGP2 (J1a2a1a2 > P58 > FGC11). EGP1 had a mutation in the NADH gene of the mitochondrial genome ND4 (m.11778 G > A) that causes Leber's hereditary optic neuropathy. Some SNPs shared by the two genomes were associated with an increased level of cholesterol and triglycerides, probably related with Egyptians obesity. Comparison of these genomes with African and Western-Asian genomes can provide insights on Egyptian ancestry and genetic history. This resource can be used to further understand genomic diversity and functional classification of variants as well as human migration and evolution across Africa and Western-Asia. Copyright © 2017. Published by Elsevier B.V.

  2. Accident sequence precursor analysis level 2/3 model development

    International Nuclear Information System (INIS)

    Lui, C.H.; Galyean, W.J.; Brownson, D.A.

    1997-01-01

    The US Nuclear Regulatory Commission's Accident Sequence Precursor (ASP) program currently uses simple Level 1 models to assess the conditional core damage probability for operational events occurring in commercial nuclear power plants (NPP). Since not all accident sequences leading to core damage will result in the same radiological consequences, it is necessary to develop simple Level 2/3 models that can be used to analyze the response of the NPP containment structure in the context of a core damage accident, estimate the magnitude of the resulting radioactive releases to the environment, and calculate the consequences associated with these releases. The simple Level 2/3 model development work was initiated in 1995, and several prototype models have been completed. Once developed, these simple Level 2/3 models are linked to the simple Level 1 models to provide risk perspectives for operational events. This paper describes the methods implemented for the development of these simple Level 2/3 ASP models, and the linkage process to the existing Level 1 models

  3. In Vivo Enhancer Analysis Chromosome 16 Conserved NoncodingSequences

    Energy Technology Data Exchange (ETDEWEB)

    Pennacchio, Len A.; Ahituv, Nadav; Moses, Alan M.; Nobrega,Marcelo; Prabhakar, Shyam; Shoukry, Malak; Minovitsky, Simon; Visel,Axel; Dubchak, Inna; Holt, Amy; Lewis, Keith D.; Plajzer-Frick, Ingrid; Akiyama, Jennifer; De Val, Sarah; Afzal, Veena; Black, Brian L.; Couronne, Olivier; Eisen, Michael B.; Rubin, Edward M.

    2006-02-01

    The identification of enhancers with predicted specificitiesin vertebrate genomes remains a significant challenge that is hampered bya lack of experimentally validated training sets. In this study, weleveraged extreme evolutionary sequence conservation as a filter toidentify putative gene regulatory elements and characterized the in vivoenhancer activity of human-fish conserved and ultraconserved1 noncodingelements on human chromosome 16 as well as such elements from elsewherein the genome. We initially tested 165 of these extremely conservedsequences in a transgenic mouse enhancer assay and observed that 48percent (79/165) functioned reproducibly as tissue-specific enhancers ofgene expression at embryonic day 11.5. While driving expression in abroad range of anatomical structures in the embryo, the majority of the79 enhancers drove expression in various regions of the developingnervous system. Studying a set of DNA elements that specifically droveforebrain expression, we identified DNA signatures specifically enrichedin these elements and used these parameters to rank all ~;3,400human-fugu conserved noncoding elements in the human genome. The testingof the top predictions in transgenic mice resulted in a three-foldenrichment for sequences with forebrain enhancer activity. These datadramatically expand the catalogue of in vivo-characterized human geneenhancers and illustrate the future utility of such training sets for avariety of iological applications including decoding the regulatoryvocabulary of the human genome.

  4. Sequence-based analysis of the bacterial and fungal compositions of multiple kombucha (tea fungus) samples.

    Science.gov (United States)

    Marsh, Alan J; O'Sullivan, Orla; Hill, Colin; Ross, R Paul; Cotter, Paul D

    2014-04-01

    Kombucha is a sweetened tea beverage that, as a consequence of fermentation, contains ethanol, carbon dioxide, a high concentration of acid (gluconic, acetic and lactic) as well as a number of other metabolites and is thought to contain a number of health-promoting components. The sucrose-tea solution is fermented by a symbiosis of bacteria and yeast embedded within a cellulosic pellicle, which forms a floating mat in the tea, and generates a new layer with each successful fermentation. The specific identity of the microbial populations present has been the focus of attention but, to date, the majority of studies have relied on culture-based analyses. To gain a more comprehensive insight into the kombucha microbiota we have carried out the first culture-independent, high-throughput sequencing analysis of the bacterial and fungal populations of 5 distinct pellicles as well as the resultant fermented kombucha at two time points. Following the analysis it was established that the major bacterial genus present was Gluconacetobacter, present at >85% in most samples, with only trace populations of Acetobacter detected (kombucha, also being revealed. The yeast populations were found to be dominated by Zygosaccharomyces at >95% in the fermented beverage, with a greater fungal diversity present in the cellulosic pellicle, including numerous species not identified in kombucha previously. Ultimately, this study represents the most accurate description of the microbiology of kombucha to date. Copyright © 2013 Elsevier Ltd. All rights reserved.

  5. Effects of the amino acid sequence on thermal conduction through β-sheet crystals of natural silk protein.

    Science.gov (United States)

    Zhang, Lin; Bai, Zhitong; Ban, Heng; Liu, Ling

    2015-11-21

    Recent experiments have discovered very different thermal conductivities between the spider silk and the silkworm silk. Decoding the molecular mechanisms underpinning the distinct thermal properties may guide the rational design of synthetic silk materials and other biomaterials for multifunctionality and tunable properties. However, such an understanding is lacking, mainly due to the complex structure and phonon physics associated with the silk materials. Here, using non-equilibrium molecular dynamics, we demonstrate that the amino acid sequence plays a key role in the thermal conduction process through β-sheets, essential building blocks of natural silks and a variety of other biomaterials. Three representative β-sheet types, i.e. poly-A, poly-(GA), and poly-G, are shown to have distinct structural features and phonon dynamics leading to different thermal conductivities. A fundamental understanding of the sequence effects may stimulate the design and engineering of polymers and biopolymers for desired thermal properties.

  6. Genomic insight into the common carp (Cyprinus carpio genome by sequencing analysis of BAC-end sequences

    Directory of Open Access Journals (Sweden)

    Wang Jintu

    2011-04-01

    Full Text Available Abstract Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio, a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3

  7. Genomic insight into the common carp (Cyprinus carpio) genome by sequencing analysis of BAC-end sequences

    Science.gov (United States)

    2011-01-01

    Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of

  8. A symbolic dynamics approach for the complexity analysis of chaotic pseudo-random sequences

    International Nuclear Information System (INIS)

    Xiao Fanghong

    2004-01-01

    By considering a chaotic pseudo-random sequence as a symbolic sequence, authors present a symbolic dynamics approach for the complexity analysis of chaotic pseudo-random sequences. The method is applied to the cases of Logistic map and one-way coupled map lattice to demonstrate how it works, and a comparison is made between it and the approximate entropy method. The results show that this method is applicable to distinguish the complexities of different chaotic pseudo-random sequences, and it is superior to the approximate entropy method

  9. A stochastic context free grammar based framework for analysis of protein sequences

    Directory of Open Access Journals (Sweden)

    Nebel Jean-Christophe

    2009-10-01

    Full Text Available Abstract Background In the last decade, there have been many applications of formal language theory in bioinformatics such as RNA structure prediction and detection of patterns in DNA. However, in the field of proteomics, the size of the protein alphabet and the complexity of relationship between amino acids have mainly limited the application of formal language theory to the production of grammars whose expressive power is not higher than stochastic regular grammars. However, these grammars, like other state of the art methods, cannot cover any higher-order dependencies such as nested and crossing relationships that are common in proteins. In order to overcome some of these limitations, we propose a Stochastic Context Free Grammar based framework for the analysis of protein sequences where grammars are induced using a genetic algorithm. Results This framework was implemented in a system aiming at the production of binding site descriptors. These descriptors not only allow detection of protein regions that are involved in these sites, but also provide insight in their structure. Grammars were induced using quantitative properties of amino acids to deal with the size of the protein alphabet. Moreover, we imposed some structural constraints on grammars to reduce the extent of the rule search space. Finally, grammars based on different properties were combined to convey as much information as possible. Evaluation was performed on sites of various sizes and complexity described either by PROSITE patterns, domain profiles or a set of patterns. Results show the produced binding site descriptors are human-readable and, hence, highlight biologically meaningful features. Moreover, they achieve good accuracy in both annotation and detection. In addition, findings suggest that, unlike current state-of-the-art methods, our system may be particularly suited to deal with patterns shared by non-homologous proteins. Conclusion A new Stochastic Context Free

  10. Sequence-selective targeting of duplex DNA by peptide nucleic acids

    DEFF Research Database (Denmark)

    Nielsen, Peter E

    2010-01-01

    Sequence-selective gene targeting constitutes an attractive drug-discovery approach for genetic therapy, with the aim of reducing or enhancing the activity of specific genes at the transcriptional level, or as part of a methodology for targeted gene repair. The pseudopeptide DNA mimic peptide...

  11. Comparative Sequence Analysis of Plasmids from Lactobacillus delbrueckii and Construction of a Shuttle Cloning Vector▿

    Science.gov (United States)

    Lee, Ju-Hoon; Halgerson, Jamie S.; Kim, Jeong-Hwan; O'Sullivan, Daniel J.

    2007-01-01

    While plasmids are very commonly associated with the majority of the lactic acid bacteria, they are only very rarely associated with Lactobacillus delbrueckii, with only four characterized to date. In this study, the complete sequence of a native plasmid, pDOJ1, from a strain of Lactobacillus delbrueckii subsp. bulgaricus was determined. It consisted of a circular DNA molecule of 6,220 bp with a G+C content of 44.6% and a characteristic ori and encoded six open reading frames (ORFs), of which functions could be predicted for three—a mobilization (Mob) protein, a transposase, and a fused primase-helicase replication protein. Comparative analysis of pDOJ1 and the other available L. delbrueckii plasmids (pLBB1, pJBL2, pN42, and pLL1212) revealed a very similar organization and amino acid identities between 85 and 98% for the putative proteins of all six predicted ORFs from pDOJ1, reflecting a common origin for L. delbrueckii plasmids. Analysis of the fused primase-helicase replication gene found a similar fused organization only in the theta replicating group B plasmids from Streptococcus thermophilus. This observation and the ability of the replicon to function in S. thermophilus support the idea that the origin of plasmids in L. delbrueckii was likely from S. thermophilus. This may reflect the close association of these two species in dairy fermentations, particularly yogurt production. As no vector based on plasmid replicons from L. delbrueckii has previously been constructed, an Escherichia coli-L. delbrueckii shuttle cloning vector, pDOJ4, was constructed from pDOJ1, the p15A ori, the chloramphenicol resistance gene of pCI372, and the lacZ polylinker from pUC18. This cloning vector was successfully introduced into E. coli, L. delbrueckii subsp. bulgaricus, S. thermophilus, and Lactococcus lactis. This shuttle cloning vector provides a new tool for molecular analysis of Lactobacillus delbrueckii and other lactic acid bacteria. PMID:17526779

  12. The sequence and analysis of duplication rich human chromosome 16

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-08-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  13. Analysis of decision procedures for a sequence of inventory periods

    International Nuclear Information System (INIS)

    Avenhaus, R.

    1982-07-01

    Optimal test procedures for a sequence of inventory periods will be discussed. Starting with a game theoretical description of the conflict situation between the plant operator and the inspector, the objectives of the inspector as well as the general decision theoretical problem will be formulated. In the first part the objective of 'secure' detection will be emphasized which means that only at the end of the reference time a decision is taken by the inspector. In the second part the objective of 'timely' detection will be emphasized which will lead to sequential test procedures. At the end of the paper all procedures will be summarized, and in view of the multitude of procedures available at the moment some comments about future work will be given. (orig./HP) [de

  14. The Sequence and Analysis of Duplication Rich Human Chromosome 16

    Science.gov (United States)

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-01-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  15. Factoring local sequence composition in motif significance analysis.

    Science.gov (United States)

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  16. Peptide Pattern Recognition for high-throughput protein sequence analysis and clustering

    DEFF Research Database (Denmark)

    Busk, Peter Kamp

    2017-01-01

    Large collections of protein sequences with divergent sequences are tedious to analyze for understanding their phylogenetic or structure-function relation. Peptide Pattern Recognition is an algorithm that was developed to facilitate this task but the previous version does only allow a limited...... number of sequences as input. I implemented Peptide Pattern Recognition as a multithread software designed to handle large numbers of sequences and perform analysis in a reasonable time frame. Benchmarking showed that the new implementation of Peptide Pattern Recognition is twenty times faster than...... the previous implementation on a small protein collection with 673 MAP kinase sequences. In addition, the new implementation could analyze a large protein collection with 48,570 Glycosyl Transferase family 20 sequences without reaching its upper limit on a desktop computer. Peptide Pattern Recognition...

  17. Information-Theoretical Analysis of EEG Microstate Sequences in Python

    Directory of Open Access Journals (Sweden)

    Frederic von Wegner

    2018-06-01

    Full Text Available We present an open-source Python package to compute information-theoretical quantities for electroencephalographic data. Electroencephalography (EEG measures the electrical potential generated by the cerebral cortex and the set of spatial patterns projected by the brain's electrical potential on the scalp surface can be clustered into a set of representative maps called EEG microstates. Microstate time series are obtained by competitively fitting the microstate maps back into the EEG data set, i.e., by substituting the EEG data at a given time with the label of the microstate that has the highest similarity with the actual EEG topography. As microstate sequences consist of non-metric random variables, e.g., the letters A–D, we recently introduced information-theoretical measures to quantify these time series. In wakeful resting state EEG recordings, we found new characteristics of microstate sequences such as periodicities related to EEG frequency bands. The algorithms used are here provided as an open-source package and their use is explained in a tutorial style. The package is self-contained and the programming style is procedural, focusing on code intelligibility and easy portability. Using a sample EEG file, we demonstrate how to perform EEG microstate segmentation using the modified K-means approach, and how to compute and visualize the recently introduced information-theoretical tests and quantities. The time-lagged mutual information function is derived as a discrete symbolic alternative to the autocorrelation function for metric time series and confidence intervals are computed from Markov chain surrogate data. The software package provides an open-source extension to the existing implementations of the microstate transform and is specifically designed to analyze resting state EEG recordings.

  18. Acid Rain Analysis by Standard Addition Titration.

    Science.gov (United States)

    Ophardt, Charles E.

    1985-01-01

    The standard addition titration is a precise and rapid method for the determination of the acidity in rain or snow samples. The method requires use of a standard buret, a pH meter, and Gran's plot to determine the equivalence point. Experimental procedures used and typical results obtained are presented. (JN)

  19. Physicochemical properties and analysis of Malaysian palm fatty acid distilled

    Science.gov (United States)

    Jumaah, Majd Ahmed; Yusoff, Mohamad Firdaus Mohamad; Salimon, Jumat

    2018-04-01

    Palm fatty acid distillate (PFAD) is cheap and valuable byproduct of edible oil processing industries. This study was carried out to determine the physicochemical properties of Malaysian palm fatty acid distilled (PFAD). The physicochemical properties showed that the free fatty acid (FFA %), acid value, iodine value, saponification value, unsaponifiable matter, hydroxyl value, specific gravity at 28°C, moisture content, viscosity at 40°C and colour at 28°C values were 87.04± 0.1 %, 190.6± 1 mg/g, 53.3±0.2 mg/g, 210.37±0.8 mg/g, 1.5±0.1%, 47±0.2 mg/g, 0.87 g/ml, 0.63 %, 30 cSt and yellowish respectively. Gas chromatography (GC) was used to determine the fatty acid (FA) composition in PFAD. The fatty acids were found to be comprised mostly with 48.9 % palmitic acid (C16:0), 37.4 % oleic acid (C18:1), 9.7 % linoleic acid (C18:2), 2.7 % stearic acid (C18:0) and 1.1 % myristic acid (C14:0). The analysis of high performance liquid chromatography (HPLC) has resulted with 99.2 % of FFA, while diacylglycerol and monoacylglycerol were 0.69 and 0.062 % respectively.

  20. Isolation of fucoxanthin and fatty acids analysis of Padina australis ...

    African Journals Online (AJOL)

    Fucoxanthin has been successfully isolated from species of Malaysian brown seaweed, namely Padina australis. The purity of the fucoxanthin is >98% as indicated by high performance liquid chromatography analysis. This seaweed also contains a considerable amount of unsaturated fatty acids. Thirteen fatty acids were ...

  1. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1

    Energy Technology Data Exchange (ETDEWEB)

    Rhee, Mun Su [University of Florida, Gainesville; Moritz, Brelan E. [University of Florida, Gainesville; Xie, Gary [Los Alamos National Laboratory (LANL); Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Dalin, Eileen [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Chertkov, Olga [Los Alamos National Laboratory (LANL); Brettin, Thomas S [ORNL; Han, Cliff [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Patel, Milind [University of Florida, Gainesville; Ou, Mark [University of Florida, Gainesville; Harbrucker, Roberta [University of Florida, Gainesville; Ingram, Lonnie O. [University of Florida; Shanmugam, Keelnathan T. [University of Florida

    2011-01-01

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 C and pH 5.0 and fer- ments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this spo- rogenic lactic acid bacterium to grow at 50-55 C and pH 5.0 makes this organism an attrac- tive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemi- cellulose. This bacterium is also considered as a potential probiotic. Complete genome se- quence of a representative strain, B. coagulans strain 36D1, is presented and discussed.

  2. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1

    Energy Technology Data Exchange (ETDEWEB)

    Xie, Gary [Los Alamos National Laboratory (LANL); Dalin, Eileen [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Chertkov, Olga [Los Alamos National Laboratory (LANL); Land, Miriam L [ORNL

    2011-01-01

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 C and pH 5.0 and fer-ments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemi-cellulose. This bacterium is also considered as a potential probiotic. Complete genome squence of a representative strain, B. coagulans strain 36D1, is presented and discussed.

  3. Combined DECS Analysis and Next-Generation Sequencing Enable Efficient Detection of Novel Plant RNA Viruses

    Directory of Open Access Journals (Sweden)

    Hironobu Yanagisawa

    2016-03-01

    Full Text Available The presence of high molecular weight double-stranded RNA (dsRNA within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV, a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as “DECS-C,” is a powerful method for detecting novel plant viruses.

  4. CodonTest: modeling amino acid substitution preferences in coding sequences.

    Directory of Open Access Journals (Sweden)

    Wayne Delport

    2010-08-01

    Full Text Available Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes.

  5. Cloning, Expression, Sequence Analysis and Homology Modeling of the Prolyl Endoprotease from Eurygaster integriceps Puton

    Directory of Open Access Journals (Sweden)

    Ravi Chandra Yandamuri

    2014-10-01

    Full Text Available eurygaster integriceps Puton, commonly known as sunn pest, is a major pest of wheat in Northern Africa, the Middle East and Eastern Europe. This insect injects a prolyl endoprotease into the wheat, destroying the gluten. The purpose of this study was to clone the full length cDNA of the sunn pest prolyl endoprotease (spPEP for expression in E. coli and to compare the amino acid sequence of the enzyme to other known PEPs in both phylogeny and potential tertiary structure. Sequence analysis shows that the 5ꞌ UTR contains several putative transcription factor binding sites for transcription factors known to be expressed in Drosophila that might be useful targets for inhibition of the enzyme. The spPEP was first identified as a prolyl endoprotease by Darkoh et al., 2010. The enzyme is a unique serine protease of the S9A family by way of its substrate recognition of the gluten proteins, which are greater than 30 kD in size. At 51% maximum identity to known PEPs, homology modeling using SWISS-MODEL, the porcine brain PEP (PDB: 2XWD was selected in the database of known PEP structures, resulting in a predicted tertiary structure 99% identical to the porcine brain PEP structure. A Km for the recombinant spPEP was determined to be 210 ± 53 µM for the zGly-Pro-pNA substrate in 0.025 M ethanolamine, pH 8.5, containing 0.1 M NaCl at 37 °C with a turnover rate of 172 ± 47 µM Gly-Pro-pNA/s/µM of enzyme.

  6. Data Analysis of Sequences and qPCR for Microbial Communities during Algal Blooms

    Science.gov (United States)

    A training opportunity is open to a highly microbial-research-motivated student to conduct sequence analysis, explore novel genes and metabolic pathways, validate resultant findings using qPCR/RT-qPCR and summarize the findings

  7. Sequence analysis of the N-acetyltransferase 2 gene (NAT2) among ...

    African Journals Online (AJOL)

    Yazun Bashir Jarrar

    2017-11-26

    Nov 26, 2017 ... Sequence analysis of the N-acetyltransferase 2 gene (NAT2) among Jordanian volunteers, Libyan. Journal of Medicine .... For molecular modeling of NAT2 protein, visualized ..... cal clustering. .... cular dynamics simulation.

  8. Analysis of common SHOX gene sequence variants and ∼4.9-kb ...

    Indian Academy of Sciences (India)

    [Solc R., Hirschfeldova K., Kebrdlova V. and Baxova A. 2014 Analysis of common SHOX gene sequence variants ... based on a Gibbs sampling strategy were done using .... SHOX (short stature homeobox) are an important cause of growth.

  9. Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

    Science.gov (United States)

    Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

    2004-03-01

    One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.

  10. Probabilistic topic modeling for the analysis and classification of genomic sequences

    Science.gov (United States)

    2015-01-01

    Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734

  11. Cloning and sequence analysis of chitin synthase gene fragments of Demodex mites*

    Science.gov (United States)

    Zhao, Ya-e; Wang, Zheng-hang; Xu, Yang; Xu, Ji-ru; Liu, Wen-yan; Wei, Meng; Wang, Chu-ying

    2012-01-01

    To our knowledge, few reports on Demodex studied at the molecular level are available at present. In this study our group, for the first time, cloned, sequenced and analyzed the chitin synthase (CHS) gene fragments of Demodex folliculorum, Demodex brevis, and Demodex canis (three isolates from each species) from Xi’an China, by designing specific primers based on the only partial sequence of the CHS gene of D. canis from Japan, retrieved from GenBank. Results show that amplification was successful only in three D. canis isolates and one D. brevis isolate out of the nine Demodex isolates. The obtained fragments were sequenced to be 339 bp for D. canis and 338 bp for D. brevis. The CHS gene sequence similarities between the three Xi’an D. canis isolates and one Japanese D. canis isolate ranged from 99.7% to 100.0%, and those between four D. canis isolates and one D. brevis isolate were 99.1%–99.4%. Phylogenetic trees based on maximum parsimony (MP) and maximum likelihood (ML) methods shared the same clusters, according with the traditional classification. Two open reading frames (ORFs) were identified in each CHS gene sequenced, and their corresponding amino acid sequences were located at the catalytic domain. The relatively conserved sequences could be deduced to be a CHS class A gene, which is associated with chitin synthesis in the integument of Demodex mites. PMID:23024043

  12. Cloning and sequence analysis of chitin synthase gene fragments of Demodex mites.

    Science.gov (United States)

    Zhao, Ya-e; Wang, Zheng-hang; Xu, Yang; Xu, Ji-ru; Liu, Wen-yan; Wei, Meng; Wang, Chu-ying

    2012-10-01

    To our knowledge, few reports on Demodex studied at the molecular level are available at present. In this study our group, for the first time, cloned, sequenced and analyzed the chitin synthase (CHS) gene fragments of Demodex folliculorum, Demodex brevis, and Demodex canis (three isolates from each species) from Xi'an China, by designing specific primers based on the only partial sequence of the CHS gene of D. canis from Japan, retrieved from GenBank. Results show that amplification was successful only in three D. canis isolates and one D. brevis isolate out of the nine Demodex isolates. The obtained fragments were sequenced to be 339 bp for D. canis and 338 bp for D. brevis. The CHS gene sequence similarities between the three Xi'an D. canis isolates and one Japanese D. canis isolate ranged from 99.7% to 100.0%, and those between four D. canis isolates and one D. brevis isolate were 99.1%-99.4%. Phylogenetic trees based on maximum parsimony (MP) and maximum likelihood (ML) methods shared the same clusters, according with the traditional classification. Two open reading frames (ORFs) were identified in each CHS gene sequenced, and their corresponding amino acid sequences were located at the catalytic domain. The relatively conserved sequences could be deduced to be a CHS class A gene, which is associated with chitin synthesis in the integument of Demodex mites.

  13. Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer

    Science.gov (United States)

    2017-09-01

    AWARD NUMBER: W81XWH-14-1-0080 TITLE: Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer . PRINCIPAL INVESTIGATOR...TITLE AND SUBTITLE Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer . 5a. CONTRACT NUMBER 5b. GRANT NUMBER GRANT11489...institutional, NIH-funded study of genetic and epigenetic alterations of pre-invasive DCIS that did or did not progress to invasive breast cancer , with an

  14. Rapid and Sensitive Isothermal Detection of Nucleic-acid Sequence by Multiple Cross Displacement Amplification

    OpenAIRE

    Yi Wang; Yan Wang; Ai-Jing Ma; Dong-Xun Li; Li-Juan Luo; Dong-Xin Liu; Dong Jin; Kai Liu; Chang-Yun Ye

    2015-01-01

    We have devised a novel amplification strategy based on isothermal strand-displacement polymerization reaction, which was termed multiple cross displacement amplification (MCDA). The approach employed a set of ten specially designed primers spanning ten distinct regions of target sequence and was preceded at a constant temperature (61?65??C). At the assay temperature, the double-stranded DNAs were at dynamic reaction environment of primer-template hybrid, thus the high concentration of primer...

  15. Seismically induced accident sequence analysis of the advanced test reactor

    International Nuclear Information System (INIS)

    Khericha, S.T.; Henry, D.M.; Ravindra, M.K.; Hashimoto, P.S.; Griffin, M.J.; Tong, W.H.; Nafday, A.M.

    1991-01-01

    A seismic probabilistic risk assessment (PRA) was performed for the Department of Energy (DOE) Advanced Test Reactor (ATR) as part of the external events analysis. The risk from seismic events to the fuel in the core and in the fuel storage canal was evaluated. The key elements of this paper are the integration of seismically induced internal flood and internal fire, and the modeling of human error rates as a function of the magnitude of earthquake. The systems analysis was performed by EG ampersand G Idaho, Inc. and the fragility analysis and quantification were performed by EQE International, Inc. (EQE)

  16. Nutritional and amino acid analysis of raw, partially fermented and ...

    African Journals Online (AJOL)

    African Journal of Food, Agriculture, Nutrition and Development ... The nutritional and amino acid analysis of raw and fermented seeds of Parkia ... between 4.27 and 8.33 % for the fully fermented and the partially fermented seeds, respectively.

  17. Microscopic Analysis and Modeling of Airport Surface Sequencing, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — The complexity and interdependence of operations on the airport surface motivate the need for a comprehensive and detailed, yet flexible and validated analysis and...

  18. BioMatriX: Sequence analysis, structure visualization, phylogenetics ...

    African Journals Online (AJOL)

    bmx-biomatrix.blogspot.com) developed for biological science community to augment scientific research regarding genomics, proteomics, phylogenetics and linkage analysis in one platform. BioMatriX offers multi-functional services to perform ...

  19. Survey sequencing and comparative analysis of the elephant shark (Callorhinchus milii genome.

    Directory of Open Access Journals (Sweden)

    Byrappa Venkatesh

    2007-04-01

    Full Text Available Owing to their phylogenetic position, cartilaginous fishes (sharks, rays, skates, and chimaeras provide a critical reference for our understanding of vertebrate genome evolution. The relatively small genome of the elephant shark, Callorhinchus milii, a chimaera, makes it an attractive model cartilaginous fish genome for whole-genome sequencing and comparative analysis. Here, the authors describe survey sequencing (1.4x coverage and comparative analysis of the elephant shark genome, one of the first cartilaginous fish genomes to be sequenced to this depth. Repetitive sequences, represented mainly by a novel family of short interspersed element-like and long interspersed element-like sequences, account for about 28% of the elephant shark genome. Fragments of approximately 15,000 elephant shark genes reveal specific examples of genes that have been lost differentially during the evolution of tetrapod and teleost fish lineages. Interestingly, the degree of conserved synteny and conserved sequences between the human and elephant shark genomes are higher than that between human and teleost fish genomes. Elephant shark contains putative four Hox clusters indicating that, unlike teleost fish genomes, the elephant shark genome has not experienced an additional whole-genome duplication. These findings underscore the importance of the elephant shark as a critical reference vertebrate genome for comparative analysis of the human and other vertebrate genomes. This study also demonstrates that a survey-sequencing approach can be applied productively for comparative analysis of distantly related vertebrate genomes.

  20. Importance of databases of nucleic acids for bioinformatic analysis focused to genomics

    Science.gov (United States)

    Jimenez-Gutierrez, L. R.; Barrios-Hernández, C. J.; Pedraza-Ferreira, G. R.; Vera-Cala, L.; Martinez-Perez, F.

    2016-08-01

    Recently, bioinformatics has become a new field of science, indispensable in the analysis of millions of nucleic acids sequences, which are currently deposited in international databases (public or private); these databases contain information of genes, RNA, ORF, proteins, intergenic regions, including entire genomes from some species. The analysis of this information requires computer programs; which were renewed in the use of new mathematical methods, and the introduction of the use of artificial intelligence. In addition to the constant creation of supercomputing units trained to withstand the heavy workload of sequence analysis. However, it is still necessary the innovation on platforms that allow genomic analyses, faster and more effectively, with a technological understanding of all biological processes.

  1. Genome sequence of the thermophilic strain Bacillus coagulans 2-6, an efficient producer of high-optical-purity L-lactic acid.

    Science.gov (United States)

    Su, Fei; Yu, Bo; Sun, Jibin; Ou, Hong-Yu; Zhao, Bo; Wang, Limin; Qin, Jiayang; Tang, Hongzhi; Tao, Fei; Jarek, Michael; Scharfe, Maren; Ma, Cuiqing; Ma, Yanhe; Xu, Ping

    2011-09-01

    Bacillus coagulans 2-6 is an efficient producer of lactic acid. The genome of B. coagulans 2-6 has the smallest genome among the members of the genus Bacillus known to date. The frameshift mutation at the start of the d-lactate dehydrogenase sequence might be responsible for the production of high-optical-purity l-lactic acid.

  2. A protein with amino acid sequence homology to bovine insulin is present in the legume Vigna unguiculata (cowpea

    Directory of Open Access Journals (Sweden)

    Venâncio T.M.

    2003-01-01

    Full Text Available Since the discovery of bovine insulin in plants, much effort has been devoted to the characterization of these proteins and elucidation of their functions. We report here the isolation of a protein with similar molecular mass and same amino acid sequence to bovine insulin from developing fruits of cowpea (Vigna unguiculata genotype Epace 10. Insulin was measured by ELISA using an anti-human insulin antibody and was detected both in empty pods and seed coats but not in the embryo. The highest concentrations (about 0.5 ng/µg of protein of the protein were detected in seed coats at 16 and 18 days after pollination, and the values were 1.6 to 4.0 times higher than those found for isolated pods tested on any day. N-terminal amino acid sequencing of insulin was performed on the protein purified by C4-HPLC. The significance of the presence of insulin in these plant tissues is not fully understood but we speculate that it may be involved in the transport of carbohydrate to the fruit.

  3. Cloning and Sequence Analysis of Vibrio halioticoli Genes Encoding Three Types of Polyguluronate Lyase.

    Science.gov (United States)

    Sugimura; Sawabe; Ezura

    2000-01-01

    The alginate lyase-coding genes of Vibrio halioticoli IAM 14596(T), which was isolated from the gut of the abalone Haliotis discus hannai, were cloned using plasmid vector pUC 18, and expressed in Escherichia coli. Three alginate lyase-positive clones, pVHB, pVHC, and pVHE, were obtained, and all clones expressed the enzyme activity specific for polyguluronate. Three genes, alyVG1, alyVG2, and alyVG3, encoding polyguluronate lyase were sequenced: alyVG1 from pVHB was composed of a 1056-bp open reading frame (ORF) encoding 352 amino acid residues; alyVG2 gene from pVHC was composed of a 993-bp ORF encoding 331 amino acid residues; and alyVG3 gene from pVHE was composed of a 705-bp ORF encoding 235 amino acid residues. Comparison of nucleotide and deduced amino acid sequences among AlyVG1, AlyVG2, and AlyVG3 revealed low homologies. The identity value between AlyVG1 and AlyVG2 was 18.7%, and that between AlyVG2 and AlyVG3 was 17.0%. A higher identity value (26.0%) was observed between AlyVG1 and AlyVG3. Sequence comparison among known polyguluronate lyases including AlyVG1, AlyVG2, and AlyVG3 also did not reveal an identical region in these sequences. However, AlyVG1 showed the highest identity value (36.2%) and the highest similarity (73.3%) to AlyA from Klebsiella pneumoniae. A consensus region comprising nine amino acid (YFKAGXYXQ) in the carboxy-terminal region previously reported by Mallisard and colleagues was observed only in AlyVG1 and AlyVG2.

  4. RT-PCR and sequence analysis of the full-length fusion protein of Canine Distemper Virus from domestic dogs.

    Science.gov (United States)

    Romanutti, Carina; Gallo Calderón, Marina; Keller, Leticia; Mattion, Nora; La Torre, José

    2016-02-01

    During 2007-2014, 84 out of 236 (35.6%) samples from domestic dogs submitted to our laboratory for diagnostic purposes were positive for Canine Distemper Virus (CDV), as analyzed by RT-PCR amplification of a fragment of the nucleoprotein gene. Fifty-nine of them (70.2%) were from dogs that had been vaccinated against CDV. The full-length gene encoding the Fusion (F) protein of fifteen isolates was sequenced and compared with that of those of other CDVs, including wild-type and vaccine strains. Phylogenetic analysis using the F gene full-length sequences grouped all the Argentinean CDV strains in the SA2 clade. Sequence identity with the Onderstepoort vaccine strain was 89.0-90.6%, and the highest divergence was found in the 135 amino acids corresponding to the F protein signal-peptide, Fsp (64.4-66.7% identity). In contrast, this region was highly conserved among the local strains (94.1-100% identity). One extra putative N-glycosylation site was identified in the F gene of CDV Argentinean strains with respect to the vaccine strain. The present report is the first to analyze full-length F protein sequences of CDV strains circulating in Argentina, and contributes to the knowledge of molecular epidemiology of CDV, which may help in understanding future disease outbreaks. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Analysis of long-range correlation in sequences data of proteins

    Directory of Open Access Journals (Sweden)

    ADRIANA ISVORAN

    2007-04-01

    Full Text Available The results presented here suggest the existence of correlations in the sequence data of proteins. 32 proteins, both globular and fibrous, both monomeric and polymeric, were analyzed. The primary structures of these proteins were treated as time series. Three spatial series of data for each sequence of a protein were generated from numerical correspondences between each amino acid and a physical property associated with it, i.e., its electric charge, its polar character and its dipole moment. For each series, the spectral coefficient, the scaling exponent and the Hurst coefficient were determined. The values obtained for these coefficients revealed non-randomness in the series of data.

  6. Reproducible analysis of sequencing-based RNA structure probing data with user-friendly tools

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz Jan; Sidiropoulos, Nikos; Vinther, Jeppe

    2015-01-01

    time also made analysis of the data challenging for scientists without formal training in computational biology. Here, we discuss different strategies for data analysis of massive parallel sequencing-based structure-probing data. To facilitate reproducible and standardized analysis of this type of data...

  7. Stratigraphical analysis of the neoproterozoic sedimentary sequences of the Sao Francisco Basin

    International Nuclear Information System (INIS)

    Martins, Mariela; Lemos, Valesca Brasil

    2007-01-01

    A stratigraphic analysis was performed under the principles of Sequence Stratigraphy on the neoproterozoic sedimentary sequences of the Sao Francisco Basin (Central Brazil). Three periods of deposition separated by unconformities were recognized in the Sao Francisco Megasequence: (1) Sequences 1 and 2, a cryogenian glaciogenic sequence, followed by a distal scarp carbonate ramp, developed during stable conditions, (2) Sequence 3, a Upper Cryogenian stack homoclinal ramps with mixed carbonate-siliciclastic sedimentation, deposited under a progressive influence of compressional stresses of the Brasiliano Cycle, (3) Sequence 4, a Lower Ediacaran shallow platform dominated by siliciclastic sedimentation of molassic nature, the erosion product of the nearby uplifted thrust sheets. Each of the carbonate-bearing sequences presents a distinct δ 13 C isotopic signature. The superposition to the global curve for carbon isotopic variation allowed the recognition of a major depositional hiatus between the Paranoa and Sao Francisco Megasequences, and suggested that the glacial diamictite deposition (Jequitai Formation) took place most probably around 800 Ma. This constrains the Sao Francisco Megasequence deposition to the interval between 800 and 600 Ma (the known ages of the Brasiliano Orogeny defines the upper limit). A minor depositional hiatus (700.680 Ma) was also identified separating sequences 2 and 3. Isotopic analyses suggest that from then on, more restricted environmental conditions were established in the basin, probably associated with a first order global event, which prevailed throughout deposition of the Sequence 3. (author)

  8. Oasis: online analysis of small RNA deep sequencing data.

    Science.gov (United States)

    Capece, Vincenzo; Garcia Vizcaino, Julio C; Vidal, Ramon; Rahman, Raza-Ur; Pena Centeno, Tonatiuh; Shomroni, Orr; Suberviola, Irantzu; Fischer, Andre; Bonn, Stefan

    2015-07-01

    Oasis is a web application that allows for the fast and flexible online analysis of small-RNA-seq (sRNA-seq) data. It was designed for the end user in the lab, providing an easy-to-use web frontend including video tutorials, demo data and best practice step-by-step guidelines on how to analyze sRNA-seq data. Oasis' exclusive selling points are a differential expression module that allows for the multivariate analysis of samples, a classification module for robust biomarker detection and an advanced programming interface that supports the batch submission of jobs. Both modules include the analysis of novel miRNAs, miRNA targets and functional analyses including GO and pathway enrichment. Oasis generates downloadable interactive web reports for easy visualization, exploration and analysis of data on a local system. Finally, Oasis' modular workflow enables for the rapid (re-) analysis of data. Oasis is implemented in Python, R, Java, PHP, C++ and JavaScript. It is freely available at http://oasis.dzne.de. stefan.bonn@dzne.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  9. Establishment of screening technique for mutant cell and analysis of base sequence in the mutation

    International Nuclear Information System (INIS)

    Sofuni, Toshio; Nomi, Takehiko; Yamada, Masami; Masumura, Kenichi

    2000-01-01

    This research project aimed to establish an easy and quick detection method for radiation-induced mutation using molecular-biological techniques and an effective analyzing method for the molecular changes in base sequence. In this year, Spi mutants derived from γ-radiation exposed mouse were analyzed by PCR method and DNA sequence method. Male transgenic mice were exposed to γ-ray at 5,10, 50 Gy and the transgene was taken out from the genome DNA from the spleen in vivo packaging method. Spi mutant plaques were obtained by infecting the recovered phage to E. coli. Sequence analysis for the mutants was made using ALFred DNA sequencer and SequiTherm TM Long-Red Cycle sequencing kit. Sequence analysis was carried out for 41 of 50 independent Spi mutants obtained. The deletions were classified into 4 groups; Group 1 included 15 mutants that were characterized with a large deletion (43 bp-10 kb) with a short homologous sequence. Group 2 included 11 mutants of a large deletion having no homologous sequence at the connecting region. Group 3 included 11 mutants having a short deletion of less than 20 bp, which occurred in the non-repetitive sequence of gam gene and possibly caused by oxidative breakage of DNA or recombination of DNA fragment produced by the breakage. Group 4 included 4 mutants having deletions as short as 20 bp or less in the repetitive sequence of gam gene, resulting in an alteration of the reading frame. Thus, the synthesis of Gam protein was terminated by the appearance of TGA between code 13 and 14 of redB gene, leading to inactivation of gam gene and redBA gene. These results indicated that most of Spi mutants had a deletion in red/gam region and the deletions in more than half mutants occurred in homologous sequences as short as 8 bp. (M.N.)

  10. A base composition analysis of natural patterns for the preprocessing of metagenome sequences.

    Science.gov (United States)

    Bonham-Carter, Oliver; Ali, Hesham; Bastola, Dhundy

    2013-01-01

    On the pretext that sequence reads and contigs often exhibit the same kinds of base usage that is also observed in the sequences from which they are derived, we offer a base composition analysis tool. Our tool uses these natural patterns to determine relatedness across sequence data. We introduce spectrum sets (sets of motifs) which are permutations of bacterial restriction sites and the base composition analysis framework to measure their proportional content in sequence data. We suggest that this framework will increase the efficiency during the pre-processing stages of metagenome sequencing and assembly projects. Our method is able to differentiate organisms and their reads or contigs. The framework shows how to successfully determine the relatedness between these reads or contigs by comparison of base composition. In particular, we show that two types of organismal-sequence data are fundamentally different by analyzing their spectrum set motif proportions (coverage). By the application of one of the four possible spectrum sets, encompassing all known restriction sites, we provide the evidence to claim that each set has a different ability to differentiate sequence data. Furthermore, we show that the spectrum set selection having relevance to one organism, but not to the others of the data set, will greatly improve performance of sequence differentiation even if the fragment size of the read, contig or sequence is not lengthy. We show the proof of concept of our method by its application to ten trials of two or three freshly selected sequence fragments (reads and contigs) for each experiment across the six organisms of our set. Here we describe a novel and computationally effective pre-processing step for metagenome sequencing and assembly tasks. Furthermore, our base composition method has applications in phylogeny where it can be used to infer evolutionary distances between organisms based on the notion that related organisms often have much conserved code.

  11. Sequence analysis and molecular characterization of Wnt4 gene in metacestodes of Taenia solium.

    Science.gov (United States)

    Hou, Junling; Luo, Xuenong; Wang, Shuai; Yin, Cai; Zhang, Shaohua; Zhu, Xueliang; Dou, Yongxi; Cai, Xuepeng

    2014-04-01

    Wnt proteins are a family of secreted glycoproteins that are evolutionarily conserved and considered to be involved in extensive developmental processes in metazoan organisms. The characterization of wnt genes may improve understanding the parasite's development. In the present study, a wnt4 gene encoding 491amino acids was amplified from cDNA of metacestodes of Taenia solium using reverse transcription PCR (RT-PCR). Bioinformatics tools were used for sequence analysis. The conserved domain of the wnt gene family was predicted. The expression profile of Wnt4 was investigated using real-time PCR. Wnt4 expression was found to be dramatically increased in scolex evaginated cysticerci when compared to invaginated cysticerci. In situ hybridization showed that wnt4 gene was distributed in the posterior end of the worm along the primary body axis in evaginated cysticerci. These findings indicated that wnt4 may take part in the process of cysticerci evagination and play a role in scolex/bladder development of cysticerci of T. solium.

  12. Construction of a plant-transformation-competent BIBAC library and genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.).

    Science.gov (United States)

    Lee, Mi-Kyung; Zhang, Yang; Zhang, Meiping; Goebel, Mark; Kim, Hee Jin; Triplett, Barbara A; Stelly, David M; Zhang, Hong-Bin

    2013-03-28

    Cotton, one of the world's leading crops, is important to the world's textile and energy industries, and is a model species for studies of plant polyploidization, cellulose biosynthesis and cell wall biogenesis. Here, we report the construction of a plant-transformation-competent binary bacterial artificial chromosome (BIBAC) library and comparative genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.) with one of its diploid putative progenitor species, G. raimondii Ulbr. We constructed the cotton BIBAC library in a vector competent for high-molecular-weight DNA transformation in different plant species through either Agrobacterium or particle bombardment. The library contains 76,800 clones with an average insert size of 135 kb, providing an approximate 99% probability of obtaining at least one positive clone from the library using a single-copy probe. The quality and utility of the library were verified by identifying BIBACs containing genes important for fiber development, fiber cellulose biosynthesis, seed fatty acid metabolism, cotton-nematode interaction, and bacterial blight resistance. In order to gain an insight into the Upland cotton genome and its relationship with G. raimondii, we sequenced nearly 10,000 BIBAC ends (BESs) randomly selected from the library, generating approximately one BES for every 250 kb along the Upland cotton genome. The retroelement Gypsy/DIRS1 family predominates in the Upland cotton genome, accounting for over 77% of all transposable elements. From the BESs, we identified 1,269 simple sequence repeats (SSRs), of which 1,006 were new, thus providing additional markers for cotton genome research. Surprisingly, comparative sequence analysis showed that Upland cotton is much more diverged from G. raimondii at the genomic sequence level than expected. There seems to be no significant difference between the relationships of the Upland cotton D- and A-subgenomes with the G. raimondii genome, even though G

  13. An alignment-free method to find similarity among protein sequences via the general form of Chou's pseudo amino acid composition.

    Science.gov (United States)

    Gupta, M K; Niyogi, R; Misra, M

    2013-01-01

    In this paper, we propose a method to create the 60-dimensional feature vector for protein sequences via the general form of pseudo amino acid composition. The construction of the feature vector is based on the contents of amino acids, total distance of each amino acid from the first amino acid in the protein sequence and the distribution of 20 amino acids. The obtained cosine distance metric (also called the similarity matrix) is used to construct the phylogenetic tree by the neighbour joining method. In order to show the applicability of our approach, we tested it on three proteins: 1) ND5 protein sequences from nine species, 2) ND6 protein sequences from eight species, and 3) 50 coronavirus spike proteins. The results are in agreement with known history and the output from the multiple sequence alignment program ClustalW, which is widely used. We have also compared our phylogenetic results with six other recently proposed alignment-free methods. These comparisons show that our proposed method gives a more consistent biological relationship than the others. In addition, the time complexity is linear and space required is less as compared with other alignment-free methods that use graphical representation. It should be noted that the multiple sequence alignment method has exponential time complexity.

  14. Effect of amino acid sequence and pH on nanofiber formation of self-assembling peptides EAK16-II and EAK16-IV.

    Science.gov (United States)

    Hong, Yooseong; Legge, Raymond L; Zhang, S; Chen, P

    2003-01-01

    Atomic force microscopy (AFM) and axisymmetric drop shape analysis-profile (ASDA-P) were used to investigate the mechanism of self-assembly of peptides. The peptides chosen consisted of 16 alternating hydrophobic and hydrophilic amino acids, where the hydrophilic residues possess alternating negative and positive charges. Two types of peptides, AEAEAKAKAEAEAKAK (EAK16-II) and AEAEAEAEAKAKAKAK (EAK16-IV), were investigated in terms of nanostructure formation through self-assembly. The experimental results, which focused on the effects of the amino acid sequence and pH, show that the nanostructures formed by the peptides are dependent on the amino acid sequence and the pH of the solution. For pH conditions around neutrality, one of the peptides used in this study, EAK16-IV, forms globular assemblies and has lower surface tension at air-water interfaces than another peptide, EAK16-II, which forms fibrillar assemblies at the same pH. When the pH is lowered below 6.5 or raised above 7.5, there is a transition from globular to fibrillar structures for EAK16-IV, but EAK16-II does not show any structural transition. Surface tension measurements using ADSA-P showed different surface activities of peptides at air-water interfaces. EAK16-II does not show a significant difference in surface tension for the pH range between 4 and 9. However, EAK16-IV shows a noticeable decrease in surface tension at pH around neutrality, indicating that the formation of globular assemblies is related to the molecular hydrophobicity.

  15. Cloning and sequence analysis of hyaluronoglucosaminidase (nagH gene of Clostridium chauvoei

    Directory of Open Access Journals (Sweden)

    Saroj K. Dangi

    2017-09-01

    Full Text Available Aim: Blackleg disease is caused by Clostridium chauvoei in ruminants. Although virulence factors such as C. chauvoei toxin A, sialidase, and flagellin are well characterized, hyaluronidases of C. chauvoei are not characterized. The present study was aimed at cloning and sequence analysis of hyaluronoglucosaminidase (nagH gene of C. chauvoei. Materials and Methods: C. chauvoei strain ATCC 10092 was grown in ATCC 2107 media and confirmed by polymerase chain reaction (PCR using the primers specific for 16-23S rDNA spacer region. nagH gene of C. chauvoei was amplified and cloned into pRham-SUMO vector and transformed into Escherichia cloni 10G cells. The construct was then transformed into E. cloni cells. Colony PCR was carried out to screen the colonies followed by sequencing of nagH gene in the construct. Results: PCR amplification yielded nagH gene of 1143 bp product, which was cloned in prokaryotic expression system. Colony PCR, as well as sequencing of nagH gene, confirmed the presence of insert. Sequence was then subjected to BLAST analysis of NCBI, which confirmed that the sequence was indeed of nagH gene of C. chauvoei. Phylogenetic analysis of the sequence showed that it is closely related to Clostridium perfringens and Clostridium paraputrificum. Conclusion: The gene for virulence factor nagH was cloned into a prokaryotic expression vector and confirmed by sequencing.

  16. Analysis of Multiple Genomic Sequence Alignments: A Web Resource, Online Tools, and Lessons Learned From Analysis of Mammalian SCL Loci

    Science.gov (United States)

    Chapman, Michael A.; Donaldson, Ian J.; Gilbert, James; Grafham, Darren; Rogers, Jane; Green, Anthony R.; Göttgens, Berthold

    2004-01-01

    Comparative analysis of genomic sequences is becoming a standard technique for studying gene regulation. However, only a limited number of tools are currently available for the analysis of multiple genomic sequences. An extensive data set for the testing and training of such tools is provided by the SCL gene locus. Here we have expanded the data set to eight vertebrate species by sequencing the dog SCL locus and by annotating the dog and rat SCL loci. To provide a resource for the bioinformatics community, all SCL sequences and functional annotations, comprising a collation of the extensive experimental evidence pertaining to SCL regulation, have been made available via a Web server. A Web interface to new tools specifically designed for the display and analysis of multiple sequence alignments was also implemented. The unique SCL data set and new sequence comparison tools allowed us to perform a rigorous examination of the true benefits of multiple sequence comparisons. We demonstrate that multiple sequence alignments are, overall, superior to pairwise alignments for identification of mammalian regulatory regions. In the search for individual transcription factor binding sites, multiple alignments markedly increase the signal-to-noise ratio compared to pairwise alignments. PMID:14718377

  17. WebMGA: a customizable web server for fast metagenomic sequence analysis.

    Science.gov (United States)

    Wu, Sitao; Zhu, Zhengwei; Fu, Liming; Niu, Beifang; Li, Weizhong

    2011-09-07

    The new field of metagenomics studies microorganism communities by culture-independent sequencing. With the advances in next-generation sequencing techniques, researchers are facing tremendous challenges in metagenomic data analysis due to huge quantity and high complexity of sequence data. Analyzing large datasets is extremely time-consuming; also metagenomic annotation involves a wide range of computational tools, which are difficult to be installed and maintained by common users. The tools provided by the few available web servers are also limited and have various constraints such as login requirement, long waiting time, inability to configure pipelines etc. We developed WebMGA, a customizable web server for fast metagenomic analysis. WebMGA includes over 20 commonly used tools such as ORF calling, sequence clustering, quality control of raw reads, removal of sequencing