WorldWideScience

Sample records for acid sequence features

  1. Characterization of N-glycosylation and amino acid sequence features of immunoglobulins from swine.

    Lopez, Paul G; Girard, Lauren; Buist, Marjorie; de Oliveira, Andrey Giovanni Gomes; Bodnar, Edward; Salama, Apolline; Soulillou, Jean-Paul; Perreault, Hélène

    2016-02-01

    The primary goal of this study was to develop a method to study the N-glycosylation of IgG from swine in order to detect epitopes containing N-glycolylneuraminic acid (Neu5Gc) and/or terminal galactose residues linked in α1-3 susceptible to cause xenograft-related problems. Samples of immunoglobulin were isolated from porcine serum using protein-A affinity chromatography. The eluate was then separated on electrophoretic gel, and bands corresponding to the N-glycosylated heavy chains were cut off the gel and subjected to tryptic digestion. Peptides and glycopeptides were separated by reversed phase liquid chromatography and fractions were collected for matrix-assisted laser desorption/ionization time-of-flight mass spectrometric (MALDI-TOF-MS) analysis. Overall no α1-3 galactose was detected, as demonstrated by complete susceptibility of terminal galactose residues to β-galactosidase digestion. Neu5Gc was detected on singly sialylated structures. Two major N-glycopeptides were found, EEQFNSTYR and EAQFNSTYR as determined by tandem MS (MS/MS), as previously reported by Butler et al. (Immunogenetics, 61, 2009, 209-230), who found 11 subclasses for porcine IgG. Out of the 11, ten include the sequence corresponding to EEQFNSTYR, and only one codes for EAQFNSTYR. In this study, glycosylation patterns associated with both chains were slightly different, in that EEQFNSTYR had a higher content of galactose. The last step of this study consisted of peptide-mapping the 11 reported porcine IgG sequences. Although there was considerable overlap, at least one unique tryptic peptide was found per IgG sequence. The workflow presented in this manuscript constitutes the first study to use MALDI-TOF-MS in the investigation of porcine IgG structural features. PMID:26586247

  2. Classifying Genomic Sequences by Sequence Feature Analysis

    Zhi-Hua Liu; Dian Jiao; Xiao Sun

    2005-01-01

    Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream,exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis.

  3. Sequence and structural features of binding site residues in protein-protein complexes: comparison with protein-nucleic acid complexes

    Selvaraj S

    2011-10-01

    Full Text Available Abstract Background Protein-protein interactions are important for several cellular processes. Understanding the mechanism of protein-protein recognition and predicting the binding sites in protein-protein complexes are long standing goals in molecular and computational biology. Methods We have developed an energy based approach for identifying the binding site residues in protein–protein complexes. The binding site residues have been analyzed with sequence and structure based parameters such as binding propensity, neighboring residues in the vicinity of binding sites, conservation score and conformational switching. Results We observed that the binding propensities of amino acid residues are specific for protein-protein complexes. Further, typical dipeptides and tripeptides showed high preference for binding, which is unique to protein-protein complexes. Most of the binding site residues are highly conserved among homologous sequences. Our analysis showed that 7% of residues changed their conformations upon protein-protein complex formation and it is 9.2% and 6.6% in the binding and non-binding sites, respectively. Specifically, the residues Glu, Lys, Leu and Ser changed their conformation from coil to helix/strand and from helix to coil/strand. Leu, Ser, Thr and Val prefer to change their conformation from strand to coil/helix. Conclusions The results obtained in this study will be helpful for understanding and predicting the binding sites in protein-protein complexes.

  4. Sequence and structural features of binding site residues in protein-protein complexes: comparison with protein-nucleic acid complexes

    Selvaraj S; Jayaram B; Saranya N; Gromiha M; Fukui Kazuhiko

    2011-01-01

    Abstract Background Protein-protein interactions are important for several cellular processes. Understanding the mechanism of protein-protein recognition and predicting the binding sites in protein-protein complexes are long standing goals in molecular and computational biology. Methods We have developed an energy based approach for identifying the binding site residues in protein–protein complexes. The binding site residues have been analyzed with sequence and structure based parameters such...

  5. Structural features of lignohumic acids

    Novák, František; Šestauberová, Martina; Hrabal, Richard

    2015-08-01

    The composition and structure of humic acids isolated from lignohumate, which is produced by hydrolytic-oxidative conversion of technical lignosulfonates, were characterized by chemical and spectral methods (UV/VIS, FTIR, and 13C NMR spectroscopy). As comparative samples, humic acids (HA) were isolated also from lignite and organic horizon of mountain spruce forest soil. When compared with other HA studied, the lignohumate humic acids (LHHA) contained relatively few carboxyl groups, whose role is partly fulfilled by sulfonic acid groups. Distinctive 13C NMR signal of methoxyl group carbons, typical for lignin and related humic substances, was found at the shift of 55.9 ppm. Other alkoxy carbons were present in limited quantity, like the aliphatic carbons. Due to the low content of these carbon types, the LHHA has high aromaticity of 60.6%. Comparison with the natural HA has shown that lignohumate obtained by thermal processing of technical lignosulfonate can be regarded as an industrially produced analog of natural humic substances. Based on the chemical and spectral data evaluation, structural features of lignohumate humic acids were clarified and their hypothetical chemical structure proposed, which described typical "average" properties of the isolated fraction.

  6. Statistical and linguistic features of DNA sequences

    Havlin, S.; Buldyrev, S. V.; Goldberger, A. L.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationary" feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Levy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.

  7. Identification of S-glutathionylation sites in species-specific proteins by incorporating five sequence-derived features into the general pseudo-amino acid composition.

    Zhao, Xiaowei; Ning, Qiao; Ai, Meiyue; Chai, Haiting; Yang, Guifu

    2016-06-01

    As a selective and reversible protein post-translational modification, S-glutathionylation generates mixed disulfides between glutathione (GSH) and cysteine residues, and plays an important role in regulating protein activity, stability, and redox regulation. To fully understand S-glutathionylation mechanisms, identification of substrates and specific S-Glutathionylated sites is crucial. Experimental identification of S-glutathionylated sites is labor-intensive and time consuming, so establishing an effective computational method is much desirable due to their convenient and fast speed. Therefore, in this study, a new bioinformatics tool named SSGlu (Species-Specific identification of Protein S-glutathionylation Sites) was developed to identify species-specific protein S-glutathionylated sites, utilizing support vector machines that combine multiple sequence-derived features with a two-step feature selection. By 5-fold cross validation, the performance of SSGlu was measured with an AUC of 0.8105 and 0.8041 for Homo sapiens and Mus musculus, respectively. Additionally, SSGlu was compared with the existing methods, and the higher MCC and AUC of SSGlu demonstrated that SSGlu was very promising to predict S-glutathionylated sites. Furthermore, a site-specific analysis showed that S-glutathionylation intimately correlated with the features derived from its surrounding sites. The conclusions derived from this study might help to understand more of the S-glutathionylation mechanism and guide the related experimental validation. For public access, SSGlu is freely accessible at http://59.73.198.144:8080/SSGlu/. PMID:27025952

  8. Coding visual features extracted from video sequences.

    Baroffio, Luca; Cesana, Matteo; Redondi, Alessandro; Tagliasacchi, Marco; Tubaro, Stefano

    2014-05-01

    Visual features are successfully exploited in several applications (e.g., visual search, object recognition and tracking, etc.) due to their ability to efficiently represent image content. Several visual analysis tasks require features to be transmitted over a bandwidth-limited network, thus calling for coding techniques to reduce the required bit budget, while attaining a target level of efficiency. In this paper, we propose, for the first time, a coding architecture designed for local features (e.g., SIFT, SURF) extracted from video sequences. To achieve high coding efficiency, we exploit both spatial and temporal redundancy by means of intraframe and interframe coding modes. In addition, we propose a coding mode decision based on rate-distortion optimization. The proposed coding scheme can be conveniently adopted to implement the analyze-then-compress (ATC) paradigm in the context of visual sensor networks. That is, sets of visual features are extracted from video frames, encoded at remote nodes, and finally transmitted to a central controller that performs visual analysis. This is in contrast to the traditional compress-then-analyze (CTA) paradigm, in which video sequences acquired at a node are compressed and then sent to a central unit for further processing. In this paper, we compare these coding paradigms using metrics that are routinely adopted to evaluate the suitability of visual features in the context of content-based retrieval, object recognition, and tracking. Experimental results demonstrate that, thanks to the significant coding gains achieved by the proposed coding scheme, ATC outperforms CTA with respect to all evaluation metrics. PMID:24818244

  9. Structural features of lignohumic acids

    Novák, František; Šestauberová, Martina; Hrabal, R.

    2015-01-01

    Roč. 1093, August (2015), s. 179-185. ISSN 0022-2860 Institutional support: RVO:60077344 Keywords : C-13 NMR * FTIR * humic acids * lignohumate * lignosulfonate * structure Subject RIV: DF - Soil Science Impact factor: 1.602, year: 2014

  10. Los Alamos sequence analysis package for nucleic acids and proteins.

    Kanehisa, M I

    1982-01-01

    An interactive system for computer analysis of nucleic acid and protein sequences has been developed for the Los Alamos DNA Sequence Database. It provides a convenient way to search or verify various sequence features, e.g., restriction enzyme sites, protein coding frames, and properties of coded proteins. Further, the comprehensive analysis package on a large-scale database can be used for comparative studies on sequence and structural homologies in order to find unnoted information stored i...

  11. Feature-based Image Sequence Compression Coding

    2001-01-01

    A novel compressing method for video teleconference applications is presented. Semantic-based coding based on human image feature is realized, where human features are adopted as parameters. Model-based coding and the concept of vector coding are combined with the work on image feature extraction to obtain the result.

  12. Chip-based sequencing nucleic acids

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  13. Feature-by-Feature – Evaluating De Novo Sequence Assembly

    Vezzi, Francesco; Narzisi, Giuseppe; Mishra, Bud

    2012-01-01

    The whole-genome sequence assembly (WGSA) problem is among one of the most studied problems in computational biology. Despite the availability of a plethora of tools (i.e., assemblers), all claiming to have solved the WGSA problem, little has been done to systematically compare their accuracy and power. Traditional methods rely on standard metrics and read simulation: while on the one hand, metrics like N50 and number of contigs focus only on size without proportionately emphasizing the infor...

  14. Stable 2D Feature Tracking for Long Video Sequences

    Jong-Seung Park

    2008-12-01

    Full Text Available In this paper, we propose a 2D feature tracking method that is stable to long video sequences. To improve the stability of long tracking, we use trajectory information about 2D features. We predict the expected feature states and compute a rough estimate of the feature location on the current image frame using the history of previous feature states up to the current frame. A search window is positioned at the estimated location and similarity measures are computed within the search window. Once the feature position is determined from the similarity measures, the current feature states are appended to the history bu®er. The outlier rejection stage is also introduced to reduce false matches. Experimental results from real video sequences showed that the proposed method stably tracks point features for long frame sequences.

  15. Unique sequence features of the Human Adenovirus 31 complete genomic sequence are conserved in clinical isolates

    Darr Sebastian

    2009-11-01

    Full Text Available Abstract Background Human adenoviruses (HAdV are causing a broad spectrum of diseases. One of the most severe forms of adenovirus infection is a disseminated disease resulting in significant morbidity and mortality. Several reports in recent years have identified HAdV-31 from species A (HAdV-A31 as a cause of disseminated disease in children following haematopoetic stem cell transplantation (hSCT and liver transplantation. We sequenced and analyzed the complete genome of the HAdV-A31 prototype strain to uncover unique sequence motifs associated with its high virulence. Moreover, we sequenced coding regions known to be essential for tropism and virulence (early transcription units E1A, E3, E4, the fiber knob and the penton base of HAdV-A31 clinical isolates from patients with disseminated disease. Results The genome size of HAdV-A31 is 33763 base pairs (bp in length with a GC content of 46.36%. Nucleotide alignment to the closely related HAdV-A12 revealed an overall homology of 84.2%. The genome organization into early, intermediate and late regions is similar to HAdV-A12. Sequence analysis of the prototype strain showed unique sequence features such as an immunoglobulin-like domain in the species A specific gene product E3 CR1 beta and a potentially integrin binding RGD motif in the C-terminal region of the protein IX. These features were conserved in all analyzed clinical isolates. Overall, amino acid sequences of clinical isolates were highly conserved compared to the prototype (99.2 to 100%, but a synonymous/non synonymous ratio (S/N of 2.36 in E3 CR1 beta suggested positive selection. Conclusion Unique sequence features of HAdV-A31 may enhance its ability to escape the host's immune surveillance and may facilitate a promiscuous tropism for various tissues. Moderate evolution of clinical isolates did not indicate the emergence of new HAdV-A31 subtypes in the recent years.

  16. Comparative Amino Acid Sequences of Dengue Viruses

    Haishi, Shozo; TANAKA Mariko; Igarashi, Akira

    1990-01-01

    Amino acid (AA) sequences of 4 serotype of dengue viruses deduced from their nucleotide (nt) sequences of genomic RNA were analyzed for each genome segment and each stretch of 10 AA residues. Precursor of membrane protein (pM), and 4 nonstructural proteins (NS1, NS3, NS4B, NS5) were highly conserved, while another nonstructural protein (NS2A) was least conserved among 5 strains of dengue viruses. When homology was compared among heterotypic viruses, type 1 and type 3 dengue viruses showed clo...

  17. Quantifying sequence and structural features of protein-RNA interactions.

    Li, Songling; Yamashita, Kazuo; Amada, Karlou Mar; Standley, Daron M

    2014-09-01

    Increasing awareness of the importance of protein-RNA interactions has motivated many approaches to predict residue-level RNA binding sites in proteins based on sequence or structural characteristics. Sequence-based predictors are usually high in sensitivity but low in specificity; conversely structure-based predictors tend to have high specificity, but lower sensitivity. Here we quantified the contribution of both sequence- and structure-based features as indicators of RNA-binding propensity using a machine-learning approach. In order to capture structural information for proteins without a known structure, we used homology modeling to extract the relevant structural features. Several novel and modified features enhanced the accuracy of residue-level RNA-binding propensity beyond what has been reported previously, including by meta-prediction servers. These features include: hidden Markov model-based evolutionary conservation, surface deformations based on the Laplacian norm formalism, and relative solvent accessibility partitioned into backbone and side chain contributions. We constructed a web server called aaRNA that implements the proposed method and demonstrate its use in identifying putative RNA binding sites. PMID:25063293

  18. Sequence features contributing to chromosomal rearrangements in Neisseria gonorrhoeae.

    Russell Spencer-Smith

    Full Text Available Through whole genome sequence alignments, breakpoints in chromosomal synteny can be identified and the sequence features associated with these determined. Alignments of the genome sequences of Neisseria gonorrhoeae strain FA1090, N.gonorrhoeae strain NCCP11945, and N. gonorrhoeae strain TCDC-NG08107 reveal chromosomal rearrangements that have occurred. Based on these alignments and dot plot pair-wise comparisons, the overall chromosomal arrangement of strain NCCP11945 and TCDC-NG08107 are very similar, with no large inversions or translocations. The insertion of the Gonococcal Genetic Island in strain NCCP11945 is the most prominent distinguishing feature differentiating these strains. When strain NCCP11945 is compared to strain FA1090, however, 14 breakpoints in chromosomal synteny are identified between these gonococcal strains. The majority of these, 11 of 14, are associated with a prophage, IS elements, or IS-like repeat enclosed elements which appear to have played a role in the rearrangements observed. Additional rearrangements of small regions of the genome are associated with pilin genes. Evidence presented here suggests that the rearrangements of blocks of sequence are mediated by activation of prophage and associated IS elements and reintegration elsewhere in the genome or by homologous recombination between IS-like elements that have generated inversions.

  19. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid...

  20. Detection of nucleic acid sequences by invader-directed cleavage

    Brow, Mary Ann D. (Madison, WI); Hall, Jeff Steven Grotelueschen (Madison, WI); Lyamichev, Victor (Madison, WI); Olive, David Michael (Madison, WI); Prudent, James Robert (Madison, WI)

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  1. Detection of nucleic acid sequences by invader-directed cleavage

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  2. Sequence-derived structural features driving proteolytic processing.

    Belushkin, Alexander A; Vinogradov, Dmitry V; Gelfand, Mikhail S; Osterman, Andrei L; Cieplak, Piotr; Kazanov, Marat D

    2014-01-01

    Proteolytic signaling, or regulated proteolysis, is an essential part of many important pathways such as Notch, Wnt, and Hedgehog. How the structure of the cleaved substrate regions influences the efficacy of proteolytic processing remains underexplored. Here, we analyzed the relative importance in proteolysis of various structural features derived from substrate sequences using a dataset of more than 5000 experimentally verified proteolytic events captured in CutDB. Accessibility to the solvent was recognized as an essential property of a proteolytically processed polypeptide chain. Proteolytic events were found nearly uniformly distributed among three types of secondary structure, although with some enrichment in loops. Cleavages in α-helices were found to be relatively abundant in regions apparently prone to unfolding, while cleavages in β-structures tended to be located at the periphery of β-sheets. Application of the same statistical procedures to proteolytic events divided into separate sets according to the catalytic classes of proteases proved consistency of the results and confirmed that the structural mechanisms of proteolysis are universal. The estimated prediction power of sequence-derived structural features, which turned out to be sufficiently high, presents a rationale for their use in bioinformatic prediction of proteolytic events. PMID:24227478

  3. Automatic discovery of cross-family sequence features associated with protein function

    Krings Andrea

    2006-01-01

    Full Text Available Abstract Background Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. Results We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. Conclusion We have developed a novel and useful approach for

  4. FeatureMap3D - a tool to map protein features and sequence conservation onto homologous structures in the PDB

    Wernersson, Rasmus; Rapacki, Krzysztof; Stærfeldt, Hans Henrik;

    2006-01-01

    FeatureMap3D is a web-based tool that maps protein features onto 3D structures. The user provides sequences annotated with any feature of interest, such as post-translational modifications, protease cleavage sites or exonic structure and FeatureMap3D will then search the Protein Data Bank (PDB) for...

  5. The sequence, structure and evolutionary features of HOTAIR in mammals

    Zhu Hao

    2011-04-01

    the full HOTAIR in mammals. Conclusions HOTAIR exists in mammals, has poorly conserved sequences and considerably conserved structures, and has evolved faster than nearby HoxC genes. Exons of HOTAIR show distinct evolutionary features, and a 239 bp domain in the 1804 bp exon6 is especially conserved. These features, together with the absence of some exons and sequences in mouse, rat and kangaroo, suggest ab initio generation of HOTAIR in marsupials. Structure prediction identifies two fragments in the 5' end exon1 and the 3' end domain B of exon6, with sequence and structure invariably occurring in various predicted structures of exon1, the domain B of exon6 and the full HOTAIR.

  6. Nucleic acid sequence detection using multiplexed oligonucleotide PCR

    Nolan, John P.; White, P. Scott

    2006-12-26

    Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.

  7. Sequence of MET protooncogene cDNA has features characteristic of the tyrosine kinase family of growth-factor receptors

    The authors isolated overlapping cDNA clones corresponding to the major MET protooncogene transcript. The cDNA nucleotide sequence contained an open reading frame of 1408 amino acids with features characteristic of the tyrosine kinase family of growth factor receptors. These features include a putative 24-amino acid signal peptide and a candidate, hybrophobic, membrane-spanning segment of 23 amino acids, which defines an extracellular domain of 926 amino acids that could serve as a ligand-binding domain. A putative intracellular domain 435 amino acids long shows high homology with the SRC family of tyrosine kinases and within the kinase domain is most homologous with the human insulin receptor (44%) and v-abl (41%). Despite these similarities, however, they found no apparent sequence homology to other growth factor receptors in the putative ligand-binding domain. They conclude from the results that the MET protooncogene is a cell-surface receptor for an as-yet-unknown ligand

  8. Intumescent features of nucleic acids and proteins

    Highlights: • The combustion resistance of DNA and caseins to different heat fluxes was studied. • Upon heating, DNA and caseins exhibited an intumescent behaviour. • The char derived from DNA was more stable and coherent than that from caseins. - Abstract: Are nucleic acids and proteins intumescent molecules? In order to get an answer, in the present manuscript, powders of deoxyribose nucleic acids (DNA) and caseins have been exposed to different heat fluxes under a cone calorimeter source and to the direct application of a propane flame. Under these conditions, DNA and caseins exhibited a typical intumescent behaviour, generating a coherent expanded cellular carbonaceous residue (char), extremely resistant to heat exposure. The resulting volumetric expansion as well as the resistance of the formed char turned out to be dependent on (i) the chemical structure of the chosen biomacromolecule, (ii) the evolution of ammonia and (iii) the adopted heat flux in cone calorimetry tests (namely, 25, 35, 50 and 75 kW/m2). The presence of ribose units within the DNA backbone determined the formation of highly expanded and coherent residues as compared to those obtained from caseins. Indeed, under a heat flux of 35 kW/m2, when a carbon source (i.e. common cane sugar) was added to caseins, the resulting char was similar to that formed by DNA. Furthermore, the char expansion was ascribed to the evolution of ammonia released by these biomacromolecules upon heating, as detected by thermogravimetry coupled to infrared spectroscopy, and confirmed by scanning electron microscopy experiments performed on the bubbles present in the residues of flammability tests

  9. Intumescent features of nucleic acids and proteins

    Alongi, Jenny, E-mail: jenny.alongi@polito.it; Cuttica, Fabio; Blasio, Alessandro Di; Carosio, Federico; Malucelli, Giulio

    2014-09-10

    Highlights: • The combustion resistance of DNA and caseins to different heat fluxes was studied. • Upon heating, DNA and caseins exhibited an intumescent behaviour. • The char derived from DNA was more stable and coherent than that from caseins. - Abstract: Are nucleic acids and proteins intumescent molecules? In order to get an answer, in the present manuscript, powders of deoxyribose nucleic acids (DNA) and caseins have been exposed to different heat fluxes under a cone calorimeter source and to the direct application of a propane flame. Under these conditions, DNA and caseins exhibited a typical intumescent behaviour, generating a coherent expanded cellular carbonaceous residue (char), extremely resistant to heat exposure. The resulting volumetric expansion as well as the resistance of the formed char turned out to be dependent on (i) the chemical structure of the chosen biomacromolecule, (ii) the evolution of ammonia and (iii) the adopted heat flux in cone calorimetry tests (namely, 25, 35, 50 and 75 kW/m{sup 2}). The presence of ribose units within the DNA backbone determined the formation of highly expanded and coherent residues as compared to those obtained from caseins. Indeed, under a heat flux of 35 kW/m{sup 2}, when a carbon source (i.e. common cane sugar) was added to caseins, the resulting char was similar to that formed by DNA. Furthermore, the char expansion was ascribed to the evolution of ammonia released by these biomacromolecules upon heating, as detected by thermogravimetry coupled to infrared spectroscopy, and confirmed by scanning electron microscopy experiments performed on the bubbles present in the residues of flammability tests.

  10. SSE: a nucleotide and amino acid sequence analysis platform

    Simmonds Peter

    2012-01-01

    Abstract Background There is an increasing need to develop bioinformatic tools to organise and analyse the rapidly growing amount of nucleotide and amino acid sequence data in organisms ranging from viruses to eukaryotes. Finding A simple sequence editor (SSE) was developed to create an integrated environment where sequences can be aligned, annotated, classified and directly analysed by a number of built-in bioinformatic programs. SSE incorporates a sequence editor for the creation of sequenc...

  11. Protein Coding Sequence Identification by Simultaneously Characterizing the Periodic and Random Features of DNA Sequences

    Gao Jianbo

    2005-01-01

    Full Text Available Most codon indices used today are based on highly biased nonrandom usage of codons in coding regions. The background of a coding or noncoding DNA sequence, however, is fairly random, and can be characterized as a random fractal. When a gene-finding algorithm incorporates multiple sources of information about coding regions, it becomes more successful. It is thus highly desirable to develop new and efficient codon indices by simultaneously characterizing the fractal and periodic features of a DNA sequence. In this paper, we describe a novel way of achieving this goal. The efficiency of the new codon index is evaluated by studying all of the 16 yeast chromosomes. In particular, we show that the method automatically and correctly identifies which of the three reading frames is the one that contains a gene.

  12. Feature Selection and the Class Imbalance Problem in Predicting Protein Function from Sequence

    Al-Shahib, A.; Breitling, R.; Gilbert, D.

    2005-01-01

    Abstract: When the standard approach to predict protein function by sequence homology fails, other alternative methods can be used that require only the amino acid sequence for predicting function. One such approach uses machine learning to predict protein function directly from amino acid sequence

  13. Incorporating distant sequence features and radial basis function networks to identify ubiquitin conjugation sites.

    Tzong-Yi Lee

    Full Text Available Ubiquitin (Ub is a small protein that consists of 76 amino acids about 8.5 kDa. In ubiquitin conjugation, the ubiquitin is majorly conjugated on the lysine residue of protein by Ub-ligating (E3 enzymes. Three major enzymes participate in ubiquitin conjugation. They are E1, E2 and E3 which are responsible for activating, conjugating and ligating ubiquitin, respectively. Ubiquitin conjugation in eukaryotes is an important mechanism of the proteasome-mediated degradation of a protein and regulating the activity of transcription factors. Motivated by the importance of ubiquitin conjugation in biological processes, this investigation develops a method, UbSite, which uses utilizes an efficient radial basis function (RBF network to identify protein ubiquitin conjugation (ubiquitylation sites. This work not only investigates the amino acid composition but also the structural characteristics, physicochemical properties, and evolutionary information of amino acids around ubiquitylation (Ub sites. With reference to the pathway of ubiquitin conjugation, the substrate sites for E3 recognition, which are distant from ubiquitylation sites, are investigated. The measurement of F-score in a large window size (-20∼+20 revealed a statistically significant amino acid composition and position-specific scoring matrix (evolutionary information, which are mainly located distant from Ub sites. The distant information can be used effectively to differentiate Ub sites from non-Ub sites. As determined by five-fold cross-validation, the model that was trained using the combination of amino acid composition and evolutionary information performs best in identifying ubiquitin conjugation sites. The prediction sensitivity, specificity, and accuracy are 65.5%, 74.8%, and 74.5%, respectively. Although the amino acid sequences around the ubiquitin conjugation sites do not contain conserved motifs, the cross-validation result indicates that the integration of distant sequence

  14. An improved classification of G-protein-coupled receptors using sequence-derived features

    Peng Zhen-Ling

    2010-08-01

    Full Text Available Abstract Background G-protein-coupled receptors (GPCRs play a key role in diverse physiological processes and are the targets of almost two-thirds of the marketed drugs. The 3 D structures of GPCRs are largely unavailable; however, a large number of GPCR primary sequences are known. To facilitate the identification and characterization of novel receptors, it is therefore very valuable to develop a computational method to accurately predict GPCRs from the protein primary sequences. Results We propose a new method called PCA-GPCR, to predict GPCRs using a comprehensive set of 1497 sequence-derived features. The principal component analysis is first employed to reduce the dimension of the feature space to 32. Then, the resulting 32-dimensional feature vectors are fed into a simple yet powerful classification algorithm, called intimate sorting, to predict GPCRs at five levels. The prediction at the first level determines whether a protein is a GPCR or a non-GPCR. If it is predicted to be a GPCR, then it will be further predicted into certain family, subfamily, sub-subfamily and subtype by the classifiers at the second, third, fourth, and fifth levels, respectively. To train the classifiers applied at five levels, a non-redundant dataset is carefully constructed, which contains 3178, 1589, 4772, 4924, and 2741 protein sequences at the respective levels. Jackknife tests on this training dataset show that the overall accuracies of PCA-GPCR at five levels (from the first to the fifth can achieve up to 99.5%, 88.8%, 80.47%, 80.3%, and 92.34%, respectively. We further perform predictions on a dataset of 1238 GPCRs at the second level, and on another two datasets of 167 and 566 GPCRs respectively at the fourth level. The overall prediction accuracies of our method are consistently higher than those of the existing methods to be compared. Conclusions The comprehensive set of 1497 features is believed to be capable of capturing information about amino acid

  15. Features of acid-saline systems of Southern Australia

    The discovery of layered, SO4-rich sediments on the Meridiani Planum on Mars has focused attention on understanding the formation of acid-saline lakes. Many salt lakes have formed in southern Australia where regional groundwaters are characterized by acidity and high salinity and show features that might be expected in the Meridiani sediments. Many (but not all) of the acid-saline Australian groundwaters are found where underlying Tertiary sediments are sulfide-rich. When waters from the formations come to the surface or interact with oxidised meteoric water, acid groundwaters result. In this paper examples of such waters around Lake Tyrrell, Victoria, and Lake Dey-Dey, South Australia, are reviewed. The acid-saline groundwaters typically have dissolved solids of 30-60 g/L and pH commonly 4 and MgSO4) or differential separation of elements with differing solubility (K, Na, Ti, Cr). Thus, it is considered unlikely that groundwaters or evaporative salt-lake systems, as found on earth, were involved. Instead, these features point to a water-poor system with local alteration and very little mobilization of elements

  16. MEANS AND METHODS FOR CLONING NUCLEIC ACID SEQUENCES

    Geertsma, Eric Robin; Poolman, Berend

    2008-01-01

    The invention provides means and methods for efficiently cloning nucleic acid sequences of interest in micro-organisms that are less amenable to conventional nucleic acid manipulations, as compared to, for instance, E.coli. The present invention enables high-throughput cloning (and, preferably, expr

  17. Sequence-based feature prediction and annotation of proteins

    Juncker, Agnieszka; Jensen, Lars J.; Pierleoni, Andrea; Bernsel, Andreas; Tress, Michael L.; Bork, Peer; Von Heijne, Gunnar; Valencia, Alfonso; A Ouzounis, Christos; Casadio, Rita; Brunak, Søren

    2009-01-01

    A recent trend in computational methods for annotation of protein function is that many prediction tools are combined in complex workflows and pipelines to facilitate the analysis of feature combinations, for example, the entire repertoire of kinase-binding motifs in the human proteome....

  18. Two distinct ferredoxins from Rhodobacter capsulatus: complete amino acid sequences and molecular evolution.

    Saeki, K; Suetsugu, Y; Yao, Y; Horio, T; Marrs, B L; Matsubara, H

    1990-09-01

    Two distinct ferredoxins were purified from Rhodobacter capsulatus SB1003. Their complete amino acid sequences were determined by a combination of protease digestion, BrCN cleavage and Edman degradation. Ferredoxins I and II were composed of 64 and 111 amino acids, respectively, with molecular weights of 6,728 and 12,549 excluding iron and sulfur atoms. Both contained two Cys clusters in their amino acid sequences. The first cluster of ferredoxin I and the second cluster of ferredoxin II had a sequence, CxxCxxCxxxCP, in common with the ferredoxins found in Clostridia. The second cluster of ferredoxin I had a sequence, CxxCxxxxxxxxCxxxCM, with extra amino acids between the second and third Cys, which has been reported for other photosynthetic bacterial ferredoxins and putative ferredoxins (nif-gene products) from nitrogen-fixing bacteria, and with a unique occurrence of Met. The first cluster of ferredoxin II had a CxxCxxxxCxxxCP sequence, with two additional amino acids between the second and third Cys, a characteristics feature of Azotobacter-[3Fe-4S] [4Fe-4S]-ferredoxin. Ferredoxin II was also similar to Azotobacter-type ferredoxins with an extended carboxyl (C-) terminal sequence compared to the common Clostridium-type. The evolutionary relationship of the two together with a putative one recently found to be encoded in nifENXQ region in this bacterium [Moreno-Vivian et al. (1989) J. Bacteriol. 171, 2591-2598] is discussed. PMID:2277040

  19. Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data.

    Pan, Yonglong; Wang, Xiaoming; Liu, Lin; Wang, Hao; Luo, Meizhong

    2016-01-01

    A good physical map is essential to guide sequence assembly in de novo whole genome sequencing, especially when sequences are produced by high-throughput sequencing such as next-generation-sequencing (NGS) technology. We here present a novel method, Feature sets-based Genome Mapping (FGM). With FGM, physical map and draft whole genome sequences can be generated, anchored and integrated using the same data set of NGS sequences, independent of restriction digestion. Method model was created and parameters were inspected by simulations using the Arabidopsis genome sequence. In the simulations, when ~4.8X genome BAC library including 4,096 clones was used to sequence the whole genome, ~90% of clones were successfully connected to physical contigs, and 91.58% of genome sequences were mapped and connected to chromosomes. This method was experimentally verified using the existing physical map and genome sequence of rice. Of 4,064 clones covering 115 Mb sequence selected from ~3 tiles of 3 chromosomes of a rice draft physical map, 3,364 clones were reconstructed into physical contigs and 98 Mb sequences were integrated into the 3 chromosomes. The physical map-integrated draft genome sequences can provide permanent frameworks for eventually obtaining high-quality reference sequences by targeted sequencing, gap filling and combining other sequences. PMID:27611682

  20. An Integrated Sequence-Structure Database incorporating matching mRNA sequence, amino acid sequence and protein three-dimensional structure data.

    Adzhubei, I A; Adzhubei, A. A.; Neidle, S.

    1998-01-01

    We have constructed a non-homologous database, termed the Integrated Sequence-Structure Database (ISSD) which comprises the coding sequences of genes, amino acid sequences of the corresponding proteins, their secondary structure and straight phi,psi angles assignments, and polypeptide backbone coordinates. Each protein entry in the database holds the alignment of nucleotide sequence, amino acid sequence and the PDB three-dimensional structure data. The nucleotide and amino acid sequences for ...

  1. Amino acid sequences of proteins from Leptospira serovar pomona

    Alves Selmo F

    2000-01-01

    Full Text Available This report describes a partial amino acid sequences from three putative outer envelope proteins from Leptospira serovar pomona. In order to obtain internal fragments for protein sequencing, enzymatic and chemical digestion was performed. The enzyme clostripain was used to digest the proteins 32 and 45 kDa. In situ digestion of 40 kDa molecular weight protein was accomplished using cyanogen bromide. The 32 kDa protein generated two fragments, one of 21 kDa and another of 10 kDa that yielded five residues. A fragment of 24 kDa that yielded nineteen residues of amino acids was obtained from 45 kDa protein. A fragment with a molecular weight of 20 kDa, yielding a twenty amino acids sequence from the 40 kDa protein.

  2. Representation of protein-sequence information by amino acid subalphabets

    Andersen, C.A.F.; Brunak, Søren

    2004-01-01

    -sequence information, using machine learning strategies, where the primary goal is the discovery of novel powerful representations for use in AI techniques. In the case of proteins and the 20 different amino acids they typically contain, it is also a secondary goal to discover how the current selection of amino acids......-which now are common in proteins-might have emerged from simpler selections, or alphabets, in use earlier during the evolution of living organisms....

  3. On Quantum Algorithm for Multiple Alignment of Amino Acid Sequences

    Iriyama, Satoshi; Ohya, Masanori

    2009-02-01

    The alignment of genome sequences or amino acid sequences is one of fundamental operations for the study of life. Usual computational complexity for the multiple alignment of N sequences with common length L by dynamic programming is O(LN). This alignment is considered as one of the NP problems, so that it is desirable to find a nice algorithm of the multiple alignment. Thus in this paper we propose the quantum algorithm for the multiple alignment based on the works12,1,2 in which the NP complete problem was shown to be the P problem by means of quantum algorithm and chaos information dynamics.

  4. The amino-acid sequence of kangaroo pancreatic ribonuclease.

    Gaastra, W; Welling, G W; Beintema, J J

    1978-05-01

    Red kangaroo (Macropus rufus) ribonuclease was isolated from pancreatic tissue by affinity chromatography. The amino acid sequence was determined by automatic sequencing of overlapping large fragments and by analysis of shorter peptides obtained by digestion with a number of proteolytic enzymes. The polypeptide chain consists of 122 amino acid residues. Compared to other ribonucleases, the N-terminal residue and residue 114 are deleted. In other pancreatic ribonucleases position 114 is occupied by a cis proline residue in an external loop at the surface of the molecule. Other remarkable substitutions are the presence of a tyrosine residue at position 123 instead of a serine which forms a hydrogen bond with the pyrimidine ring of a nucleotide substrate, and a number of hydrophobichydrophilic interchanges in the sequence 51-55, which forms part of an alpha-helix in bovine ribonuclease and exhibits few substitutions in the placental mammals. Kangaroo ribonuclease contains no carbohydrate, although the enzyme possesses a recognition site for carbohydrate attachment in the sequence Asn-Val-Thr (62-64). The enzyme differs at about 35-40% of the positions from all other mammalian pancreatic ribonucleases sequenced to date, which is in agreement with the early divergence between the marsupials and the placental mammals. From fragmentary data a tentative sequence of red-necked wallaby (Macropus rufogriseus) pancreatic ribonuclease has been derived. Eight differences with the kangaroo sequence were found. PMID:658039

  5. Prediction of protein modification sites of pyrrolidone carboxylic acid using mRMR feature selection and analysis.

    Lu-Lu Zheng

    Full Text Available Pyrrolidone carboxylic acid (PCA is formed during a common post-translational modification (PTM of extracellular and multi-pass membrane proteins. In this study, we developed a new predictor to predict the modification sites of PCA based on maximum relevance minimum redundancy (mRMR and incremental feature selection (IFS. We incorporated 727 features that belonged to 7 kinds of protein properties to predict the modification sites, including sequence conservation, residual disorder, amino acid factor, secondary structure and solvent accessibility, gain/loss of amino acid during evolution, propensity of amino acid to be conserved at protein-protein interface and protein surface, and deviation of side chain carbon atom number. Among these 727 features, 244 features were selected by mRMR and IFS as the optimized features for the prediction, with which the prediction model achieved a maximum of MCC of 0.7812. Feature analysis showed that all feature types contributed to the modification process. Further site-specific feature analysis showed that the features derived from PCA's surrounding sites contributed more to the determination of PCA sites than other sites. The detailed feature analysis in this paper might provide important clues for understanding the mechanism of the PCA formation and guide relevant experimental validations.

  6. Anti-peptide aptamers recognize amino acid sequence and bind a protein epitope.

    Xu, W; Ellington, A. D.

    1996-01-01

    In vitro selection of nucleic acid binding species (aptamers) is superficially similar to the immune response. Both processes produce biopolymers that can recognize targets with high affinity and specificity. While antibodies are known to recognize the sequence and conformation of protein surface features (epitopes), very little is known about the precise interactions between aptamers and their epitopes. Therefore, aptamers that could recognize a particular epitope, a peptide fragment of huma...

  7. Effective automated feature construction and selection for classification of biological sequences.

    Uday Kamath

    Full Text Available Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features.We present an algorithmic framework (EFFECT for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not.To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification

  8. Feature-Based Classification of Amino Acid Substitutions outside Conserved Functional Protein Domains

    Branislava Gemovic

    2013-01-01

    Full Text Available There are more than 500 amino acid substitutions in each human genome, and bioinformatics tools irreplaceably contribute to determination of their functional effects. We have developed feature-based algorithm for the detection of mutations outside conserved functional domains (CFDs and compared its classification efficacy with the most commonly used phylogeny-based tools, PolyPhen-2 and SIFT. The new algorithm is based on the informational spectrum method (ISM, a feature-based technique, and statistical analysis. Our dataset contained neutral polymorphisms and mutations associated with myeloid malignancies from epigenetic regulators ASXL1, DNMT3A, EZH2, and TET2. PolyPhen-2 and SIFT had significantly lower accuracies in predicting the effects of amino acid substitutions outside CFDs than expected, with especially low sensitivity. On the other hand, only ISM algorithm showed statistically significant classification of these sequences. It outperformed PolyPhen-2 and SIFT by 15% and 13%, respectively. These results suggest that feature-based methods, like ISM, are more suitable for the classification of amino acid substitutions outside CFDs than phylogeny-based tools.

  9. ANTICALIgN: visualizing, editing and analyzing combined nucleotide and amino acid sequence alignments for combinatorial protein engineering.

    Jarasch, Alexander; Kopp, Melanie; Eggenstein, Evelyn; Richter, Antonia; Gebauer, Michaela; Skerra, Arne

    2016-07-01

    ANTIC ALIGN: is an interactive software developed to simultaneously visualize, analyze and modify alignments of DNA and/or protein sequences that arise during combinatorial protein engineering, design and selection. ANTIC ALIGN: combines powerful functions known from currently available sequence analysis tools with unique features for protein engineering, in particular the possibility to display and manipulate nucleotide sequences and their translated amino acid sequences at the same time. ANTIC ALIGN: offers both template-based multiple sequence alignment (MSA), using the unmutated protein as reference, and conventional global alignment, to compare sequences that share an evolutionary relationship. The application of similarity-based clustering algorithms facilitates the identification of duplicates or of conserved sequence features among a set of selected clones. Imported nucleotide sequences from DNA sequence analysis are automatically translated into the corresponding amino acid sequences and displayed, offering numerous options for selecting reading frames, highlighting of sequence features and graphical layout of the MSA. The MSA complexity can be reduced by hiding the conserved nucleotide and/or amino acid residues, thus putting emphasis on the relevant mutated positions. ANTIC ALIGN: is also able to handle suppressed stop codons or even to incorporate non-natural amino acids into a coding sequence. We demonstrate crucial functions of ANTIC ALIGN: in an example of Anticalins selected from a lipocalin random library against the fibronectin extradomain B (ED-B), an established marker of tumor vasculature. Apart from engineered protein scaffolds, ANTIC ALIGN: provides a powerful tool in the area of antibody engineering and for directed enzyme evolution. PMID:27261456

  10. SeqVISTA: a graphical tool for sequence feature visualization and comparison

    Niu Tianhua

    2003-01-01

    Full Text Available Abstract Background Many readers will sympathize with the following story. You are viewing a gene sequence in Entrez, and you want to find whether it contains a particular sequence motif. You reach for the browser's "find in page" button, but those darn spaces every 10 bp get in the way. And what if the motif is on the opposite strand? Subsequently, your favorite sequence analysis software informs you that there is an interesting feature at position 13982–14013. By painstakingly counting the 10 bp blocks, you are able to examine the sequence at this location. But now you want to see what other features have been annotated close by, and this information is buried several screenfuls higher up the web page. Results SeqVISTA presents a holistic, graphical view of features annotated on nucleotide or protein sequences. This interactive tool highlights the residues in the sequence that correspond to features chosen by the user, and allows easy searching for sequence motifs or extraction of particular subsequences. SeqVISTA is able to display results from diverse sequence analysis tools in an integrated fashion, and aims to provide much-needed unity to the bioinformatics resources scattered around the Internet. Our viewer may be launched on a GenBank record by a single click of a button installed in the web browser. Conclusion SeqVISTA allows insights to be gained by viewing the totality of sequence annotations and predictions, which may be more revealing than the sum of their parts. SeqVISTA runs on any operating system with a Java 1.4 virtual machine. It is freely available to academic users at http://zlab.bu.edu/SeqVISTA.

  11. Analysis on n-gram statistics and linguistic features of whole genome protein sequences

    DONG Qi-wen; WANG Xiao-long; LIN Lei

    2008-01-01

    To obtain the statistical sequence analysis on a large number of genomic and proteomie sequences available for different organisms,the n-grams of whole genome protein sequences from 20 organisms were extracted.Their linguistic features were analyzed by two tests:Zipf power law and Shannon entropy,developed for analysis of natural languages and symbolic sequences.The natural genome proteins and the artificial genome proteins were compared with each other and some statistical features of n-grams were discovered.The results show that:the n-grams of whole genome protein sequences approximately follow the Zipf law when n is larger than 4;the Shannon n-gram entropy of natural genome proteins is lower than that of artificial proteins;a simple unigram model can distinguish different organisms;there exist organism-specific usages of "phrases" in protein sequences.It is suggested that further detailed analysis on n-gram of whole genome protein sequences will result in a powerful model for mapping the relationship of protein sequence,structure and function.

  12. Deoxyribonucleic acid sequence mapping on metaphase chromosomes by immunoelectron microscopy

    Nucleic acid sequences can be localized on chromosomes in the electron microscope after hybridization with a biotinylated DNA probe followed by detection with a primary antibiotin antibody and a secondary antibody coupled to colloidal gold. Hybridization probes can also be labelled with alternative ligands such as N-acetoxy-2-acetylaminofluorene (AAF), Dinitrophenyl-dUTP and Digoxigenin-dUTP. Multiple labelling is possible if these differently modified DNA probes are used in conjunction with colloidal gold preparations of varying particle sizes. A substantial signal amplification can be achieved by incubating preparations with successive cycles of primary antibiotin antibody followed by a biotinylated secondary antibody. Detection is with Streptavidin-gold, and in the case of highly and moderately repeated sequences, the signal is visible in the light microscope. Detailed protocols are given for EM in-situ hybridization to whole mount metaphase chromosomes and include instructions necessary to perform multiple sequence localization and signal amplification

  13. What are the basic modules of implicit sequence learning? A feature-based account

    Eberhardt, Katharina

    2015-01-01

    According to the Theory of Event Coding (TEC; Hommel et al., 2001), action and perception are represented in a shared format in the cognitive system by means of feature codes. In implicit sequence learning research, it is still common to make a conceptual difference between independent motor and perceptual sequences. This supposedly independent learning takes place in encapsulated modules (Keele et al., 2003) which process information along single dimensions. These dimensions have remained un...

  14. An Improved Management Model for Tracking Missing Features in Computer Vision Long Image Sequences

    Pinho, Raquel R.; João Manuel R. S. Tavares; Correia, Miguel V.

    2006-01-01

    In this paper we present a management model to deal with the problem of tracking missing features during long image sequences using Computational Vision. Some usual difficulties related with missing features are that they may be temporarily occluded or might even have disappeared definitively, and the computational cost involved should always be reduced to the strictly necessary. The proposed Net Present Value (NPV) model, based on the economic Theory of Capital, considers the tracking of eac...

  15. Auto-pooling: Learning to Improve Invariance of Image Features from Image Sequences

    Sukhbaatar, Sainbayar; Makino, Takaki; Aihara, Kazuyuki

    2013-01-01

    Learning invariant representations from images is one of the hardest challenges facing computer vision. Spatial pooling is widely used to create invariance to spatial shifting, but it is restricted to convolutional models. In this paper, we propose a novel pooling method that can learn soft clustering of features from image sequences. It is trained to improve the temporal coherence of features, while keeping the information loss at minimum. Our method does not use spatial information, so it c...

  16. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  17. Nucleotide sequence and corresponding amino acid sequence of the gene for the major antigen of foot and mouth disease virus.

    Kurz, C; Forss, S; Küpper, H; K Strohmaier; Schaller, H

    1981-01-01

    A segment of 1160 nucleotides of the FMDV genome has been sequenced using three overlapping fragments of cloned cDNA from FMDV strain O1K. This sequence contains the coding sequence for the viral capsid protein VP1 as shown by its homology to known and newly determined amino acid sequences from this man antigenic polypeptide of the FMDV virion. The structural gene for VP1 comprises 639 nucleotides which specify a sequence of 213 amino acids for the VP1 protein. The coding sequence is not flan...

  18. Correlation between fibroin amino acid sequence and physical silk properties.

    Fedic, Robert; Zurovec, Michal; Sehnal, Frantisek

    2003-09-12

    The fiber properties of lepidopteran silk depend on the amino acid repeats that interact during H-fibroin polymerization. The aim of our research was to relate repeat composition to insect biology and fiber strength. Representative regions of the H-fibroin genes were sequenced and analyzed in three pyralid species: wax moth (Galleria mellonella), European flour moth (Ephestia kuehniella), and Indian meal moth (Plodia interpunctella). The amino acid repeats are species-specific, evidently a diversification of an ancestral region of 43 residues, and include three types of regularly dispersed motifs: modifications of GSSAASAA sequence, stretches of tripeptides GXZ where X and Z represent bulky residues, and sequences similar to PVIVIEE. No concatenations of GX dipeptide or alanine, which are typical for Bombyx silkworms and Antheraea silk moths, respectively, were found. Despite different repeat structure, the silks of G. mellonella and E. kuehniella exhibit similar tensile strength as the Bombyx and Antheraea silks. We suggest that in these latter two species, variations in the repeat length obstruct repeat alignment, but sufficiently long stretches of iterated residues get superposed to interact. In the pyralid H-fibroins, interactions of the widely separated and diverse motifs depend on the precision of repeat matching; silk is strong in G. mellonella and E. kuehniella, with 2-3 types of long homogeneous repeats, and nearly 10 times weaker in P. interpunctella, with seven types of shorter erratic repeats. The high proportion of large amino acids in the H-fibroin of pyralids has probably evolved in connection with the spinning habit of caterpillars that live in protective silk tubes and spin continuously, enlarging the tubes on one end and partly devouring the other one. The silk serves as a depot of energetically rich and essential amino acids that may be scarce in the diet. PMID:12816957

  19. Chicken TAP genes differ from their human orthologues in locus organisation, size, sequence features and polymorphism.

    Walker, Brian A; van Hateren, Andrew; Milne, Sarah; Beck, Stephan; Kaufman, Jim

    2005-05-01

    We have previously shown that in the chicken major histocompatibility complex, the two transporters associated with antigen processing genes (TAP1 and TAP2) are located head to head between two classical class I genes. Here we show that the region between these two TAP genes has transcription factor-binding sites in common with class I gene promoters. The TAP genes are also up-regulated by interferon-gamma in a similar way to mammalian TAP genes and in a way that suggests they are both transcribed from a bi-directional promoter. The gene structures of TAP1 and TAP2 differ from that of human TAPs in that TAP1 has a truncated exon 1 and TAP2 has fused exons, resulting in a much smaller gene size. The truncation of TAP1 results in the loss of approximately 150 amino acids, which are thought to be involved in endoplasmic reticulum retention, heterodimer formation and tapasin binding, compared to human TAP1. Most of the protein sequence features involved in binding ATP are conserved, with two exceptions: chicken TAP1 has a glycine in the switch region where other TAPs have glutamine or histidine, and both chicken TAP genes have serines in the C motif where mammalian TAP2 has an alanine. Lastly, the chicken TAP genes are highly polymorphic, with at least as many TAP alleles as there are class I alleles, as seen by investigating nine inbred lines of chicken. The close proximity of the TAP genes to the class I genes and the high level of polymorphism may allow co-evolution of the genes, allowing TAP molecules to transport peptides specifically for the class I molecules of that haplotype. PMID:15900495

  20. Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs

    Ruan Jishou

    2007-04-01

    Full Text Available Abstract Background Traditionally, it is believed that the native structure of a protein corresponds to a global minimum of its free energy. However, with the growing number of known tertiary (3D protein structures, researchers have discovered that some proteins can alter their structures in response to a change in their surroundings or with the help of other proteins or ligands. Such structural shifts play a crucial role with respect to the protein function. To this end, we propose a machine learning method for the prediction of the flexible/rigid regions of proteins (referred to as FlexRP; the method is based on a novel sequence representation and feature selection. Knowledge of the flexible/rigid regions may provide insights into the protein folding process and the 3D structure prediction. Results The flexible/rigid regions were defined based on a dataset, which includes protein sequences that have multiple experimental structures, and which was previously used to study the structural conservation of proteins. Sequences drawn from this dataset were represented based on feature sets that were proposed in prior research, such as PSI-BLAST profiles, composition vector and binary sequence encoding, and a newly proposed representation based on frequencies of k-spaced amino acid pairs. These representations were processed by feature selection to reduce the dimensionality. Several machine learning methods for the prediction of flexible/rigid regions and two recently proposed methods for the prediction of conformational changes and unstructured regions were compared with the proposed method. The FlexRP method, which applies Logistic Regression and collocation-based representation with 95 features, obtained 79.5% accuracy. The two runner-up methods, which apply the same sequence representation and Support Vector Machines (SVM and Naïve Bayes classifiers, obtained 79.2% and 78.4% accuracy, respectively. The remaining considered methods are

  1. Amino acid sequences used for clusterintg (Multi FASTA format) - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Full Text Available Gclust Server Amino acid sequences used for clusterintg (Multi FASTA format) Data detail Data name Amino acid sequences use... Site Policy | Contact Us Amino acid sequences used for clusterintg (Multi FASTA format) - Gclust Server | LSDB Archive ...

  2. Loop-sequence features and stability determinants in antibody variable domains by high-throughput experiments.

    Chang, Hung-Ju; Jian, Jhih-Wei; Hsu, Hung-Ju; Lee, Yu-Ching; Chen, Hong-Sen; You, Jhong-Jhe; Hou, Shin-Chen; Shao, Chih-Yun; Chen, Yen-Ju; Chiu, Kuo-Ping; Peng, Hung-Pin; Lee, Kuo Hao; Yang, An-Suei

    2014-01-01

    Protein loops are frequently considered as critical determinants in protein structure and function. Recent advances in high-throughput methods for DNA sequencing and thermal stability measurement have enabled effective exploration of sequence-structure-function relationships in local protein regions. Using these data-intensive technologies, we investigated the sequence-structure-function relationships of six complementarity-determining regions (CDRs) and ten non-CDR loops in the variable domains of a model vascular endothelial growth factor (VEGF)-binding single-chain antibody variable fragment (scFv) whose sequence had been optimized via a consensus-sequence approach. The results show that only a handful of residues involving long-range tertiary interactions distant from the antigen-binding site are strongly coupled with antigen binding. This implies that the loops are passive regions in protein folding; the essential sequences of these regions are dictated by conserved tertiary interactions and the consensus local loop-sequence features contribute little to protein stability and function. PMID:24268648

  3. SoftSearch: integration of multiple sequence features to identify breakpoints of structural variations.

    Steven N Hart

    Full Text Available BACKGROUND: Structural variation (SV represents a significant, yet poorly understood contribution to an individual's genetic makeup. Advanced next-generation sequencing technologies are widely used to discover such variations, but there is no single detection tool that is considered a community standard. In an attempt to fulfil this need, we developed an algorithm, SoftSearch, for discovering structural variant breakpoints in Illumina paired-end next-generation sequencing data. SoftSearch combines multiple strategies for detecting SV including split-read, discordant read-pair, and unmated pairs. Co-localized split-reads and discordant read pairs are used to refine the breakpoints. RESULTS: We developed and validated SoftSearch using real and synthetic datasets. SoftSearch's key features are 1 not requiring secondary (or exhaustive primary alignment, 2 portability into established sequencing workflows, and 3 is applicable to any DNA-sequencing experiment (e.g. whole genome, exome, custom capture, etc.. SoftSearch identifies breakpoints from a small number of soft-clipped bases from split reads and a few discordant read-pairs which on their own would not be sufficient to make an SV call. CONCLUSIONS: We show that SoftSearch can identify more true SVs by combining multiple sequence features. SoftSearch was able to call clinically relevant SVs in the BRCA2 gene not reported by other tools while offering significantly improved overall performance.

  4. Face Recognition from Still Images to Video Sequences: A Local-Feature-Based Framework

    Chen Shaokang

    2011-01-01

    Full Text Available Although automatic faces recognition has shown success for high-quality images under controlled conditions, for video-based recognition it is hard to attain similar levels of performance. We describe in this paper recent advances in a project being undertaken to trial and develop advanced surveillance systems for public safety. In this paper, we propose a local facial feature based framework for both still image and video-based face recognition. The evaluation is performed on a still image dataset LFW and a video sequence dataset MOBIO to compare 4 methods for operation on feature: feature averaging (Avg-Feature, Mutual Subspace Method (MSM, Manifold to Manifold Distance (MMS, and Affine Hull Method (AHM, and 4 methods for operation on distance on 3 different features. The experimental results show that Multi-region Histogram (MRH feature is more discriminative for face recognition compared to Local Binary Patterns (LBP and raw pixel intensity. Under the limitation on a small number of images available per person, feature averaging is more reliable than MSM, MMD, and AHM and is much faster. Thus, our proposed framework—averaging MRH feature is more suitable for CCTV surveillance systems with constraints on the number of images and the speed of processing.

  5. Rigorous assessment and integration of the sequence and structure based features to predict hot spots

    Wang Yong

    2011-07-01

    Full Text Available Abstract Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes. While in Ab- dataset (antigen-antibody complexes are excluded, there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs. The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine

  6. ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

    Meiler Arno

    2012-09-01

    Full Text Available Abstract Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.

  7. A Novel Method of Predicting Protein Disordered Regions Based on Sequence Features

    Tong-Hui Zhao

    2013-01-01

    Full Text Available With a large number of disordered proteins and their important functions discovered, it is highly desired to develop effective methods to computationally predict protein disordered regions. In this study, based on Random Forest (RF, Maximum Relevancy Minimum Redundancy (mRMR, and Incremental Feature Selection (IFS, we developed a new method to predict disordered regions in proteins. The mRMR criterion was used to rank the importance of all candidate features. Finally, top 128 features were selected from the ranked feature list to build the optimal model, including 92 Position Specific Scoring Matrix (PSSM conservation score features and 36 secondary structure features. As a result, Matthews correlation coefficient (MCC of 0.3895 was achieved on the training set by 10-fold cross-validation. On the basis of predicting results for each query sequence by using the method, we used the scanning and modification strategy to improve the performance. The accuracy (ACC and MCC were increased by 4% and almost 0.2%, respectively, compared with other three popular predictors: DISOPRED, DISOclust, and OnD-CRF. The selected features may shed some light on the understanding of the formation mechanism of disordered structures, providing guidelines for experimental validation.

  8. Using expected sequence features to improve basecalling accuracy of amplicon pyrosequencing data

    Rask, Thomas Salhøj; Petersen, Bent; Chen, Donald S.;

    2016-01-01

    insertions and deletions, are on the other hand likely to disrupt open reading frames. Such an inverse relationship between errors and expectation based on prior knowledge can be used advantageously to guide the process known as basecalling, i.e. the inference of nucleotide sequence from raw sequencing data......Amplicon pyrosequencing targets a known genetic region and thus inherently produces reads highly anticipated to have certain features, such as conserved nucleotide sequence, and in the case of protein coding DNA, an open reading frame. Pyrosequencing errors, consisting mainly of nucleotide....... This probabilistic approach enables integration of basecalling into a larger model where other parameters can be incorporated, such as the likelihood for observing a full-length open reading frame at the targeted region. We apply the method to 454 amplicon pyrosequencing data obtained from a malaria...

  9. Prediction of protein motions from amino acid sequence and its application to protein-protein interaction

    Wako Hiroshi

    2010-07-01

    Full Text Available Abstract Background Structural flexibility is an important characteristic of proteins because it is often associated with their function. The movement of a polypeptide segment in a protein can be broken down into two types of motions: internal and external ones. The former is deformation of the segment itself, but the latter involves only rotational and translational motions as a rigid body. Normal Model Analysis (NMA can derive these two motions, but its application remains limited because it necessitates the gathering of complete structural information. Results In this work, we present a novel method for predicting two kinds of protein motions in ordered structures. The prediction uses only information from the amino acid sequence. We prepared a dataset of the internal and external motions of segments in many proteins by application of NMA. Subsequently, we analyzed the relation between thermal motion assessed from X-ray crystallographic B-factor and internal/external motions calculated by NMA. Results show that attributes of amino acids related to the internal motion have different features from those related to the B-factors, although those related to the external motion are correlated strongly with the B-factors. Next, we developed a method to predict internal and external motions from amino acid sequences based on the Random Forest algorithm. The proposed method uses information associated with adjacent amino acid residues and secondary structures predicted from the amino acid sequence. The proposed method exhibited moderate correlation between predicted internal and external motions with those calculated by NMA. It has the highest prediction accuracy compared to a naïve model and three published predictors. Conclusions Finally, we applied the proposed method predicting the internal motion to a set of 20 proteins that undergo large conformational change upon protein-protein interaction. Results show significant overlaps between the

  10. Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences

    Chen, Peng

    2013-07-23

    Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. © 2013 Wiley Periodicals, Inc.

  11. Mapping genomic features to functional traits through microbial whole genome sequences.

    Zhang, Wei; Zeng, Erliang; Liu, Dan; Jones, Stuart E; Emrich, Scott

    2014-01-01

    Recently, the utility of trait-based approaches for microbial communities has been identified. Increasing availability of whole genome sequences provide the opportunity to explore the genetic foundations of a variety of functional traits. We proposed a machine learning framework to quantitatively link the genomic features with functional traits. Genes from bacteria genomes belonging to different functional traits were grouped to Cluster of Orthologs (COGs), and were used as features. Then, TF-IDF technique from the text mining domain was applied to transform the data to accommodate the abundance and importance of each COG. After TF-IDF processing, COGs were ranked using feature selection methods to identify their relevance to the functional trait of interest. Extensive experimental results demonstrated that functional trait related genes can be detected using our method. Further, the method has the potential to provide novel biological insights. PMID:24989863

  12. Complete mitochondrial DNA sequences of six snakes: phylogenetic relationships and molecular evolution of genomic features.

    Dong, Songyu; Kumazawa, Yoshinori

    2005-07-01

    Complete mitochondrial DNA (mtDNA) sequences were determined for representative species from six snake families: the acrochordid little file snake, the bold boa constrictor, the cylindrophiid red pipe snake, the viperid himehabu, the pythonid ball python, and the xenopeltid sunbeam snake. Thirteen protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 2 control regions were identified in these mtDNAs. Duplication of the control region and translocation of the tRNALeu gene were two notable features of the snake mtDNAs. The duplicate control regions had nearly identical nucleotide sequences within species but they were divergent among species, suggesting concerted sequence evolution of the two control regions. In addition, the duplicate control regions appear to have facilitated an interchange of some flanking tRNA genes in the viperid lineage. Phylogenetic analyses were conducted using a large number of sites (9570 sites in total) derived from the complete mtDNA sequences. Our data strongly suggested a new phylogenetic relationship among the major families of snakes: ((((Viperidae, Colubridae), Acrochordidae), (((Pythonidae, Xenopeltidae), Cylindrophiidae), Boidae)), Leptotyphlopidae). This conclusion was distinct from a widely accepted view based on morphological characters in denying the sister-group relationship of boids and pythonids, as well as the basal divergence of nonmacrostomatan cylindrophiids. These results imply the significance to reconstruct the snake phylogeny with ample molecular data, such as those from complete mtDNA sequences. PMID:16007493

  13. What Matters in Implicit Task Sequence Learning: Perceptual Stimulus Features, Task Sets, or Correlated Streams of Information?

    Weiermann, Brigitte; Cock, Josephine; Meier, Beat

    2010-01-01

    Implicit task sequence learning may be attributed to learning the order of perceptual stimulus features associated with the task sequence, learning a series of automatic task set activations, or learning an integrated sequence that derives from 2 correlated streams of information. In the present study, our purpose was to distinguish among these 3…

  14. SeqFeatR for the Discovery of Feature-Sequence Associations.

    Budeus, Bettina; Timm, Jörg; Hoffmann, Daniel

    2016-01-01

    Specific selection pressures often lead to specifically mutated genomes. The open source software SeqFeatR has been developed to identify associations between mutation patterns in biological sequences and specific selection pressures ("features"). For instance, SeqFeatR has been used to discover in viral protein sequences new T cell epitopes for hosts of given HLA types. SeqFeatR supports frequentist and Bayesian methods for the discovery of statistical sequence-feature associations. Moreover, it offers novel ways to visualize results of the statistical analyses and to relate them to further properties. In this article we demonstrate various functions of SeqFeatR with real data. The most frequently used set of functions is also provided by a web server. SeqFeatR is implemented as R package and freely available from the R archive CRAN (http://cran.r-project.org/web/packages/SeqFeatR/index.html). The package includes a tutorial vignette. The software is distributed under the GNU General Public License (version 3 or later). The web server URL is https://seqfeatr.zmb.uni-due.de. PMID:26731669

  15. Sequencing of bovine herpesvirus 4 v.test strain reveals important genome features

    Gillet Laurent

    2011-08-01

    Full Text Available Abstract Background Bovine herpesvirus 4 (BoHV-4 is a useful model for the human pathogenic gammaherpesviruses Epstein-Barr virus and Kaposi's Sarcoma-associated Herpesvirus. Although genome manipulations of this virus have been greatly facilitated by the cloning of the BoHV-4 V.test strain as a Bacterial Artificial Chromosome (BAC, the lack of a complete genome sequence for this strain limits its experimental use. Methods In this study, we have determined the complete sequence of BoHV-4 V.test strain by a pyrosequencing approach. Results The long unique coding region (LUR consists of 108,241 bp encoding at least 79 open reading frames and is flanked by several polyrepetitive DNA units (prDNA. As previously suggested, we showed that the prDNA unit located at the left prDNA-LUR junction (prDNA-G differs from the other prDNA units (prDNA-inner. Namely, the prDNA-G unit lacks the conserved pac-2 cleavage and packaging signal in its right terminal region. Based on the mechanisms of cleavage and packaging of herpesvirus genomes, this feature implies that only genomes bearing left and right end prDNA units are encapsulated into virions. Conclusions In this study, we have determined the complete genome sequence of the BAC-cloned BoHV-4 V.test strain and identified genome organization features that could be important in other herpesviruses.

  16. Prediction of peptide drift time in ion mobility mass spectrometry from sequence-based features

    Wang, Bing

    2013-05-09

    Background: Ion mobility-mass spectrometry (IMMS), an analytical technique which combines the features of ion mobility spectrometry (IMS) and mass spectrometry (MS), can rapidly separates ions on a millisecond time-scale. IMMS becomes a powerful tool to analyzing complex mixtures, especially for the analysis of peptides in proteomics. The high-throughput nature of this technique provides a challenge for the identification of peptides in complex biological samples. As an important parameter, peptide drift time can be used for enhancing downstream data analysis in IMMS-based proteomics.Results: In this paper, a model is presented based on least square support vectors regression (LS-SVR) method to predict peptide ion drift time in IMMS from the sequence-based features of peptide. Four descriptors were extracted from peptide sequence to represent peptide ions by a 34-component vector. The parameters of LS-SVR were selected by a grid searching strategy, and a 10-fold cross-validation approach was employed for the model training and testing. Our proposed method was tested on three datasets with different charge states. The high prediction performance achieve demonstrate the effectiveness and efficiency of the prediction model.Conclusions: Our proposed LS-SVR model can predict peptide drift time from sequence information in relative high prediction accuracy by a test on a dataset of 595 peptides. This work can enhance the confidence of protein identification by combining with current protein searching techniques. 2013 Wang et al.; licensee BioMed Central Ltd.

  17. Parallax Effect Free Mosaicing of Underwater Video Sequence Based on Texture Features

    Nagaraja S

    2014-10-01

    Full Text Available In this paper, we present feature-based technique for construction of mosaic image from underwater video sequence, which suffers from parallax distortion due to propagation properties of light in the underwater environment. The most of the available mosaic tools and underwater image mosaicing techniques yields final result with some artifacts such as blurring, ghosting and seam due to presence of parallax in the input images. The removal of parallax from input images may not reduce its effects instead it must be corrected in successive steps of mosaicing. Thus, our approach minimizes the parallax effects by adopting an efficient local alignment technique after global registration. We extract texture features using Centre Symmetric Local Binary Pattern (CS-LBP descriptor in order to find feature correspondences, which are used further for estimation of homography through RANSAC. In order to increase the accuracy of global registration, we perform preprocessing such as colour alignment between two selected frames based on colour distribution adjustment. Because of existence of 100% overlap in consecutive frames of underwater video, we select frames with minimum overlap based on mutual offset in order to reduce the computation cost during mosaicing. Our approach minimizes the parallax effects considerably in final mosaic constructed using our own underwater video sequences.

  18. FASTERp: A Feature Array Search Tool for Estimating Resemblance of Protein Sequences

    Macklin, Derek; Egan, Rob; Wang, Zhong

    2014-03-14

    Metagenome sequencing efforts have provided a large pool of billions of genes for identifying enzymes with desirable biochemical traits. However, homology search with billions of genes in a rapidly growing database has become increasingly computationally impractical. Here we present our pilot efforts to develop a novel alignment-free algorithm for homology search. Specifically, we represent individual proteins as feature vectors that denote the presence or absence of short kmers in the protein sequence. Similarity between feature vectors is then computed using the Tanimoto score, a distance metric that can be rapidly computed on bit string representations of feature vectors. Preliminary results indicate good correlation with optimal alignment algorithms (Spearman r of 0.87, ~;;1,000,000 proteins from Pfam), as well as with heuristic algorithms such as BLAST (Spearman r of 0.86, ~;;1,000,000 proteins). Furthermore, a prototype of FASTERp implemented in Python runs approximately four times faster than BLAST on a small scale dataset (~;;1000 proteins). We are optimizing and scaling to improve FASTERp to enable rapid homology searches against billion-protein databases, thereby enabling more comprehensive gene annotation efforts.

  19. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    2010-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  20. Contig sequences and their annotation (amino acid sequence and results of homology search), and expression profile - Dicty_cDB | LSDB Archive [Life Science Database Archive metadata

    Full Text Available Dicty_cDB Contig sequences and their annotation (amino acid sequence and results of homology search), and ex...pression profile Data detail Data name Contig sequences and their annotation (amino acid sequence and result... sequences of cDNA sequences of Dictyostelium discoideum and their annotation (amino acid sequence and resul...ence and full-length cDNA sequence by the assembly program Phrap ( http://www.phrap.org/index.html ). Link to the... list of clones constituting the contig, the information on its mapping to the genome mapped to genome sequence and the

  1. Topological features of proteins from amino acid residue networks

    Alves, N A; Alves, Nelson Augusto; Martinez, Alexandre Souto

    2006-01-01

    Topological properties of native folds are obtained from statistical analysis of 160 low homology proteins covering the four structural classes. This is done analysing one, two and three-vertex joint distribution of quantities related to the corresponding network of amino acid residues. Emphasis on the amino acid residue hydrophobicity leads to the definition of their center of mass as vertices in this contact network model with interactions represented by edges. The network analysis helps us to interpret experimental results such as hydrophobic scales and fraction of buried accessible surface area in terms of the network connectivity. To explore the vertex type dependent correlations, we build a network of hydrophobic and polar vertices. This procedure presents the wiring diagram of the topological structure of globular proteins leading to the following attachment probabilities between hydrophobic-hydrophobic 0.424(5), hydrophobic-polar 0.419(2) and polar-polar 0.157(3) residues.

  2. Unique features of a global human ectoparasite identified through sequencing of the bed bug genome.

    Benoit, Joshua B; Adelman, Zach N; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C; Szuter, Elise M; Hagan, Richard W; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M; Nelson, David R; Rosendale, Andrew J; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R; Ioannidis, Panagiotis; Waterhouse, Robert M; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J Spencer; Gondhalekar, Ameya D; Scharf, Michael E; Peterson, Brittany F; Raje, Kapil R; Hottel, Benjamin A; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S T; Duncan, Elizabeth J; Murali, Shwetha C; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C; Muzny, Donna M; Wheeler, David; Panfilio, Kristen A; Vargas Jentzsch, Iris M; Vargo, Edward L; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T; Anderson, Michelle A E; Jones, Jeffery W; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D; Attardo, Geoffrey M; Robertson, Hugh M; Zdobnov, Evgeny M; Ribeiro, Jose M C; Gibbs, Richard A; Werren, John H; Palli, Subba R; Schal, Coby; Richards, Stephen

    2016-01-01

    The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host-symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human-bed bug and symbiont-bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite. PMID:26836814

  3. Unique features of a global human ectoparasite identified through sequencing of the bed bug genome

    Benoit, Joshua B.; Adelman, Zach N.; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C.; Szuter, Elise M.; Hagan, Richard W.; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M.; Nelson, David R.; Rosendale, Andrew J.; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M.; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R.; Ioannidis, Panagiotis; Waterhouse, Robert M.; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J. Spencer; Gondhalekar, Ameya D.; Scharf, Michael E.; Peterson, Brittany F.; Raje, Kapil R.; Hottel, Benjamin A.; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S. T.; Duncan, Elizabeth J.; Murali, Shwetha C.; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L.; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C.; Muzny, Donna M.; Wheeler, David; Panfilio, Kristen A.; Vargas Jentzsch, Iris M.; Vargo, Edward L.; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T.; Anderson, Michelle A. E.; Jones, Jeffery W.; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D.; Attardo, Geoffrey M.; Robertson, Hugh M.; Zdobnov, Evgeny M.; Ribeiro, Jose M. C.; Gibbs, Richard A.; Werren, John H.; Palli, Subba R.; Schal, Coby; Richards, Stephen

    2016-01-01

    The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host–symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human–bed bug and symbiont–bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite. PMID:26836814

  4. Clostridium sticklandii, a specialist in amino acid degradation:revisiting its metabolism through its genome sequence

    Pelletier Eric

    2010-10-01

    Full Text Available Abstract Background Clostridium sticklandii belongs to a cluster of non-pathogenic proteolytic clostridia which utilize amino acids as carbon and energy sources. Isolated by T.C. Stadtman in 1954, it has been generally regarded as a "gold mine" for novel biochemical reactions and is used as a model organism for studying metabolic aspects such as the Stickland reaction, coenzyme-B12- and selenium-dependent reactions of amino acids. With the goal of revisiting its carbon, nitrogen, and energy metabolism, and comparing studies with other clostridia, its genome has been sequenced and analyzed. Results C. sticklandii is one of the best biochemically studied proteolytic clostridial species. Useful additional information has been obtained from the sequencing and annotation of its genome, which is presented in this paper. Besides, experimental procedures reveal that C. sticklandii degrades amino acids in a preferential and sequential way. The organism prefers threonine, arginine, serine, cysteine, proline, and glycine, whereas glutamate, aspartate and alanine are excreted. Energy conservation is primarily obtained by substrate-level phosphorylation in fermentative pathways. The reactions catalyzed by different ferredoxin oxidoreductases and the exergonic NADH-dependent reduction of crotonyl-CoA point to a possible chemiosmotic energy conservation via the Rnf complex. C. sticklandii possesses both the F-type and V-type ATPases. The discovery of an as yet unrecognized selenoprotein in the D-proline reductase operon suggests a more detailed mechanism for NADH-dependent D-proline reduction. A rather unusual metabolic feature is the presence of genes for all the enzymes involved in two different CO2-fixation pathways: C. sticklandii harbours both the glycine synthase/glycine reductase and the Wood-Ljungdahl pathways. This unusual pathway combination has retrospectively been observed in only four other sequenced microorganisms. Conclusions Analysis of the C

  5. Poly(A) motif prediction using spectral latent features from human DNA sequences

    Xie, Bo

    2013-06-21

    Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ?30% fewer error predictions relative to the other

  6. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    Myers, G.; Foley, B.; Korber, B. [eds.] [Los Alamos National Lab., NM (United States). Theoretical Div.; Mellors, J.W. [ed.] [Univ. of Pittsburgh, PA (United States); Jeang, K.T. [ed.] [National Institutes of Health, Bethesda, MD (United States). Molecular Virology Section; Wain-Hobson, S. [Pasteur Inst., Paris (France)] [ed.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.

  7. Natural vs. random protein sequences: Discovering combinatorics properties on amino acid words.

    Santoni, Daniele; Felici, Giovanni; Vergni, Davide

    2016-02-21

    Casual mutations and natural selection have driven the evolution of protein amino acid sequences that we observe at present in nature. The question about which is the dominant force of proteins evolution is still lacking of an unambiguous answer. Casual mutations tend to randomize protein sequences while, in order to have the correct functionality, one expects that selection mechanisms impose rigid constraints on amino acid sequences. Moreover, one also has to consider that the space of all possible amino acid sequences is so astonishingly large that it could be reasonable to have a well tuned amino acid sequence indistinguishable from a random one. In order to study the possibility to discriminate between random and natural amino acid sequences, we introduce different measures of association between pairs of amino acids in a sequence, and apply them to a dataset of 1047 natural protein sequences and 10,470 random sequences, carefully generated in order to preserve the relative length and amino acid distribution of the natural proteins. We analyze the multidimensional measures with machine learning techniques and show that, to a reasonable extent, natural protein sequences can be differentiated from random ones. PMID:26656109

  8. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza.

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  9. Conservation of uORF repressiveness and sequence features in mouse, human and zebrafish.

    Chew, Guo-Liang; Pauli, Andrea; Schier, Alexander F

    2016-01-01

    Upstream open reading frames (uORFs) are ubiquitous repressive genetic elements in vertebrate mRNAs. While much is known about the regulation of individual genes by their uORFs, the range of uORF-mediated translational repression in vertebrate genomes is largely unexplored. Moreover, it is unclear whether the repressive effects of uORFs are conserved across species. To address these questions, we analyse transcript sequences and ribosome profiling data from human, mouse and zebrafish. We find that uORFs are depleted near coding sequences (CDSes) and have initiation contexts that diminish their translation. Linear modelling reveals that sequence features at both uORFs and CDSes modulate the translation of CDSes. Moreover, the ratio of translation over 5' leaders and CDSes is conserved between human and mouse, and correlates with the number of uORFs. These observations suggest that the prevalence of vertebrate uORFs may be explained by their conserved role in repressing CDS translation. PMID:27216465

  10. SeqFeatR for the Discovery of Feature-Sequence Associations

    Budeus, Bettina; Timm, Jörg; Hoffmann, Daniel

    2016-01-01

    Specific selection pressures often lead to specifically mutated genomes. The open source software SeqFeatR has been developed to identify associations between mutation patterns in biological sequences and specific selection pressures (“features”). For instance, SeqFeatR has been used to discover in viral protein sequences new T cell epitopes for hosts of given HLA types. SeqFeatR supports frequentist and Bayesian methods for the discovery of statistical sequence-feature associations. Moreover, it offers novel ways to visualize results of the statistical analyses and to relate them to further properties. In this article we demonstrate various functions of SeqFeatR with real data. The most frequently used set of functions is also provided by a web server. SeqFeatR is implemented as R package and freely available from the R archive CRAN (http://cran.r-project.org/web/packages/SeqFeatR/index.html). The package includes a tutorial vignette. The software is distributed under the GNU General Public License (version 3 or later). The web server URL is https://seqfeatr.zmb.uni-due.de. PMID:26731669

  11. A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences

    B.Umamageswari

    2012-04-01

    Full Text Available Large-scale analysis of genome sequences is in progress around the world, the major application of which is to establish the evolutionary relationship among the species using phylogenetic trees. Hierarchical agglomerative algorithms can be used to generate such phylogenetic trees given the distance matrix representing the dissimilarity among the species. ClustalW and Muscle are two general purpose programs that generates distance matrix from the input DNA or protein sequences. The limitation of these programs is that they are based on Smith-Waterman algorithm which uses dynamic programming for doing the pair-wise alignment. This is an extremely time consuming process and the existing systems may even fail to work for larger input data set. To overcome this limitation, we have used the frequency of codons usage as an approximation to find dissimilarity among species. The proposed technique further reduces the complexity by extracting only the significant features of the species from the mtDNA sequences using the techniques like frequent codons, codons with maximum range value or PCA technique. We have observed that the proposed system produces nearly accurate results in a significantly reduced running time.

  12. Host-Associated Genomic Features of the Novel Uncultured Intracellular Pathogen Ca. Ichthyocystis Revealed by Direct Sequencing of Epitheliocysts.

    Qi, Weihong; Vaughan, Lloyd; Katharios, Pantelis; Schlapbach, Ralph; Seth-Smith, Helena M B

    2016-01-01

    Advances in single-cell and mini-metagenome sequencing have enabled important investigations into uncultured bacteria. In this study, we applied the mini-metagenome sequencing method to assemble genome drafts of the uncultured causative agents of epitheliocystis, an emerging infectious disease in the Mediterranean aquaculture species gilthead seabream. We sequenced multiple cyst samples and constructed 11 genome drafts from a novel beta-proteobacterial lineage, Candidatus Ichthyocystis. The draft genomes demonstrate features typical of pathogenic bacteria with an obligate intracellular lifestyle: a reduced genome of up to 2.6 Mb, reduced G + C content, and reduced metabolic capacity. Reconstruction of metabolic pathways reveals that Ca Ichthyocystis genomes lack all amino acid synthesis pathways, compelling them to scavenge from the fish host. All genomes encode type II, III, and IV secretion systems, a large repertoire of predicted effectors, and a type IV pilus. These are all considered to be virulence factors, required for adherence, invasion, and host manipulation. However, no evidence of lipopolysaccharide synthesis could be found. Beyond the core functions shared within the genus, alignments showed distinction into different species, characterized by alternative large gene families. These comprise up to a third of each genome, appear to have arisen through duplication and diversification, encode many effector proteins, and are seemingly critical for virulence. Thus, Ca Ichthyocystis represents a novel obligatory intracellular pathogenic beta-proteobacterial lineage. The methods used: mini-metagenome analysis and manual annotation, have generated important insights into the lifestyle and evolution of the novel, uncultured pathogens, elucidating many putative virulence factors including an unprecedented array of novel gene families. PMID:27190004

  13. Nucleotide sequence of Crithidia fasciculata cytosol 5S ribosomal ribonucleic acid.

    MacKay, R M; Gray, M W; Doolittle, W F

    1980-01-01

    The complete nucleotide sequence of the cytosol 5S ribosomal ribonucleic acid of the trypanosomatid protozoan Crithidia fasciculata has been determined by a combination of T1-oligonucleotide catalog and gel sequencing techniques. The sequence is: GAGUACGACCAUACUUGAGUGAAAACACCAUAUCCCGUCCGAUUUGUGAAGUUAAGCACC CACAGGCUUAGUUAGUACUGAGGUCAGUGAUGACUCGGGAACCCUGAGUGCCGUACUCCCOH. This 5S ribosomal RNA is unique in having GAUU in place of the GAAC or GAUC found in all other prokaryotic and eukaryotic 5S ...

  14. Analysis of intron sequence features associated with transcriptional regulation in human genes.

    Huimin Li

    Full Text Available Although some preliminary work has revealed the potential transcriptional regulatory function of the introns in eukaryotes, additional evidences are needed to support this conjecture. In this study, we perform systemic analyses of the sequence characteristics of human introns. The results show that the first introns are generally longer and C, G and their dinucleotide compositions are over-represented relative to other introns, which are consistent with the previous findings. In addition, some new phenomena concerned with transcriptional regulation are found: i the first introns are enriched in CpG islands; and ii the percentages of the first introns containing TATA, CAAT and GC boxes are relatively higher than other position introns. The similar features of introns are observed in tissue-specific genes. The results further support that the first introns of human genes are likely to be involved in transcriptional regulation, and give an insight into the transcriptional regulatory regions of genes.

  15. GENAS: a database system for nucleic acid sequence analysis.

    Kuhara, S; Matsuo, F; Futamura, S; A. Fujita; Shinohara, T.; Takagi, T.; Sakaki, Y

    1984-01-01

    A database system, named GENAS (GENe Analyzing System), for computer analysis of sequence was constructed using Adbis which is a relational database management system (1). GENAS enables us to retrieve any sequence data from EMBL nucleotide sequence data library (2) and readily to analyze them (if necessary, together with private data) by various application programs in a interactive manner. Analysis of structure of replication origin of replicons was demonstrated using this system.

  16. Accessible surface area of proteins from purely sequence information and the importance of global features

    Faraggi, Eshel; Zhou, Yaoqi; Kloczkowski, Andrzej

    2014-03-01

    We present a new approach for predicting the accessible surface area of proteins. The novelty of this approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Rather, sequential window information and the global monomer and dimer compositions of the chain are used. We find that much of the lost accuracy due to the elimination of evolutionary information is recouped by the use of global features. Furthermore, this new predictor produces similar results for proteins with or without sequence homologs deposited in the Protein Data Bank, and hence shows generalizability. Finally, these predictions are obtained in a small fraction (1/1000) of the time required to run mutation profile based prediction. All these factors indicate the possible usability of this work in de-novo protein structure prediction and in de-novo protein design using iterative searches. Funded in part by the financial support of the National Institutes of Health through Grants R01GM072014 and R01GM073095, and the National Science Foundation through Grant NSF MCB 1071785.

  17. Accurate single-sequence prediction of solvent accessible surface area using local and global features.

    Faraggi, Eshel; Zhou, Yaoqi; Kloczkowski, Andrzej

    2014-11-01

    We present a new approach for predicting the Accessible Surface Area (ASA) using a General Neural Network (GENN). The novelty of the new approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Instead we use solely sequential window information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment-based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is tested on predicting the ASA of globular proteins and found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for GENN and ASAquick are available from Research and Information Systems at http://mamiris.com, from the SPARKS Lab at http://sparks-lab.org, and from the Battelle Center for Mathematical Medicine at http://mathmed.org. PMID:25204636

  18. Sequence features associated with the cleavage efficiency of CRISPR/Cas9 system

    Liu, Xiaoxi; Homma, Ayaka; Sayadi, Jamasb; Yang, Shu; Ohashi, Jun; Takumi, Toru

    2016-01-01

    The CRISPR-Cas9 system has recently emerged as a versatile tool for biological and medical research. In this system, a single guide RNA (sgRNA) directs the endonuclease Cas9 to a targeted DNA sequence for site-specific manipulation. In addition to this targeting function, the sgRNA has also been shown to play a role in activating the endonuclease activity of Cas9. This dual function of the sgRNA likely underlies observations that different sgRNAs have varying on-target activities. Currently, our understanding of the relationship between sequence features of sgRNAs and their on-target cleavage efficiencies remains limited, largely due to difficulties in assessing the cleavage capacity of a large number of sgRNAs. In this study, we evaluated the cleavage activities of 218 sgRNAs using in vitro Surveyor assays. We found that nucleotides at both PAM-distal and PAM-proximal regions of the sgRNA are significantly correlated with on-target efficiency. Furthermore, we also demonstrated that the genomic context of the targeted DNA, the GC percentage, and the secondary structure of sgRNA are critical factors contributing to cleavage efficiency. In summary, our study reveals important parameters for the design of sgRNAs with high on-target efficiencies, especially in the context of high throughput applications. PMID:26813419

  19. Homology between the invertible deoxyribonucleic acid sequence that controls flagellar-phase variation in Salmonella sp. and deoxyribonucleic acid sequences in other organisms.

    Szekely, E; Simon, M.

    1981-01-01

    The invertible deoxyribonucleic acid (DNA) segment cloned from Salmonella sp. was radioactively labeled and used as a probe to search for homologous sequences by Southern hybridization. Only one copy of the invertible segment could be found on the Salmonella sp. genome. Partial sequence homology with the invertible region was detected in bacteriophage Mu and P1 DNA by low-stringency hybridization. Under these conditions, no homology was detected with Escherichia coli DNA. A strain of Salmonel...

  20. Nucleotide Sequence of a Chicken Vitellogenin Gene and Derived Amino Acid Sequence of the Encoded Yolk Precursor Protein

    Schip, Fred D. van het; Samallo, John; Broos, Jaap; Ophuis, Jan; Mojet, Mart; Gruber, Max; AB, Geert

    1987-01-01

    The gene encoding the major vitellogenin from chicken has been completely sequenced and its exon-intron organization has been established. The gene is 20,342 base-pairs long and contains 35 exons with a combined length of 5787 base-pairs. They encode the 1850-amino acid pre-peptide of vitellogenin,

  1. Characterization of mouse cellular deoxyribonucleic acid homologous to Abelson murine leukemia virus-specific sequences.

    Dale, B.; Ozanne, B.

    1981-01-01

    The genome of Abelson murine leukemia virus (A-MuLV) consists of sequences derived from both BALB/c mouse deoxyribonucleic acid and the genome of Moloney murine leukemia virus. Using deoxyribonucleic acid linear intermediates as a source of retroviral deoxyribonucleic acid, we isolated a recombinant plasmid which contained 1.9 kilobases of the 3.5-kilobase mouse-derived sequences found in A-MuLV (A-MuLV-specific sequences). We used this clone, designated pSA-17, as a probe restriction enzyme ...

  2. Representation of Protein-Sequence Information by Amino Acid Subalphabets

    Andersen, Claus A. F.; Brunak, Soren

    2004-01-01

    Within computational biology, algorithms are constructed with the aim of extracting knowledge from biological data, in particular, data generated by the large genome projects, where gene and protein sequences are produced in high volume. In this article, we explore new ways of representing protein-sequence information, using machine learning strategies, where the primary goal is the discovery of novel powerful representations for use in AI techniques. In the case of proteins and the 20 differ...

  3. A conversational system for the computer analysis of nucleic acid sequences.

    Sege, R; Söll, D.; Ruddle, F H; Queen, C

    1981-01-01

    We present a conversational system for the computer analysis of nucleic acid and protein sequences based on the well-known Queen and Korn program (1). The system can be used by persons with only minimal knowledge of computers.

  4. Amino Acid Sequence - KOME | LSDB Archive [Life Science Database Archive metadata

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us ...CE Amino acid sequence Joomla SEF URLs by Artio About This Database Database Description Download License Update History

  5. Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets.

    Melo, Francisco; Marti-Renom, Marc A

    2006-06-01

    Reduced or simplified amino acid alphabets group the 20 naturally occurring amino acids into a smaller number of representative protein residues. To date, several reduced amino acid alphabets have been proposed, which have been derived and optimized by a variety of methods. The resulting reduced amino acid alphabets have been applied to pattern recognition, generation of consensus sequences from multiple alignments, protein folding, and protein structure prediction. In this work, amino acid substitution matrices and statistical potentials were derived based on several reduced amino acid alphabets and their performance assessed in a large benchmark for the tasks of sequence alignment and fold assessment of protein structure models, using as a reference frame the standard alphabet of 20 amino acids. The results showed that a large reduction in the total number of residue types does not necessarily translate into a significant loss of discriminative power for sequence alignment and fold assessment. Therefore, some definitions of a few residue types are able to encode most of the relevant sequence/structure information that is present in the 20 standard amino acids. Based on these results, we suggest that the use of reduced amino acid alphabets may allow to increasing the accuracy of current substitution matrices and statistical potentials for the prediction of protein structure of remote homologs. PMID:16506243

  6. cDNA-derived amino acid sequences of myoglobins from nine species of whales and dolphins.

    Iwanami, Kentaro; Mita, Hajime; Yamamoto, Yasuhiko; Fujise, Yoshihiro; Yamada, Tadasu; Suzuki, Tomohiko

    2006-10-01

    We determined the myoglobin (Mb) cDNA sequences of nine cetaceans, of which six are the first reports of Mb sequences: sei whale (Balaenoptera borealis), Bryde's whale (Balaenoptera edeni), pygmy sperm whale (Kogia breviceps), Stejneger's beaked whale (Mesoplodon stejnegeri), Longman's beaked whale (Indopacetus pacificus), and melon-headed whale (Peponocephala electra), and three confirm the previously determined chemical amino acid sequences: sperm whale (Physeter macrocephalus), common minke whale (Balaenoptera acutorostrata) and pantropical spotted dolphin (Stenella attenuata). We found two types of Mb in the skeletal muscle of pantropical spotted dolphin: Mb I with the same amino acid sequence as that deposited in the protein database, and Mb II, which differs at two amino acid residues compared with Mb I. Using an alignment of the amino acid or cDNA sequences of cetacean Mb, we constructed a phylogenetic tree by the NJ method. Clustering of cetacean Mb amino acid and cDNA sequences essentially follows the classical taxonomy of cetaceans, suggesting that Mb sequence data is valid for classification of cetaceans at least to the family level. PMID:16962803

  7. Homology of amino acid sequences of rat liver cathepsins B and H with that of papain.

    Takio, K; Towatari, T; Katunuma, N.; Teller, D C; Titani, K

    1983-01-01

    The amino acid sequences of rat liver lysosomal thiol endopeptidases, cathepsins B and H, are presented and compared with that of the plant thiol protease papain. The 252-residue sequence of cathepsin B and the 220-residue sequence of cathepsin H were determined largely by automated Edman degradation of their intact polypeptide chains and of the two chains of each enzyme generated by limited proteolysis. Subfragments of the chains were produced by enzymatic digestion and by chemical cleavage ...

  8. Complete amino acid sequence of human intestinal aminopeptidase N as deduced from cloned cDNA

    Cowell, G M; Kønigshøfer, E; Danielsen, E M;

    1988-01-01

    The complete primary structure (967 amino acids) of an intestinal human aminopeptidase N (EC 3.4.11.2) was deduced from the sequence of a cDNA clone. Aminopeptidase N is anchored to the microvillar membrane via an uncleaved signal for membrane insertion. A domain constituting amino acid 250...

  9. Representation of protein-sequence information by amino acid subalphabets

    Andersen, C.A.F.; Brunak, Søren

    2004-01-01

    Within computational biology, algorithms are constructed with the aim of extracting knowledge from biological data, in particular, data generated by the large genome projects, where gene and protein sequences are produced in high volume. In this article, we explore new ways of representing protei......-which now are common in proteins-might have emerged from simpler selections, or alphabets, in use earlier during the evolution of living organisms....

  10. Event-related potential indices of congruency sequence effects without feature integration or contingency learning confounds.

    Larson, Michael J; Clayson, Peter E; Kirwan, C Brock; Weissman, Daniel H

    2016-06-01

    The congruency effect in Stroop-like tasks (i.e., increased response time and reduced accuracy in incongruent relative to congruent trials) is often smaller when the previous trial was incongruent as compared to congruent. This congruency sequence effect (CSE) is thought to reflect cognitive control processes that shift attention to the target and/or modulate the response engendered by the distracter differently after incongruent relative to congruent trials. The neural signatures of CSEs are therefore usually attributed to cognitive control processes that minimize distraction from irrelevant stimuli. However, CSEs in previous functional neuroimaging studies were ubiquitously confounded with feature integration and/or contingency learning processes. We therefore investigated whether a neural CSE can be observed without such confounds in a group of healthy young adults (n = 56). To this end, we combined a prime-probe task that lacks such confounds with high-density ERPs to identify, for the first time, the neural time course of confound-minimized CSEs. Replicating recent behavioral findings, we observed strong CSEs in this task for mean response time and mean accuracy. Critically, conceptually replicating prior ERP results from confounded tasks, we also observed a CSE in both the parietal conflict slow potential (conflict SP) and the frontomedial N450. These findings indicate for the first time that neural CSEs as indexed by ERPs can be observed without the typical confounds. More broadly, the present study provides a confound-minimized protocol that will help future researchers to better isolate the neural bases of control processes that minimize distraction from irrelevant stimuli. PMID:26854028

  11. Complete nucleotide sequence of cDNA and deduced amino acid sequence of rat liver catalase.

    Furuta, S.; Hayashi, H; Hijikata, M; Miyazawa, S.; Osumi, T; Hashimoto, T.

    1986-01-01

    We have isolated five cDNA clones for rat liver catalase (hydrogen peroxide:hydrogen peroxide oxidoreductase, EC 1.11.1.6). These clones overlapped with each other and covered the entire length of the mRNA, which had been estimated to be 2.4 kilobases long by blot hybridization analysis of electrophoretically fractionated RNA. Nucleotide sequencing was carried out on these five clones and the composite nucleotide sequence of catalase cDNA was determined. The 5' noncoding region contained 83 b...

  12. Protein chemotaxonomy. XIII. Amino acid sequence of ferredoxin from Panax ginseng.

    Mino, Yoshiki

    2006-08-01

    The complete amino acid sequence of [2Fe-2S] ferredoxin from Panax ginseng (Araliaceae) has been determined by automated Edman degradation of the entire S-carboxymethylcysteinyl protein and of the peptides obtained by enzymatic digestion. This ferredoxin has a unique amino acid sequence, which includes an insertion of Tyr at the 3rd position from the amino-terminus and a deletion of two amino acid residues at the carboxyl terminus. This ferredoxin had 18 differences in its amino acid sequence compared to that of Petroselinum sativum (Umbelliferae). In contrast, 23-33 differences were observed compared to other dicotyledonous plants. This suggests that Panax ginseng is related taxonomically to umbelliferous plants. PMID:16880642

  13. SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features.

    Zhou, Yuan; Zeng, Pan; Li, Yan-Hui; Zhang, Ziding; Cui, Qinghua

    2016-06-01

    N(6)-methyladenosine (m(6)A) is a prevalent RNA methylation modification involved in the regulation of degradation, subcellular localization, splicing and local conformation changes of RNA transcripts. High-throughput experiments have demonstrated that only a small fraction of the m(6)A consensus motifs in mammalian transcriptomes are modified. Therefore, accurate identification of RNA m(6)A sites becomes emergently important. For the above purpose, here a computational predictor of mammalian m(6)A site named SRAMP is established. To depict the sequence context around m(6)A sites, SRAMP combines three random forest classifiers that exploit the positional nucleotide sequence pattern, the K-nearest neighbor information and the position-independent nucleotide pair spectrum features, respectively. SRAMP uses either genomic sequences or cDNA sequences as its input. With either kind of input sequence, SRAMP achieves competitive performance in both cross-validation tests and rigorous independent benchmarking tests. Analyses of the informative features and overrepresented rules extracted from the random forest classifiers demonstrate that nucleotide usage preferences at the distal positions, in addition to those at the proximal positions, contribute to the classification. As a public prediction server, SRAMP is freely available at http://www.cuilab.cn/sramp/. PMID:26896799

  14. Nucleotide sequence of the beta-cyclodextrin glucanotransferase gene of alkalophilic Bacillus sp. strain 1011 and similarity of its amino acid sequence to those of alpha-amylases.

    Kimura, K.; Kataoka, S; Ishii, Y; Takano, T.; Yamane, K

    1987-01-01

    The nucleotide sequence of the gene for cyclodextrin glucanotransferase of alkalophilic Bacillus sp. strain 1011 was determined. The deduced amino acid sequence at the NH2-terminal side of the enzyme showed a high homology with the sequences of alpha-amylase in the three regions which constitutes the active centers of alpha-amylases.

  15. EST sequences and their annotation (amino acid sequence and results of homology search) - Dicty_cDB | LSDB Archive [Life Science Database Archive metadata

    Full Text Available lone covering full-length ORF provided by the National BioResource Project ( http://www.nbrp.jp/ ). The...ein Coding Gene in dictyBase ( http://dictybase.org/ ). The link to dictyBase is provided in the...Dicty_cDB EST sequences and their annotation (amino acid sequence and results of homology search) Data detai...l Data name EST sequences and their annotation (amino acid sequence and results of homology search) Descript...ion of data contents Sequences of cDNA clones of Dictyostelium discoideum and the

  16. Electrochemical features of Pt(S)[n(110) × (100)] surfaces in acidic media

    Souza-Garcia, Janaina; Angelucci, Camilo Andrea; Climent, Víctor; Feliu, Juan M.

    2013-01-01

    Experiments have been carried out in sulfuric and perchloric acid solutions on Pt(S)[n(110) × (100)] electrodes. The comparison between the two different electrolytic media reveals an important influence of the anion in the voltammetric features. Total charge curves have been obtained with the CO charge displacement method in combination with voltammetric measurements. From these curves, the dependence of the pztc with the step density and the strength of the anion adsorption have been analyz...

  17. Introduction of restriction enzyme sites in protein-coding DNA sequences by site-specific mutagenesis not affecting the amino acid sequence: a computer program.

    Arentzen, R; Ripka, W. C.

    1984-01-01

    Structure/function relationship studies of proteins are greatly facilitated by recombinant DNA technology which allows specific amino acid mutations to be made at the DNA sequence level by site-specific mutagenesis employing synthetic oligonucleotides. This technique has been successfully used to alter one or two amino acids in a protein. Replacement of existing DNA sequence coding for several amino acids with new synthetic DNA fragments would be facilitated by the presence of unique restrict...

  18. PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection.

    Huilin Wang

    Full Text Available X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed 'PredPPCrys' using the support vector machine (SVM. Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I. Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II, which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization

  19. Amino acid sequence of phospholipase A/sub 2/-. cap alpha. from the venom of Crotalus adamanteus

    Heinrikson, R.L.; Krueger, E.T.; Keim, P.S.

    1977-07-25

    The complete amino acid sequence of Crotalus adamanteus venom phospholipase A/sub 2/-..cap alpha.. has been determined by analysis of the five tryptic peptides from the citraconylated, reduced, and S-(/sup 14/C)carboxamidomethylated enzyme. Earlier studies provided the information necessary to align the tryptic fragments so that secondary cleavage procedures to establish overlaps were unnecessary. The subunit in the phospholipase A/sub 2/-..cap alpha.. dimer is a single polypeptide chain containing 122 amino acids and seven disulfide bonds. The histidine residue implicated in the active site of mammalian phospholipases is at position 47 in the C. adamanteus enzyme and is located in a domain of the molecule which is highly homologous in sequence with corresponding regions of phospholipases from a variety of venom and pancreatic sources. Comparative sequence analysis has revealed insights with regard to the function and evolution of phospholipases A/sub 2/. Primary structural relationships observed among the snake venom enzymes parallel the phylogenetic classification of the venomous reptiles from which they were derived. It is proposed that phospholipases A/sub 2/ of this general type be divided into two groups depending upon the presence or absence of distinctive structural features elucidated in this study.

  20. FeatureViewer, a BioJS component for visualization of position-based annotations in protein sequences

    Leyla Garcia; Guy Yachdav; Maria-Jesus Martin

    2014-01-01

    Summary: FeatureViewer is a BioJS component that lays out, maps, orients, and renders position-based annotations for protein sequences. This component is highly flexible and customizable, allowing the presentation of annotations by rows, all centered, or distributed in non-overlapping tracks. It uses either lines or shapes for sites and rectangles for regions. The result is a powerful visualization tool that can be easily integrated into web applications as well as documents as it provides an...

  1. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    2010-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence...

  2. Draft genome sequence of the docosahexaenoic acid producing thraustochytrid Aurantiochytrium sp. T66.

    Liu, Bin; Ertesvåg, Helga; Aasen, Inga Marie; Vadstein, Olav; Brautaset, Trygve; Heggeset, Tonje Marita Bjerkan

    2016-06-01

    Thraustochytrids are unicellular, marine protists, and there is a growing industrial interest in these organisms, particularly because some species, including strains belonging to the genus Aurantiochytrium, accumulate high levels of docosahexaenoic acid (DHA). Here, we report the draft genome sequence of Aurantiochytrium sp. T66 (ATCC PRA-276), with a size of 43 Mbp, and 11,683 predicted protein-coding sequences. The data has been deposited at DDBJ/EMBL/Genbank under the accession LNGJ00000000. The genome sequence will contribute new insight into DHA biosynthesis and regulation, providing a basis for metabolic engineering of thraustochytrids. PMID:27222814

  3. New families in the classification of glycosyl hydrolases based on amino acid sequence similarities.

    Henrissat, B; Bairoch, A

    1993-01-01

    301 glycosyl hydrolases and related enzymes corresponding to 39 EC entries of the I.U.B. classification system have been classified into 35 families on the basis of amino-acid-sequence similarities [Henrissat (1991) Biochem. J. 280, 309-316]. Approximately half of the families were found to be monospecific (containing only one EC number), whereas the other half were found to be polyspecific (containing at least two EC numbers). A > 60% increase in sequence data for glycosyl hydrolases (181 additional enzymes or enzyme domains sequences have since become available) allowed us to update the classification not only by the addition of more members to already identified families, but also by the finding of ten new families. On the basis of a comparison of 482 sequences corresponding to 52 EC entries, 45 families, out of which 22 are polyspecific, can now be defined. This classification has been implemented in the SWISS-PROT protein sequence data bank. PMID:8352747

  4. Prediction of GPCR-G Protein Coupling Specificity Using Features of Sequences and Biological Functions

    Toshihide Ono; Haretsugu Hishigaki

    2006-01-01

    Understanding the coupling specificity between G protein-coupled receptors (GPCRs) and specific classes of G proteins is important for further elucidation of receptor functions within a cell. Increasing information on GPCR sequences and the G protein family would facilitate prediction of the coupling properties of GPCRs. In this study, we describe a novel approach for predicting the coupling specificity between GPCRs and G proteins. This method uses not only GPCR sequences but also the functional knowledge generated by natural language processing, and can achieve 92.2% prediction accuracy by using the C4.5 algorithm.Furthermore, rules related to GPCR-G protein coupling are generated. The combination of sequence analysis and text mining improves the prediction accuracy for GPCR-G protein coupling specificity, and also provides clues for understanding GPCR signaling.

  5. A Possible Mechanism of Zika Virus Associated Microcephaly: Imperative Role of Retinoic Acid Response Element (RARE) Consensus Sequence Repeats in the Viral Genome.

    Kumar, Ashutosh; Singh, Himanshu N; Pareek, Vikas; Raza, Khursheed; Dantham, Subrahamanyam; Kumar, Pavan; Mochan, Sankat; Faiq, Muneeb A

    2016-01-01

    Owing to the reports of microcephaly as a consistent outcome in the fetuses of pregnant women infected with ZIKV in Brazil, Zika virus (ZIKV)-microcephaly etiomechanistic relationship has recently been implicated. Researchers, however, are still struggling to establish an embryological basis for this interesting causal handcuff. The present study reveals robust evidence in favor of a plausible ZIKV-microcephaly cause-effect liaison. The rationale is based on: (1) sequence homology between ZIKV genome and the response element of an early neural tube developmental marker "retinoic acid" in human DNA and (2) comprehensive similarities between the details of brain defects in ZIKV-microcephaly and retinoic acid embryopathy. Retinoic acid is considered as the earliest factor for regulating anteroposterior axis of neural tube and positioning of structures in developing brain through retinoic acid response elements (RARE) consensus sequence (5'-AGGTCA-3') in promoter regions of retinoic acid-dependent genes. We screened genomic sequences of already reported virulent ZIKV strains (including those linked to microcephaly) and other viruses available in National Institute of Health genetic sequence database (GenBank) for the RARE consensus repeats and obtained results strongly bolstering our hypothesis that ZIKV strains associated with microcephaly may act through precipitation of dysregulation in retinoic acid-dependent genes by introducing extra stretches of RARE consensus sequence repeats in the genome of developing brain cells. Additional support to our hypothesis comes from our findings that screening of other viruses for RARE consensus sequence repeats is positive only for those known to display neurotropism and cause fetal brain defects (for which maternal-fetal transmission during developing stage may be required). The numbers of RARE sequence repeats appeared to match with the virulence of screened positive viruses. Although, bioinformatic evidence and embryological

  6. Identifying Learning Behaviors by Contextualizing Differential Sequence Mining with Action Features and Performance Evolution

    Kinnebrew, John S.; Biswas, Gautam

    2012-01-01

    Our learning-by-teaching environment, Betty's Brain, captures a wealth of data on students' learning interactions as they teach a virtual agent. This paper extends an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs sequence mining techniques to…

  7. Better prediction of protein contact number using a support vector regression analysis of amino acid sequence

    Yuan Zheng

    2005-10-01

    Full Text Available Abstract Background Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of Cβ atoms in other residues within a sphere around the Cβ atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles, we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either "contacted" or "non-contacted", the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary protein sequence and higher order consecutive protein structural and functional properties.

  8. Complete amino acid sequence of branched-chain amino acid aminotransferase (transaminase B) of Salmonella typhimurium, identification of the coenzyme-binding site and sequence comparison analysis

    The complete amino acid sequence of the subunit of branched-chain amino acid aminotransferase of Salmonella typhimurium was determined by automated Edman degradation of peptide fragments generated by chemical and enzymatic digestion of S-carboxymethylated and S-pyridylethylated transaminase B. Peptide fragments of transaminase B were generated by treatment of the enzyme with trypsin, Staphylococcus aureus V8 protease, endoproteinase Lys-C, and cyanogen bromide. Protocols were developed for separation of the peptide fragments by reverse-phase high performance liquid chromatography (HPLC), ion-exchange HPLC, and SDS-urea gel electrophoresis. The enzyme subunit contains 308 amino acid residues and has a molecular weight of 33,920 daltons. The coenzyme-binding site was determined by treatment of the enzyme, containing bound pyridoxal 5-phosphate, with tritiated sodium borohydride prior to trypsin digestion. Monitoring radioactivity incorporation and peptide map comparisons with an apoenzyme tryptic digest, allowed identification of the pyridoxylated-peptide which was isolated by reverse-phase HPLC and sequenced. The coenzyme-binding site is a lysyl residue at position 159. Some peptides were further characterized by fast atom bombardment mass spectrometry

  9. The human receptor for urokinase plasminogen activator. NH2-terminal amino acid sequence and glycosylation variants

    Behrendt, N; Rønne, E; Ploug, M; Petri, T; Løber, D; Nielsen, L S; Schleuning, W D; Blasi, F; Appella, E; Danø, K

    1990-01-01

    -PA. The purified protein shows a single 55-60 kDa band after sodium dodecyl sulfate-polyacrylamide gel electrophoresis and silver staining. It is a heavily glycosylated protein, the deglycosylated polypeptide chain comprising only 35 kDa. The glycosylated protein contains N-acetyl-D-glucosamine and sialic...... acid, but no N-acetyl-D-galactosamine. Glycosylation is responsible for substantial heterogeneity in the receptor on phorbol ester-stimulated U937 cells, and also for molecular weight variations among various cell lines. The amino acid composition and the NH2-terminal amino acid sequence are reported...

  10. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active...... sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally...... valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects...

  11. Molecular cloning and sequence analysis of cDNA encoding human prostatic acid phosphatase.

    Vihko, P; Virkkunen, P; Henttu, P; Roiko, K; Solin, T; Huhtala, M L

    1988-08-29

    lambda gt11 clones encoding human prostatic acid phosphatase (PAP) (EC 3.1.3.2) were isolated from human prostatic cDNA libraries by immunoscreening with polyclonal antisera. Sequence data obtained from several overlapping clones indicated that the composite cDNAs contained the complete coding region for PAP, which encodes a 354-residue protein with a calculated molecular mass of 41,126 Da. In the 5'-end, the cDNA codes for a signal peptide of 32 amino acids. Direct protein sequencing of the amino-terminus of the mature protein and its proteolytic fragments confirmed the identity of the predicted protein sequence. PAP has no apparent sequence homology to other known proteins. However, both the cDNA clones coding for human placental alkaline phosphatase and PAP have an alu-type repetitive sequence about 900 nucleotides downstream from the coding region in the 3'-untranslated region. Two of our cDNA clones differed from others at the 5'-ends. RNA blot analysis indicated mRNA of 3.3 kb. We are continuing to study whether acid phosphatases form a gene family as do alkaline phosphatases. PMID:2842184

  12. Draft Genome Sequence of Perfluorooctane Acid-Degrading Bacterium Pseudomonas parafulva YAB-1

    Yi, Langbo; Tang, Chongjian; Peng, Qingjing; Peng, Qingzhong; Chai, Liyuan

    2015-01-01

    Pseudomonas parafulva YAB-1, isolated from perfluorinated compound-contaminated soil, has the ability to degrade perfluorooctane acid (PFOA) compound. Here, we report the draft genome sequence and annotation of the PFOA-degrading bacterium P. parafulva YAB-1. The data provide the basis to investigate the molecular mechanism of PFOA metabolism.

  13. Draft Genome Sequence of Perfluorooctane Acid-Degrading Bacterium Pseudomonas parafulva YAB-1

    Tang, Chongjian; Peng, Qingjing; Peng, Qingzhong

    2015-01-01

    Pseudomonas parafulva YAB-1, isolated from perfluorinated compound-contaminated soil, has the ability to degrade perfluorooctane acid (PFOA) compound. Here, we report the draft genome sequence and annotation of the PFOA-degrading bacterium P. parafulva YAB-1. The data provide the basis to investigate the molecular mechanism of PFOA metabolism. PMID:26337877

  14. Coding Local and Global Binary Visual Features Extracted From Video Sequences.

    Baroffio, Luca; Canclini, Antonio; Cesana, Matteo; Redondi, Alessandro; Tagliasacchi, Marco; Tubaro, Stefano

    2015-11-01

    Binary local features represent an effective alternative to real-valued descriptors, leading to comparable results for many visual analysis tasks while being characterized by significantly lower computational complexity and memory requirements. When dealing with large collections, a more compact representation based on global features is often preferred, which can be obtained from local features by means of, e.g., the bag-of-visual word model. Several applications, including, for example, visual sensor networks and mobile augmented reality, require visual features to be transmitted over a bandwidth-limited network, thus calling for coding techniques that aim at reducing the required bit budget while attaining a target level of efficiency. In this paper, we investigate a coding scheme tailored to both local and global binary features, which aims at exploiting both spatial and temporal redundancy by means of intra- and inter-frame coding. In this respect, the proposed coding scheme can conveniently be adopted to support the analyze-then-compress (ATC) paradigm. That is, visual features are extracted from the acquired content, encoded at remote nodes, and finally transmitted to a central controller that performs the visual analysis. This is in contrast with the traditional approach, in which visual content is acquired at a node, compressed and then sent to a central unit for further processing, according to the compress-then-analyze (CTA) paradigm. In this paper, we experimentally compare the ATC and the CTA by means of rate-efficiency curves in the context of two different visual analysis tasks: 1) homography estimation and 2) content-based retrieval. Our results show that the novel ATC paradigm based on the proposed coding primitives can be competitive with the CTA, especially in bandwidth limited scenarios. PMID:26080384

  15. Unique Features of a Japanese ‘Candidatus Liberibacter asiaticus’ Strain Revealed by Whole Genome Sequencing

    Hiroshi Katoh; Shin-Ichi Miyata; Hiromitsu Inoue; Toru Iwanami

    2014-01-01

    Citrus greening (huanglongbing) is the most destructive disease of citrus worldwide. It is spread by citrus psyllids and is associated with phloem-limited bacteria of three species of α-Proteobacteria, namely, 'Candidatus Liberibacter asiaticus', 'Ca. L. americanus', and 'Ca. L. africanus'. Recent findings suggested that some Japanese strains lack the bacteriophage-type DNA polymerase region (DNA pol), in contrast to the Floridian psy62 strain. The whole genome sequence of the pol-negative 'C...

  16. Characterization of soybean genomic features by analysis of its expressed sequence tags

    Tian, Ai-Guo; Wang, Jun; Cui, Peng;

    2004-01-01

    to be fast-evolving. Soybean unigenes with no match to genes within the Arabidopsis genome were identified as soybean-specific genes. These genes were mainly involved in nodule development and the synthesis of seed storage proteins. In addition, we also identified 61 genes regulated by salicylic acid, 1...

  17. A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences

    B.Umamageswari

    2012-04-01

    Full Text Available Large-scale analysis of genome sequences is in progress around the world, the major application ofwhich is to establish the evolutionary relationship among the species using phylogenetic trees.Hierarchical agglomerative algorithms can be used to generate such phylogenetic trees given thedistance matrix representing the dissimilarity among the species. ClustalW and Muscle are twogeneral purpose programs that generates distance matrix from the input DNA or protein sequences.The limitation of these programs is that they are based on Smith-Waterman algorithm which usesdynamic programming for doing the pair-wise alignment. This is an extremely time consuming processand the existing systems may even fail to work for larger input data set. To overcome this limitation,we have used the frequency of codons usage as an approximation to find dissimilarity amongspecies. The proposed technique further reduces the complexity by extracting only the significantfeatures of the species from the mtDNA sequences using the techniques like frequent codons, codonswith maximum range value or PCA technique. We have observed that the proposed system producesnearly accurate results in a significantly reduced running time.

  18. Use of a structural alphabet to find compatible folds for amino acid sequences.

    Mahajan, Swapnil; de Brevern, Alexandre G; Sanejouand, Yves-Henri; Srinivasan, Narayanaswamy; Offmann, Bernard

    2015-01-01

    The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence-search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino-acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as "Protein Blocks" (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence-search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z-score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales-up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web-server that is freely available at http://www.bo-protscience.fr/forsa. PMID:25297700

  19. Amino acid sequences and structures of chicken and turkey beta 2-microglobulin

    Welinder, K G; Jespersen, H M; Walther-Rasmussen, J; Skjødt, K

    The complete amino acid sequences of chicken and turkey beta 2-microglobulins have been determined by analyses of tryptic, V8-proteolytic and cyanogen bromide fragments, and by N-terminal sequencing. Mass spectrometric analysis of chicken beta 2-microglobulin supports the sequence-derived Mr of 11......,048. The higher apparent Mr obtained for the avian beta 2-microglobulins as compared to human beta 2-microglobulin by SDS-PAGE is not understood. Chicken and turkey beta 2-microglobulin consist of 98 residues and deviate at seven positions: 60, 66, 74-76, 78 and 82. The chicken and turkey sequences are...... complex suggest that the seven chicken to turkey differences are exposed to solvent in the avian MHC class I complex. The key residues of beta 2-microglobulin involved in alpha chain contacts within the MHC class I molecule are highly conserved between chicken and man. This explains that heterologous...

  20. Software scripts for quality checking of high-throughput nucleic acid sequencers.

    Lazo, G R; Tong, J; Miller, R; Hsia, C; Rausch, C; Kang, Y; Anderson, O D

    2001-06-01

    We have developed a graphical interface to allow the researcher to view and assess the quality of sequencing results using a series of program scripts developed to process data generated by automated sequencers. The scripts are written in Perl programming language and are executable under the cgibin directory of a Web server environment. The scripts direct nucleic acid sequencing trace file data output from automated sequencers to be analyzed by the phred molecular biology program and are displayed as graphical hypertext mark-up language (HTML) pages. The scripts are mainly designed to handle 96-well microtiter dish samples, but the scripts are also able to read data from 384-well microtiter dishes 96 samples at a time. The scripts may be customized for different laboratory environments and computer configurations. Web links to the sources and discussion page are provided. PMID:11414222

  1. ProViz-a web-based visualization tool to investigate the functional and evolutionary features of protein sequences.

    Jehl, Peter; Manguy, Jean; Shields, Denis C; Higgins, Desmond G; Davey, Norman E

    2016-07-01

    Low-throughput experiments and high-throughput proteomic and genomic analyses have created enormous quantities of data that can be used to explore protein function and evolution. The ability to consolidate these data into an informative and intuitive format is vital to our capacity to comprehend these distinct but complementary sources of information. However, existing tools to visualize protein-related data are restricted by their presentation, sources of information, functionality or accessibility. We introduce ProViz, a powerful browser-based tool to aid biologists in building hypotheses and designing experiments by simplifying the analysis of functional and evolutionary features of proteins. Feature information is retrieved in an automated manner from resources describing protein modular architecture, post-translational modification, structure, sequence variation and experimental characterization of functional regions. These features are mapped to evolutionary information from precomputed multiple sequence alignments. Data are displayed in an interactive and information-rich yet intuitive visualization, accessible through a simple protein search interface. This allows users with limited bioinformatic skills to rapidly access data pertinent to their research. Visualizations can be further customized with user-defined data either manually or using a REST API. ProViz is available at http://proviz.ucd.ie/. PMID:27085803

  2. Face Recognition from Still Images to Video Sequences: A Local-Feature-Based Framework

    Shaokang Chen; Sandra Mau; Harandi, Mehrtash T.; Conrad Sanderson; Abbas Bigdeli; Lovell, Brian C.

    2011-01-01

    Although automatic faces recognition has shown success for high-quality images under controlled conditions, for video-based recognition it is hard to attain similar levels of performance. We describe in this paper recent advances in a project being undertaken to trial and develop advanced surveillance systems for public safety. In this paper, we propose a local facial feature based framework for both still image and video-based face recognition. The evaluation is performed on a still image d...

  3. Unique Features of a Japanese ‘Candidatus Liberibacter asiaticus’ Strain Revealed by Whole Genome Sequencing

    Katoh, Hiroshi; Miyata, Shin-ichi; Inoue, Hiromitsu; Iwanami, Toru

    2014-01-01

    Citrus greening (huanglongbing) is the most destructive disease of citrus worldwide. It is spread by citrus psyllids and is associated with phloem-limited bacteria of three species of α-Proteobacteria, namely, ‘Candidatus Liberibacter asiaticus’, ‘Ca. L. americanus’, and ‘Ca. L. africanus’. Recent findings suggested that some Japanese strains lack the bacteriophage-type DNA polymerase region (DNA pol), in contrast to the Floridian psy62 strain. The whole genome sequence of the pol-negative ‘C...

  4. Sequence Features of Drosha and Dicer Cleavage Sites Affect the Complexity of IsomiRs

    Julia Starega-Roslan; Witkos, Tomasz M.; Paulina Galka-Marciniak; Krzyzosiak, Wlodzimierz J.

    2015-01-01

    The deep-sequencing of small RNAs has revealed that different numbers and proportions of miRNA variants called isomiRs are formed from single miRNA genes and that this effect is attributable mainly to imprecise cleavage by Drosha and Dicer. Factors that influence the degree of cleavage precision of Drosha and Dicer are under investigation, and their identification may improve our understanding of the mechanisms by which cells modulate the regulatory potential of miRNAs. In this study, we focu...

  5. Nucleotide sequence analysis of the gene encoding the Deinococcus radiodurans surface protein, derived amino acid sequence, and complementary protein chemical studies

    Peters, J.; Peters, M.; Lottspeich, F.; Schaefer, W.; Baumeister, W.

    1987-11-01

    The complete nucleotide sequence of the gene encoding the surface (hexagonally packed intermediate (HPI))-layer polypeptide of Deinococcus radiodurans Sark was determined and found to encode a polypeptide of 1036 amino acids. Amino acid sequence analysis of about 30% of the residues revealed that the mature polypeptide consists of at least 978 amino acids. The N terminus was blocked to Edman degradation. The results of proteolytic modification of the HPI layer in situ and M/sub r/ estimations of the HPI polypeptide expressed in Escherichia coli indicated that there is a leader sequence. The N-terminal region contained a very high percentage (29%) of threonine and serine, including a cluster of nine consecutive serine or threonine residues, whereas a stretch near the C terminus was extremely rich in aromatic amino acids (29%). The protein contained at least two disulfide bridges, as well as tightly bound reducing sugars and fatty acids.

  6. Nucleotide sequence analysis of the gene encoding the Deinococcus radiodurans surface protein, derived amino acid sequence, and complementary protein chemical studies

    The complete nucleotide sequence of the gene encoding the surface (hexagonally packed intermediate [HPI])-layer polypeptide of Deinococcus radiodurans Sark was determined and found to encode a polypeptide of 1036 amino acids. Amino acid sequence analysis of about 30% of the residues revealed that the mature polypeptide consists of at least 978 amino acids. The N terminus was blocked to Edman degradation. The results of proteolytic modification of the HPI layer in situ and M/sub r/ estimations of the HPI polypeptide expressed in Escherichia coli indicated that there is a leader sequence. The N-terminal region contained a very high percentage (29%) of threonine and serine, including a cluster of nine consecutive serine or threonine residues, whereas a stretch near the C terminus was extremely rich in aromatic amino acids (29%). The protein contained at least two disulfide bridges, as well as tightly bound reducing sugars and fatty acids

  7. Efficient Nucleic Acid Extraction and 16S rRNA Gene Sequencing for Bacterial Community Characterization.

    Anahtar, Melis N; Bowman, Brittany A; Kwon, Douglas S

    2016-01-01

    There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information. PMID:27168460

  8. Global features of sequences of bacterial chromosomes, plasmids and phages revealed by analysis of oligonucleotide usage patterns

    Tümmler Burkhard

    2004-07-01

    Full Text Available Abstract Background Oligonucleotide frequencies were shown to be conserved signatures for bacterial genomes, however, the underlying constraints have yet not been resolved in detail. In this paper we analyzed oligonucleotide usage (OU biases in a comprehensive collection of 155 completely sequenced bacterial chromosomes, 316 plasmids and 104 phages. Results Two global features were analyzed: pattern skew (PS and variance of OU deviations normalized by mononucleotide content of the sequence (OUV. OUV reflects the strength of OU biases and taxonomic signals. PS denotes asymmetry of OU in direct and reverse DNA strands. A trend towards minimal PS was observed for almost all complete sequences of bacterial chromosomes and plasmids, however, PS was substantially higher in separate genomic loci and several types of plasmids and phages characterized by long stretches of non-coding DNA and/or asymmetric gene distribution on the two DNA strands. Five of the 155 bacterial chromosomes have anomalously high PS, of which the chromosomes of Xylella fastidiosa 9a5c and Prochlorococcus marinus MIT9313 exhibit extreme PS values suggesting an intermediate unstable state of these two genomes. Conclusions Strand symmetry as indicated by minimal PS is a universally conserved feature of complete bacterial genomes that results from the matching mutual compensation of local OU biases on both replichors while OUV is more a taxon specific feature. Local events such as inversions or the incorporation of genome islands are balanced by global changes in genome organization to minimize PS that may represent one of the leading evolutionary forces driving bacterial genome diversification.

  9. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    Patel, Kamlesh D [Ken; SNL,

    2012-06-01

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  10. Complete nucleic acid sequence of Penaeus stylirostris densovirus (PstDNV) from India.

    Rai, Praveen; Safeena, Muhammed P; Karunasagar, Iddya; Karunasagar, Indrani

    2011-06-01

    Infectious hypodermal and hematopoietic necrosis virus (IHHNV) of shrimp, recently been classified as Penaeus stylirostris densovirus (PstDNV). The complete nucleic acid sequence of PstDNV from India was obtained by cloning and sequencing of different DNA fragment of the virus. The genome organisation of PstDNV revealed that there were three major coding domains: a left ORF (NS1) of 2001 bp, a mid ORF (NS2) of 1092 bp and a right ORF (VP) of 990 bp. The complete genome and amino acid sequences of three proteins viz., NS1, NS2 and VP were compared with the genomes of the virus reported from Hawaii, China and Mexico and with partial sequence available from isolates from different regions. The phylogenetic analysis of shrimp, insect and vertebrate parvovirus sequences showed that the Indian PstDNV isolate is phylogenetically more closely related to one of the three isolates from Taiwan (AY355307), and two isolates (AY362547 and AY102034) from Thailand. PMID:21402111

  11. Human liver type pyruvate kinase: Complete amino acid sequence and the expression in mammalian cells

    Pyruvate kinase (PK) has four isozymes (L, R, M1, M2) that are encoded by two different genes. Among these isozymes, abnormalities of liver (L)-type PK is considered to be associated with hereditary nonspherocytic hemolytic anemia in humans. The authors isolated and determined the full-length sequence of human L-type PK cDNA. The cDNA contains 1,629 base pairs encoding 543 amino acids, 68 base pairs of 5'-noncoding sequence, and 734 base pairs of 3'-noncoding sequence. The similarity between human and rat L-type PK was 86.9% at the nucleotide sequence level and 92.4% at the amino acid sequence level. The full-length L-type PK cDNA was placed under the promoter of simian virus 40 and introduced into monkey COS cells. Human L-type PK activity was detected in the extract of COS cells by the classical PK electrophoresis method

  12. Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles.

    Rodrigue, Nicolas; Philippe, Hervé; Lartillot, Nicolas

    2010-03-01

    Modeling the interplay between mutation and selection at the molecular level is key to evolutionary studies. To this end, codon-based evolutionary models have been proposed as pertinent means of studying long-range evolutionary patterns and are widely used. However, these approaches have not yet consolidated results from amino acid level phylogenetic studies showing that selection acting on proteins displays strong site-specific effects, which translate into heterogeneous amino acid propensities across the columns of alignments; related codon-level studies have instead focused on either modeling a single selective context for all codon columns, or a separate selective context for each codon column, with the former strategy deemed too simplistic and the latter deemed overparameterized. Here, we integrate recent developments in nonparametric statistical approaches to propose a probabilistic model that accounts for the heterogeneity of amino acid fitness profiles across the coding positions of a gene. We apply the model to a dozen real protein-coding gene alignments and find it to produce biologically plausible inferences, for instance, as pertaining to site-specific amino acid constraints, as well as distributions of scaled selection coefficients. In their account of mutational features as well as the heterogeneous regimes of selection at the amino acid level, the modeling approaches studied here can form a backdrop for several extensions, accounting for other selective features, for variable population size, or for subtleties of mutational features, all with parameterizations couched within population-genetic theory. PMID:20176949

  13. Emotion Recognition based on 2D-3D Facial Feature Extraction from Color Image Sequences

    Robert Niese

    2010-10-01

    Full Text Available Normal 0 21 false false false DE X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Normale Tabelle"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} In modern human computer interaction systems, emotion recognition from video is becoming an imperative feature. In this work we propose a new method for automatic recognition of facial expressions related to categories of basic emotions from image data. Our method incorporates a series of image processing, low level 3D computer vision and pattern recognition techniques. For image feature extraction, color and gradient information is used. Further, in terms of 3D processing, camera models are applied along with an initial registration step, in which person specific face models are automatically built from stereo. Based on these face models, geometric feature measures are computed and normalized using photogrammetric techniques. For recognition this normalization leads to minimal mixing between different emotion classes, which are determined with an artificial neural network classifier. Our framework achieves robust and superior classification results, also across a variety of head poses with resulting perspective foreshortening and changing face size. Results are presented for domestic and publicly available databases.

  14. MRI Sequence and Characteristic Features in ‘Giant Cell Tumor’ of Clivus

    Mahale, Ajit; K.V.N, Dhananjaya; Pai, Muralidhar; Poornima, Vinaya; Sahu, Kausalya Kumari

    2013-01-01

    Giant cell tumours of the clivus are rare. These tumours present in the second and third decades of life and they are slightly more frequent in women than in men. We are presenting a case of a 20 years young patient who came with the complaints of headache, retro-orbital pain and recurrent transient bleeding from the nose since two and half months. MRI of the brain with contrast was done and its features were suggestive of a Giant cell tumour of the clivus. A transnasal endoscopic biopsy was ...

  15. Specific features of sulfuric acid leaching-out of lanthanides from phosphosemihydrate

    Effect of increase in the temperature of synthesis of hydrated lanthanide orthophosphates in the range of 75-95 deg C on the degree of their chemical activity and, accordingly, on specific features of their leaching with sulfuric acid was studied. It has been ascertained that during thermal treatment of humid lanthanide orthophosphates LnPO4·(1-1.5)H2O (Ln = La, Ce or Nd) in the presence of 70 wt. % H3PO4 (their formation conditions in the course of semihydrate decomposition of apatite concentrate) increase in thermal treatment temperature from 75 to 95 deg C decreases their solubility in sulfuric acid solution by a factor of 3-4, in all probability due to reduction of their hydration degree. Therefore, insignificant increase in the temperature of semihydrate process of phosphoric acid decomposition of the Khibiny apatite can result in a noticeable decrease in chemical activity of hydrated lanthanides orthophosphates. The restoration mechanism of the lanthanide compounds chemical activity was considered

  16. Nucleotide and amino acid sequences of human intestinal alkaline phosphatase: close homology to placental alkaline phosphatase

    A cDNA clone for human adult intestinal alkaline phosphatase (ALP) [orthophosphoric-monoester phosphohydrolase (alkaline optimum); EC 3.1.3.1] was isolated from a λgt11 expression library. The cDNA insert of this clone is 2513 base pairs in length and contains an open reading frame that encodes a 528-amino acid polypeptide. This deduced polypeptide contains the first 40 amino acids of human intestinal ALP, as determined by direct protein sequencing. Intestinal ALP shows 86.5% amino acid identity to placental (type 1) ALP and 56.6% amino acid identity to liver/bone/kidney ALP. In the 3'-untranslated regions, intestinal and placental ALP cDNAs are 73.5% identical (excluding gaps). The evolution of this multigene enzyme family is discussed

  17. Amino acid sequence of the beta subunit of bovine lung casein kinase II.

    Takio, K.; Kuenzel, E A; Walsh, K. A.; Krebs, E G

    1987-01-01

    The amino acid sequence of the 209-residue beta subunit of bovine lung casein kinase II has been determined. Excluding the amino-terminal blocking group, which was not identified, the molecular weight of the polypeptide chain is 24,239. A marked polarity of the beta subunit is indicated by clusters of negative charges in the amino-terminal region and of positive charges in the carboxyl-terminal region. Whereas the beta subunit shows no homology with any known protein, a segment of the sequenc...

  18. Human apolipoprotein C-II: complete nucleic acid sequence of preapolipoprotein C-II.

    Fojo, S S; Law, S W; Brewer, H B

    1984-01-01

    Apolipoprotein (apo) C-II is a cofactor for lipoprotein lipase, the enzyme that catalyzes the hydrolysis of triglycerides on plasma triglyceride-rich lipoproteins. The complete coding sequence of apoC-II mRNA has been determined from an apoC-II clone isolated from a human liver cDNA library. A 17-base-long synthetic oligonucleotide based on amino acid residues 5-10 of apoC-II was utilized as a hybridization probe to select recombinant plasmids containing the apoC-II sequence. Two thousand fou...

  19. MRI sequence and characteristic features in 'giant cell tumor' of clivus.

    Mahale, Ajit; K V N, Dhananjaya; Pai, Muralidhar; Poornima, Vinaya; Sahu, Kausalya Kumari

    2013-06-01

    Giant cell tumours of the clivus are rare. These tumours present in the second and third decades of life and they are slightly more frequent in women than in men. We are presenting a case of a 20 years young patient who came with the complaints of headache, retro-orbital pain and recurrent transient bleeding from the nose since two and half months. MRI of the brain with contrast was done and its features were suggestive of a Giant cell tumour of the clivus. A transnasal endoscopic biopsy was done under general anaesthesia and the histopathology report suggested that the features were of a giant cell tumour. Excision of the mass was done by Transnasal endoscopy. Post operatively, the patient did not recover from the lateral rectus palsy which was there on the right side. The patient was discharged with an advice of a follow up and radiotherapy. Radiation therapy and chemotherapy may be effective as adjuvant treatments. Even though a recurrence usually occurs within 4 years of the initial treatment, these patients will need to be carefully followed for the remainder of their lives. PMID:23905141

  20. Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features

    Li, Yuan; Wang, Mingjun; Wang, Huilin; Tan, Hao; Zhang, Ziding; Webb, Geoffrey I.; Song, Jiangning

    2014-07-01

    Lysine acetylation is a reversible post-translational modification, playing an important role in cytokine signaling, transcriptional regulation, and apoptosis. To fully understand acetylation mechanisms, identification of substrates and specific acetylation sites is crucial. Experimental identification is often time-consuming and expensive. Alternative bioinformatics methods are cost-effective and can be used in a high-throughput manner to generate relatively precise predictions. Here we develop a method termed as SSPKA for species-specific lysine acetylation prediction, using random forest classifiers that combine sequence-derived and functional features with two-step feature selection. Feature importance analysis indicates functional features, applied for lysine acetylation site prediction for the first time, significantly improve the predictive performance. We apply the SSPKA model to screen the entire human proteome and identify many high-confidence putative substrates that are not previously identified. The results along with the implemented Java tool, serve as useful resources to elucidate the mechanism of lysine acetylation and facilitate hypothesis-driven experimental design and validation.

  1. Structural and Morphological Features of Acid-Bearing Polymers for PEM Fuel Cells

    Yang, Yunsong; Siu, Ana; Peckham, Timothy J.;

    2008-01-01

    Chemical structure, polymer microstructure, sequence distribution, and morphology of acid-bearing polymers are important factors in the design of polymer electrolyte membranes (PEMs) for fuel cells. The roles of ion aggregation and phase separation in vinylic- and aromatic-based polymers in proton...... conductivity and water transport are described. The formation, dimensions, and connectivity of ionic pathways are consistently found to play an important role in determining the physicochemical properties of PEMs. For polymers that possess low water content, phase separation and ionic channel formation...... significantly enhance the transport of water and protons. For membranes that contain a high content of water, phase separation is less influential. Continuity of ionic aggregates is influential on the diffusion of water and electroosmotic drag within a membrane. A balance of these properties must be considered...

  2. Sequence-specific nucleic acid detection from binary pore conductance measurement

    Esfandiari, Leyla; Monbouquette, Harold G.; Jacob J. Schmidt

    2012-01-01

    We describe a platform for sequence-specific nucleic acid (NA) detection utilizing a micropipette tapered to a 2 μm diameter pore and 3 μm diameter polystyrene beads to which uncharged peptide nucleic acid (PNA) probe molecules have been conjugated. As the target NAs hybridize to the complementary PNA-beads, the beads acquire negative charge and become electrophoretically mobile. An applied electric field guides these NA-PNA-beads toward the pipette tip, which they obstruct, leading to an ind...

  3. Nucleotide sequence homology between the heat-labile enterotoxin gene of Escherichia coli and Vibrio cholerae deoxyribonucleic acid.

    Moseley, S L; Falkow, S

    1980-01-01

    Isolated deoxyribonucleic acid fragments encoding the heat-labile enterotoxin of Escherichia coli were used to probe for homologous sequences in restricted whole-cell deoxyribonucleic acid from Vibrio cholerae. Significant sequence homology between the heat-labile enterotoxin gene and V. cholerae deoxyribonucleic acid was demonstrated, and apparent differences were observed in the organization of the cholera toxin gene among different strains of V. cholerae.

  4. Self-Sequencing of Amino Acids and Origins of Polyfunctional Protocells

    Fox, Sidney W.

    1984-12-01

    The primal role of the origins of proteins in molecular evolution is discussed. On the basis of this premise, the significance of the experimentally established self-sequencing of amino acids under simulated geological conditions is explained as due to the fact that the products are highly nonrandom and accordingly contain many kinds of information. When such thermal proteins are aggregated into laboratory protocells, an action that occurs readily, the resultant protocells also contain many kinds of information. Residue-by-residue order, enzymic activities, and lipid quality accordingly occur within each preparation of proteinoid (thermal protein). In this paper are reviewed briefly the phenomenon of self-sequencing of amino acids, its relationship to evolutionary processes, other significance of such self-ordering, and the experimental evidence for original polyfunctional protocells.

  5. The complete chloroplast genome sequence of Aster spathulifolius (Asteraceae); genomic features and relationship with Asteraceae.

    Choi, Kyoung Su; Park, SeonJoo

    2015-11-10

    Aster spathulifolius, a member of the Asteraceae family, is distributed along the coast of Japan and Korea. This plant is used for medicinal and ornamental purposes. The complete chloroplast (cp) genome of A. sphathulifolius consists of 149,473 bp that include a pair of inverted repeats of 24,751 bp separated by a large single copy region of 81,998 bp and a small single copy region of 17,973 bp. The chloroplast genome contains 78 coding genes, four rRNA genes and 29 tRNA genes. When compared to other cpDNA sequences of Asteraceae, A. spathulifolius showed the closest relationship with Jacobaea vulgaris, and its atpB gene was found to be a pseudogene, unlike J. vulgaris. Furthermore, evaluation of the gene compositions of J. vulgaris, Helianthus annuus, Guizotia abyssinica and A. spathulifolius revealed that 13.6-kb showed inversion from ndhF to rps15, unlike Lactuca of Asteraceae. Comparison of the synonymous (Ks) and nonsynonymous (Ka) substitution rates with J. vulgaris revealed that synonymous genes related to a small subunit of the ribosome showed the highest value (0.1558), while nonsynonymous rates of genes related to ATP synthase genes were highest (0.0118). These findings revealed that substitution has occurred at similar rates in most genes, and the substitution rates suggested that most genes is a purified selection. PMID:26164759

  6. Unique features of a Japanese 'Candidatus Liberibacter asiaticus' strain revealed by whole genome sequencing.

    Hiroshi Katoh

    Full Text Available Citrus greening (huanglongbing is the most destructive disease of citrus worldwide. It is spread by citrus psyllids and is associated with phloem-limited bacteria of three species of α-Proteobacteria, namely, 'Candidatus Liberibacter asiaticus', 'Ca. L. americanus', and 'Ca. L. africanus'. Recent findings suggested that some Japanese strains lack the bacteriophage-type DNA polymerase region (DNA pol, in contrast to the Floridian psy62 strain. The whole genome sequence of the pol-negative 'Ca. L. asiaticus' Japanese isolate Ishi-1 was determined by metagenomic analysis of DNA extracted from 'Ca. L. asiaticus'-infected psyllids and leaf midribs. The 1.19-Mb genome has an average 36.32% GC content. Annotation revealed 13 operons encoding rRNA and 44 tRNA genes, but no typical bacterial pathogenesis-related genes were located within the genome, similar to the Floridian psy62 and Chinese gxpsy. In contrast to other 'Ca. L. asiaticus' strains, the genome of the Japanese Ishi-1 strain lacks a prophage-related region.

  7. New features in the genus Ilarvirus revealed by the nucleotide sequence of Fragaria chiloensis latent virus.

    Tzanetakis, Ioannis E; Martin, Robert R

    2005-09-01

    Fragaria chiloensis latent virus (FClLV), a member of the genus Ilarvirus was first identified in the early 1990s. Double-stranded RNA was extracted from FClLV infected plants and cloned. The complete nucleotide sequence of the virus has been elucidated. RNA 1 encodes a protein with methyltransferase and helicase enzymatic motifs while RNA 2 encodes the viral RNA dependent RNA polymerase and an ORF, that shares no homology with other Ilarvirus genes. RNA 3 codes for movement and coat proteins and an additional ORF, making FClLV possibly the first Ilarvirus encoding a third protein in RNA 3. Phylogenetic analysis reveals that FClLV is most closely related to Prune dwarf virus, the type member of subgroup 4 of the Ilarvirus genus. FClLV is also closely related to Alfalfa mosaic virus (AlMV), a virus that shares many properties with Ilarviruses . We propose the reclassification of AlMV as a member of the Ilarvirus genus instead of being a member of a distinct genus. PMID:15878214

  8. A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data.

    Yang, Sheng; Guo, Li; Shao, Fang; Zhao, Yang; Chen, Feng

    2015-01-01

    Sequencing is widely used to discover associations between microRNAs (miRNAs) and diseases. However, the negative binomial distribution (NB) and high dimensionality of data obtained using sequencing can lead to low-power results and low reproducibility. Several statistical learning algorithms have been proposed to address sequencing data, and although evaluation of these methods is essential, such studies are relatively rare. The performance of seven feature selection (FS) algorithms, including baySeq, DESeq, edgeR, the rank sum test, lasso, particle swarm optimistic decision tree, and random forest (RF), was compared by simulation under different conditions based on the difference of the mean, the dispersion parameter of the NB, and the signal to noise ratio. Real data were used to evaluate the performance of RF, logistic regression, and support vector machine. Based on the simulation and real data, we discuss the behaviour of the FS and classification algorithms. The Apriori algorithm identified frequent item sets (mir-133a, mir-133b, mir-183, mir-937, and mir-96) from among the deregulated miRNAs of six datasets from The Cancer Genomics Atlas. Taking these findings altogether and considering computational memory requirements, we propose a strategy that combines edgeR and DESeq for large sample sizes. PMID:26508990

  9. A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data

    Sheng Yang

    2015-01-01

    Full Text Available Sequencing is widely used to discover associations between microRNAs (miRNAs and diseases. However, the negative binomial distribution (NB and high dimensionality of data obtained using sequencing can lead to low-power results and low reproducibility. Several statistical learning algorithms have been proposed to address sequencing data, and although evaluation of these methods is essential, such studies are relatively rare. The performance of seven feature selection (FS algorithms, including baySeq, DESeq, edgeR, the rank sum test, lasso, particle swarm optimistic decision tree, and random forest (RF, was compared by simulation under different conditions based on the difference of the mean, the dispersion parameter of the NB, and the signal to noise ratio. Real data were used to evaluate the performance of RF, logistic regression, and support vector machine. Based on the simulation and real data, we discuss the behaviour of the FS and classification algorithms. The Apriori algorithm identified frequent item sets (mir-133a, mir-133b, mir-183, mir-937, and mir-96 from among the deregulated miRNAs of six datasets from The Cancer Genomics Atlas. Taking these findings altogether and considering computational memory requirements, we propose a strategy that combines edgeR and DESeq for large sample sizes.

  10. Sequence-selective targeting of duplex DNA by peptide nucleic acids

    Nielsen, Peter E

    2010-01-01

    nucleic acid (PNA) can recognize duplex DNA with high sequence specificity and affinity in triplex, duplex and double-duplex invasive modes or non-invasive triplex modes. Novel PNA modification has improved the affinity for DNA recognition via duplex invasion, double-duplex invasion and triplex......Sequence-selective gene targeting constitutes an attractive drug-discovery approach for genetic therapy, with the aim of reducing or enhancing the activity of specific genes at the transcriptional level, or as part of a methodology for targeted gene repair. The pseudopeptide DNA mimic peptide...... recognition considerably. Such modifications have also resulted in new approaches to targeted gene repair and sequence-selective double-strand cleavage of genomic DNA....

  11. Amino acid sequences mediating vascular cell adhesion molecule 1 binding to integrin alpha 4: homologous DSP sequence found for JC polyoma VP1 coat protein

    Michael Andrew Meyer

    2013-07-01

    Full Text Available The JC polyoma viral coat protein VP1 was analyzed for amino acid sequences homologies to the IDSP sequence which mediates binding of VLA-4 (integrin alpha 4 to vascular cell adhesion molecule 1. Although the full sequence was not found, a DSP sequence was located near the critical arginine residue linked to infectivity of the virus and binding to sialic acid containing molecules such as integrins (3. For the JC polyoma virus, a DSP sequence was found at residues 70, 71 and 72 with homology also noted for the mouse polyoma virus and SV40 virus. Three dimensional modeling of the VP1 molecule suggests that the DSP loop has an accessible site for interaction from the external side of the assembled viral capsid pentamer.

  12. Random amino acid mutations and protein misfolding lead to Shannon limit in sequence-structure communication.

    Andreas Martin Lisewski

    Full Text Available The transmission of genomic information from coding sequence to protein structure during protein synthesis is subject to stochastic errors. To analyze transmission limits in the presence of spurious errors, Shannon's noisy channel theorem is applied to a communication channel between amino acid sequences and their structures established from a large-scale statistical analysis of protein atomic coordinates. While Shannon's theorem confirms that in close to native conformations information is transmitted with limited error probability, additional random errors in sequence (amino acid substitutions and in structure (structural defects trigger a decrease in communication capacity toward a Shannon limit at 0.010 bits per amino acid symbol at which communication breaks down. In several controls, simulated error rates above a critical threshold and models of unfolded structures always produce capacities below this limiting value. Thus an essential biological system can be realistically modeled as a digital communication channel that is (a sensitive to random errors and (b restricted by a Shannon error limit. This forms a novel basis for predictions consistent with observed rates of defective ribosomal products during protein synthesis, and with the estimated excess of mutual information in protein contact potentials.

  13. Characterization of the microbial acid mine drainage microbial community using culturing and direct sequencing techniques.

    Auld, Ryan R; Myre, Maxine; Mykytczuk, Nadia C S; Leduc, Leo G; Merritt, Thomas J S

    2013-05-01

    We characterized the bacterial community from an AMD tailings pond using both classical culturing and modern direct sequencing techniques and compared the two methods. Acid mine drainage (AMD) is produced by the environmental and microbial oxidation of minerals dissolved from mining waste. Surprisingly, we know little about the microbial communities associated with AMD, despite the fundamental ecological roles of these organisms and large-scale economic impact of these waste sites. AMD microbial communities have classically been characterized by laboratory culturing-based techniques and more recently by direct sequencing of marker gene sequences, primarily the 16S rRNA gene. In our comparison of the techniques, we find that their results are complementary, overall indicating very similar community structure with similar dominant species, but with each method identifying some species that were missed by the other. We were able to culture the majority of species that our direct sequencing results indicated were present, primarily species within the Acidithiobacillus and Acidiphilium genera, although estimates of relative species abundance were only obtained from direct sequencing. Interestingly, our culture-based methods recovered four species that had been overlooked from our sequencing results because of the rarity of the marker gene sequences, likely members of the rare biosphere. Further, direct sequencing indicated that a single genus, completely missed in our culture-based study, Legionella, was a dominant member of the microbial community. Our results suggest that while either method does a reasonable job of identifying the dominant members of the AMD microbial community, together the methods combine to give a more complete picture of the true diversity of this environment. PMID:23485423

  14. Adsorptive features of poli(acrylic acid-co-hydroxyapatite) composite for UO22+

    The copolymer of poli(acrylic acid-co-hydroxyapatite) (PAA-HAP) was prepared and characterized by means of FT-IR and SEM analysis. The adsorptive features of PAA-HAP for UO22+ was studied as a function of pH, adsorbent dosage, initial metal ion concentration and temperature. The adsorption isotherm data fitted well to the Langmuir isotherm model. The adsorbed UO22+ can be desorbed effectively by 0.1 M HNO3. The maximum adsorption capacities for UO22+ of the dry PAA-HAP was 1.86 x 10-4 mol/g. The high adsorption capacity and kinetics results indicate that PAA-HAP can be used as an alternative adsorbent to remove UO22+ from aqueous solution. (author)

  15. Draft Genome Sequence of Acid-Tolerant Clostridium drakei SL1T, a Potential Chemical Producer through Syngas Fermentation

    Jeong, Yujin; Song, Yoseb; Shin, Hyeon Seok; Cho, Byung-Kwan

    2014-01-01

    Clostridium drakei SL1T is a strictly anaerobic, H2-utilizing, and acid-tolerant acetogen isolated from an acidic sediment that is a potential platform for commodity chemical production from syngas fermentation. The draft genome sequence of this strain will enable determination of the acid resistance and autotrophic pathway of the acetogen.

  16. Complete Genome Sequence of the Probiotic Lactic Acid Bacterium Lactobacillus Rhamnosus

    Samat Kozhakhmetov

    2014-01-01

    Full Text Available Introduction: Lactobacilli are a bacteria commonly found in the gastrointestinal tract. Some species of this genus have probiotic properties. The most common of these is Lactobacillus rhamnosus, a microoganism, generally regarded as safe (GRAS. It is also a homofermentative L-(+-lactic acid producer. The genus Lactobacillus is characterized by an extraordinary degree of the phenotypic and genotypic diversity. However, the studies of the genus were conducted mostly with the unequally distributed, non-random choice of species for sequencing; thus, there is only one representative genome from the Lactobacillus rhamnosus clade available to date. The aim of this study was to characterize the genome sequencing of selected strains of Lactobacilli. Methods: 109 samples were isolated from national domestic dairy products in the laboratory of Center for life sciences. After screaning isolates for probiotic properties, a highly active Lactobacillus spp strain was chosen. Genomic DNA was extracted according to the manufacturing protocol (Wizard® Genomic DNA Purification Kit. The Lactobacillus rhamnosus strain was identified as the highly active Lactobacillus strain accoridng to its morphological, cultural, physiological, and biochemical properties, and a genotypic analysis. Results: The genome of Lactobacillus rhamnosus was sequenced using the Roche 454 GS FLX (454 GS FLX platforms. The initial draft assembly was prepared from 14 large contigs (20 all contigs by the Newbler gsAssembler 2.3 (454 Life Sciences, Branford, CT. Conclusion: A full genome-sequencing of selected strains of lactic acid bacteria was made during the study.

  17. Metazoan Remaining Genes for Essential Amino Acid Biosynthesis: Sequence Conservation and Evolutionary Analyses

    Igor R. Costa

    2014-12-01

    Full Text Available Essential amino acids (EAA consist of a group of nine amino acids that animals are unable to synthesize via de novo pathways. Recently, it has been found that most metazoans lack the same set of enzymes responsible for the de novo EAA biosynthesis. Here we investigate the sequence conservation and evolution of all the metazoan remaining genes for EAA pathways. Initially, the set of all 49 enzymes responsible for the EAA de novo biosynthesis in yeast was retrieved. These enzymes were used as BLAST queries to search for similar sequences in a database containing 10 complete metazoan genomes. Eight enzymes typically attributed to EAA pathways were found to be ubiquitous in metazoan genomes, suggesting a conserved functional role. In this study, we address the question of how these genes evolved after losing their pathway partners. To do this, we compared metazoan genes with their fungal and plant orthologs. Using phylogenetic analysis with maximum likelihood, we found that acetolactate synthase (ALS and betaine-homocysteine S-methyltransferase (BHMT diverged from the expected Tree of Life (ToL relationships. High sequence conservation in the paraphyletic group Plant-Fungi was identified for these two genes using a newly developed Python algorithm. Selective pressure analysis of ALS and BHMT protein sequences showed higher non-synonymous mutation ratios in comparisons between metazoans/fungi and metazoans/plants, supporting the hypothesis that these two genes have undergone non-ToL evolution in animals.

  18. Nanopore Analysis of Nucleic Acids: Single-Molecule Studies of Molecular Dynamics, Structure, and Base Sequence

    Olasagasti, Felix; Deamer, David W.

    Nucleic acids are linear polynucleotides in which each base is covalently linked to a pentose sugar and a phosphate group carrying a negative charge. If a pore having roughly the crosssectional diameter of a single-stranded nucleic acid is embedded in a thin membrane and a voltage of 100 mV or more is applied, individual nucleic acids in solution can be captured by the electrical field in the pore and translocated through by single-molecule electrophoresis. The dimensions of the pore cannot accommodate anything larger than a single strand, so each base in the molecule passes through the pore in strict linear sequence. The nucleic acid strand occupies a large fraction of the pore's volume during translocation and therefore produces a transient blockade of the ionic current created by the applied voltage. If it could be demonstrated that each nucleotide in the polymer produced a characteristic modulation of the ionic current during its passage through the nanopore, the sequence of current modulations would reflect the sequence of bases in the polymer. According to this basic concept, nanopores are analogous to a Coulter counter that detects nanoscopic molecules rather than microscopic [1,2]. However, the advantage of nanopores is that individual macromolecules can be characterized because different chemical and physical properties affect their passage through the pore. Because macromolecules can be captured in the pore as well as translocated, the nanopore can be used to detect individual functional complexes that form between a nucleic acid and an enzyme. No other technique has this capability.

  19. The genome sequence of Geobacter metallireducens: features of metabolism, physiology and regulation common and dissimilar to Geobacter sulfurreducens

    Aklujkar, Muktak; Krushkal, Julia; DiBartolo, Genevieve; Lapidus, Alla; Land, Miriam L.; Lovley, Derek R.

    2008-12-01

    Background: The genome sequence of Geobacter metallireducens is the second to be completed from the metal-respiring genus Geobacter, and is compared in this report to that of Geobacter sulfurreducens in order to understand their metabolic, physiological and regulatory similarities and differences. Results: The experimentally observed greater metabolic versatility of G. metallireducens versus G. sulfurreducens is borne out by the presence of more numerous genes for metabolism of organic acids including acetate, propionate, and pyruvate. Although G. metallireducens lacks a dicarboxylic acid transporter, it has acquired a second succinate dehydrogenase/fumarate reductase complex, suggesting that respiration of fumarate was important until recently in its evolutionary history. Vestiges of the molybdate (ModE) regulon of G. sulfurreducens can be detected in G. metallireducens, which has lost the global regulatory protein ModE but retained some putative ModE-binding sites and multiplied certain genes of molybdenum cofactor biosynthesis. Several enzymes of amino acid metabolism are of different origin in the two species, but significant patterns of gene organization are conserved. Whereas most Geobacteraceae are predicted to obtain biosynthetic reducing equivalents from electron transfer pathways via a ferredoxin oxidoreductase, G. metallireducens can derive them from the oxidative pentose phosphate pathway. In addition to the evidence of greater metabolic versatility, the G. metallireducens genome is also remarkable for the abundance of multicopy nucleotide sequences found in intergenic regions and even within genes. Conclusion: The genomic evidence suggests that metabolism, physiology and regulation of gene expression in G. metallireducens may be dramatically different from other Geobacteraceae.

  20. Predicting DNA-binding sites of proteins from amino acid sequence

    Wu Feihong

    2006-05-01

    Full Text Available Abstract Background Understanding the molecular details of protein-DNA interactions is critical for deciphering the mechanisms of gene regulation. We present a machine learning approach for the identification of amino acid residues involved in protein-DNA interactions. Results We start with a Naïve Bayes classifier trained to predict whether a given amino acid residue is a DNA-binding residue based on its identity and the identities of its sequence neighbors. The input to the classifier consists of the identities of the target residue and 4 sequence neighbors on each side of the target residue. The classifier is trained and evaluated (using leave-one-out cross-validation on a non-redundant set of 171 proteins. Our results indicate the feasibility of identifying interface residues based on local sequence information. The classifier achieves 71% overall accuracy with a correlation coefficient of 0.24, 35% specificity and 53% sensitivity in identifying interface residues as evaluated by leave-one-out cross-validation. We show that the performance of the classifier is improved by using sequence entropy of the target residue (the entropy of the corresponding column in multiple alignment obtained by aligning the target sequence with its sequence homologs as additional input. The classifier achieves 78% overall accuracy with a correlation coefficient of 0.28, 44% specificity and 41% sensitivity in identifying interface residues. Examination of the predictions in the context of 3-dimensional structures of proteins demonstrates the effectiveness of this method in identifying DNA-binding sites from sequence information. In 33% (56 out of 171 of the proteins, the classifier identifies the interaction sites by correctly recognizing at least half of the interface residues. In 87% (149 out of 171 of the proteins, the classifier correctly identifies at least 20% of the interface residues. This suggests the possibility of using such classifiers to identify

  1. Complete amino acid sequence of globin chains and biological activity of fragmented crocodile hemoglobin (Crocodylus siamensis).

    Srihongthong, Saowaluck; Pakdeesuwan, Anawat; Daduang, Sakda; Araki, Tomohiro; Dhiravisit, Apisak; Thammasirirak, Sompong

    2012-08-01

    Hemoglobin, α-chain, β-chain and fragmented hemoglobin of Crocodylus siamensis demonstrated both antibacterial and antioxidant activities. Antibacterial and antioxidant properties of the hemoglobin did not depend on the heme structure but could result from the compositions of amino acid residues and structures present in their primary structure. Furthermore, thirteen purified active peptides were obtained by RP-HPLC analyses, corresponding to fragments in the α-globin chain and the β-globin chain which are mostly located at the N-terminal and C-terminal parts. These active peptides operate on the bacterial cell membrane. The globin chains of Crocodylus siamensis showed similar amino acids to the sequences of Crocodylus niloticus. The novel amino acid substitutions of α-chain and β-chain are not associated with the heme binding site or the bicarbonate ion binding site, but could be important through their interactions with membranes of bacteria. PMID:22648692

  2. Amino acid selective unlabeling for sequence specific resonance assignments in proteins

    Krishnarjuna, B.; Jaipuria, Garima; Thakur, Anushikha [Indian Institute of Science, NMR Research Centre (India); D' Silva, Patrick, E-mail: patrick@biochem.iisc.ernet.in [Indian Institute of Science, Department of Biochemistry (India); Atreya, Hanudatta S., E-mail: hsatreya@sif.iisc.ernet.in [Indian Institute of Science, NMR Research Centre (India)

    2011-01-15

    Sequence specific resonance assignment constitutes an important step towards high-resolution structure determination of proteins by NMR and is aided by selective identification and assignment of amino acid types. The traditional approach to selective labeling yields only the chemical shifts of the particular amino acid being selected and does not help in establishing a link between adjacent residues along the polypeptide chain, which is important for sequential assignments. An alternative approach is the method of amino acid selective 'unlabeling' or reverse labeling, which involves selective unlabeling of specific amino acid types against a uniformly {sup 13}C/{sup 15}N labeled background. Based on this method, we present a novel approach for sequential assignments in proteins. The method involves a new NMR experiment named, {l_brace}{sup 12}CO{sub i}-{sup 15}N{sub i+1}{r_brace}-filtered HSQC, which aids in linking the {sup 1}H{sup N}/{sup 15}N resonances of the selectively unlabeled residue, i, and its C-terminal neighbor, i + 1, in HN-detected double and triple resonance spectra. This leads to the assignment of a tri-peptide segment from the knowledge of the amino acid types of residues: i - 1, i and i + 1, thereby speeding up the sequential assignment process. The method has the advantage of being relatively inexpensive, applicable to {sup 2}H labeled protein and can be coupled with cell-free synthesis and/or automated assignment approaches. A detailed survey involving unlabeling of different amino acid types individually or in pairs reveals that the proposed approach is also robust to misincorporation of {sup 14}N at undesired sites. Taken together, this study represents the first application of selective unlabeling for sequence specific resonance assignments and opens up new avenues to using this methodology in protein structural studies.

  3. Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids.

    Tanaka, Junko; Doi, Nobuhide; Takashima, Hideaki; Yanagawa, Hiroshi

    2010-04-01

    Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279-284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids. PMID:20162614

  4. Partial amino acid sequence of fructose-1,6-bisphosphatase from the blue-green algae Synechococcus leopoliensis.

    Marcus, F; Latshaw, S P; Steup, M; Gerbling, K P

    1989-08-01

    Purified fructose-1,6-bisphosphatase from the cyanobacterium Synechococcus leopoliensis was S-carboxymethylated and cleaved with trypsin. The resulting peptides were purified by reversed-phase high performance liquid chromatography and the amino acid sequence of six of the purified peptides was determined by gas-phase microsequencing. The results revealed sequence homology with other fructose-1,6-bisphosphatases. The obtained sequence data provides information required for the design of oligonucleotide hybridization probes to screen existing libraries of cyanobacterial DNA. The determination of the amino acid sequence of cyanobacterial proteins may yield important information with respect to the endosymbiotic theory of evolution. PMID:2550924

  5. CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping

    Shi Weisong

    2011-06-01

    Full Text Available Abstract Background Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS. However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. Results To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80% mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http

  6. Parameters of proteome evolution from histograms of amino-acid sequence identities of paralogous proteins

    Yan Koon-Kiu

    2007-11-01

    Full Text Available Abstract Background The evolution of the full repertoire of proteins encoded in a given genome is mostly driven by gene duplications, deletions, and sequence modifications of existing proteins. Indirect information about relative rates and other intrinsic parameters of these three basic processes is contained in the proteome-wide distribution of sequence identities of pairs of paralogous proteins. Results We introduce a simple mathematical framework based on a stochastic birth-and-death model that allows one to extract some of this information and apply it to the set of all pairs of paralogous proteins in H. pylori, E. coli, S. cerevisiae, C. elegans, D. melanogaster, and H. sapiens. It was found that the histogram of sequence identities p generated by an all-to-all alignment of all protein sequences encoded in a genome is well fitted with a power-law form ~ p-γ with the value of the exponent γ around 4 for the majority of organisms used in this study. This implies that the intra-protein variability of substitution rates is best described by the Gamma-distribution with the exponent α ≈ 0.33. Different features of the shape of such histograms allow us to quantify the ratio between the genome-wide average deletion/duplication rates and the amino-acid substitution rate. Conclusion We separately measure the short-term ("raw" duplication and deletion rates rdup∗ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemOCai3aa0baaSqaaiabbsgaKjabbwha1jabbchaWbqaaiabgEHiQaaaaaa@3283@, rdel∗ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemOCai3aa0baaSqaaiabbsga

  7. tRNAfeature: An algorithm for tRNA features to identify tRNA genes in DNA sequences.

    Yang, Cheng-Hong; Lin, Yu-Da; Chuang, Li-Yeh

    2016-09-01

    The identification of transfer RNAs (tRNAs) is critical for a detailed understanding of the evolution of biological organisms and viruses. However, some tRNAs are difficult to recognize due to their unusual sub-structures and may result in the detection of the wrong anticodon. Therefore, the detection of unusual sub-structures of tRNA genes remains an important challenge. In this study, we propose a method to identify tRNA genes based on tRNA features. tRNAfeature attempts to refold the sequence with single-stranded regions longer than those found in the canonical and conventional structural models for tRNA. We predicted a set of 53926 archaeal, eubacterial and eukaryotic tRNA genes annotated in tRNADB-CE and scanned the tRNA genes in whole genome sequencing. The results indicate that tRNAfeature is more powerful than other existing methods for identifying tRNAs. PMID:27291467

  8. alpha-Amylase gene of Streptomyces limosus: nucleotide sequence, expression motifs, and amino acid sequence homology to mammalian and invertebrate alpha-amylases.

    Long, C M; Virolle, M J; Chang, S Y; Chang, S.; Bibb, M.J.

    1987-01-01

    The nucleotide sequence of the coding and regulatory regions of the alpha-amylase gene (aml) of Streptomyces limosus was determined. High-resolution S1 mapping was used to locate the 5' end of the transcript and demonstrated that the gene is transcribed from a unique promoter. The predicted amino acid sequence has considerable identity to mammalian and invertebrate alpha-amylases, but not to those of plant, fungal, or eubacterial origin. Consistent with this is the susceptibility of the enzym...

  9. Nucleic and amino acid sequences relating to a novel transketolase, and methods for the expression thereof

    Croteau, Rodney Bruce (Pullman, WA); Wildung, Mark Raymond (Colfax, WA); Lange, Bernd Markus (Pullman, WA); McCaskill, David G. (Pullman, WA)

    2001-01-01

    cDNAs encoding 1-deoxyxylulose-5-phosphate synthase from peppermint (Mentha piperita) have been isolated and sequenced, and the corresponding amino acid sequences have been determined. Accordingly, isolated DNA sequences (SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7) are provided which code for the expression of 1-deoxyxylulose-5-phosphate synthase from plants. In another aspect the present invention provides for isolated, recombinant DXPS proteins, such as the proteins having the sequences set forth in SEQ ID NO:4, SEQ ID NO:6 and SEQ ID NO:8. In other aspects, replicable recombinant cloning vehicles are provided which code for plant 1-deoxyxylulose-5-phosphate synthases, or for a base sequence sufficiently complementary to at least a portion of 1-deoxyxylulose-5-phosphate synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding a plant 1-deoxyxylulose-5-phosphate synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant 1-deoxyxylulose-5-phosphate synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant 1-deoxyxylulose-5-phosphate synthase may be used to obtain expression or enhanced expression of 1-deoxyxylulose-5-phosphate synthase in plants in order to enhance the production of 1-deoxyxylulose-5-phosphate, or its derivatives such as isopentenyl diphosphate (BP), or may be otherwise employed for the regulation or expression of 1-deoxyxylulose-5-phosphate synthase, or the production of its products.

  10. Etiology of autistic features: the persisting neurotoxic effects of propionic acid

    El-Ansary Afaf K

    2012-04-01

    Full Text Available Abstract Background Recent clinical observations suggest that certain gut and dietary factors may transiently worsen symptoms in autism. Propionic acid (PA is a short chain fatty acid and an important intermediate of cellular metabolism. Although PA has several beneficial biological effects, its accumulation is neurotoxic. Methods Two groups of young Western albino male rats weighing about 45 to 60 grams (approximately 21 days old were used in the present study. The first group consisted of oral buffered PA-treated rats that were given a neurotoxic dose of 250 mg/kg body weight/day for three days, n = eight; the second group of rats were given only phosphate buffered saline and used as a control. Biochemical parameters representing oxidative stress, energy metabolism, neuroinflammation, neurotransmission, and apoptosis were investigated in brain homogenates of both groups. Results Biochemical analyses of brain homogenates from PA-treated rats showed an increase in oxidative stress markers (for example, lipid peroxidation, coupled with a decrease in glutathione (GSH and glutathione peroxidase (GPX and catalase activities. Impaired energy metabolism was ascertained through the decrease of lactate dehydrogenase and activation of creatine kinase (CK. Elevated IL-6, TNFα, IFNγ and heat shock protein 70 (HSP70 confirmed the neuroinflammatory effect of PA. Moreover, elevation of caspase3 and DNA fragmentation proved the pro-apoptotic and neurotoxic effect of PA to rat pups Conclusion By comparing the results obtained with those from animal models of autism or with clinical data on the biochemical profile of autistic patients, this study showed that the neurotoxicity of PA as an environmental factor could play a central role in the etiology of autistic biochemical features.

  11. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3.

    Xiaoyu Wang

    Full Text Available Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals.

  12. Heterodimeric l-amino acid oxidase enzymes from Egyptian Cerastes cerastes venom: Purification, biochemical characterization and partial amino acid sequencing

    A.E. El Hakim

    2015-12-01

    Full Text Available Two l-amino acid oxidase enzyme isoforms, Cc-LAAOI and Cc-LAAOII were purified to apparent homogeneity from Cerastes cerastes venom in a sequential two-step chromatographic protocol including; gel filtration and anion exchange chromatography. The native molecular weights of the isoforms were 115 kDa as determined by gel filtration on calibrated Sephacryl S-200 column, while the monomeric molecular weights of the enzymes were, 60, 56 kDa and 60, 53 kDa for LAAOI and LAAOII, respectively. The tryptic peptides of the two isoforms share high sequence homology with other snake venom l-amino acid oxidases. The optimal pH and temperature values of Cc-LAAOI and Cc-LAAOII were 7.8, 50 °C and 7, 60 °C, respectively. The two isoenzymes were thermally stable up to 70 °C. The Km and Vmax values were 0.67 mM, 0.135 μmol/min for LAAOI and 0.82 mM, 0.087 μmol/min for LAAOII. Both isoenzymes displayed high catalytic preference to long-chain, hydrophobic and aromatic amino acids. The Mn2+ ion markedly increased the LAAO activity for both purified isoforms, while Na+, K+, Ca2+, Mg2+ and Ba2+ ions showed a non-significant increase in the enzymatic activity of both isoforms. Furthermore, Zn2+, Ni2+, Co2+, Cu2+ and AL3+ ions markedly inhibited the LAAOI and LAAOII activities. l-Cysteine and reduced glutathione completely inhibited the LAAO activity of both isoenzymes, whereas, β-mercaptoethanol, O-phenanthroline and PMSF completely inhibited the enzymatic activity of LAAOII. Furthermore, iodoacitic acid inhibited the enzymatic activity of LAAOII by 46% and had no effect on the LAAOI activity.

  13. Sequence-defined bioactive macrocycles via an acid-catalysed cascade reaction

    Porel, Mintu; Thornlow, Dana N.; Phan, Ngoc N.; Alabi, Christopher A.

    2016-06-01

    Synthetic macrocycles derived from sequence-defined oligomers are a unique structural class whose ring size, sequence and structure can be tuned via precise organization of the primary sequence. Similar to peptides and other peptidomimetics, these well-defined synthetic macromolecules become pharmacologically relevant when bioactive side chains are incorporated into their primary sequence. In this article, we report the synthesis of oligothioetheramide (oligoTEA) macrocycles via a one-pot acid-catalysed cascade reaction. The versatility of the cyclization chemistry and modularity of the assembly process was demonstrated via the synthesis of >20 diverse oligoTEA macrocycles. Structural characterization via NMR spectroscopy revealed the presence of conformational isomers, which enabled the determination of local chain dynamics within the macromolecular structure. Finally, we demonstrate the biological activity of oligoTEA macrocycles designed to mimic facially amphiphilic antimicrobial peptides. The preliminary results indicate that macrocyclic oligoTEAs with just two-to-three cationic charge centres can elicit potent antibacterial activity against Gram-positive and Gram-negative bacteria.

  14. SigWin-detector: a Grid-enabled workflow for discovering enriched windows of genomic features related to DNA sequences

    Wibisono Adianto

    2008-08-01

    Full Text Available Abstract Background Chromosome location is often used as a scaffold to organize genomic information in both the living cell and molecular biological research. Thus, ever-increasing amounts of data about genomic features are stored in public databases and can be readily visualized by genome browsers. To perform in silico experimentation conveniently with this genomics data, biologists need tools to process and compare datasets routinely and explore the obtained results interactively. The complexity of such experimentation requires these tools to be based on an e-Science approach, hence generic, modular, and reusable. A virtual laboratory environment with workflows, workflow management systems, and Grid computation are therefore essential. Findings Here we apply an e-Science approach to develop SigWin-detector, a workflow-based tool that can detect significantly enriched windows of (genomic features in a (DNA sequence in a fast and reproducible way. For proof-of-principle, we utilize a biological use case to detect regions of increased and decreased gene expression (RIDGEs and anti-RIDGEs in human transcriptome maps. We improved the original method for RIDGE detection by replacing the costly step of estimation by random sampling with a faster analytical formula for computing the distribution of the null hypothesis being tested and by developing a new algorithm for computing moving medians. SigWin-detector was developed using the WS-VLAM workflow management system and consists of several reusable modules that are linked together in a basic workflow. The configuration of this basic workflow can be adapted to satisfy the requirements of the specific in silico experiment. Conclusion As we show with the results from analyses in the biological use case on RIDGEs, SigWin-detector is an efficient and reusable Grid-based tool for discovering windows enriched for features of a particular type in any sequence of values. Thus, SigWin-detector provides the

  15. Complete Genome Sequence of Lactococcus lactis IO-1, a Lactic Acid Bacterium That Utilizes Xylose and Produces High Levels of l-Lactic Acid

    Kato, Hiroaki; Shiwa, Yuh; Oshima, Kenshiro; Machii, Miki; Araya-Kojima, Tomoko; Zendo, Takeshi; Shimizu-Kadota, Mariko; Hattori, Masahira; Sonomoto, Kenji; Yoshikawa, Hirofumi

    2012-01-01

    We report the complete genome sequence of Lactococcus lactis IO-1 (= JCM7638). It is a nondairy lactic acid bacterium, produces nisin Z, ferments xylose, and produces predominantly l-lactic acid at high xylose concentrations. From ortholog analysis with other five L. lactis strains, IO-1 was identified as L. lactis subsp. lactis.

  16. Morphological tranformation of calcite crystal growth by prismatic "acidic" polypeptide sequences.

    Kim, I; Giocondi, J L; Orme, C A; Collino, J; Evans, J S

    2007-02-13

    Many of the interesting mechanical and materials properties of the mollusk shell are thought to stem from the prismatic calcite crystal assemblies within this composite structure. It is now evident that proteins play a major role in the formation of these assemblies. Recently, a superfamily of 7 conserved prismatic layer-specific mollusk shell proteins, Asprich, were sequenced, and the 42 AA C-terminal sequence region of this protein superfamily was found to introduce surface voids or porosities on calcite crystals in vitro. Using AFM imaging techniques, we further investigate the effect that this 42 AA domain (Fragment-2) and its constituent subdomains, DEAD-17 and Acidic-2, have on the morphology and growth kinetics of calcite dislocation hillocks. We find that Fragment-2 adsorbs on terrace surfaces and pins acute steps, accelerates then decelerates the growth of obtuse steps, forms clusters and voids on terrace surfaces, and transforms calcite hillock morphology from a rhombohedral form to a rounded one. These results mirror yet are distinct from some of the earlier findings obtained for nacreous polypeptides. The subdomains Acidic-2 and DEAD-17 were found to accelerate then decelerate obtuse steps and induce oval rather than rounded hillock morphologies. Unlike DEAD-17, Acidic-2 does form clusters on terrace surfaces and exhibits stronger obtuse velocity inhibition effects than either DEAD-17 or Fragment-2. Interestingly, a 1:1 mixture of both subdomains induces an irregular polygonal morphology to hillocks, and exhibits the highest degree of acute step pinning and obtuse step velocity inhibition. This suggests that there is some interplay between subdomains within an intra (Fragment-2) or intermolecular (1:1 mixture) context, and sequence interplay phenomena may be employed by biomineralization proteins to exert net effects on crystal growth and morphology.

  17. Amino acid sequence of the cold-active alkaline phosphatase from Atlantic cod (Gadus morhua)

    Asgeirsson, Bjarni; Nielsen, Berit Noesgaard; Højrup, Peter

    2003-01-01

    Atlantic cod is a marine fish that lives at low temperatures of 0-10 degrees C and contains a cold-adapted alkaline phosphatase (AP). Preparations of AP from either the lower part of the intestines or the pyloric caeca area were subjected to proteolytic digestion, mass spectrometry and amino acid...... sequencing by Edman degradation. The primary structure exhibits greatest similarity to human tissue non-specific AP (80%), and approximately 30% similarity to AP from Escherichia coli. The key residues required for catalysis are conserved in the cod AP, except for the third metal binding site, where cod AP...

  18. Pathophysiology, clinical features and radiological findings of differentiation syndrome/all-trans-retinoic acid syndrome

    Luciano; Cardinale; Francesco; Asteggiano; Federica; Moretti; Federico; Torre; Stefano; Ulisciani; Carmen; Fava; Giovanna; Rege-Cambrin

    2014-01-01

    In acute promyelocytic leukemia, differentiation thera-py based on all-trans-retinoic acid can be complicated by the development of a differentiation syndrome(DS). DS is a life-threatening complication, characterized by respiratory distress, unexplained fever, weight gain, interstitial lung infiltrates, pleural or pericardial effusions, hypotension and acute renal failure. The diagnosis of DS is made on clinical grounds and has proven to be difficult, because none of the symptoms is pathognomonic for the syndrome without any definitive diagnostic criteria. As DS can have subtle signs and symptoms at presentation but progress rapidly, end-stage DS clinical picture resembles the acute respiratory distress syndrome with extremely poor prognosis; so it is of absolute importance to be conscious of these complications and initiate therapy as soon as it was suspected. The radiologic appearance resembles the typical features of cardiogenic pulmonary edema. Diagnosis of DS remains a great skill for radiologists and haematologist but it is of an utmost importance the cooperation in suspect DS, detect the early signs of DS, examine the patients’ behaviour and rapidly detect the complications.

  19. Frequencies of amino acid strings in globular protein sequences indicate suppression of blocks of consecutive hydrophobic residues

    Schwartz, Russell; Istrail, Sorin; King, Jonathan

    2001-01-01

    Patterns of hydrophobic and hydrophilic residues play a major role in protein folding and function. Long, predominantly hydrophobic strings of 20–22 amino acids each are associated with transmembrane helices and have been used to identify such sequences. Much less attention has been paid to hydrophobic sequences within globular proteins. In prior work on computer simulations of the competition between on-pathway folding and off-pathway aggregate formation, we found that long sequences of cons...

  20. 3-d structure-based amino acid sequence alignment of esterases, lipases and related proteins

    Gentry, M.K.; Doctor, B.P.; Cygler, M.; Schrag, J.D.; Sussman, J.L.

    1993-05-13

    Acetylcholinesterase and butyrylcholinesterase, enzymes with potential as pretreatment drugs for organophosphate toxicity, are members of a larger family of homologous proteins that includes carboxylesterases, cholesterol esterases, lipases, and several nonhydrolytic proteins. A computer-generated alignment of 18 of the proteins, the acetylcholinesases, butyrylcholinesterases, carboxylesterases, some esterases, and the nonenzymatic proteins has been previously presented. More recently, the three-dimensional structures of two enzymes enzymes in this group, acetylcholinesterase from Torpedo californica and lipase from Geotrichum candidum, have been determined. Based on the x-ray structures and the superposition of these two enzymes, it was possible to obtain an improved amino acid sequence alignment of 32 members of this family of proteins. Examination of this alignment reveals that 24 amino acids are invariant in all of the hydrolytic proteins, and an additional 49 are well conserved. Conserved amino acids include those of the active site, the disulfide bridges, the salt bridges, in the core of the proteins, and at the edges of secondary structural elements. Comparison of the three-dimensional structures makes it possible to find a well-defined structural basis for the conservation of many of these amino acids.

  1. Draft Genome Sequences of Gluconobacter cerinus CECT 9110 and Gluconobacter japonicus CECT 8443, Acetic Acid Bacteria Isolated from Grape Must

    Sainz, Florencia

    2016-01-01

    We report here the draft genome sequences of Gluconobacter cerinus strain CECT9110 and Gluconobacter japonicus CECT8443, acetic acid bacteria isolated from grape must. Gluconobacter species are well known for their ability to oxidize sugar alcohols into the corresponding acids. Our objective was to select strains to oxidize effectively d-glucose. PMID:27365351

  2. Draft Genome Sequence of Bacillus subtilis GXA-28, a Thermophilic Strain with High Productivity of Poly-γ-Glutamic Acid

    Zeng, Wei; Chen, Guiguang; Tang, Zhen; Wu, Hao; Shu, Lin; Liang, Zhiqun

    2014-01-01

    Bacillus subtilis GXA-28 is a thermophilic strain that can produce high yield and high molecular weight of poly-γ-glutamic acid under high temperature. Here, we report the draft genome sequence of this strain, which may provide the genomic basis for the high productivity of poly-γ-glutamic acid.

  3. FiveS rRNA sequences and fatty acid profiles of colourless sulfur-oxidising bacteria

    LokaBharathi, P.A.; Ortiz-conde, B.A; Nair, S.; Chandramohan, D.; Colwell, R.R.

    these at the molecular level, 5S ribosomal ribonucleic acid (5S rRNA) sequences have been determined. Fatty acid profiles showed strain 29 to be related to Pseudomonas vesicularis with an E.D. of 5.965 and similarity index of 0.182. The nearest organism of strain 82...

  4. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    2010-07-01

    ... with paragraph (b) of this section, must begin on a new page and must be titled “Sequence Listing.” The... shall be “DNA.” In addition, the combined DNA/RNA molecule shall be further described in the to feature..., the “Unknown” or “Artificial Sequence” organisms shall be further described in the to feature...

  5. Geochemical features and effects on deep-seated fluids during the May-June 2012 southern Po Valley seismic sequence

    Francesco Italiano

    2012-10-01

    Full Text Available A periodic sampling of the groundwaters and dissolved and free gases in selected deep wells located in the area affected by the May-June 2012 southern Po Valley seismic sequence has provided insight into seismogenic-induced changes of the local aquifer systems. The results obtained show progressive changes in the fluid geochemistry, allowing it to be established that deep-seated fluids were mobilized during the seismic sequence and reached surface layers along faults and fractures, which generated significant geochemical anomalies. The May-June 2012 seismic swarm (mainshock on May 29, 2012, M 5.8; 7 shocks M >5, about 200 events 3 > M > 5 induced several modifications in the circulating fluids. This study reports the preliminary results obtained for the geochemical features of the waters and gases collected over the epicentral area from boreholes drilled at different depths, thus intercepting water and gases with different origins and circulation. The aim of the investigations was to improve our knowledge of the fluids circulating over the seismic area (e.g. origin, provenance, interactions, mixing of different components, temporal changes. This was achieved by collecting samples from both shallow and deep-drilled boreholes, and then, after the selection of the relevant sites, we looked for temporal changes with mid-to-long-term monitoring activity following a constant sampling rate. This allowed us to gain better insight into the relationships between the fluid circulation and the faulting activity. The sampling sites are listed in Table 1, along with the analytical results of the gas phase. […

  6. Isolation and amino acid sequences of squirrel monkey (Saimiri sciurea) insulin and glucagon

    It was reported two decades ago that insulin was not detectable in the glucose-stimulated state in Saimiri sciurea, the New World squirrel monkey, by a radioimmunoassay system developed with guinea pig anti-pork insulin antibody and labeled park insulin. With the same system, reasonable levels were observed in rhesus monkeys and chimpanzees. This suggested that New World monkeys, like the New World hystricomorph rodents such as the guinea pig and the coypu, might have insulins whose sequences differ markedly from those of Old World mammals. In this report the authors describe the purification and amino acid sequences of squirrel monkey insulin and glucagon. They demonstrate that the substitutions at B29, B27, A2, A4, and A17 of squirrel monkey insulin are identical with those previously found in another New World primate, the owl monkey (Aotus trivirgatus). The immunologic cross-reactivity of this insulin in their immunoassay system is only a few percent of that of human insulin. It appears that the peptides of the New World monkeys have diverged less from those of the Old World mammals than have those of the New World hystricomorph rodents. The striking improvements in peptide purification and sequencing have the potential for adding new information concerning the evolutionary divergence of species

  7. Complete amino acid sequence of the myoglobin from the Pacific spotted dolphin, Stenella attenuata graffmani.

    Jones, B N; Wang, C C; Dwulet, F E; Lehman, L D; Meuth, J L; Bogardt, R A; Gurd, F R

    1979-04-25

    The complete amino acid sequence of the major component myoglobin from the Pacific spotted dolphin, Stenella attenuata graffmani, was determined by the automated Edman degradation of several large peptides obtained by specific cleavage of the protein. The acetimidated apomyoglobin was selectively cleaved at its two methionyl residues with cyanogen bromide and at its three arginyl residues by trypsin. By subjecting four of these peptides and the apomyoglobin to automated Edman degradation, over 80% of the primary structure of the protein was obtained. The remainder of the covalent structure was determined by the sequence analysis of peptides that resulted from further digestion of the central cyanogen bromide fragment. This fragment was cleaved at its glutamyl residues with staphylococcal protease and its lysyl residues with trypsin. The action of trypsin was restricted to the lysyl residues by chemical modification of the single arginyl residue of the fragment with 1,2-cyclohexanedione. The primary structure of this myoglobin proved to be identical with that from the Atlantic bottlenosed dolphin and Pacific common dolphin but differs from the myoglobins of the killer whale and pilot whale at two positions. The above sequence identities and differences reflect the close taxonomic relationship of these five species of Cetacea. PMID:454657

  8. Isolation and amino acid sequences of squirrel monkey (Saimiri sciurea) insulin and glucagon

    Yu, Jinghua (Veterans Administration Medical Center, Bronx, NY (United States)); Eng, J.; Yalow, R.S. (Veterans Administration Medical Center, Bronx, NY (United States) City Univ. of New York, NY (United States))

    1990-12-01

    It was reported two decades ago that insulin was not detectable in the glucose-stimulated state in Saimiri sciurea, the New World squirrel monkey, by a radioimmunoassay system developed with guinea pig anti-pork insulin antibody and labeled park insulin. With the same system, reasonable levels were observed in rhesus monkeys and chimpanzees. This suggested that New World monkeys, like the New World hystricomorph rodents such as the guinea pig and the coypu, might have insulins whose sequences differ markedly from those of Old World mammals. In this report the authors describe the purification and amino acid sequences of squirrel monkey insulin and glucagon. They demonstrate that the substitutions at B29, B27, A2, A4, and A17 of squirrel monkey insulin are identical with those previously found in another New World primate, the owl monkey (Aotus trivirgatus). The immunologic cross-reactivity of this insulin in their immunoassay system is only a few percent of that of human insulin. It appears that the peptides of the New World monkeys have diverged less from those of the Old World mammals than have those of the New World hystricomorph rodents. The striking improvements in peptide purification and sequencing have the potential for adding new information concerning the evolutionary divergence of species.

  9. A Possible Mechanism of Zika Virus Associated Microcephaly: Imperative Role of Retinoic Acid Response Element (RARE) Consensus Sequence Repeats in the Viral Genome

    Kumar, Ashutosh; Singh, Himanshu N.; Pareek, Vikas; Raza, Khursheed; Dantham, Subrahamanyam; Kumar, Pavan; Mochan, Sankat; Faiq, Muneeb A.

    2016-01-01

    Owing to the reports of microcephaly as a consistent outcome in the fetuses of pregnant women infected with ZIKV in Brazil, Zika virus (ZIKV)—microcephaly etiomechanistic relationship has recently been implicated. Researchers, however, are still struggling to establish an embryological basis for this interesting causal handcuff. The present study reveals robust evidence in favor of a plausible ZIKV-microcephaly cause-effect liaison. The rationale is based on: (1) sequence homology between ZIKV genome and the response element of an early neural tube developmental marker “retinoic acid” in human DNA and (2) comprehensive similarities between the details of brain defects in ZIKV-microcephaly and retinoic acid embryopathy. Retinoic acid is considered as the earliest factor for regulating anteroposterior axis of neural tube and positioning of structures in developing brain through retinoic acid response elements (RARE) consensus sequence (5′–AGGTCA–3′) in promoter regions of retinoic acid-dependent genes. We screened genomic sequences of already reported virulent ZIKV strains (including those linked to microcephaly) and other viruses available in National Institute of Health genetic sequence database (GenBank) for the RARE consensus repeats and obtained results strongly bolstering our hypothesis that ZIKV strains associated with microcephaly may act through precipitation of dysregulation in retinoic acid-dependent genes by introducing extra stretches of RARE consensus sequence repeats in the genome of developing brain cells. Additional support to our hypothesis comes from our findings that screening of other viruses for RARE consensus sequence repeats is positive only for those known to display neurotropism and cause fetal brain defects (for which maternal-fetal transmission during developing stage may be required). The numbers of RARE sequence repeats appeared to match with the virulence of screened positive viruses. Although, bioinformatic evidence and

  10. Identification of a repeated sequence in the genome of the sea urchin which is transcribed by RNA polymerase III and contains the features of a retroposon.

    Nisson, P E; Hickey, R. J.; Boshar, M F; Crain, W R

    1988-01-01

    A repeated sequence element which is located about 200 nucleotides upstream from the protein-coding portion of the muscle actin gene (probably within a large 5' intron) in the genome of the sea urchin, Strongylocentrotus purpuratus has been characterized, and shown to contain the sequence features which indicate that it has been transposed by means of an RNA intermediate. This retroposon-like sequence, SURF1-1, is a member of a family which is dispersed and repeated about 800 times in the gen...

  11. Sequence stratigraphic features of the Middle Permian Maokou Formation in the Sichuan Basin and their controls on source rocks and reservoirs

    Wang Su

    2015-11-01

    Full Text Available Well Shuangyushi 1 and Well Nanchong l deployed in the NW and central Sichuan Basin have obtained a high-yield industrial gas flow in the dolomite and karst reservoirs of the Middle Permian Maokou Formation, showing good exploration prospects of the Maokou Formation. In order to identify the sequence stratigraphic features of the Maokou Formation, its sequence stratigraphy was divided and a unified sequence stratigraphic framework applicable for the entire basin was established to analyze the stratigraphic denudation features within the sequence framework by using the spectral curve trend attribute analysis, together with drilling and outcrop data. On this basis, the controls of sequence on source rocks and reservoirs were analyzed. In particular, the Maokou Formation was divided into two third-order sequences – SQ1 and SQ2. SQ1 was composed of members Mao 1 Member and Mao 3, while SQ2 was composed of Mao 4 Member. Sequence stratigraphic correlation indicated that the Maokou Formation within the basin had experienced erosion to varying extent, forming “three intense and two weak” denuded regions, among which, the upper part of SQ2 was slightly denuded in the two weak denuded regions (SW Sichuan Basin and locally Eastern Sichuan Basin, while SQ2 was denuded out in the three intense denuded regions (Southern Sichuan Basin–Central Sichuan Basin, NE and NW Sichuan Basin. The development of source rocks and reservoirs within sequence stratigraphic framework was significantly affected by sequence boundary; the grain banks that can form effective reservoir were predominately distributed in SQ1 highstand systems tract (HST, while effective source rocks were predominately distributed in SQ1 transgressive system tract (TST. It is concluded that the sequence division method is objective and reasonable, which can effectively guide oil and gas exploration in this region.

  12. Comparison of the chromosomal localization of murine and human glucocerebrosidase genes and of the deduced amino acid sequences

    To study structure-function relationships and molecular evolution, the authors determined the nucleotide sequence and chromosomal location of the gene encoding murine glucocerebrosidase. In the protein coding region of the murine cDNA, the nucleotide sequence and the corresponding deduced amino acid sequences were 82% and 86% identical to the respective humans sequences. All five amino acids presently known to be essential for normal enzymatic activity were conserved between mouse and man. The murine enzyme had a single deletion relative to the human enzyme at amino acid number 273. One ATG translation initiation signal was present in the mouse sequence in contrast to the human sequence, where two start codons have been reported. Nucleotide sequencing of a clone derived from murine genomic DNA revealed that the murine signal for translation initiation was located in exon 2. The locations of all 10 introns were conserved among mouse and man. They mapped the genetic locus for glucocerebrosidase to mouse chromosome 3, at a position 7.6 ± 3.2 centimorgans from the locus for the β subunit of nerve growth factor. Comparison of linkage relationships in the human and murine genome indicates that these closely linked mouse genes are also syntenic on human chromosome 1 but in positions that span the centromere

  13. Repetitive sequence based polymerase chain reaction to differentiate close bacteria strains in acidic sites

    XIE Ming; YIN Hua-qun; LIU Yi; LIU Jie; LIU Xue-duan

    2008-01-01

    To study the diversity of bacteria strains newly isolated from several acid mine drainage(AMD) sites in China,repetitive sequence based polymerase chain reaction (rep-PCR),a well established technology for diversity analysis of closely related bacteria strains,was conducted on 30 strains of bacteria Leptospirillum ferriphilium,8 strains of bacteria Acidithiobacillus ferrooxidans,as well as the Acidithiobacillus ferrooxidans type strain ATCC (American Type Culture Collection) 23270.The results showed that,using ERIC and BOX primer sets,rep-PCR produced highly discriminatory banding patterns.Phylogenetic analysis based on ERIC-PCR banding types was made and the results indicated that rep-PCR could be used as a rapid and highly discriminatory screening technique in studying bacterial diversity,especially in differentiating bacteria within one species in AMD.

  14. The amino acid alphabet and the architecture of the protein sequence-structure map. I. Binary alphabets.

    Ferrada, Evandro

    2014-12-01

    The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet. PMID:25473967

  15. Trypsin inhibitors from ridged gourd (Luffa acutangula Linn.) seeds: purification, properties, and amino acid sequences.

    Haldar, U C; Saha, S K; Beavis, R C; Sinha, N K

    1996-02-01

    Two trypsin inhibitors, LA-1 and LA-2, have been isolated from ridged gourd (Luffa acutangula Linn.) seeds and purified to homogeneity by gel filtration followed by ion-exchange chromatography. The isoelectric point is at pH 4.55 for LA-1 and at pH 5.85 for LA-2. The Stokes radius of each inhibitor is 11.4 A. The fluorescence emission spectrum of each inhibitor is similar to that of the free tyrosine. The biomolecular rate constant of acrylamide quenching is 1.0 x 10(9) M-1 sec-1 for LA-1 and 0.8 x 10(9) M-1 sec-1 for LA-2 and that of K2HPO4 quenching is 1.6 x 10(11) M-1 sec-1 for LA-1 and 1.2 x 10(11) M-1 sec-1 for LA-2. Analysis of the circular dichroic spectra yields 40% alpha-helix and 60% beta-turn for La-1 and 45% alpha-helix and 55% beta-turn for LA-2. Inhibitors LA-1 and LA-2 consist of 28 and 29 amino acid residues, respectively. They lack threonine, alanine, valine, and tryptophan. Both inhibitors strongly inhibit trypsin by forming enzyme-inhibitor complexes at a molar ratio of unity. A chemical modification study suggests the involvement of arginine of LA-1 and lysine of LA-2 in their reactive sites. The inhibitors are very similar in their amino acid sequences, and show sequence homology with other squash family inhibitors. PMID:8924202

  16. Human Retroviruses and AIDS. A compilation and analysis of nucleic acid and amino acid sequences: I--II; III--V

    Myers, G.; Korber, B. [eds.] [Los Alamos National Lab., NM (United States); Wain-Hobson, S. [ed.] [Laboratory of Molecular Retrovirology, Pasteur Inst.; Smith, R.F. [ed.] [Baylor Coll. of Medicine, Houston, TX (United States). Dept. of Pharmacology; Pavlakis, G.N. [ed.] [National Cancer Inst., Frederick, MD (United States). Cancer Research Facility

    1993-12-31

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (I) HIV and SIV Nucleotide Sequences; (II) Amino Acid Sequences; (III) Analyses; (IV) Related Sequences; and (V) Database Communications. Information within all the parts is updated at least twice in each year, which accounts for the modes of binding and pagination in the compendium.

  17. NetTurnP – Neural Network Prediction of Beta-turns by Use of Evolutionary Information and Predicted Protein Sequence Features

    Petersen, Bent; Lundegaard, Claus; Petersen, Thomas Nordahl

    2010-01-01

    acid sequence. The individual β-turn types I, I', II, II', VIII, VIa1, VIa2, VIba and IV have been predicted based on classifications by PROMOTIF, and the two-class prediction of β-turn or not is a superset comprised of all β-turn types. The performance is evaluated using a golden set of non...

  18. Geometric Feature-Based Facial Expression Recognition in Image Sequences Using Multi-Class AdaBoost and Support Vector Machines

    Joonwhoan Lee

    2013-06-01

    Full Text Available Facial expressions are widely used in the behavioral interpretation of emotions, cognitive science, and social interactions. In this paper, we present a novel method for fully automatic facial expression recognition in facial image sequences. As the facial expression evolves over time facial landmarks are automatically tracked in consecutive video frames, using displacements based on elastic bunch graph matching displacement estimation. Feature vectors from individual landmarks, as well as pairs of landmarks tracking results are extracted, and normalized, with respect to the first frame in the sequence. The prototypical expression sequence for each class of facial expression is formed, by taking the median of the landmark tracking results from the training facial expression sequences. Multi-class AdaBoost with dynamic time warping similarity distance between the feature vector of input facial expression and prototypical facial expression, is used as a weak classifier to select the subset of discriminative feature vectors. Finally, two methods for facial expression recognition are presented, either by using multi-class AdaBoost with dynamic time warping, or by using support vector machine on the boosted feature vectors. The results on the Cohn-Kanade (CK+ facial expression database show a recognition accuracy of 95.17% and 97.35% using multi-class AdaBoost and support vector machines, respectively.

  19. Self-organizing maps: A tool to ascertain taxonomic relatedness based on features derived from 16S rDNA sequence

    D V Raje; H J Purohit; Y P Badhe; S S Tambe; B D Kulkarni

    2010-12-01

    Exploitation of microbial wealth, of which almost 95% or more is still unexplored, is a growing need. The taxonomic placements of a new isolate based on phenotypic characteristics are now being supported by information preserved in the 16S rRNA gene. However, the analysis of 16S rDNA sequences retrieved from metagenome, by the available bioinformatics tools, is subject to limitations. In this study, the occurrences of nucleotide features in 16S rDNA sequences have been used to ascertain the taxonomic placement of organisms. The tetra- and penta-nucleotide features were extracted from the training data set of the 16S rDNA sequence, and was subjected to an artificial neural network (ANN) based tool known as self-organizing map (SOM), which helped in visualization of unsupervised classification. For selection of significant features, principal component analysis (PCA) or curvilinear component analysis (CCA) was applied. The SOM along with these techniques could discriminate the sample sequences with more than 90% accuracy, highlighting the relevance of features. To ascertain the confidence level in the developed classification approach, the test data set was specifically evaluated for Thiobacillus, with Acidiphilium, Paracocus and Starkeya, which are taxonomically reassigned. The evaluation proved the excellent generalization capability of the developed tool. The topology of genera in SOM supported the conventional chemo-biochemical classification reported in the Bergey manual.

  20. Identification and Characterization of Second-Generation Invader Locked Nucleic Acids (LNAs) for Mixed-Sequence Recognition of Double-Stranded DNA

    Sau, Sujay P; Madsen, Andreas S; Podbevsek, Peter;

    2013-01-01

    The development of synthetic agents that recognize double-stranded DNA (dsDNA) is a long-standing goal that is inspired by the promise for tools that detect, regulate, and modify genes. Progress has been made with triplex-forming oligonucleotides, peptide nucleic acids, and polyamides, but...... substantial efforts are currently devoted to the development of alternative strategies that overcome the limitations observed with the classic approaches. In 2005, we introduced Invader locked nucleic acids (LNAs), i.e., double-stranded probes that are activated for mixed-sequence recognition of dsDNA through...... monomers. We compare the thermal denaturation characteristics of double-stranded probes featuring different interstrand zippers of pyrene-functionalized monomers based on 2'-amino-α-l-LNA, 2'-N-methyl-2'-amino-DNA, and RNA scaffolds. Insights from fluorescence spectroscopy, molecular modeling, and NMR...

  1. Lactic acid production from potato peel waste by anaerobic sequencing batch fermentation using undefined mixed culture.

    Liang, Shaobo; McDonald, Armando G; Coats, Erik R

    2015-11-01

    Lactic acid (LA) is a necessary industrial feedstock for producing the bioplastic, polylactic acid (PLA), which is currently produced by pure culture fermentation of food carbohydrates. This work presents an alternative to produce LA from potato peel waste (PPW) by anaerobic fermentation in a sequencing batch reactor (SBR) inoculated with undefined mixed culture from a municipal wastewater treatment plant. A statistical design of experiments approach was employed using set of 0.8L SBRs using gelatinized PPW at a solids content range from 30 to 50 g L(-1), solids retention time of 2-4 days for yield and productivity optimization. The maximum LA production yield of 0.25 g g(-1) PPW and highest productivity of 125 mg g(-1) d(-1) were achieved. A scale-up SBR trial using neat gelatinized PPW (at 80 g L(-1) solids content) at the 3 L scale was employed and the highest LA yield of 0.14 g g(-1) PPW and a productivity of 138 mg g(-1) d(-1) were achieved with a 1 d SRT. PMID:25708409

  2. Cloning and sequence analysis of putative type II fatty acid synthase genes from Arachis hypogaea L.

    Meng-Jun Li; Ai-Qin Li; Han Xia; Chuan-Zhi Zhao; Chang-Sheng Li; Shu-Bo Wan; Yu-Ping Bi; Xing-Jun Wang

    2009-06-01

    The cultivated peanut is a valuable source of dietary oil and ranks fifth among the world oil crops. Plant fatty acid biosynthesis is catalysed by type II fatty acid synthase (FAS) in plastids and mitochondria. By constructing a full-length cDNA library derived from immature peanut seeds and homology-based cloning, candidate genes of acyl carrier protein (ACP), malonyl-CoA:ACP transacylase, -ketoacyl-ACP synthase (I, II, III), -ketoacyl-ACP reductase, -hydroxyacyl-ACP dehydrase and enoyl-ACP reductase were isolated. Sequence alignments revealed that primary structures of type II FAS enzymes were highly conserved in higher plants and the catalytic residues were strictly conserved in Escherichia coli and higher plants. Homologue numbers of each type II FAS gene expressing in developing peanut seeds varied from 1 in KASII, KASIII and HD to 5 in ENR. The number of single-nucleotide polymorphisms (SNPs) was quite different in each gene. Peanut type II FAS genes were predicted to target plastids except ACP2 and ACP3. The results suggested that peanut may contain two type II FAS systems in plastids and mitochondria. The type II FAS enzymes in higher plants may have similar functions as those in E. coli.

  3. Site-directed gene mutation at mixed sequence targets by psoralen-conjugated pseudo-complementary peptide nucleic acids

    Kim, Ki-Hyun; Nielsen, Peter E.; Glazer, Peter M.

    2007-01-01

    Sequence-specific DNA-binding molecules such as triple helix-forming oligonucleotides (TFOs) provide a means for inducing site-specific mutagenesis and recombination at chromosomal sites in mammalian cells. However, the utility of TFOs is limited by the requirement for homopurine stretches in the target duplex DNA. Here, we report the use of pseudo-complementary peptide nucleic acids (pcPNAs) for intracellular gene targeting at mixed sequence sites. Due to steric hindrance, pcPNAs are unable ...

  4. Nucleotide and amino acid sequence coding for polypeptides of foot-and-mouth disease virus type A12.

    Robertson, B H; Grubman, M J; Weddell, G N; Moore, D.M.; Welsh, J D; Fischer, T.; Dowbenko, D J; Yansura, D G; Small, B.; Kleid, D G

    1985-01-01

    The coding region for the structural and nonstructural polypeptides of the type A12 foot-and-mouth disease virus genome has been identified by nucleotide sequencing of cloned DNA derived from the viral RNA. In addition, 704 nucleotides in the 5' untranslated region between the polycytidylic acid tract and the probable initiation codon of the first translated gene, P16-L, have been sequenced. This region has several potential initiation codons, one of which appears to be a low-frequency altern...

  5. Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides

    McMillen, Chelsea L.; Wright, Patience M.; Cassady, Carolyn J.

    2016-05-01

    Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.

  6. Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides.

    McMillen, Chelsea L; Wright, Patience M; Cassady, Carolyn J

    2016-05-01

    Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides. Graphical Abstract ᅟ. PMID:26864792

  7. Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides

    McMillen, Chelsea L.; Wright, Patience M.; Cassady, Carolyn J.

    2016-02-01

    Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.

  8. Homology analyses of the protein sequences of fatty acid synthases from chicken liver, rat mammary gland, and yeast

    Homology analyses of the protein sequences of chicken liver and rat mammary gland fatty acid synthases were carried out. The amino acid sequences of the chicken and rat enzymes are 67% identical. If conservative substitutions are allowed, 78% of the amino acids are matched. A region of low homologies exists between the functional domains, in particular around amino acid residues 1059-1264 of the chicken enzyme. Homologies between the active sites of chicken and rat and of chicken and yeast enzymes have been analyzed by an alignment method. A high degree of homology exists between the active sites of the chicken and rat enzymes. However, the chicken and yeast enzymes show a lower degree of homology. The DADPH-binding dinucleotide folds of the β-ketoacyl reductase and the enoyl reductase sites were identified by comparison with a known consensus sequence for the DADP- and FAD-binding dinucleotide folds. The active sites of all of the enzymes are primarily in hydrophobic regions of the protein. This study suggests that the genes for the functional domains of fatty acid synthase were originally separated, and these genes were connected to each other by using different connecting nucleotide sequences in different species. An alternative explanation for the differences in rat and chicken is a common ancestry and mutations in the joining regions during evolution

  9. Features of separation on polymeric reversed phase for two classes of higher saturated fatty acids esters

    Deineka, V. I.; Lapshova, M. S.; Zakharenko, E. V.; Deineka, L. A.

    2013-11-01

    The principles of sorption on polymeric reversed phase (PRP) YMS C30 for members of the two classes of esters formed by higher saturated fatty acids, i.e., lutein diesters ( I) and triacylglycerols ( II), are investigated. It is shown that the logarithm of the retention factor increases nonlinearly with an increase of the length of the acid radical, although the retention on PRP is higher in the case of I and lower in the case of II, compared to their retention on traditional monomeric reversed phase (MRP) Kromasil-100 5C18; however, the equivalence of the contributions to the retention of I that correspond to an identical change in acids, does not depend on the length of the hydrocarbon radical of the second acid. It is noted that the Van't Hoff plot for PRP contains a curve break, indicating a change in the retention mechanism upon a rise in temperature.

  10. PSNO: Predicting Cysteine S-Nitrosylation Sites by Incorporating Various Sequence-Derived Features into the General Form of Chou’s PseAAC

    Jian Zhang

    2014-06-01

    Full Text Available S-nitrosylation (SNO is one of the most universal reversible post-translational modifications involved in many biological processes. Malfunction or dysregulation of SNO leads to a series of severe diseases, such as developmental abnormalities and various diseases. Therefore, the identification of SNO sites (SNOs provides insights into disease progression and drug development. In this paper, a new bioinformatics tool, named PSNO, is proposed to identify SNOs from protein sequences. Firstly, we explore various promising sequence-derived discriminative features, including the evolutionary profile, the predicted secondary structure and the physicochemical properties. Secondly, rather than simply combining the features, which may bring about information redundancy and unwanted noise, we use the relative entropy selection and incremental feature selection approach to select the optimal feature subsets. Thirdly, we train our model by the technique of the k-nearest neighbor algorithm. Using both informative features and an elaborate feature selection scheme, our method, PSNO, achieves good prediction performance with a mean Mathews correlation coefficient (MCC value of about 0.5119 on the training dataset using 10-fold cross-validation. These results indicate that PSNO can be used as a competitive predictor among the state-of-the-art SNOs prediction tools. A web-server, named PSNO, which implements the proposed method, is freely available at http://59.73.198.144:8088/PSNO/.

  11. Hydroxycinnamic acid bound arabinoxylans from millet brans-structural features and antioxidant activity.

    Bijalwan, Vandana; Ali, Usman; Kesarwani, Atul Kumar; Yadav, Kamalendra; Mazumder, Koushik

    2016-07-01

    Hydroxycinnamic acid bound arabinoxylans (HCA-AXs) were extracted from brans of five Indian millet varieties and response surface methodology was used to optimize the extraction conditions. The optimal condition to obtain highest yield of millet HCA-AXs was determined as follows: time 61min, temperature 66°C, ratio of solvent to sample 12ml/g. Linkage analysis indicated that hydroxycinnamic acid bound arabinoxylan from kodo millet (KM-HCA-AX) contained comparatively low branched arabinoxylan consisting of 14.6% mono-substituted, 1.2% di-substituted and 41.2% un-substituted Xylp residues. The HPLC analysis of millet HCA-AXs showed significant variation in the content of three major bound hydroxycinnamic acids (caffeic, p-coumaric and ferulic acid). The antioxidant activity of millet HCA-AXs were evaluated using three in vitro assay methods (DPPH, FRAP and β-carotene linoleate emulsion assays) which suggested both phenolic acid composition and structural characteristics of arabinoxylans could be correlated to their antioxidant potential, the detailed structural analysis revealed that low substituted KM-HCA-AX exhibited relatively higher antioxidant activity compared to other medium and highly substituted HCA-AXs from finger (FM), proso (PM), barnyard (BM) and foxtail (FOXM) millet. PMID:27050114

  12. The developmental transcriptome landscape of bovine skeletal muscle defined by Ribo-Zero ribonucleic acid sequencing.

    Sun, X; Li, M; Sun, Y; Cai, H; Li, R; Wei, X; Lan, X; Huang, Y; Lei, C; Chen, H

    2015-12-01

    Ribonucleic acid sequencing (RNA-Seq) libraries are normally prepared with oligo(dT) selection of poly(A)+ mRNA, but it depends on intact total RNA samples. Recent studies have described Ribo-Zero technology, a novel method that can capture both poly(A)+ and poly(A)- transcripts from intact or fragmented RNA samples. We report here the first application of Ribo-Zero RNA-Seq for the analysis of the bovine embryonic, neonatal, and adult skeletal muscle whole transcriptome at an unprecedented depth. Overall, 19,893 genes were found to be expressed, with a high correlation of expression levels between the calf and the adult. Hundreds of genes were found to be highly expressed in the embryo and decreased at least 10-fold after birth, indicating their potential roles in embryonic muscle development. In addition, we present for the first time the analysis of global transcript isoform discovery in bovine skeletal muscle and identified 36,694 transcript isoforms. Transcriptomic data were also analyzed to unravel sequence variations; 185,036 putative SNP and 12,428 putative short insertions-deletions (InDel) were detected. Specifically, many stop-gain, stop-loss, and frameshift mutations were identified that probably change the relative protein production and sequentially affect the gene function. Notably, the numbers of stage-specific transcripts, alternative splicing events, SNP, and InDel were greater in the embryo than in the calf and the adult, suggesting that gene expression is most active in the embryo. The resulting view of the transcriptome at a single-base resolution greatly enhances the comprehensive transcript catalog and uncovers the global trends in gene expression during bovine skeletal muscle development. PMID:26641174

  13. Insights into Protein Sequence and Structure-Derived Features Mediating 3D Domain Swapping Mechanism using Support Vector Machine Based Approach

    Khader Shameer

    2010-06-01

    Full Text Available 3-dimensional domain swapping is a mechanism where two or more protein molecules form higher order oligomers by exchanging identical or similar subunits. Recently, this phenomenon has received much attention in the context of prions and neuro-degenerative diseases, due to its role in the functional regulation, formation of higher oligomers, protein misfolding, aggregation etc. While 3-dimensional domain swap mechanism can be detected from three-dimensional structures, it remains a formidable challenge to derive common sequence or structural patterns from proteins involved in swapping. We have developed a SVM-based classifier to predict domain swapping events using a set of features derived from sequence and structural data. The SVM classifier was trained on features derived from 150 proteins reported to be involved in 3D domain swapping and 150 proteins not known to be involved in swapped conformation or related to proteins involved in swapping phenomenon. The testing was performed using 63 proteins from the positive dataset and 63 proteins from the negative dataset. We obtained 76.33% accuracy from training and 73.81% accuracy from testing. Due to high diversity in the sequence, structure and functions of proteins involved in domain swapping, availability of such an algorithm to predict swapping events from sequence and structure-derived features will be an initial step towards identification of more putative proteins that may be involved in swapping or proteins involved in deposition disease. Further, the top features emerging in our feature selection method may be analysed further to understand their roles in the mechanism of domain swapping.

  14. Genome Sequence of a Candidate World Health Organization Reference Strain of Zika Virus for Nucleic Acid Testing.

    Trösemeier, Jan-Hendrik; Musso, Didier; Blümel, Johannes; Thézé, Julien; Pybus, Oliver G; Baylis, Sally A

    2016-01-01

    We report here the sequence of a candidate reference strain of Zika virus (ZIKV) developed on behalf of the World Health Organization (WHO). The ZIKV reference strain is intended for use in nucleic acid amplification (NAT)-based assays for the detection and quantification of ZIKV RNA. PMID:27587826

  15. Draft Genome Sequence of Lactobacillus delbrueckii subsp. bulgaricus CFL1, a Lactic Acid Bacterium Isolated from French Handcrafted Fermented Milk

    Meneghel, Julie; Irlinger, Françoise; Loux, Valentin; Vidal, Marie; Passot, Stéphanie; Béal, Catherine; Layec, Séverine

    2016-01-01

    Lactobacillus delbrueckii subsp. bulgaricus (L. bulgaricus) is a lactic acid bacterium widely used for the production of yogurt and cheeses. Here, we report the genome sequence of L. bulgaricus CFL1 to improve our knowledge on its stress-induced damages following production and end-use processes. PMID:26941141

  16. N-terminal amino acid sequence of Bacillus licheniformis alpha-amylase: comparison with Bacillus amyloliquefaciens and Bacillus subtilis Enzymes.

    Kuhn, H; Fietzek, P P; Lampen, J. O.

    1982-01-01

    The thermostable, liquefying alpha-amylase from Bacillus licheniformis was immunologically cross-reactive with the thermolabile, liquefying alpha-amylase from Bacillus amyloliquefaciens. Their N-terminal amino acid sequences showed extensive homology with each other, but not with the saccharifying alpha-amylases of Bacillus subtilis.

  17. Complete genome sequence of Lactobacillus plantarum ZS2058, a probiotic strain with high conjugated linoleic acid production ability.

    Yang, Bo; Chen, Haiqin; Tian, Fengwei; Zhao, Jianxin; Gu, Zhennan; Zhang, Hao; Chen, Yong Q; Chen, Wei

    2015-11-20

    Lactobacillus plantarum ZS2058 was isolated from sauerkraut and identified to synthesize the beneficial metabolite conjugated linoleic acid. The genome contains a 319,7363-bp chromosome and three plasmids. The sequence will facilitate identification and characterization of the genetic determinants for its putative biological benefits. PMID:26439428

  18. Genome Sequence of Corynebacterium glutamicum ATCC 14067, Which Provides Insight into Amino Acid Biosynthesis in Coryneform Bacteria

    Lv, Yangyong; Liao, Juanjun; Wu, Zhanhong; Han, Shuangyan; Lin, Ying; Zheng, Suiping

    2012-01-01

    We report the genome sequence of Corynebacterium glutamicum ATCC 14067 (once named Brevibacterium flavum), which is useful for taxonomy research and further molecular breeding in amino acid production. Preliminary comparison with those of the reported coryneform strains revealed some notable differences that might be related to the difficulties in molecular manipulation.

  19. Draft Genome Sequence of Lactobacillus delbrueckii subsp. bulgaricus CFL1, a Lactic Acid Bacterium Isolated from French Handcrafted Fermented Milk.

    Meneghel, Julie; Dugat-Bony, Eric; Irlinger, Françoise; Loux, Valentin; Vidal, Marie; Passot, Stéphanie; Béal, Catherine; Layec, Séverine; Fonseca, Fernanda

    2016-01-01

    Lactobacillus delbrueckii subsp. bulgaricus (L. bulgaricus) is a lactic acid bacterium widely used for the production of yogurt and cheeses. Here, we report the genome sequence of L. bulgaricus CFL1 to improve our knowledge on its stress-induced damages following production and end-use processes. PMID:26941141

  20. Structural features of lignite humic acid in light of NMR and thermal degradation experiments

    Peuravuori, J.; Simpson, A.J.; Lam, B.; Zbankova, P.; Pihlaja, K. [University of Turku, Turku (Finland). Dept. of Chemistry

    2007-01-29

    Structural composition of a lignite humic acid (HA) fraction was studied by means of solid-state {sup 13}C NMR, different solution-state {sup 1}H NMR pulse techniques and thermally assisted hydrolysis-methylation with TMAH and TMAAc followed-up with pyrolysis-GC-MS experiments. The results verified that certain aliphatic compounds have their special tasks in the complicated structural network of lignite HA material, and aprotic solvents with strong electron-donor powers are needed to release the tightly bound certain aliphatics from the macromolecular network for obtaining a fully dissolved HA solution. The occurrence of the relatively large content of different carboxylic acids as their free-acid forms was surprising. The structural interpretations performed by special {sup 1}H NMR pulse techniques verified the complexity of aliphatic moieties, the presence of hydroaromatic carbons, residual lignin derivatives, the abundance of aliphatic and aromatic carboxylic acids, and the ability of aliphatics to form inter-molecular bridges between aromatic building blocks.

  1. An Matching Method for Vehicle-borne Panoramic Image Sequence Based on Adaptive Structure from Motion Feature

    ZHANG Zhengpeng

    2015-10-01

    Full Text Available Panoramic image matching method with the constraint condition of local structure from motion similarity feature is an important method, the process requires multivariable kernel density estimations for the structure from motion feature used nonparametric mean shift. Proper selection of the kernel bandwidth is a critical step for convergence speed and accuracy of matching method. Variable bandwidth with adaptive structure from motion feature for panoramic image matching method has been proposed in this work. First the bandwidth matrix is defined using the locally adaptive spatial structure of the sampling point in spatial domain and optical flow domain. The relaxation diffusion process of structure from motion similarity feature is described by distance weighting method of local optical flow feature vector. Then the expression form of adaptive multivariate kernel density function is given out, and discusses the solution of the mean shift vector, termination conditions, and the seed point selection method. The final fusions of multi-scale SIFT the features and structure features to establish a unified panoramic image matching framework. The sphere panoramic images from vehicle-borne mobile measurement system are chosen such that a comparison analysis between fixed bandwidth and adaptive bandwidth is carried out in detail. The results show that adaptive bandwidth is good for case with the inlier ratio changes and the object space scale changes. The proposed method can realize the adaptive similarity measure of structure from motion feature, improves the correct matching points and matching rate, experimental results have shown our method to be robust.

  2. Features of photopolymerization of Langmiur-Blodgett thin films of acetylenic acids

    UV-induced polymerization of thin (1-4 monolayers) Langmuir-Blodgett films of acetylenic carboxylic acids with triple bonds in different positions (terminal: 23-tetracosinic acid HC≡C(CH2)21COOH, and internal: 2-docosinic CH3(CH2)18C≡CCOOH) and their lead salts is investigated. It is shown by means of IR spectroscopy that the topochemical reaction proceeds with the participation of carboxylic groups. The differences in the structure of mono- and bilayers are demonstrated. The mechanism of the topochemical reaction depends on the method of film transfer onto the substrate. It is shown by means of UV spectroscopy that short conjugated polyenes (containing 7 to 9 carbon atoms) are formed as the product of polymerization. The mechanism of the formation of these polyene chains is proposed on the basis of the experimental data and semi-empirical calculations.

  3. Features of photopolymerization of Langmiur-Blodgett thin films of acetylenic acids

    Dultsev, F.N., E-mail: fdultsev@thermo.isp.nsc.r [Institute of Semiconductor Physics SB RAS, Novosibisrk, 630090, Lavrentiev Ave., 13 (Russian Federation); Badmaeva, I.A. [Institute of Semiconductor Physics SB RAS, Novosibisrk, 630090, Lavrentiev Ave., 13 (Russian Federation)

    2009-11-02

    UV-induced polymerization of thin (1-4 monolayers) Langmuir-Blodgett films of acetylenic carboxylic acids with triple bonds in different positions (terminal: 23-tetracosinic acid HC{identical_to}C(CH2)21COOH, and internal: 2-docosinic CH3(CH2)18C{identical_to}CCOOH) and their lead salts is investigated. It is shown by means of IR spectroscopy that the topochemical reaction proceeds with the participation of carboxylic groups. The differences in the structure of mono- and bilayers are demonstrated. The mechanism of the topochemical reaction depends on the method of film transfer onto the substrate. It is shown by means of UV spectroscopy that short conjugated polyenes (containing 7 to 9 carbon atoms) are formed as the product of polymerization. The mechanism of the formation of these polyene chains is proposed on the basis of the experimental data and semi-empirical calculations.

  4. Pathophysiology, clinical features and radiological findings of differentiation syndrome/all-trans-retinoic acid syndrome

    Cardinale, Luciano; Asteggiano, Francesco; Moretti, Federica; Torre, Federico; Ulisciani, Stefano; Fava, Carmen; Rege-Cambrin, Giovanna

    2014-01-01

    In acute promyelocytic leukemia, differentiation therapy based on all-trans-retinoic acid can be complicated by the development of a differentiation syndrome (DS). DS is a life-threatening complication, characterized by respiratory distress, unexplained fever, weight gain, interstitial lung infiltrates, pleural or pericardial effusions, hypotension and acute renal failure. The diagnosis of DS is made on clinical grounds and has proven to be difficult, because none of the symptoms is pathognom...

  5. Geometric Feature-Based Facial Expression Recognition in Image Sequences Using Multi-Class AdaBoost and Support Vector Machines

    Joonwhoan Lee; Deepak Ghimire

    2013-01-01

    Facial expressions are widely used in the behavioral interpretation of emotions, cognitive science, and social interactions. In this paper, we present a novel method for fully automatic facial expression recognition in facial image sequences. As the facial expression evolves over time facial landmarks are automatically tracked in consecutive video frames, using displacements based on elastic bunch graph matching displacement estimation. Feature vectors from individual landmarks, as well as pa...

  6. Nucleotide and amino acid sequences of a coat protein of an Ukrainian isolate of Potato virus Y: comparison with homologous sequences of other isolates and phylogenetic analysis

    Budzanivska I. G.

    2014-03-01

    Full Text Available Aim. Identification of the widespread Ukrainian isolate(s of PVY (Potato virus Y in different potato cultivars and subsequent phylogenetic analysis of detected PVY isolates based on NA and AA sequences of coat protein. Methods. ELISA, RT-PCR, DNA sequencing and phylogenetic analysis. Results. PVY has been identified serologically in potato cultivars of Ukrainian selection. In this work we have optimized a method for total RNA extraction from potato samples and offered a sensitive and specific PCR-based test system of own design for diagnostics of the Ukrainian PVY isolates. Part of the CP gene of the Ukrainian PVY isolate has been sequenced and analyzed phylogenetically. It is demonstrated that the Ukrainian isolate of Potato virus Y (CP gene has a higher percentage of homology with the recombinant isolates (strains of this pathogen (approx. 98.8– 99.8 % of homology for both nucleotide and translated amino acid sequences of the CP gene. The Ukrainian isolate of PVY is positioned in the separate cluster together with the isolates found in Syria, Japan and Iran; these isolates possibly have common origin. The Ukrainian PVY isolate is confirmed to be recombinant. Conclusions. This work underlines the need and provides the means for accurate monitoring of Potato virus Y in the agroecosystems of Ukraine. Most importantly, the phylogenetic analysis demonstrated the recombinant nature of this PVY isolate which has been attributed to the strain group O, subclade N:O.

  7. Evolution of vertebrate IgM: complete amino acid sequence of the constant region of Ambystoma mexicanum mu chain deduced from cDNA sequence.

    Fellah, J S; Wiles, M V; Charlemagne, J; Schwager, J

    1992-10-01

    cDNA clones coding for the constant region of the Mexican axolotl (Ambystoma mexicanum) mu heavy immunoglobulin chain were selected from total spleen RNA, using a cDNA polymerase chain reaction technique. The specific 5'-end primer was an oligonucleotide homologous to the JH segment of Xenopus laevis mu chain. One of the clones, JHA/3, corresponded to the complete constant region of the axolotl mu chain, consisting of a 1362-nucleotide sequence coding for a polypeptide of 454 amino acids followed in 3' direction by a 179-nucleotide untranslated region and a polyA+ tail. The axolotl C mu is divided into four typical domains (C mu 1-C mu 4) and can be aligned with the Xenopus C mu with an overall identity of 56% at the nucleotide level. Percent identities were particularly high between C mu 1 (59%) and C mu 4 (71%). The C-terminal 20-amino acid segment which constitutes the secretory part of the mu chain is strongly homologous to the equivalent sequences of chondrichthyans and of other tetrapods, including a conserved N-linked oligosaccharide, the penultimate cysteine and the C-terminal lysine. The four C mu domains of 13 vertebrate species ranging from chondrichthyans to mammals were aligned and compared at the amino acid level. The significant number of mu-specific residues which are conserved into each of the four C mu domains argues for a continuous line of evolution of the vertebrate mu chain. This notion was confirmed by the ability to reconstitute a consistent vertebrate evolution tree based on the phylogenic parsimony analysis of the C mu 4 sequences. PMID:1382992

  8. Complete amino acid sequence of human plasma Zn-α2-glycoprotein and its homology to histocompatibility antigens

    In the present study the complete amino acid sequence of human plasma Zn-α2-glycoprotein was determined. This protein whose biological function is unknown consists of a single polypeptide chain of 276 amino acid residues including 8 tryptophan residues and has a pyroglutamyl residue at the amino terminus. The location of the two disulfide bonds in the polypeptide chain was also established. The three glycans, whose structure was elucidated with the aid of 500 MHz 1H NMR spectroscopy, were sialylated N-biantennas. The molecular weight calculated from the polypeptide and carbohydrate structure is 38,478, which is close to the reported value of ≅ 41,000 based on physicochemical measurements. The predicted secondary structure appeared to comprised of 23% α-helix, 27% β-sheet, and 22% β-turns. The three N-glycans were found to be located in β-turn regions. An unexpected finding was made by computer analysis of the sequence data; this revealed that Zn-α2-glycoprotein is closely related to antigens of the major histocompatibility complex in amino acid sequence and in domain structure. There was an unusually high degree of sequence homology with the α chains of class I histocompatibility antigens. Moreover, this plasma protein was shown to be a member of the immunoglobulin gene superfamily. Zn-α2-glycoprotein appears to be truncated secretory major histocompatibility complex-related molecule, and it may have a role in the expression of the immune response

  9. Investigation of Antifouling Properties of Surfaces Featuring Zwitterionic α-Aminophosphonic Acid Moieties.

    Wagner, Natalie; Zimmermann, Phyllis; Heisig, Peter; Klitsche, Franziska; Maison, Wolfgang; Theato, Patrick

    2015-12-01

    Zwitterionic thin films containing α-amino phosphonic acid moieties were successfully introduced on silicon surfaces and their antifouling properties were investigated. Initially, the substrates were modified with a hybrid polymer, composed of poly(methylsilsesquioxane) (PMSSQ) and poly(4-vinyl benzaldehyde) (PStCHO). Next, a Kabachnik-Fields post-polymerization modification (sur-KF-PMR) of the functionalized aldehyde surfaces was conducted with different amines and dialkyl phosphonates. After subsequent deprotection reaction of dialkyl phosphonates, the obtained zwitterionic surfaces were characterized by various techniques and we found excellent antifouling properties of the resulting films. PMID:26332285

  10. Human tyrosyl-tRNA synthetase shares amino acid sequence homology with a putative cytokine.

    Kleeman, T A; Wei, D; Simpson, K L; First, E A

    1997-05-30

    To test the hypothesis that tRNATyr recognition differs between bacterial and human tyrosyl-tRNA synthetases, we sequenced several clones identified as human tyrosyl-tRNA synthetase cDNAs by the Human Genome Project. We found that human tyrosyl-tRNA synthetase is composed of three domains: 1) an amino-terminal Rossmann fold domain that is responsible for formation of the activated E.Tyr-AMP intermediate and is conserved among bacteria, archeae, and eukaryotes; 2) a tRNA anticodon recognition domain that has not been conserved between bacteria and eukaryotes; and 3) a carboxyl-terminal domain that is unique to the human tyrosyl-tRNA synthetase and whose primary structure is 49% identical to the putative human cytokine endothelial monocyte-activating protein II, 50% identical to the carboxyl-terminal domain of methionyl-tRNA synthetase from Caenorhabditis elegans, and 43% identical to the carboxyl-terminal domain of Arc1p from Saccharomyces cerevisiae. The first two domains of the human tyrosyl-tRNA synthetase are 52, 36, and 16% identical to tyrosyl-tRNA synthetases from S. cerevisiae, Methanococcus jannaschii, and Bacillus stearothermophilus, respectively. Nine of fifteen amino acids known to be involved in the formation of the tyrosyl-adenylate complex in B. stearothermophilus are conserved across all of the organisms, whereas amino acids involved in the recognition of tRNATyr are not conserved. Kinetic analyses of recombinant human and B. stearothermophilus tyrosyl-tRNA synthetases expressed in Escherichia coli indicate that human tyrosyl-tRNA synthetase aminoacylates human but not B. stearothermophilus tRNATyr, and vice versa, supporting the original hypothesis. It is proposed that like endothelial monocyte-activating protein II and the carboxyl-terminal domain of Arc1p, the carboxyl-terminal domain of human tyrosyl-tRNA synthetase evolved from gene duplication of the carboxyl-terminal domain of methionyl-tRNA synthetase and may direct tRNA to the active site of