WorldWideScience

Sample records for bioinformatics analysis identifies

  1. Bioinformatics analysis identifies several intrinsically disordered human E3 ubiquitin-protein ligases

    DEFF Research Database (Denmark)

    Boomsma, Wouter Krogh; Nielsen, Sofie Vincents; Lindorff-Larsen, Kresten;

    2016-01-01

    conduct a bioinformatics analysis to examine >600 human and S. cerevisiae E3 ligases to identify enzymes that are similar to San1 in terms of function and/or mechanism of substrate recognition. An initial sequence-based database search was found to detect candidates primarily based on the homology...

  2. Analysis of regulatory protease sequences identified through bioinformatic data mining of the Schistosoma mansoni genome

    Directory of Open Access Journals (Sweden)

    Minchella Dennis J

    2009-10-01

    Full Text Available Abstract Background New chemotherapeutic agents against Schistosoma mansoni, an etiological agent of human schistosomiasis, are a priority due to the emerging drug resistance and the inability of current drug treatments to prevent reinfection. Proteases have been under scrutiny as targets of immunological or chemotherapeutic anti-Schistosoma agents because of their vital role in many stages of the parasitic life cycle. Function has been established for only a handful of identified S. mansoni proteases, and the vast majority of these are the digestive proteases; very few of the conserved classes of regulatory proteases have been identified from Schistosoma species, despite their vital role in numerous cellular processes. To that end, we identified protease protein coding genes from the S. mansoni genome project and EST library. Results We identified 255 protease sequences from five catalytic classes using predicted proteins of the S. mansoni genome. The vast majority of these show significant similarity to proteins in KEGG and the Conserved Domain Database. Proteases include calpains, caspases, cytosolic and mitochondrial signal peptidases, proteases that interact with ubiquitin and ubiquitin-like molecules, and proteases that perform regulated intramembrane proteolysis. Comparative analysis of classes of important regulatory proteases find conserved active site domains, and where appropriate, signal peptides and transmembrane helices. Phylogenetic analysis provides support for inferring functional divergence among regulatory aspartic, cysteine, and serine proteases. Conclusion Numerous proteases are identified for the first time in S. mansoni. We characterized important regulatory proteases and focus analysis on these proteases to complement the growing knowledge base of digestive proteases. This work provides a foundation for expanding knowledge of proteases in Schistosoma species and examining their diverse function and potential as targets

  3. Transcriptome bioinformatic analysis identifies potential therapeutic mechanism of pentylenetetrazole in down syndrome

    Directory of Open Access Journals (Sweden)

    Sharma Abhay

    2010-10-01

    Full Text Available Abstract Background Pentylenetetrazole (PTZ has recently been found to ameliorate cognitive impairment in rodent models of Down syndrome (DS. The mechanism underlying PTZ's therapeutic effect in DS is however not clear. Microarray profiling has previously reported differential expression, both up- and down-regulation, of genes in DS. Given this, transcriptomic data related to PTZ treatment, if available, could be used to understand the drug's therapeutic mechanism in DS. No such mammalian data however exists. Nevertheless, a Drosophila model inspired by PTZ induced kindling plasticity in rodents has recently been described. Microarray profiling has shown PTZ's downregulatory effect on gene expression in the fly heads. Methods In a comparative transcriptomics approach, I have analyzed the available microarray data in order to identify potential therapeutic mechanism of PTZ in DS. In the analysis, summary data of up- and down-regulated genes reported in human DS studies and of down-regulated genes reported in the Drosophila model has been used. Results I find that transcriptomic correlate of chronic PTZ in Drosophila counteracts that of DS. Genes downregulated by PTZ significantly over-represent genes upregulated in DS and under-represent genes downregulated in DS. Further, the genes which are common in the downregulated and upregulated DS set show enrichment for MAP kinase pathway. Conclusion My analysis suggests that downregulation of MAP kinase pathway may mediate therapeutic effect of PTZ in DS. Existing evidence implicating MAP kinase pathway in DS supports this observation.

  4. Hypothetical granulin-like molecule from Fasciola hepatica identified by bioinformatics analysis.

    Science.gov (United States)

    Machicado, Claudia; Marcos, Luis A; Zimic, Mirko

    2016-01-01

    Fasciola hepatica is considered an emergent human pathogen, causing liver fibrosis or cirrhosis, conditions that are known to be direct causes of cancer. Some parasites have been categorized by WHO as carcinogenic agents such as Opisthorchis viverrini, a relative of F. hepatica. Although these two parasites are from the same class (Trematoda), the role of F. hepatica in carcinogenesis is unclear. We hypothesized that F. hepatica might share some features with O. viverrini and to be responsible to induce proliferation of host cells. We analyzed the recently released genome of F. hepatica looking for a gene coding a granulin-like growth factor, a protein secreted by O. viverrini (Ov-GRN-1), which is a potent stimulator of proliferation of host cells. Using computational biology tools, we identified a granulin-like molecule in F. hepatica, here termed FhGLM, which has high sequence identity level to Ov-GRN-1 and human progranulin. We found evidence of an upstream promoter compatible with the expression of FhGLM. The FhGLM architecture showed to have five granulin domains, one of them, the domain 3, was homologue to Ov-GRN-1 and human GRNC. The structure of the FhGLM granulin domain 3 resulted to have the overall folding of its homologue the human GRNC. Our findings show the presence of a homologue of a potent modulator of cell growth in F. hepatica that might have, as other granulins, a proliferative action on host cells during fascioliasis. Future experimental assays to demonstrate the presence of FhGLM in F. hepatica are needed to confirm our hypothesis.

  5. Somatic mutation profiles of MSI and MSS colorectal cancer identified by whole exome next generation sequencing and bioinformatics analysis.

    Directory of Open Access Journals (Sweden)

    Bernd Timmermann

    Full Text Available BACKGROUND: Colorectal cancer (CRC is with approximately 1 million cases the third most common cancer worldwide. Extensive research is ongoing to decipher the underlying genetic patterns with the hope to improve early cancer diagnosis and treatment. In this direction, the recent progress in next generation sequencing technologies has revolutionized the field of cancer genomics. However, one caveat of these studies remains the large amount of genetic variations identified and their interpretation. METHODOLOGY/PRINCIPAL FINDINGS: Here we present the first work on whole exome NGS of primary colon cancers. We performed 454 whole exome pyrosequencing of tumor as well as adjacent not affected normal colonic tissue from microsatellite stable (MSS and microsatellite instable (MSI colon cancer patients and identified more than 50,000 small nucleotide variations for each tissue. According to predictions based on MSS and MSI pathomechanisms we identified eight times more somatic non-synonymous variations in MSI cancers than in MSS and we were able to reproduce the result in four additional CRCs. Our bioinformatics filtering approach narrowed down the rate of most significant mutations to 359 for MSI and 45 for MSS CRCs with predicted altered protein functions. In both CRCs, MSI and MSS, we found somatic mutations in the intracellular kinase domain of bone morphogenetic protein receptor 1A, BMPR1A, a gene where so far germline mutations are associated with juvenile polyposis syndrome, and show that the mutations functionally impair the protein function. CONCLUSIONS/SIGNIFICANCE: We conclude that with deep sequencing of tumor exomes one may be able to predict the microsatellite status of CRC and in addition identify potentially clinically relevant mutations.

  6. Bioinformatics methods for identifying candidate disease genes

    Directory of Open Access Journals (Sweden)

    van Driel Marc A

    2006-06-01

    Full Text Available Abstract With the explosion in genomic and functional genomics information, methods for disease gene identification are rapidly evolving. Databases are now essential to the process of selecting candidate disease genes. Combining positional information with disease characteristics and functional information is the usual strategy by which candidate disease genes are selected. Enrichment for candidate disease genes, however, depends on the skills of the operating researcher. Over the past few years, a number of bioinformatics methods that enrich for the most likely candidate disease genes have been developed. Such in silico prioritisation methods may further improve by completion of datasets, by development of standardised ontologies across databases and species and, ultimately, by the integration of different strategies.

  7. Bioinformatics approaches for identifying new therapeutic bioactive peptides in food

    Directory of Open Access Journals (Sweden)

    Nora Khaldi

    2012-10-01

    Full Text Available ABSTRACT:The traditional methods for mining foods for bioactive peptides are tedious and long. Similar to the drug industry, the length of time to identify and deliver a commercial health ingredient that reduces disease symptoms can take anything between 5 to 10 years. Reducing this time and effort is crucial in order to create new commercially viable products with clear and important health benefits. In the past few years, bioinformatics, the science that brings together fast computational biology, and efficient genome mining, is appearing as the long awaited solution to this problem. By quickly mining food genomes for characteristics of certain food therapeutic ingredients, researchers can potentially find new ones in a matter of a few weeks. Yet, surprisingly, very little success has been achieved so far using bioinformatics in mining for food bioactives.The absence of food specific bioinformatic mining tools, the slow integration of both experimental mining and bioinformatics, and the important difference between different experimental platforms are some of the reasons for the slow progress of bioinformatics in the field of functional food and more specifically in bioactive peptide discovery.In this paper I discuss some methods that could be easily translated, using a rational peptide bioinformatics design, to food bioactive peptide mining. I highlight the need for an integrated food peptide database. I also discuss how to better integrate experimental work with bioinformatics in order to improve the mining of food for bioactive peptides, therefore achieving a higher success rates.

  8. Bioinformatic analysis of neurotropic HIV envelope sequences identifies polymorphisms in the gp120 bridging sheet that increase macrophage-tropism through enhanced interactions with CCR5.

    Science.gov (United States)

    Mefford, Megan E; Kunstman, Kevin; Wolinsky, Steven M; Gabuzda, Dana

    2015-07-01

    Macrophages express low levels of the CD4 receptor compared to T-cells. Macrophage-tropic HIV strains replicating in brain of untreated patients with HIV-associated dementia (HAD) express Envs that are adapted to overcome this restriction through mechanisms that are poorly understood. Here, bioinformatic analysis of env sequence datasets together with functional studies identified polymorphisms in the β3 strand of the HIV gp120 bridging sheet that increase M-tropism. D197, which results in loss of an N-glycan located near the HIV Env trimer apex, was detected in brain in some HAD patients, while position 200 was estimated to be under positive selection. D197 and T/V200 increased fusion and infection of cells expressing low CD4 by enhancing gp120 binding to CCR5. These results identify polymorphisms in the HIV gp120 bridging sheet that overcome the restriction to macrophage infection imposed by low CD4 through enhanced gp120-CCR5 interactions, thereby promoting infection of brain and other macrophage-rich tissues.

  9. Bioinformatic analysis of neurotropic HIV envelope sequences identifies polymorphisms in the gp120 bridging sheet that increase macrophage-tropism through enhanced interactions with CCR5

    Energy Technology Data Exchange (ETDEWEB)

    Mefford, Megan E., E-mail: megan_mefford@hms.harvard.edu [Department of Cancer Immunology and AIDS, Dana-Farber Cancer Institute, Boston, MA (United States); Kunstman, Kevin, E-mail: kunstman@northwestern.edu [Northwestern University Medical School, Chicago, IL (United States); Wolinsky, Steven M., E-mail: s-wolinsky@northwestern.edu [Northwestern University Medical School, Chicago, IL (United States); Gabuzda, Dana, E-mail: dana_gabuzda@dfci.harvard.edu [Department of Cancer Immunology and AIDS, Dana-Farber Cancer Institute, Boston, MA (United States); Department of Neurology (Microbiology and Immunobiology), Harvard Medical School, Boston, MA (United States)

    2015-07-15

    Macrophages express low levels of the CD4 receptor compared to T-cells. Macrophage-tropic HIV strains replicating in brain of untreated patients with HIV-associated dementia (HAD) express Envs that are adapted to overcome this restriction through mechanisms that are poorly understood. Here, bioinformatic analysis of env sequence datasets together with functional studies identified polymorphisms in the β3 strand of the HIV gp120 bridging sheet that increase M-tropism. D197, which results in loss of an N-glycan located near the HIV Env trimer apex, was detected in brain in some HAD patients, while position 200 was estimated to be under positive selection. D197 and T/V200 increased fusion and infection of cells expressing low CD4 by enhancing gp120 binding to CCR5. These results identify polymorphisms in the HIV gp120 bridging sheet that overcome the restriction to macrophage infection imposed by low CD4 through enhanced gp120–CCR5 interactions, thereby promoting infection of brain and other macrophage-rich tissues. - Highlights: • We analyze HIV Env sequences and identify amino acids in beta 3 of the gp120 bridging sheet that enhance macrophage tropism. • These amino acids at positions 197 and 200 are present in brain of some patients with HIV-associated dementia. • D197 results in loss of a glycan near the HIV Env trimer apex, which may increase exposure of V3. • These variants may promote infection of macrophages in the brain by enhancing gp120–CCR5 interactions.

  10. Coronavirus Genomics and Bioinformatics Analysis

    Directory of Open Access Journals (Sweden)

    Kwok-Yung Yuen

    2010-08-01

    Full Text Available The drastic increase in the number of coronaviruses discovered and coronavirus genomes being sequenced have given us an unprecedented opportunity to perform genomics and bioinformatics analysis on this family of viruses. Coronaviruses possess the largest genomes (26.4 to 31.7 kb among all known RNA viruses, with G + C contents varying from 32% to 43%. Variable numbers of small ORFs are present between the various conserved genes (ORF1ab, spike, envelope, membrane and nucleocapsid and downstream to nucleocapsid gene in different coronavirus lineages. Phylogenetically, three genera, Alphacoronavirus, Betacoronavirus and Gammacoronavirus, with Betacoronavirus consisting of subgroups A, B, C and D, exist. A fourth genus, Deltacoronavirus, which includes bulbul coronavirus HKU11, thrush coronavirus HKU12 and munia coronavirus HKU13, is emerging. Molecular clock analysis using various gene loci revealed that the time of most recent common ancestor of human/civet SARS related coronavirus to be 1999-2002, with estimated substitution rate of 4´10-4 to 2´10-2 substitutions per site per year. Recombination in coronaviruses was most notable between different strains of murine hepatitis virus (MHV, between different strains of infectious bronchitis virus, between MHV and bovine coronavirus, between feline coronavirus (FCoV type I and canine coronavirus generating FCoV type II, and between the three genotypes of human coronavirus HKU1 (HCoV-HKU1. Codon usage bias in coronaviruses were observed, with HCoV-HKU1 showing the most extreme bias, and cytosine deamination and selection of CpG suppressed clones are the two major independent biological forces that shape such codon usage bias in coronaviruses.

  11. Biopipe: a flexible framework for protocol-based bioinformatics analysis.

    Science.gov (United States)

    Hoon, Shawn; Ratnapu, Kiran Kumar; Chia, Jer-Ming; Kumarasamy, Balamurugan; Juguang, Xiao; Clamp, Michele; Stabenau, Arne; Potter, Simon; Clarke, Laura; Stupka, Elia

    2003-08-01

    We identify several challenges facing bioinformatics analysis today. Firstly, to fulfill the promise of comparative studies, bioinformatics analysis will need to accommodate different sources of data residing in a federation of databases that, in turn, come in different formats and modes of accessibility. Secondly, the tsunami of data to be handled will require robust systems that enable bioinformatics analysis to be carried out in a parallel fashion. Thirdly, the ever-evolving state of bioinformatics presents new algorithms and paradigms in conducting analysis. This means that any bioinformatics framework must be flexible and generic enough to accommodate such changes. In addition, we identify the need for introducing an explicit protocol-based approach to bioinformatics analysis that will lend rigorousness to the analysis. This makes it easier for experimentation and replication of results by external parties. Biopipe is designed in an effort to meet these goals. It aims to allow researchers to focus on protocol design. At the same time, it is designed to work over a compute farm and thus provides high-throughput performance. A common exchange format that encapsulates the entire protocol in terms of the analysis modules, parameters, and data versions has been developed to provide a powerful way in which to distribute and reproduce results. This will enable researchers to discuss and interpret the data better as the once implicit assumptions are now explicitly defined within the Biopipe framework.

  12. Bioinformatics

    DEFF Research Database (Denmark)

    Baldi, Pierre; Brunak, Søren

    , and medicine will be particularly affected by the new results and the increased understanding of life at the molecular level. Bioinformatics is the development and application of computer methods for analysis, interpretation, and prediction, as well as for the design of experiments. It has emerged...... as a strategic frontier between biology and computer science. Machine learning approaches (e.g. neural networks, hidden Markov models, and belief networsk) are ideally suited for areas in which there is a lot of data but little theory. The goal in machine learning is to extract useful information from a body...... of data by building good probabilistic models. The particular twist behind machine learning, however, is to automate the process as much as possible.In this book, the authors present the key machine learning approaches and apply them to the computational problems encountered in the analysis of biological...

  13. Mass spectrometry and bioinformatics analysis data

    Directory of Open Access Journals (Sweden)

    Mainak Dutta

    2015-03-01

    Full Text Available 2DE and 2D-DIGE based proteomics analysis of serum from women with endometriosis revealed several proteins to be dysregulated. A complete list of these proteins along with their mass spectrometry data and subsequent bioinformatics analysis are presented here. The data is related to “Investigation of serum proteome alterations in human endometriosis” by Dutta et al. [1].

  14. A Bioinformatics Filtering Strategy for Identifying Radiation Response Biomarker Candidates

    Science.gov (United States)

    Oh, Jung Hun; Wong, Harry P.; Wang, Xiaowei; Deasy, Joseph O.

    2012-01-01

    The number of biomarker candidates is often much larger than the number of clinical patient data points available, which motivates the use of a rational candidate variable filtering methodology. The goal of this paper is to apply such a bioinformatics filtering process to isolate a modest number (<10) of key interacting genes and their associated single nucleotide polymorphisms involved in radiation response, and to ultimately serve as a basis for using clinical datasets to identify new biomarkers. In step 1, we surveyed the literature on genetic and protein correlates to radiation response, in vivo or in vitro, across cellular, animal, and human studies. In step 2, we analyzed two publicly available microarray datasets and identified genes in which mRNA expression changed in response to radiation. Combining results from Step 1 and Step 2, we identified 20 genes that were common to all three sources. As a final step, a curated database of protein interactions was used to generate the most statistically reliable protein interaction network among any subset of the 20 genes resulting from Steps 1 and 2, resulting in identification of a small, tightly interacting network with 7 out of 20 input genes. We further ranked the genes in terms of likely importance, based on their location within the network using a graph-based scoring function. The resulting core interacting network provides an attractive set of genes likely to be important to radiation response. PMID:22768051

  15. A bioinformatics filtering strategy for identifying radiation response biomarker candidates.

    Directory of Open Access Journals (Sweden)

    Jung Hun Oh

    Full Text Available The number of biomarker candidates is often much larger than the number of clinical patient data points available, which motivates the use of a rational candidate variable filtering methodology. The goal of this paper is to apply such a bioinformatics filtering process to isolate a modest number (<10 of key interacting genes and their associated single nucleotide polymorphisms involved in radiation response, and to ultimately serve as a basis for using clinical datasets to identify new biomarkers. In step 1, we surveyed the literature on genetic and protein correlates to radiation response, in vivo or in vitro, across cellular, animal, and human studies. In step 2, we analyzed two publicly available microarray datasets and identified genes in which mRNA expression changed in response to radiation. Combining results from Step 1 and Step 2, we identified 20 genes that were common to all three sources. As a final step, a curated database of protein interactions was used to generate the most statistically reliable protein interaction network among any subset of the 20 genes resulting from Steps 1 and 2, resulting in identification of a small, tightly interacting network with 7 out of 20 input genes. We further ranked the genes in terms of likely importance, based on their location within the network using a graph-based scoring function. The resulting core interacting network provides an attractive set of genes likely to be important to radiation response.

  16. Integrative cluster analysis in bioinformatics

    CERN Document Server

    Abu-Jamous, Basel; Nandi, Asoke K

    2015-01-01

    Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review o

  17. Bioinformatic analysis of proteomics data.

    Science.gov (United States)

    Schmidt, Andreas; Forne, Ignasi; Imhof, Axel

    2014-01-01

    Most biochemical reactions in a cell are regulated by highly specialized proteins, which are the prime mediators of the cellular phenotype. Therefore the identification, quantitation and characterization of all proteins in a cell are of utmost importance to understand the molecular processes that mediate cellular physiology. With the advent of robust and reliable mass spectrometers that are able to analyze complex protein mixtures within a reasonable timeframe, the systematic analysis of all proteins in a cell becomes feasible. Besides the ongoing improvements of analytical hardware, standardized methods to analyze and study all proteins have to be developed that allow the generation of testable new hypothesis based on the enormous pre-existing amount of biological information. Here we discuss current strategies on how to gather, filter and analyze proteomic data sates using available software packages.

  18. Bioinformatics analysis and detection of gelatinase encoded gene in Lysinibacillussphaericus

    Science.gov (United States)

    Repin, Rul Aisyah Mat; Mutalib, Sahilah Abdul; Shahimi, Safiyyah; Khalid, Rozida Mohd.; Ayob, Mohd. Khan; Bakar, Mohd. Faizal Abu; Isa, Mohd Noor Mat

    2016-11-01

    In this study, we performed bioinformatics analysis toward genome sequence of Lysinibacillussphaericus (L. sphaericus) to determine gene encoded for gelatinase. L. sphaericus was isolated from soil and gelatinase species-specific bacterium to porcine and bovine gelatin. This bacterium offers the possibility of enzymes production which is specific to both species of meat, respectively. The main focus of this research is to identify the gelatinase encoded gene within the bacteria of L. Sphaericus using bioinformatics analysis of partially sequence genome. From the research study, three candidate gene were identified which was, gelatinase candidate gene 1 (P1), NODE_71_length_93919_cov_158.931839_21 which containing 1563 base pair (bp) in size with 520 amino acids sequence; Secondly, gelatinase candidate gene 2 (P2), NODE_23_length_52851_cov_190.061386_17 which containing 1776 bp in size with 591 amino acids sequence; and Thirdly, gelatinase candidate gene 3 (P3), NODE_106_length_32943_cov_169.147919_8 containing 1701 bp in size with 566 amino acids sequence. Three pairs of oligonucleotide primers were designed and namely as, F1, R1, F2, R2, F3 and R3 were targeted short sequences of cDNA by PCR. The amplicons were reliably results in 1563 bp in size for candidate gene P1 and 1701 bp in size for candidate gene P3. Therefore, the results of bioinformatics analysis of L. Sphaericus resulting in gene encoded gelatinase were identified.

  19. High-throughput protein analysis integrating bioinformatics and experimental assays.

    Science.gov (United States)

    del Val, Coral; Mehrle, Alexander; Falkenhahn, Mechthild; Seiler, Markus; Glatting, Karl-Heinz; Poustka, Annemarie; Suhai, Sandor; Wiemann, Stefan

    2004-01-01

    The wealth of transcript information that has been made publicly available in recent years requires the development of high-throughput functional genomics and proteomics approaches for its analysis. Such approaches need suitable data integration procedures and a high level of automation in order to gain maximum benefit from the results generated. We have designed an automatic pipeline to analyse annotated open reading frames (ORFs) stemming from full-length cDNAs produced mainly by the German cDNA Consortium. The ORFs are cloned into expression vectors for use in large-scale assays such as the determination of subcellular protein localization or kinase reaction specificity. Additionally, all identified ORFs undergo exhaustive bioinformatic analysis such as similarity searches, protein domain architecture determination and prediction of physicochemical characteristics and secondary structure, using a wide variety of bioinformatic methods in combination with the most up-to-date public databases (e.g. PRINTS, BLOCKS, INTERPRO, PROSITE SWISSPROT). Data from experimental results and from the bioinformatic analysis are integrated and stored in a relational database (MS SQL-Server), which makes it possible for researchers to find answers to biological questions easily, thereby speeding up the selection of targets for further analysis. The designed pipeline constitutes a new automatic approach to obtaining and administrating relevant biological data from high-throughput investigations of cDNAs in order to systematically identify and characterize novel genes, as well as to comprehensively describe the function of the encoded proteins.

  20. Bioinformatics analysis of estrogen-responsive genes

    Science.gov (United States)

    Handel, Adam E.

    2016-01-01

    Estrogen is a steroid hormone that plays critical roles in a myriad of intracellular pathways. The expression of many genes is regulated through the steroid hormone receptors ESR1 and ESR2. These bind to DNA and modulate the expression of target genes. Identification of estrogen target genes is greatly facilitated by the use of transcriptomic methods, such as RNA-seq and expression microarrays, and chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq). Combining transcriptomic and ChIP-seq data enables a distinction to be drawn between direct and indirect estrogen target genes. This chapter will discuss some methods of identifying estrogen target genes that do not require any expertise in programming languages or complex bioinformatics. PMID:26585125

  1. Bioinformatic analysis of patient-derived ASPS gene expressions and ASPL-TFE3 fusion transcript levels identify potential therapeutic targets.

    Directory of Open Access Journals (Sweden)

    David G Covell

    Full Text Available Gene expression data, collected from ASPS tumors of seven different patients and from one immortalized ASPS cell line (ASPS-1, was analyzed jointly with patient ASPL-TFE3 (t(X;17(p11;q25 fusion transcript data to identify disease-specific pathways and their component genes. Data analysis of the pooled patient and ASPS-1 gene expression data, using conventional clustering methods, revealed a relatively small set of pathways and genes characterizing the biology of ASPS. These results could be largely recapitulated using only the gene expression data collected from patient tumor samples. The concordance between expression measures derived from ASPS-1 and both pooled and individual patient tumor data provided a rationale for extending the analysis to include patient ASPL-TFE3 fusion transcript data. A novel linear model was exploited to link gene expressions to fusion transcript data and used to identify a small set of ASPS-specific pathways and their gene expression. Cellular pathways that appear aberrantly regulated in response to the t(X;17(p11;q25 translocation include the cell cycle and cell adhesion. The identification of pathways and gene subsets characteristic of ASPS support current therapeutic strategies that target the FLT1 and MET, while also proposing additional targeting of genes found in pathways involved in the cell cycle (CHK1, cell adhesion (ARHGD1A, cell division (CDC6, control of meiosis (RAD51L3 and mitosis (BIRC5, and chemokine-related protein tyrosine kinase activity (CCL4.

  2. Bioinformatics analysis of metastasis-related proteins in hepatocellular carcinoma

    Institute of Scientific and Technical Information of China (English)

    Pei-Ming Song; Yang Zhang; Yu-Fei He; Hui-Min Bao; Jian-Hua Luo; Yin-Kun Liu; Peng-Yuan Yang; Xian Chen

    2008-01-01

    AIM: To analyze the metastasis-related proteins in hepatocellular carcinoma (HCC) and discover the biomark-er candidates for diagnosis and therapeutic intervention of HCC metastasis with bioinformatics tools.METHODS: Metastasis-related proteins were determined by stable isotope labeling and MS analysis and analyzed with bioinformatics resources, including Phobius, Kyoto encyclopedia of genes and genomes (KEGG), online mendelian inheritance in man (OHIH) and human protein reference database (HPRD).RESULTS: All the metastasis-related proteins were linked to 83 pathways in KEGG, including MAPK and p53 signal pathways. Protein-protein interaction network showed that all the metastasis-related proteins were categorized into 19 function groups, including cell cycle, apoptosis and signal transcluction. OMIM analysis linked these proteins to 186 OMIM entries.CONCLUSION: Metastasis-related proteins provide HCC cells with biological advantages in cell proliferation, migration and angiogenesis, and facilitate metastasis of HCC cells. The bird's eye view can reveal a global charac-teristic of metastasis-related proteins and many differen-tially expressed proteins can be identified as candidates for diagnosis and treatment of HCC.

  3. A complementary bioinformatics approach to identify potential plant cell wall glycosytransferase encoding genes

    DEFF Research Database (Denmark)

    Egelund, Jack; Skjøt, Michael; Geshi, Naomi;

    2004-01-01

    . Although much is known with regard to composition and fine structures of the plant CW, only a handful of CW biosynthetic GT genes-all classified in the CAZy system-have been characterized. In an effort to identify CW GTs that have not yet been classified in the CAZy database, a simple bioinformatics...

  4. In-depth analysis of the adipocyte proteome by mass spectrometry and bioinformatics

    DEFF Research Database (Denmark)

    Adachi, Jun; Kumar, Chanchal; Zhang, Yanling;

    2007-01-01

    , mitochondria, membrane, and cytosol of 3T3-L1 adipocytes. We identified 3,287 proteins while essentially eliminating false positives, making this one of the largest high confidence proteomes reported to date. Comprehensive bioinformatics analysis revealed that the adipocyte proteome, despite its specialized...

  5. Bioinformatics Resources for In Silico Proteome Analysis

    Directory of Open Access Journals (Sweden)

    Pruess Manuela

    2003-01-01

    Full Text Available In the growing field of proteomics, tools for the in silico analysis of proteins and even of whole proteomes are of crucial importance to make best use of the accumulating amount of data. To utilise this data for healthcare and drug development, first the characteristics of proteomes of entire species—mainly the human—have to be understood, before secondly differentiation between individuals can be surveyed. Specialised databases about nucleic acid sequences, protein sequences, protein tertiary structure, genome analysis, and proteome analysis represent useful resources for analysis, characterisation, and classification of protein sequences. Different from most proteomics tools focusing on similarity searches, structure analysis and prediction, detection of specific regions, alignments, data mining, 2D PAGE analysis, or protein modelling, respectively, comprehensive databases like the proteome analysis database benefit from the information stored in different databases and make use of different protein analysis tools to provide computational analysis of whole proteomes.

  6. Bioinformatics Analysis of MAPKKK Family Genes in Medicago truncatula

    Science.gov (United States)

    Li, Wei; Xu, Hanyun; Liu, Ying; Song, Lili; Guo, Changhong; Shu, Yongjun

    2016-01-01

    Mitogen-activated protein kinase kinase kinase (MAPKKK) is a component of the MAPK cascade pathway that plays an important role in plant growth, development, and response to abiotic stress, the functions of which have been well characterized in several plant species, such as Arabidopsis, rice, and maize. In this study, we performed genome-wide and systemic bioinformatics analysis of MAPKKK family genes in Medicago truncatula. In total, there were 73 MAPKKK family members identified by search of homologs, and they were classified into three subfamilies, MEKK, ZIK, and RAF. Based on the genomic duplication function, 72 MtMAPKKK genes were located throughout all chromosomes, but they cluster in different chromosomes. Using microarray data and high-throughput sequencing-data, we assessed their expression profiles in growth and development processes; these results provided evidence for exploring their important functions in developmental regulation, especially in the nodulation process. Furthermore, we investigated their expression in abiotic stresses by RNA-seq, which confirmed their critical roles in signal transduction and regulation processes under stress. In summary, our genome-wide, systemic characterization and expressional analysis of MtMAPKKK genes will provide insights that will be useful for characterizing the molecular functions of these genes in M. truncatula. PMID:27049397

  7. Bioinformatics Analysis of MAPKKK Family Genes in Medicago truncatula.

    Science.gov (United States)

    Li, Wei; Xu, Hanyun; Liu, Ying; Song, Lili; Guo, Changhong; Shu, Yongjun

    2016-01-01

    Mitogen-activated protein kinase kinase kinase (MAPKKK) is a component of the MAPK cascade pathway that plays an important role in plant growth, development, and response to abiotic stress, the functions of which have been well characterized in several plant species, such as Arabidopsis, rice, and maize. In this study, we performed genome-wide and systemic bioinformatics analysis of MAPKKK family genes in Medicago truncatula. In total, there were 73 MAPKKK family members identified by search of homologs, and they were classified into three subfamilies, MEKK, ZIK, and RAF. Based on the genomic duplication function, 72 MtMAPKKK genes were located throughout all chromosomes, but they cluster in different chromosomes. Using microarray data and high-throughput sequencing-data, we assessed their expression profiles in growth and development processes; these results provided evidence for exploring their important functions in developmental regulation, especially in the nodulation process. Furthermore, we investigated their expression in abiotic stresses by RNA-seq, which confirmed their critical roles in signal transduction and regulation processes under stress. In summary, our genome-wide, systemic characterization and expressional analysis of MtMAPKKK genes will provide insights that will be useful for characterizing the molecular functions of these genes in M. truncatula.

  8. Bioinformatics Analysis of MAPKKK Family Genes in Medicago truncatula

    Directory of Open Access Journals (Sweden)

    Wei Li

    2016-04-01

    Full Text Available Mitogen‐activated protein kinase kinase kinase (MAPKKK is a component of the MAPK cascade pathway that plays an important role in plant growth, development, and response to abiotic stress, the functions of which have been well characterized in several plant species, such as Arabidopsis, rice, and maize. In this study, we performed genome‐wide and systemic bioinformatics analysis of MAPKKK family genes in Medicago truncatula. In total, there were 73 MAPKKK family members identified by search of homologs, and they were classified into three subfamilies, MEKK, ZIK, and RAF. Based on the genomic duplication function, 72 MtMAPKKK genes were located throughout all chromosomes, but they cluster in different chromosomes. Using microarray data and high‐throughput sequencing‐data, we assessed their expression profiles in growth and development processes; these results provided evidence for exploring their important functions in developmental regulation, especially in the nodulation process. Furthermore, we investigated their expression in abiotic stresses by RNA‐seq, which confirmed their critical roles in signal transduction and regulation processes under stress. In summary, our genome‐wide, systemic characterization and expressional analysis of MtMAPKKK genes will provide insights that will be useful for characterizing the molecular functions of these genes in M. truncatula.

  9. Whale song analyses using bioinformatics sequence analysis approaches

    Science.gov (United States)

    Chen, Yian A.; Almeida, Jonas S.; Chou, Lien-Siang

    2005-04-01

    Animal songs are frequently analyzed using discrete hierarchical units, such as units, themes and songs. Because animal songs and bio-sequences may be understood as analogous, bioinformatics analysis tools DNA/protein sequence alignment and alignment-free methods are proposed to quantify the theme similarities of the songs of false killer whales recorded off northeast Taiwan. The eighteen themes with discrete units that were identified in an earlier study [Y. A. Chen, masters thesis, University of Charleston, 2001] were compared quantitatively using several distance metrics. These metrics included the scores calculated using the Smith-Waterman algorithm with the repeated procedure; the standardized Euclidian distance and the angle metrics based on word frequencies. The theme classifications based on different metrics were summarized and compared in dendrograms using cluster analyses. The results agree with earlier classifications derived by human observation qualitatively. These methods further quantify the similarities among themes. These methods could be applied to the analyses of other animal songs on a larger scale. For instance, these techniques could be used to investigate song evolution and cultural transmission quantifying the dissimilarities of humpback whale songs across different seasons, years, populations, and geographic regions. [Work supported by SC Sea Grant, and Ilan County Government, Taiwan.

  10. Biochip microsystem for bioinformatics recognition and analysis

    Science.gov (United States)

    Lue, Jaw-Chyng (Inventor); Fang, Wai-Chi (Inventor)

    2011-01-01

    A system with applications in pattern recognition, or classification, of DNA assay samples. Because DNA reference and sample material in wells of an assay may be caused to fluoresce depending upon dye added to the material, the resulting light may be imaged onto an embodiment comprising an array of photodetectors and an adaptive neural network, with applications to DNA analysis. Other embodiments are described and claimed.

  11. Bioinformatics Analysis of Zinc Transporter from Baoding Alfalfa

    Institute of Scientific and Technical Information of China (English)

    Haibo WANG; Junyun GUO

    2012-01-01

    [Objective] This study aimed to perform the bioinformatics analysis of Zinc transporter (ZnT) from Baoding Alfalfa. [Method] Based on the amino acid sequence, the physical and chemical properties, hydrophilicity/hydrophobicity, secondary structure of ZnT from Baoding alfalfa were predicted by a series of bioinformatics software. And the transmembrane domains were predicted by using different online tools. [Result] ZnT is a hydrophobic protein containing 408 amino acids with the theoretical pl of 5.94, and it has 7 potential transmembrane hydrophobic regions. In the sec- ondary structure, co-helix (Hh) accounted for 48.04%, extended strand (Ee) for 9.56%, random coil (Cc) for 42.40%, which was accored with the characteristic of transmembrane protein. [Conclusion] mZnT is a member of CDF family, responsible for transporting Zn^2+ out of the cell membrane to reduce the concentration and toxicity of Zn^2+.

  12. Integrated Bioinformatics, Environmental Epidemiologic and Genomic Approaches to Identify Environmental and Molecular Links between Endometriosis and Breast Cancer

    Directory of Open Access Journals (Sweden)

    Deodutta Roy

    2015-10-01

    Full Text Available We present a combined environmental epidemiologic, genomic, and bioinformatics approach to identify: exposure of environmental chemicals with estrogenic activity; epidemiologic association between endocrine disrupting chemical (EDC and health effects, such as, breast cancer or endometriosis; and gene-EDC interactions and disease associations. Human exposure measurement and modeling confirmed estrogenic activity of three selected class of environmental chemicals, polychlorinated biphenyls (PCBs, bisphenols (BPs, and phthalates. Meta-analysis showed that PCBs exposure, not Bisphenol A (BPA and phthalates, increased the summary odds ratio for breast cancer and endometriosis. Bioinformatics analysis of gene-EDC interactions and disease associations identified several hundred genes that were altered by exposure to PCBs, phthalate or BPA. EDCs-modified genes in breast neoplasms and endometriosis are part of steroid hormone signaling and inflammation pathways. All three EDCs–PCB 153, phthalates, and BPA influenced five common genes—CYP19A1, EGFR, ESR2, FOS, and IGF1—in breast cancer as well as in endometriosis. These genes are environmentally and estrogen responsive, altered in human breast and uterine tumors and endometriosis lesions, and part of Mitogen Activated Protein Kinase (MAPK signaling pathways in cancer. Our findings suggest that breast cancer and endometriosis share some common environmental and molecular risk factors.

  13. Bioinformatic Identification and Analysis of Extensins in the Plant Kingdom

    Science.gov (United States)

    Liu, Xiao; Wolfe, Richard; Welch, Lonnie R.; Domozych, David S.; Popper, Zoë A.; Showalter, Allan M.

    2016-01-01

    Extensins (EXTs) are a family of plant cell wall hydroxyproline-rich glycoproteins (HRGPs) that are implicated to play important roles in plant growth, development, and defense. Structurally, EXTs are characterized by the repeated occurrence of serine (Ser) followed by three to five prolines (Pro) residues, which are hydroxylated as hydroxyproline (Hyp) and glycosylated. Some EXTs have Tyrosine (Tyr)-X-Tyr (where X can be any amino acid) motifs that are responsible for intramolecular or intermolecular cross-linkings. EXTs can be divided into several classes: classical EXTs, short EXTs, leucine-rich repeat extensins (LRXs), proline-rich extensin-like receptor kinases (PERKs), formin-homolog EXTs (FH EXTs), chimeric EXTs, and long chimeric EXTs. To guide future research on the EXTs and understand evolutionary history of EXTs in the plant kingdom, a bioinformatics study was conducted to identify and classify EXTs from 16 fully sequenced plant genomes, including Ostreococcus lucimarinus, Chlamydomonas reinhardtii, Volvox carteri, Klebsormidium flaccidum, Physcomitrella patens, Selaginella moellendorffii, Pinus taeda, Picea abies, Brachypodium distachyon, Zea mays, Oryza sativa, Glycine max, Medicago truncatula, Brassica rapa, Solanum lycopersicum, and Solanum tuberosum, to supplement data previously obtained from Arabidopsis thaliana and Populus trichocarpa. A total of 758 EXTs were newly identified, including 87 classical EXTs, 97 short EXTs, 61 LRXs, 75 PERKs, 54 FH EXTs, 38 long chimeric EXTs, and 346 other chimeric EXTs. Several notable findings were made: (1) classical EXTs were likely derived after the terrestrialization of plants; (2) LRXs, PERKs, and FHs were derived earlier than classical EXTs; (3) monocots have few classical EXTs; (4) Eudicots have the greatest number of classical EXTs and Tyr-X-Tyr cross-linking motifs are predominantly in classical EXTs; (5) green algae have no classical EXTs but have a number of long chimeric EXTs that are absent in

  14. ISEV position paper: extracellular vesicle RNA analysis and bioinformatics

    Directory of Open Access Journals (Sweden)

    Andrew F. Hill

    2013-12-01

    Full Text Available Extracellular vesicles (EVs are the collective term for the various vesicles that are released by cells into the extracellular space. Such vesicles include exosomes and microvesicles, which vary by their size and/or protein and genetic cargo. With the discovery that EVs contain genetic material in the form of RNA (evRNA has come the increased interest in these vesicles for their potential use as sources of disease biomarkers and potential therapeutic agents. Rapid developments in the availability of deep sequencing technologies have enabled the study of EV-related RNA in detail. In October 2012, the International Society for Extracellular Vesicles (ISEV held a workshop on “evRNA analysis and bioinformatics.” Here, we report the conclusions of one of the roundtable discussions where we discussed evRNA analysis technologies and provide some guidelines to researchers in the field to consider when performing such analysis.

  15. Bioinformatics tools and databases for whole genome sequence analysis of Mycobacterium tuberculosis.

    Science.gov (United States)

    Faksri, Kiatichai; Tan, Jun Hao; Chaiprasert, Angkana; Teo, Yik-Ying; Ong, Rick Twee-Hee

    2016-11-01

    Tuberculosis (TB) is an infectious disease of global public health importance caused by Mycobacterium tuberculosis complex (MTC) in which M. tuberculosis (Mtb) is the major causative agent. Recent advancements in genomic technologies such as next generation sequencing have enabled high throughput cost-effective generation of whole genome sequence information from Mtb clinical isolates, providing new insights into the evolution, genomic diversity and transmission of the Mtb bacteria, including molecular mechanisms of antibiotic resistance. The large volume of sequencing data generated however necessitated effective and efficient management, storage, analysis and visualization of the data and results through development of novel and customized bioinformatics software tools and databases. In this review, we aim to provide a comprehensive survey of the current freely available bioinformatics software tools and publicly accessible databases for genomic analysis of Mtb for identifying disease transmission in molecular epidemiology and in rapid determination of the antibiotic profiles of clinical isolates for prompt and optimal patient treatment.

  16. The Expression and Bioinformatic Analysis of a Novel Gene C20orf14 Associated with Lymphoma

    Institute of Scientific and Technical Information of China (English)

    Liangping SU; Deng CHEN; Jianming ZHANG; Ximing LI; Guihong PAN; Xiangyang BAI; Yunping LU; Jianfeng ZHOU; Shuang LI

    2008-01-01

    The aim of the present study was to explore the differentially expressed genes in the blood vessel endothelial cells (BVECs) between diffuse large B-cell lymphoma (DLBCL) and reac- tive lymph node hyperplasia (RLNH), and to perform an initial bioinformatics analysis on a novel gene, C20orf14, which is highly expressed in lymph node of lymphoma. The mRNA of the tissue from the BVECs of DLBCL and RLNH tissues was labeled with biotin respectively and hybridized with expression profile microarray, and the differentially expressed genes were obtained. Initial bio- informatics analysis was performed on a novel gene named C20orf14. Its gene structure, genomic lo- calization, the physical and chemical characteristics of the putative protein, subcellular localization, functional domain etc. were predicted, and the systematic evolution analysis was performed on the similar proteins among several species. By using expression profile microarray, many differentially expressed genes were uncovered. The efficient bioinformatics analysis have fundamentally identified that C20orfl4 was a nuclear protein, and may be involved in the post-transcription modification of mRNA. Therefore, microarray is an efficient and high throughout strategy for the detection of differ- entially expressed genes, and C20orf14 is thought to be a potential target for tumor metastasis re- searches by bioinformatics analysis.

  17. The MPI Bioinformatics Toolkit for protein sequence analysis.

    Science.gov (United States)

    Biegert, Andreas; Mayer, Christian; Remmert, Michael; Söding, Johannes; Lupas, Andrei N

    2006-07-01

    The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification. Several public tools are offered in customized versions that extend their functionality. For example, PSI-BLAST can be run against regularly updated standard databases, customized user databases or selectable sets of genomes. Another tool, Quick2D, integrates the results of various secondary structure, transmembrane and disorder prediction programs into one view. The Toolkit provides a friendly and intuitive user interface with an online help facility. As a key feature, various tools are interconnected so that the results of one tool can be forwarded to other tools. One could run PSI-BLAST, parse out a multiple alignment of selected hits and send the results to a cluster analysis tool. The Toolkit framework and the tools developed in-house will be packaged and freely available under the GNU Lesser General Public Licence (LGPL). The Toolkit can be accessed at http://toolkit.tuebingen.mpg.de.

  18. Bioinformatics approaches to single-cell analysis in developmental biology.

    Science.gov (United States)

    Yalcin, Dicle; Hakguder, Zeynep M; Otu, Hasan H

    2016-03-01

    Individual cells within the same population show various degrees of heterogeneity, which may be better handled with single-cell analysis to address biological and clinical questions. Single-cell analysis is especially important in developmental biology as subtle spatial and temporal differences in cells have significant associations with cell fate decisions during differentiation and with the description of a particular state of a cell exhibiting an aberrant phenotype. Biotechnological advances, especially in the area of microfluidics, have led to a robust, massively parallel and multi-dimensional capturing, sorting, and lysis of single-cells and amplification of related macromolecules, which have enabled the use of imaging and omics techniques on single cells. There have been improvements in computational single-cell image analysis in developmental biology regarding feature extraction, segmentation, image enhancement and machine learning, handling limitations of optical resolution to gain new perspectives from the raw microscopy images. Omics approaches, such as transcriptomics, genomics and epigenomics, targeting gene and small RNA expression, single nucleotide and structural variations and methylation and histone modifications, rely heavily on high-throughput sequencing technologies. Although there are well-established bioinformatics methods for analysis of sequence data, there are limited bioinformatics approaches which address experimental design, sample size considerations, amplification bias, normalization, differential expression, coverage, clustering and classification issues, specifically applied at the single-cell level. In this review, we summarize biological and technological advancements, discuss challenges faced in the aforementioned data acquisition and analysis issues and present future prospects for application of single-cell analyses to developmental biology.

  19. Bioinformatic Analysis of BBTV Satellite DNA in Hainan

    Institute of Scientific and Technical Information of China (English)

    Nai-tong Yu; Tuan-cheng Feng; Yu-liang Zhang; Jian-hua Wang; Zhi-xin Liu

    2011-01-01

    Banana bunchy top virus (BBTV),family Nanaviridae,genus Babuvirus,is a single stranded DNA virus (ssDNA) that causes banana bunchy top disease (BBTD) in banana plants.It is the most common and most destructive of all viruses in these plants and is widespread throughout the Asia-Pacific region.In this study we isolated,cloned and sequenced a BBTV sample from Hainan Island,China.The results from sequencing and bioinformatics analysis indicate this isolate represents a satellite DNA component with 12 DNA sequences motifs.We also predicted the physical and chemical properties,structure,signal peptide,phosphorylation,secondary structure,tertiary structure and functional domains of its encoding protein,and compare them with the corresponding quantities in the replication initiation protein of BBTV DNA1.

  20. An Integrated Bioinformatics and Computational Biology Approach Identifies New BH3-Only Protein Candidates.

    Science.gov (United States)

    Hawley, Robert G; Chen, Yuzhong; Riz, Irene; Zeng, Chen

    2012-05-04

    In this study, we utilized an integrated bioinformatics and computational biology approach in search of new BH3-only proteins belonging to the BCL2 family of apoptotic regulators. The BH3 (BCL2 homology 3) domain mediates specific binding interactions among various BCL2 family members. It is composed of an amphipathic α-helical region of approximately 13 residues that has only a few amino acids that are highly conserved across all members. Using a generalized motif, we performed a genome-wide search for novel BH3-containing proteins in the NCBI Consensus Coding Sequence (CCDS) database. In addition to known pro-apoptotic BH3-only proteins, 197 proteins were recovered that satisfied the search criteria. These were categorized according to α-helical content and predictive binding to BCL-xL (encoded by BCL2L1) and MCL-1, two representative anti-apoptotic BCL2 family members, using position-specific scoring matrix models. Notably, the list is enriched for proteins associated with autophagy as well as a broad spectrum of cellular stress responses such as endoplasmic reticulum stress, oxidative stress, antiviral defense, and the DNA damage response. Several potential novel BH3-containing proteins are highlighted. In particular, the analysis strongly suggests that the apoptosis inhibitor and DNA damage response regulator, AVEN, which was originally isolated as a BCL-xL-interacting protein, is a functional BH3-only protein representing a distinct subclass of BCL2 family members.

  1. Mutational and Bioinformatic Analysis of Haloarchaeal Lipobox-Containing Proteins

    Directory of Open Access Journals (Sweden)

    Stefanie Storf

    2010-01-01

    Full Text Available A conserved lipid-modified cysteine found in a protein motif commonly referred to as a lipobox mediates the membrane anchoring of a subset of proteins transported across the bacterial cytoplasmic membrane via the Sec pathway. Sequenced haloarchaeal genomes encode many putative lipoproteins and recent studies have confirmed the importance of the conserved lipobox cysteine for signal peptide processing of three lipobox-containing proteins in the model archaeon Haloferax volcanii. We have extended these in vivo analyses to additional Hfx. volcanii substrates, supporting our previous in silico predictions and confirming the diversity of predicted Hfx. volcanii lipoproteins. Moreover, using extensive comparative secretome analyses, we identified genes encodining putative lipoproteins across a wide range of archaeal species. While our in silico analyses, supported by in vivo data, indicate that most haloarchaeal lipoproteins are Tat substrates, these analyses also predict that many crenarchaeal species lack lipoproteins altogether and that other archaea, such as nonhalophilic euryarchaeal species, transport lipoproteins via the Sec pathway. To facilitate the identification of genes that encode potential haloarchaeal Tat-lipoproteins, we have developed TatLipo, a bioinformatic tool designed to detect lipoboxes in haloarchaeal Tat signal peptides. Our results provide a strong foundation for future studies aimed at identifying components of the archaeal lipoprotein biogenesis pathway.

  2. ZBIT Bioinformatics Toolbox: A Web-Platform for Systems Biology and Expression Data Analysis.

    Science.gov (United States)

    Römer, Michael; Eichner, Johannes; Dräger, Andreas; Wrzodek, Clemens; Wrzodek, Finja; Zell, Andreas

    2016-01-01

    Bioinformatics analysis has become an integral part of research in biology. However, installation and use of scientific software can be difficult and often requires technical expert knowledge. Reasons are dependencies on certain operating systems or required third-party libraries, missing graphical user interfaces and documentation, or nonstandard input and output formats. In order to make bioinformatics software easily accessible to researchers, we here present a web-based platform. The Center for Bioinformatics Tuebingen (ZBIT) Bioinformatics Toolbox provides web-based access to a collection of bioinformatics tools developed for systems biology, protein sequence annotation, and expression data analysis. Currently, the collection encompasses software for conversion and processing of community standards SBML and BioPAX, transcription factor analysis, and analysis of microarray data from transcriptomics and proteomics studies. All tools are hosted on a customized Galaxy instance and run on a dedicated computation cluster. Users only need a web browser and an active internet connection in order to benefit from this service. The web platform is designed to facilitate the usage of the bioinformatics tools for researchers without advanced technical background. Users can combine tools for complex analyses or use predefined, customizable workflows. All results are stored persistently and reproducible. For each tool, we provide documentation, tutorials, and example data to maximize usability. The ZBIT Bioinformatics Toolbox is freely available at https://webservices.cs.uni-tuebingen.de/.

  3. Somatic populations of PGT135-137 HIV-1-neutralizing antibodies identified by 454 pyrosequencing and bioinformatics

    Directory of Open Access Journals (Sweden)

    Jiang eZhu

    2012-09-01

    Full Text Available Select HIV-1-infected individuals develop sera capable of neutralizing diverse viral strains. The molecular basis of this neutralization is currently being deciphered by the isolation of HIV-1-neutralizing antibodies. In one infected donor, three neutralizing antibodies, PGT135-137, were identified by assessment of neutralization from individually sorted B cells and found to recognize an epitope containing an N-linked glycan at residue 332 on HIV-1 gp120. Here we use deep sequencing and bioinformatics methods to interrogate the B cell record of this donor to gain a more complete understanding of the humoral immune response. PGT135-137-gene family-specific primers were used to amplify heavy and light chain-variable domain sequences. 454 pyrosequencing produced 141,298 heavy-chain sequences of IGHV4-39 origin and 87,229 light-chain sequences of IGKV3-15 origin. A number of heavy and light chain sequences of ~90% identity to PGT137, several to PGT136, and none of high identity to PGT135 were identified. After expansion of these sequences to include close phylogenetic relatives, a total of 202 heavy-chain sequences and 72 light-chain sequences were identified. These sequences were clustered into populations of 95% identity comprising 15 for heavy chain and 10 for light chain, and a select sequence from each population was synthesized and reconstituted with a PGT137-partner chain. Reconstituted antibodies showed varied neutralization phenotypes for HIV-1 clade A and D isolates. Sequence diversity of the antibody population represented by these tested sequences was notably higher than observed with a 454 pyrosequencing-control analysis on 10 antibodies of defined sequence, suggesting that this diversity results primarily from somatic maturation. Our results thus provide an example of how pathogens like HIV-1 are opposed by a varied humoral immune response, derived from intrinsic mechanisms of antibody development, and embodied by somatic populations

  4. Bioinformatics analysis of two-component regulatory systems in Staphylococcus epidermidis

    Institute of Scientific and Technical Information of China (English)

    QIN Zhiqiang; ZHONG Yang; ZHANG Jian; HE Youyu; WU Yang; JIANG Juan; CHEN Jiemin; LUO Xiaomin; QU Di

    2004-01-01

    Sixteen pairs of two-component regulatory systems are identified in the genome of Staphylococcus epidermidis ATCC12228 strain, which is newly sequenced by our laboratory for Medical Molecular Virology and Chinese National Human Genome Center at Shanghai, by using bioinformatics analysis. Comparative analysis of the twocomponent regulatory systems in S. epidermidis and that of S.aureus and Bacillus subtilis shows that these systems may regulate some important biological functions, e.g. growth,biofilm formation, and expression of virulence factors in S.epidermidis. Two conserved domains, i.e. HATPase_c and REC domains, are found in all 16 pairs of two-component proteins.Homologous modelling analysis indicates that there are 4similar HATPase_c domain structures of histidine kinases and 13 similar REC domain structures of response regulators,and there is one AMP-PNP binding pocket in the HATPase_c domain and three active aspartate residues in the REC domain. Preliminary experiment reveals that the bioinformatics analysis of the conserved domain structures in the two-component regulatory systems in S. epidermidis may provide useful information for discovery of potential drug target.

  5. A bioinformatics search pipeline, RNA2DSearch, identifies RNA localization elements in Drosophila retrotransposons.

    Science.gov (United States)

    Hamilton, Russell S; Hartswood, Eve; Vendra, Georgia; Jones, Cheryl; Van De Bor, Veronique; Finnegan, David; Davis, Ilan

    2009-02-01

    mRNA localization is a widespread mode of delivering proteins to their site of function. The embryonic axes in Drosophila are determined in the oocyte, through Dynein-dependent transport of gurken/TGF-alpha mRNA, containing a small localization signal that assigns its destination. A signal with a similar secondary structure, but lacking significant sequence similarity, is present in the I factor retrotransposon mRNA, also transported by Dynein. It is currently unclear whether other mRNAs exist that are localized to the same site using similar signals. Moreover, searches for other genes containing similar elements have not been possible due to a lack of suitable bioinformatics methods for searches of secondary structure elements and the difficulty of experimentally testing all the possible candidates. We have developed a bioinformatics approach for searching across the genome for small RNA elements that are similar to the secondary structures of particular localization signals. We have uncovered 48 candidates, of which we were able to test 22 for their localization potential using injection assays for Dynein mediated RNA localization. We found that G2 and Jockey transposons each contain a gurken/I factor-like RNA stem-loop required for Dynein-dependent localization to the anterior and dorso-anterior corner of the oocyte. We conclude that I factor, G2, and Jockey are members of a "family" of transposable elements sharing a gurken-like mRNA localization signal and Dynein-dependent mechanism of transport. The bioinformatics pipeline we have developed will have broader utility in fields where small RNA signals play important roles.

  6. Buying in to bioinformatics: an introduction to commercial sequence analysis software.

    Science.gov (United States)

    Smith, David Roy

    2015-07-01

    Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics.

  7. Bioinformatic science and devices for computer analysis and visualization of macromolecules

    Directory of Open Access Journals (Sweden)

    Yu.B. Porozov

    2010-06-01

    Full Text Available The goals and objectives of bioinformatic science are presented in the article. The main methods and approaches used in computer biology are highlighted. Areas in which bioinformatic science can greatly facilitate and speed up the work of practical biologist and pharmacologist are revealed. The features of both the basic packages and software devices for complete, thorough analysis of macromolecules and for development and modeling of ligands and binding centers are described

  8. Bioinformatics Analysis of the Duck Enteritis Virus UL54 Gene

    Directory of Open Access Journals (Sweden)

    Chaoyue Liu

    2014-04-01

    Full Text Available In this study, we analyze the Duck Enteritis Virus (DEV UL54 gene, which has been isolated and identified in our lab (GenBank accession NO EU071033, to help deeply research on DEV. DNA sequence analysis showed that the identified ORF which composed of 1377 bp nucleotides encoded 458 amino acids with a predicted Mr. of 51.75 kDa. Multiple sequence alignment suggested that the UL54 gene was highly conserved in Alphaherpesvirinae and was similar to the other herpesviral UL54 gene. Phylogenetic analysis of the DEV UL54 gene revealed that DEV had a close evolutionary relationship with Gallid, Herpesvirus 2 (GaHV-2, Gallid Herpesvirus 3 (GaHV-3, Meleagrid Herpesvirus1 (MeHV-1 and should belong to a single cluster within the Alphaherpesvirinae subfamily.

  9. Cloning, expression, purification and bioinformatic analysis of 2-methylcitrate synthase from Mycobacterium tuberculosis

    Institute of Scientific and Technical Information of China (English)

    Kandasamy Eniyan; Urmi Bajpai

    2015-01-01

    Objective:To clone, express and purify2-methylcitrate synthase(Rv1131) gene of Mycobacterium tuberculosis(M. tuberculosis) and to study its structural characteristics using various bioinformatics tools.Methods:Rv1131 gene was amplified by polymerase chain reaction usingM. tuberculosisH37Rv genomicDNA and cloned into pGEM-T easy vector and sequenced. The gene was sub-cloned in pET28c vector, expressed inEscherichia coliBL21(E. coliBL21) (DE3) cells and the recombinant protein was identified byWestern blotting.The protein was purified usingNickel affinity chromatography and the structural characteristics like sub-cellular localization, presence of transmembrane helices and secondary structure of the protein were predicted by bioinformatics tools.Tertiary structure of the protein and phylogenetic analysis was also established byin silico analysis.Results:The expression of the recombinant protein (Rv1131) was confirmed by western blotting using anti-HIS antibodies and the protein was purified from the soluble fraction.In silicoanalysis showed that the protein contains no signal peptide and transmembrane helices.Active site prediction showed that the protein has histidine and aspartic acid residues at242,281 &332 positions respectively.Phylogenetic analysis showed 100% homology withmajor mycobacterial species.Secondary structure predicts2-methylcitrate synthase contain51.9% alpha-helix,8.7% extended strand and39.4% random coils.Tertiary structure of the protein was also established.Conclusions:The enzyme2-methylcitrate synthase from M. tuberculosisH37Rv has been successfully expressed and purified.The purified protein will further be utilized to develop assay methods for screening new inhibitors.

  10. Cake: a bioinformatics pipeline for the integrated analysis of somatic variants in cancer genomes

    Science.gov (United States)

    Rashid, Mamunur; Robles-Espinoza, Carla Daniela; Rust, Alistair G.; Adams, David J.

    2013-01-01

    Summary: We have developed Cake, a bioinformatics software pipeline that integrates four publicly available somatic variant-calling algorithms to identify single nucleotide variants with higher sensitivity and accuracy than any one algorithm alone. Cake can be run on a high-performance computer cluster or used as a stand-alone application. Availabilty: Cake is open-source and is available from http://cakesomatic.sourceforge.net/ Contact: da1@sanger.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:23803469

  11. Bioinformatics analysis of the gene expression profile of hepatocellular carcinoma: preliminary results

    Science.gov (United States)

    Li, Jia

    2016-01-01

    Aim of the study To analyse the expression profile of hepatocellular carcinoma compared with normal liver by using bioinformatics methods. Material and methods In this study, we analysed the microarray expression data of HCC and adjacent normal liver samples from the Gene Expression Omnibus (GEO) database to screen for differentially expressed genes. Then, functional analyses were performed using GenCLiP analysis, Gene Ontology categories, and aberrant pathway identification. In addition, we used the CMap database to identify small molecules that can induce HCC. Results Overall, 2721 differentially expressed genes (DEGs) were identified. We found 180 metastasis-related genes and constructed co-occurrence networks. Several significant pathways, including the transforming growth factor β (TGF-β) signalling pathway, were identified as closely related to these DEGs. Some candidate small molecules (such as betahistine) were identified that might provide a basis for developing HCC treatments in the future. Conclusions Although we functionally analysed the differences in the gene expression profiles of HCC and normal liver tissues, our study is essentially preliminary, and it may be premature to apply our results to clinical trials. Further research and experimental testing are required in future studies. PMID:27095935

  12. Expression and bioinformatic analysis of lymphoma-associated novel gene KIAA0372

    Institute of Scientific and Technical Information of China (English)

    BAI Xiangyang; TANG Duozhuang; ZHU Tao; SUN Lishi; YAN Lingling; LU Yunping; ZHOU Jianfeng; MA Ding

    2007-01-01

    The purpose of this study was to explore the differentially expressed genes in lymph-node cells (LNC) of lymphomas and reactive lymph node hyperplasia,and to perform an initial bioinformatic analysis on a novel gene,KIAA0372,which is highly expressed in the LNC of lymphomas.mRNA extracted from LNC of lymphomas and reactive lymph node hyperplasia were respectively marked with biotin and hybridized with Gene Expression Chips,resulting in differentially expressed genes.Initial bioinformatic analysis was then performed on a novel gene named KIAA0372,whose function has not yet been explored.Its structure and genomic location,its product's physical and chemical properties,subcellular localization and functional domains,were also predicted.Further,a systematic evolution analysis was performed on similar proteins from among several species.Using Gene Expression Chips,many differentially expressed genes were uncovered.Efficient bioinformatic analysis has fundamentally determined that KIAA0372 is an extracellular protein which may be involved in TGF-β signaling.Microarray is an efficient and high throughput strategy for detection of differentially expressed genes.And KIAA0372 is thought to be a potential target for tumor research using bioinformatic analysis.

  13. The Revolution in Viral Genomics as Exemplified by the Bioinformatic Analysis of Human Adenoviruses

    Directory of Open Access Journals (Sweden)

    Sarah Torres

    2010-06-01

    Full Text Available Over the past 30 years, genomic and bioinformatic analysis of human adenoviruses has been achieved using a variety of DNA sequencing methods; initially with the use of restriction enzymes and more currently with the use of the GS FLX pyrosequencing technology. Following the conception of DNA sequencing in the 1970s, analysis of adenoviruses has evolved from 100 base pair mRNA fragments to entire genomes. Comparative genomics of adenoviruses made its debut in 1984 when nucleotides and amino acids of coding sequences within the hexon genes of two human adenoviruses (HAdV, HAdV–C2 and HAdV–C5, were compared and analyzed. It was determined that there were three different zones (1-393, 394-1410, 1411-2910 within the hexon gene, of which HAdV–C2 and HAdV–C5 shared zones 1 and 3 with 95% and 89.5% nucleotide identity, respectively. In 1992, HAdV-C5 became the first adenovirus genome to be fully sequenced using the Sanger method. Over the next seven years, whole genome analysis and characterization was completed using bioinformatic tools such as blastn, tblastx, ClustalV and FASTA, in order to determine key proteins in species HAdV-A through HAdV-F. The bioinformatic revolution was initiated with the introduction of a novel species, HAdV-G, that was typed and named by the use of whole genome sequencing and phylogenetics as opposed to traditional serology. HAdV bioinformatics will continue to advance as the latest sequencing technology enables scientists to add to and expand the resource databases. As a result of these advancements, how novel HAdVs are typed has changed. Bioinformatic analysis has become the revolutionary tool that has significantly accelerated the in-depth study of HAdV microevolution through comparative genomics.

  14. Bioinformatic Analysis of Putative Gene Products Encoded in SARS-HCoV Genome

    Institute of Scientific and Technical Information of China (English)

    赵心刚; 韩敬东; 宁元亨; 孟安明; 陈晔光

    2003-01-01

    The cause of severe acute respiratory syndrome (SARS) has been identified as a new coronavirus named as SARS-HCoV.Using bioinformatic methods, we have performed a detailed domain search.In addition to the viral structure proteins, we have found that several putative polypeptides share sequence similarity to known domains or proteins.This study may provide a basis for future studies on the infection and replication process of this notorious virus.

  15. Proteomic and bioinformatic analysis of epithelial tight junction reveals an unexpected cluster of synaptic molecules

    Directory of Open Access Journals (Sweden)

    Tang Vivian W

    2006-12-01

    Full Text Available Abstract Background Zonula occludens, also known as the tight junction, is a specialized cell-cell interaction characterized by membrane "kisses" between epithelial cells. A cytoplasmic plaque of ~100 nm corresponding to a meshwork of densely packed proteins underlies the tight junction membrane domain. Due to its enormous size and difficulties in obtaining a biochemically pure fraction, the molecular composition of the tight junction remains largely unknown. Results A novel biochemical purification protocol has been developed to isolate tight junction protein complexes from cultured human epithelial cells. After identification of proteins by mass spectroscopy and fingerprint analysis, candidate proteins are scored and assessed individually. A simple algorithm has been devised to incorporate transmembrane domains and protein modification sites for scoring membrane proteins. Using this new scoring system, a total of 912 proteins have been identified. These 912 hits are analyzed using a bioinformatics approach to bin the hits in 4 categories: configuration, molecular function, cellular function, and specialized process. Prominent clusters of proteins related to the cytoskeleton, cell adhesion, and vesicular traffic have been identified. Weaker clusters of proteins associated with cell growth, cell migration, translation, and transcription are also found. However, the strongest clusters belong to synaptic proteins and signaling molecules. Localization studies of key components of synaptic transmission have confirmed the presence of both presynaptic and postsynaptic proteins at the tight junction domain. To correlate proteomics data with structure, the tight junction has been examined using electron microscopy. This has revealed many novel structures including end-on cytoskeletal attachments, vesicles fusing/budding at the tight junction membrane domain, secreted substances encased between the tight junction kisses, endocytosis of tight junction

  16. Bioinformatics analysis of differentially expressed proteins in prostate cancer based on proteomics data

    Directory of Open Access Journals (Sweden)

    Chen C

    2016-03-01

    Full Text Available Chen Chen,1 Li-Guo Zhang,1 Jian Liu,1 Hui Han,1 Ning Chen,1 An-Liang Yao,1 Shao-San Kang,1 Wei-Xing Gao,1 Hong Shen,2 Long-Jun Zhang,1 Ya-Peng Li,1 Feng-Hong Cao,1 Zhi-Guo Li3 1Department of Urology, North China University of Science and Technology Affiliated Hospital, 2Department of Modern Technology and Education Center, 3Department of Medical Research Center, International Science and Technology Cooperation Base of Geriatric Medicine, North China University of Science and Technology, Tangshan, People’s Republic of China Abstract: We mined the literature for proteomics data to examine the occurrence and metastasis of prostate cancer (PCa through a bioinformatics analysis. We divided the differentially expressed proteins (DEPs into two groups: the group consisting of PCa and benign tissues (P&b and the group presenting both high and low PCa metastatic tendencies (H&L. In the P&b group, we found 320 DEPs, 20 of which were reported more than three times, and DES was the most commonly reported. Among these DEPs, the expression levels of FGG, GSN, SERPINC1, TPM1, and TUBB4B have not yet been correlated with PCa. In the H&L group, we identified 353 DEPs, 13 of which were reported more than three times. Among these DEPs, MDH2 and MYH9 have not yet been correlated with PCa metastasis. We further confirmed that DES was differentially expressed between 30 cancer and 30 benign tissues. In addition, DEPs associated with protein transport, regulation of actin cytoskeleton, and the extracellular matrix (ECM–receptor interaction pathway were prevalent in the H&L group and have not yet been studied in detail in this context. Proteins related to homeostasis, the wound-healing response, focal adhesions, and the complement and coagulation pathways were overrepresented in both groups. Our findings suggest that the repeatedly reported DEPs in the two groups may function as potential biomarkers for detecting PCa and predicting its aggressiveness. Furthermore

  17. Microarray-bioinformatics analysis of altered genomic expression profiles between human fetal and infant myocardium

    Institute of Scientific and Technical Information of China (English)

    KONG Bo; LIU Ying-long; L(U) Xiao-dong

    2008-01-01

    Background The physiological differences between fetal and postnatal heart have been well characterized at the cellular level. However, the genetic mechanisms governing and regulating these differences have only been partially elucidated. Elucidation of the differentially expressed genes profile before and after birth has never been systematically proposed and analyzed.Methods The human oligonuclectide microarray and bioinformatics analysis approaches were applied to isolate and classify the differentially expressed genes between fetal and infant cardiac tissue samples. Quantitative real-time PCR was used to confirm the results from the microarray.Results Two hundred and forty-two differentially expressed genes were discovered and classified into 13 categories, including genes related to energy metabolism, myocyte hyperplasia, development, muscle contraction, protein synthesis and degradation, extraceUular matrix components, transcription factors, apoptosis, signal pathway molecules, organelle organization and several other biological processes. Moreover, 95 genes were identified which had not previously been reported to be expressed in the heart.Conclusions The study systematically analyzed the alteration of the gene expression profile between the human fetal and infant myocardium. A number of genes were discovered which had not been reported to be expressed in the heart. The data provided insight into the physical development mechanisms of the heart before and after birth.KONG Bo and LU Xiao-dong contributed equally to this study.

  18. R/parallel - speeding up bioinformatics analysis with R

    NARCIS (Netherlands)

    Vera, Gonzalo; Jansen, Ritsert C.; Suppi, Remo L.

    2008-01-01

    Background: R is the preferred tool for statistical analysis of many bioinformaticians due in part to the increasing number of freely available analytical methods. Such methods can be quickly reused and adapted to each particular experiment. However, in experiments where large amounts of data are ge

  19. Bioinformatics and biomarker discovery "Omic" data analysis for personalized medicine

    CERN Document Server

    Azuaje, Francisco

    2010-01-01

    This book is designed to introduce biologists, clinicians and computational researchers to fundamental data analysis principles, techniques and tools for supporting the discovery of biomarkers and the implementation of diagnostic/prognostic systems. The focus of the book is on how fundamental statistical and data mining approaches can support biomarker discovery and evaluation, emphasising applications based on different types of "omic" data. The book also discusses design factors, requirements and techniques for disease screening, diagnostic and prognostic applications. Readers are provided w

  20. Bioinformatics analysis and genetic diversity of the poliovirus.

    Science.gov (United States)

    Liu, Yanhan; Ma, Tengfei; Liu, Jianzhu; Zhao, Xiaona; Cheng, Ziqiang; Guo, Huijun; Wang, Shujing; Xu, Ruixue

    2014-12-01

    Poliomyelitis, a disease which can manifest as muscle paralysis, is caused by the poliovirus, which is a human enterovirus and member of the family Picornaviridae that usually transmits by the faecal-oral route. The viruses of the OPV (oral poliovirus attenuated-live vaccine) strains can mutate in the human intestine during replication and some of these mutations can lead to the recovery of serious neurovirulence. Informatics research of the poliovirus genome can be used to explain further the characteristics of this virus. In this study, sequences from 100 poliovirus isolates were acquired from GenBank. To determine the evolutionary relationship between the strains, we compared and analysed the sequences of the complete poliovirus genome and the VP1 region. The reconstructed phylogenetic trees for the complete sequences and the VP1 sequences were both divided into two branches, indicating that the genetic relationships of the whole poliovirus genome and the VP1 sequences are very similar. This branching indicates that the virulence and pathogenicity of poliomyelitis may be associated with the VP1 region. Sequence alignment of the VP1 region revealed numerous mutation sites in which mutation rates of >30 % were detected. In a group of strains recorded in the USA, mutation sites and mutation types were the same and this may be associated with their distribution in the evolutionary tree and their genetic relationship. In conclusion, the genetic evolutionary relationships of poliovirus isolate sequences are determined to a great extent by the VP1 protein, and poliovirus strains located on the same branch of the phylogenetic tree contain the same mutation spots and mutation types. Hence, the genetic characteristics of the VP1 region in the poliovirus genome should be analysed to identify the transmission route of poliovirus and provide the basis of viral immunity development.

  1. Bioinformatic analysis of Entamoeba histolytica SINE1 elements

    Directory of Open Access Journals (Sweden)

    Butcher Sarah A

    2010-05-01

    Full Text Available Abstract Background Invasive amoebiasis, caused by infection with the human parasite Entamoeba histolytica remains a major cause of morbidity and mortality in some less-developed countries. Genetically E. histolytica exhibits a number of unusual features including having approximately 20% of its genome comprised of repetitive elements. These include a number of families of SINEs - non-autonomous elements which can, however, move with the help of partner LINEs. In many eukaryotes SINE mobility has had a profound effect on gene expression; in this study we concentrated on one such element - EhSINE1, looking in particular for evidence of recent transposition. Results EhSINE1s were detected in the newly reassembled E. histolytica genome by searching with a Hidden Markov Model developed to encapsulate the key features of this element; 393 were detected. Examination of their sequences revealed that some had an internal structure showing one to four 26-27 nt repeats. Members of the different classes differ in a number of ways and in particular those with two internal repeats show the properties expected of fairly recently transposed SINEs - they are the most homogeneous in length and sequence, they have the longest (i.e. the least decayed target site duplications and are the most likely to show evidence (in a cDNA library of active transcription. Furthermore we were able to identify 15 EhSINE1s (6 pairs and one triplet which appeared to be identical or very nearly so but inserted into different sites in the genome; these provide good evidence that if mobility has now ceased it has only done so very recently. Conclusions Of the many families of repetitive elements present in the genome of E. histolytica we have examined in detail just one - EhSINE1. We have shown that there is evidence for waves of transposition at different points in the past and no evidence that mobility has entirely ceased. There are many aspects of the biology of this parasite which

  2. Quantitative Analysis of the Trends Exhibited by the Three Interdisciplinary Biological Sciences: Biophysics, Bioinformatics, and Systems Biology.

    Science.gov (United States)

    Kang, Jonghoon; Park, Seyeon; Venkat, Aarya; Gopinath, Adarsh

    2015-12-01

    New interdisciplinary biological sciences like bioinformatics, biophysics, and systems biology have become increasingly relevant in modern science. Many papers have suggested the importance of adding these subjects, particularly bioinformatics, to an undergraduate curriculum; however, most of their assertions have relied on qualitative arguments. In this paper, we will show our metadata analysis of a scientific literature database (PubMed) that quantitatively describes the importance of the subjects of bioinformatics, systems biology, and biophysics as compared with a well-established interdisciplinary subject, biochemistry. Specifically, we found that the development of each subject assessed by its publication volume was well described by a set of simple nonlinear equations, allowing us to characterize them quantitatively. Bioinformatics, which had the highest ratio of publications produced, was predicted to grow between 77% and 93% by 2025 according to the model. Due to the large number of publications produced in bioinformatics, which nearly matches the number published in biochemistry, it can be inferred that bioinformatics is almost equal in significance to biochemistry. Based on our analysis, we suggest that bioinformatics be added to the standard biology undergraduate curriculum. Adding this course to an undergraduate curriculum will better prepare students for future research in biology.

  3. Secretome Analysis of Lipid-Induced Insulin Resistance in Skeletal Muscle Cells by a Combined Experimental and Bioinformatics Workflow.

    Science.gov (United States)

    Deshmukh, Atul S; Cox, Juergen; Jensen, Lars Juhl; Meissner, Felix; Mann, Matthias

    2015-11-01

    Skeletal muscle has emerged as an important secretory organ that produces so-called myokines, regulating energy metabolism via autocrine, paracrine, and endocrine actions; however, the nature and extent of the muscle secretome has not been fully elucidated. Mass spectrometry (MS)-based proteomics, in principle, allows an unbiased and comprehensive analysis of cellular secretomes; however, the distinction of bona fide secreted proteins from proteins released upon lysis of a small fraction of dying cells remains challenging. Here we applied highly sensitive MS and streamlined bioinformatics to analyze the secretome of lipid-induced insulin-resistant skeletal muscle cells. Our workflow identified 1073 putative secreted proteins including 32 growth factors, 25 cytokines, and 29 metalloproteinases. In addition to previously reported proteins, we report hundreds of novel ones. Intriguingly, ∼40% of the secreted proteins were regulated under insulin-resistant conditions, including a protein family with signal peptide and EGF-like domain structure that had not yet been associated with insulin resistance. Finally, we report that secretion of IGF and IGF-binding proteins was down-regulated under insulin-resistant conditions. Our study demonstrates an efficient combined experimental and bioinformatics workflow to identify putative secreted proteins from insulin-resistant skeletal muscle cells, which could easily be adapted to other cellular models.

  4. GProX, a User-Friendly Platform for Bioinformatics Analysis and Visualization of Quantitative Proteomics Data

    DEFF Research Database (Denmark)

    Rigbolt, Kristoffer T G; Vanselow, Jens T; Blagoev, Blagoy

    2011-01-01

    -friendly platform for comprehensive analysis, inspection and visualization of quantitative proteomics data we developed the Graphical Proteomics Data Explorer (GProX)(1). The program requires no special bioinformatics training, as all functions of GProX are accessible within its graphical user-friendly interface...... which will be intuitive to most users. Basic features facilitate the uncomplicated management and organization of large data sets and complex experimental setups as well as the inspection and graphical plotting of quantitative data. These are complemented by readily available high-level analysis options......Recent technological advances have made it possible to identify and quantify thousands of proteins in a single proteomics experiment. As a result of these developments, the analysis of data has become the bottleneck of proteomics experiment. To provide the proteomics community with a user...

  5. Bioinformatics analysis of differentially expressed proteins in prostate cancer based on proteomics data

    Science.gov (United States)

    Chen, Chen; Zhang, Li-Guo; Liu, Jian; Han, Hui; Chen, Ning; Yao, An-Liang; Kang, Shao-San; Gao, Wei-Xing; Shen, Hong; Zhang, Long-Jun; Li, Ya-Peng; Cao, Feng-Hong; Li, Zhi-Guo

    2016-01-01

    We mined the literature for proteomics data to examine the occurrence and metastasis of prostate cancer (PCa) through a bioinformatics analysis. We divided the differentially expressed proteins (DEPs) into two groups: the group consisting of PCa and benign tissues (P&b) and the group presenting both high and low PCa metastatic tendencies (H&L). In the P&b group, we found 320 DEPs, 20 of which were reported more than three times, and DES was the most commonly reported. Among these DEPs, the expression levels of FGG, GSN, SERPINC1, TPM1, and TUBB4B have not yet been correlated with PCa. In the H&L group, we identified 353 DEPs, 13 of which were reported more than three times. Among these DEPs, MDH2 and MYH9 have not yet been correlated with PCa metastasis. We further confirmed that DES was differentially expressed between 30 cancer and 30 benign tissues. In addition, DEPs associated with protein transport, regulation of actin cytoskeleton, and the extracellular matrix (ECM)–receptor interaction pathway were prevalent in the H&L group and have not yet been studied in detail in this context. Proteins related to homeostasis, the wound-healing response, focal adhesions, and the complement and coagulation pathways were overrepresented in both groups. Our findings suggest that the repeatedly reported DEPs in the two groups may function as potential biomarkers for detecting PCa and predicting its aggressiveness. Furthermore, the implicated biological processes and signaling pathways may help elucidate the molecular mechanisms of PCa carcinogenesis and metastasis and provide new targets for clinical treatment. PMID:27051295

  6. Bio-informatics Research Progress in the Post-genome Era Based on the Quantitative Analysis of SCIE

    Institute of Scientific and Technical Information of China (English)

    Yongqin; ZHAN; Min; YU

    2013-01-01

    SCIE paper output can reflect the status quo and trend of discipline research and 7 038 scientific articles concerning bioinformatics are retrieved in SCIE database during the years between 2008 and 2012. Quantitative analysis of paper output and citation frequency are conducted according to nations, institutions, publications, research direction as well as hot articles, which provides assistance for bioinformatics researchers to understand the present situation of this subject, carry out cooperative studies and display scientific research achievements.

  7. Bioinformatics Identification of Modules of Transcription Factor Binding Sites in Alzheimer's Disease-Related Genes by In Silico Promoter Analysis and Microarrays

    Directory of Open Access Journals (Sweden)

    Regina Augustin

    2011-01-01

    Full Text Available The molecular mechanisms and genetic risk factors underlying Alzheimer's disease (AD pathogenesis are only partly understood. To identify new factors, which may contribute to AD, different approaches are taken including proteomics, genetics, and functional genomics. Here, we used a bioinformatics approach and found that distinct AD-related genes share modules of transcription factor binding sites, suggesting a transcriptional coregulation. To detect additional coregulated genes, which may potentially contribute to AD, we established a new bioinformatics workflow with known multivariate methods like support vector machines, biclustering, and predicted transcription factor binding site modules by using in silico analysis and over 400 expression arrays from human and mouse. Two significant modules are composed of three transcription factor families: CTCF, SP1F, and EGRF/ZBPF, which are conserved between human and mouse APP promoter sequences. The specific combination of in silico promoter and multivariate analysis can identify regulation mechanisms of genes involved in multifactorial diseases.

  8. Expression and Bioinformatics Analysis of SPACA4 in Human and Mice

    Institute of Scientific and Technical Information of China (English)

    Ai-fa TANG; Zhen-dong YU; Yao-ting GUI; Xin GUO; Xian-xin LI; Wei-xiang LIU; Hui ZHU; Zhi-ming CAI

    2008-01-01

    Objective To analyze the expression of SPACA4 in human and mice. Methods Testes cRNA samples from Balb/c mice of different postnatal days were performed with mouse affymetrix chip to screen the expression of SPACA4 in mice. Sub-quantitative RT-PCR and bioinformatic tools were used here to describe the expression profile of SPACA4 in mice and human. Results The results of gene chip analysis indicated that the expression of mSPACA4 began after d 35 of postnatal testis in mice. Sub-quantitative RT-PCR assay showed that SPACA4 gene expressed exclusively in mouse and human testis, and mouse mSPACA4 gene expressed after d 35 of postnatal testis that was consistency with the results of gene chip analysis. By bioinformatics analysis, mSPACA4 is located in cell membrane (34.8%) or plasma membrane (34.8%), the signal peptide cleavage site between position 19 and 20 amino acids, transmembrane region between 2-20 and 101-126 amino acids, respectively, on mSPACA4 protein. Conclusion mSPACA4 and hSPACA4 were testis-specific genes, and the expression of mSPACA4 begins after d 35 of postnatal testis in mice. SPACA4 is a candidate for targeting in a sperm-based contraceptive vaccine.

  9. Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace

    Science.gov (United States)

    Thorvaldsdottir, Helga; Liefeld, Ted; Ocana, Marco; Borges-Rivera, Diego; Pochet, Nathalie; Robinson, James T.; Demchak, Barry; Hull, Tim; Ben-Artzi, Gil; Blankenberg, Daniel; Barber, Galt P.; Lee, Brian T.; Kuhn, Robert M.; Nekrutenko, Anton; Segal, Eran; Ideker, Trey; Reich, Michael; Regev, Aviv; Chang, Howard Y.; Mesirov, Jill P.

    2015-01-01

    Integrative analysis of multiple data types to address complex biomedical questions requires the use of multiple software tools in concert and remains an enormous challenge for most of the biomedical research community. Here we introduce GenomeSpace (http://www.genomespace.org), a cloud-based, cooperative community resource. Seeded as a collaboration of six of the most popular genomics analysis tools, GenomeSpace now supports the streamlined interaction of 20 bioinformatics tools and data resources. To facilitate the ability of non-programming users’ to leverage GenomeSpace in integrative analysis, it offers a growing set of ‘recipes’, short workflows involving a few tools and steps to guide investigators through high utility analysis tasks. PMID:26780094

  10. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  11. Bioinformatics Analysis for Coding SNPs of the HLADQA1 Gene Involved in Susceptibility to Cervical Cancer

    Institute of Scientific and Technical Information of China (English)

    Yanyun Li; Jun Xing; Linsheng Zhao; Yanni Li; Yuchuan Wang; Weiming Zhang

    2006-01-01

    OBJECTIVE To analyze coding SNPs of the HLA-DQA1 gene involved in susceptibility for cervical cancer by a bioinformatics approach, and to choose some SNPs that may have an association with cervical cancer.METHODS By a SNPper tool we extracted SNPs from a public database (dbSNP), exporting them in FASTA formats suitable for subsequent use.Then we used PARSESNP as a tool for the analysis of the cSNPs.RESULTS In the cSNPs of the HLA-DQA1 gene, we find that rs9272693and rs9272703, are made up of missense mutations which convert a codon for one amino acid into a codon for a different amino acid. We chose a PSSM Difference >10 as a lower level for the scores of changes predicted to be deldterious.CONCLUSION We used a bioinformatics approach for cSNPs analysis of the HLA-DQA1 gene. This method can select the variants in a conserved region, and give a PSSM Difference score. But the results need to be verified in cervical cancer patients and a control population.

  12. SweetNET: A Bioinformatics Workflow for Glycopeptide MS/MS Spectral Analysis.

    Science.gov (United States)

    Nasir, Waqas; Toledo, Alejandro Gomez; Noborn, Fredrik; Nilsson, Jonas; Wang, Mingxun; Bandeira, Nuno; Larson, Göran

    2016-08-01

    Glycoproteomics has rapidly become an independent analytical platform bridging the fields of glycomics and proteomics to address site-specific protein glycosylation and its impact in biology. Current glycopeptide characterization relies on time-consuming manual interpretations and demands high levels of personal expertise. Efficient data interpretation constitutes one of the major challenges to be overcome before true high-throughput glycopeptide analysis can be achieved. The development of new glyco-related bioinformatics tools is thus of crucial importance to fulfill this goal. Here we present SweetNET: a data-oriented bioinformatics workflow for efficient analysis of hundreds of thousands of glycopeptide MS/MS-spectra. We have analyzed MS data sets from two separate glycopeptide enrichment protocols targeting sialylated glycopeptides and chondroitin sulfate linkage region glycopeptides, respectively. Molecular networking was performed to organize the glycopeptide MS/MS data based on spectral similarities. The combination of spectral clustering, oxonium ion intensity profiles, and precursor ion m/z shift distributions provided typical signatures for the initial assignment of different N-, O- and CS-glycopeptide classes and their respective glycoforms. These signatures were further used to guide database searches leading to the identification and validation of a large number of glycopeptide variants including novel deoxyhexose (fucose) modifications in the linkage region of chondroitin sulfate proteoglycans.

  13. BioGPS descriptors for rational engineering of enzyme promiscuity and structure based bioinformatic analysis.

    Directory of Open Access Journals (Sweden)

    Valerio Ferrario

    Full Text Available A new bioinformatic methodology was developed founded on the Unsupervised Pattern Cognition Analysis of GRID-based BioGPS descriptors (Global Positioning System in Biological Space. The procedure relies entirely on three-dimensional structure analysis of enzymes and does not stem from sequence or structure alignment. The BioGPS descriptors account for chemical, geometrical and physical-chemical features of enzymes and are able to describe comprehensively the active site of enzymes in terms of "pre-organized environment" able to stabilize the transition state of a given reaction. The efficiency of this new bioinformatic strategy was demonstrated by the consistent clustering of four different Ser hydrolases classes, which are characterized by the same active site organization but able to catalyze different reactions. The method was validated by considering, as a case study, the engineering of amidase activity into the scaffold of a lipase. The BioGPS tool predicted correctly the properties of lipase variants, as demonstrated by the projection of mutants inside the BioGPS "roadmap".

  14. Importance of databases of nucleic acids for bioinformatic analysis focused to genomics

    Science.gov (United States)

    Jimenez-Gutierrez, L. R.; Barrios-Hernández, C. J.; Pedraza-Ferreira, G. R.; Vera-Cala, L.; Martinez-Perez, F.

    2016-08-01

    Recently, bioinformatics has become a new field of science, indispensable in the analysis of millions of nucleic acids sequences, which are currently deposited in international databases (public or private); these databases contain information of genes, RNA, ORF, proteins, intergenic regions, including entire genomes from some species. The analysis of this information requires computer programs; which were renewed in the use of new mathematical methods, and the introduction of the use of artificial intelligence. In addition to the constant creation of supercomputing units trained to withstand the heavy workload of sequence analysis. However, it is still necessary the innovation on platforms that allow genomic analyses, faster and more effectively, with a technological understanding of all biological processes.

  15. [Bioinformatic analysis of adenoma-normal mucosa SSH library of colon].

    Science.gov (United States)

    Lü, Bing-Jian; Cui, Jing; Xu, Jing; Zhang, Hao; Luo, Min-Jie; Zhu, Yi-Min; Lai, Mao-De

    2006-04-01

    We established a colonic adenoma-normal mucosa suppressive subtraction hybridization (SSH) library in 1999. In this study, we wanted to explore the expression profile of all candidate genes in this library. We developed an EST pipeline which contained two in-house software packages, nucleic acid analytical software and GetUni. The nucleic acid analytical software, an integrator of the universal bioinformatics tools including phred, phd2fasta, cross_match, repeatmasker and blast2.0, can blast sequences of differential clones with the downloaded non-redundant nucleotide (NR) database. GetUni can cluster these NR sequences into Unigene via matching with the downloaded Homo Sapiens UniGene database. Sixty-two candidate genes in A-N library were obtained via the high throughput automatic gene expression bioinformatics pipeline. Gene Ontology online analysis revealed that ribosome genes and immunity-regulating genes were the two most common categories in the KEGG or Biocarta Pathway. We also detected the expression of 2 genes with highest hits, Reg4 and FAM46A, by semi-quantitative RT-PCR. Both genes were up-regulated in 10 or 9 out of 10 adenomas in comparison with the paired normal mucosa, respectively. The candidate genes in A-N library would be of great significance in disclosing the molecular mechanism underlying in colonic adenoma initiation and progression.

  16. Predicting the Nuclear Localization Signals of 107 Types of HPV L1 Proteins by Bioinformatic Analysis

    Institute of Scientific and Technical Information of China (English)

    Jun Yang; Yi-Li Wang; Lü-Sheng Si

    2006-01-01

    In this study, 107 types of human papillomavirus (HPV) L1 protein sequences were obtained from available databases, and the nuclear localization signals (NLSs) of these HPV L1 proteins were analyzed and predicted by bioinformatic analysis.Out of the 107 types, the NLSs of 39 types were predicted by PredictNLS software (35 types of bipartite NLSs and 4 types of monopartite NLSs). The NLSs of the remaining HPV types were predicted according to the characteristics and the homology of the already predicted NLSs as well as the general rule of NLSs.According to the result, the NLSs of 107 types of HPV L1 proteins were classified into 15 categories. The different types of HPV L1 proteins in the same NLS category could share the similar or the same nucleocytoplasmic transport pathway.They might be used as the same target to prevent and treat different types of HPV infection. The results also showed that bioinformatic technology could be used to analyze and predict NLSs of proteins.

  17. Bioinformatic Analysis of Differential Protein Expression in Calu-3 Cells Exposed to Carbon Nanotubes

    Directory of Open Access Journals (Sweden)

    Pin Li

    2013-10-01

    Full Text Available Carbon nanomaterials are widely produced and used in industry, medicine and scientific research. To examine the impact of exposure to nanoparticles on human health, the human airway epithelial cell line, Calu-3, was used to evaluate changes in the cellular proteome that could account for alterations in cellular function of airway epithelia after 24 hexposure to 10 μg/mL and 100 ng/mLof two common carbon nanoparticles, single- and multi-wall carbon nanotubes (SWCNT, MWCNT. After exposure to the nanoparticles, label-free quantitative mass spectrometry (LFQMS was used to study the differential protein expression. Ingenuity Pathway Analysis (IPA was used to conduct a bioinformaticanalysis of proteins identified in LFQMS. Interestingly, after exposure to ahigh concentration (10 mg/mL; 0.4 mg/cm2 of MWCNT or SWCNT, only 8 and 13 proteins, respectively, exhibited changes in abundance. In contrast, the abundance of hundreds of proteins was altered in response to a low concentration (100 ng/mL; 4 ng/cm2 of either CNT. Of the 281 and 282 proteins that were significantly altered in response to MWCNT or SWCNT respectively, 231 proteins were the same. Bioinformatic analyses found that the proteins in common to both nanotubes occurred within the cellular functions of cell death and survival, cell-to-cell signaling and interaction, cellular assembly and organization, cellular growth and proliferation, infectious disease, molecular transport and protein synthesis. The majority of the protein changes represent a decrease in amount suggesting a general stress response to protect cells. The STRING database was used to analyze the various functional protein networks. Interestingly, some proteins like cadherin 1 (CDH1, signal transducer and activator of transcription 1 (STAT1, junction plakoglobin (JUP, and apoptosis-associated speck-like protein containing a CARD (PYCARD, appear in several functional categories and tend to be in the center of the networks. This

  18. Bioinformatics analysis of breast cancer bone metastasis related gene-CXCR4

    Institute of Scientific and Technical Information of China (English)

    Heng-Wei Zhang; Xian-Fu Sun; Ya-Ning He; Jun-Tao Li; Xu-Hui Guo; Hui Liu

    2013-01-01

    Objective: To analyze breast cancer bone metastasis related gene-CXCR4. Methods: This research screened breast cancer bone metastasis related genes by high-flux gene chip. Results:It was found that the expressions of 396 genes were different including 165 up-regulations and 231 down-regulations. The expression of chemokine receptor CXCR4 was obviously up-regulated in the tissue with breast cancer bone metastasis. Compared with the tissue without bone metastasis, there was significant difference, which indicated that CXCR4 played a vital role in breast cancer bone metastasis. Conclusions: The bioinformatics analysis of CXCR4 can provide a certain basis for the occurrence and diagnosis of breast cancer bone metastasis, target gene therapy and evaluation of prognosis.

  19. Experimental Design and Bioinformatics Analysis for the Application of Metagenomics in Environmental Sciences and Biotechnology.

    Science.gov (United States)

    Ju, Feng; Zhang, Tong

    2015-11-01

    Recent advances in DNA sequencing technologies have prompted the widespread application of metagenomics for the investigation of novel bioresources (e.g., industrial enzymes and bioactive molecules) and unknown biohazards (e.g., pathogens and antibiotic resistance genes) in natural and engineered microbial systems across multiple disciplines. This review discusses the rigorous experimental design and sample preparation in the context of applying metagenomics in environmental sciences and biotechnology. Moreover, this review summarizes the principles, methodologies, and state-of-the-art bioinformatics procedures, tools and database resources for metagenomics applications and discusses two popular strategies (analysis of unassembled reads versus assembled contigs/draft genomes) for quantitative or qualitative insights of microbial community structure and functions. Overall, this review aims to facilitate more extensive application of metagenomics in the investigation of uncultured microorganisms, novel enzymes, microbe-environment interactions, and biohazards in biotechnological applications where microbial communities are engineered for bioenergy production, wastewater treatment, and bioremediation.

  20. Bioinformatic analysis ofhuman nuclear receptornr5a2(hblf) genomic sequence

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    We have cloned the cDNA of human nuclear receptor nrSa2(hb1f) gene and obtained its whole genomic sequence previously. In this work we carried out in-depth bioinformatic analysis on the genomic sequence of nrSa2(hb1f) gene. Sequence comparison and prediction algorithms implicated that there might be additional coding regions in the 210 kb genomic sequence besides known exons,especially in the two largest introns. Comparison of the structures of nr5a loci in different species revealed distinguishable conservation and apparent gene duplication during evolution. The remarkable conservation among promoters of zebrafish, mouse and human nr5a2 genes suggested that they would be regulated by the same transcription factors.

  1. Cloning and bioinformatic analysis of lovastatin biosynthesis regulatory gene lovE

    Institute of Scientific and Technical Information of China (English)

    HUANG Xin; LI Hao-ming

    2009-01-01

    Background Lovastatin is an effective drug for treatment of hyperlipidemia.This study aimed to clone Iovastatin biosynthesis regulatory gene lovE and analyze the structure and function of its encoding protein.Methods According to the Iovastatin synthase gene sequence from genebank,primers were designed to amplify and clone the Iovastatin biosynthesis regulatory gene lovE from Aspergillus terms genomic DNA.Bioinformatic analysis of lovE and its encoding animo acid sequence was performed through intemet resources and software like DNAMAN.Results Target fragment lovE,almost 1500 bp in length,was amplified from Aspergillus terms genomic DNA and the secondary and three-dimensional structures of LovE protein were predicted.Conclusion In the Iovastatin biosynthesis process lovE is a regulatory gene and LovE protein is a GAL4-1ike transcriptional factor.

  2. Flow cytometry bioinformatics.

    Directory of Open Access Journals (Sweden)

    Kieran O'Neill

    Full Text Available Flow cytometry bioinformatics is the application of bioinformatics to flow cytometry data, which involves storing, retrieving, organizing, and analyzing flow cytometry data using extensive computational resources and tools. Flow cytometry bioinformatics requires extensive use of and contributes to the development of techniques from computational statistics and machine learning. Flow cytometry and related methods allow the quantification of multiple independent biomarkers on large numbers of single cells. The rapid growth in the multidimensionality and throughput of flow cytometry data, particularly in the 2000s, has led to the creation of a variety of computational analysis methods, data standards, and public databases for the sharing of results. Computational methods exist to assist in the preprocessing of flow cytometry data, identifying cell populations within it, matching those cell populations across samples, and performing diagnosis and discovery using the results of previous steps. For preprocessing, this includes compensating for spectral overlap, transforming data onto scales conducive to visualization and analysis, assessing data for quality, and normalizing data across samples and experiments. For population identification, tools are available to aid traditional manual identification of populations in two-dimensional scatter plots (gating, to use dimensionality reduction to aid gating, and to find populations automatically in higher dimensional space in a variety of ways. It is also possible to characterize data in more comprehensive ways, such as the density-guided binary space partitioning technique known as probability binning, or by combinatorial gating. Finally, diagnosis using flow cytometry data can be aided by supervised learning techniques, and discovery of new cell types of biological importance by high-throughput statistical methods, as part of pipelines incorporating all of the aforementioned methods. Open standards, data

  3. Nano-LC-ESI MS/MS analysis of proteins in dried sea dragon Solenognathus hardwickii and bioinformatic analysis of its protein expression profiling.

    Science.gov (United States)

    Zhang, Dong-Mei; Feng, Li-Xing; Li, Lu; Liu, Miao; Jiang, Bao-Hong; Yang, Min; Li, Guo-Qiang; Wu, Wan-Ying; Guo, De-An; Liu, Xuan

    2016-09-01

    The sea dragon Solenognathus hardwickii has long been used as a traditional Chinese medicine for the treatment of various diseases, such as male impotency. To gain a comprehensive insight into the protein components of the sea dragon, shotgun proteomic analysis of its protein expression profiling was conducted in the present study. Proteins were extracted from dried sea dragon using a trichloroacetic acid/acetone precipitation method and then separated by SDS-PAGE. The protein bands were cut from the gel and digested by trypsin to generate peptide mixture. The peptide fragments were then analyzed using nano liquid chromatography tandem mass spectrometry (nano-LC-ESI MS/MS). 810 proteins and 1 577 peptides were identified in the dried sea dragon. The identified proteins exhibited molecular weight values ranging from 1 900 to 3 516 900 Da and pI values from 3.8 to 12.18. Bioinformatic analysis was conducted using the DAVID Bioinformatics Resources 6.7 Gene Ontology (GO) analysis tool to explore possible functions of the identified proteins. Ascribed functions of the proteins mainly included intracellular non-membrane-bound organelle, non-membrane-bounded organelle, cytoskeleton, structural molecule activity, calcium ion binding and etc. Furthermore, possible signal networks of the identified proteins were predicted using STRING (Search Tool for the Retrieval of Interacting Genes) database. Ribosomal protein synthesis was found to play an important role in the signal network. The results of this study, to best of our knowledge, were the first to provide a reference proteome profile for the sea dragon, and would aid in the understanding of the expression and functions of the identified proteins.

  4. Bioinformatics analysis suggests base modifications of tRNAs and miRNAs in Arabidopsis thaliana

    Directory of Open Access Journals (Sweden)

    Jin Hailing

    2009-04-01

    Full Text Available Abstract Background Modifications of RNA bases have been found in some mRNAs and non-coding RNAs including rRNAs, tRNAs, and snRNAs, where modified bases are important for RNA function. Little is known about RNA base modifications in Arabidopsis thaliana. Results In the current work, we carried out a bioinformatics analysis of RNA base modifications in tRNAs and miRNAs using large numbers of cDNA sequences of small RNAs (sRNAs generated with the 454 technology and the massively parallel signature sequencing (MPSS method. We looked for sRNAs that map to the genome sequence with one-base mismatch (OMM, which indicate candidate modified nucleotides. We obtained 1,187 sites with possible RNA base modifications supported by both 454 and MPSS sequences. Seven hundred and three of these sites were within tRNA loci. Nucleotide substitutions were frequently located in the T arm (substitutions from A to U or G, upstream of the D arm (from G to C, U, or A, and downstream of the D arm (from G to U. The positions of major substitution sites corresponded with the following known RNA base modifications in tRNAs: N1-methyladenosine (m1A, N2-methylguanosine (m2G, and N2-N2-methylguanosine (m22G. Conclusion These results indicate that our bioinformatics method successfully detected modified nucleotides in tRNAs. Using this method, we also found 147 substitution sites in miRNA loci. As with tRNAs, substitutions from A to U or G and from G to C, U, or A were common, suggesting that base modifications might be similar in tRNAs and miRNAs. We suggest that miRNAs contain modified bases and such modifications might be important for miRNA maturation and/or function.

  5. Small envelope protein E of SARS:cloning,expression, purification, CD determination, and bioinformatics analysis

    Institute of Scientific and Technical Information of China (English)

    SHENXu; XUEJian-Hua; YUChang-Ying; LUOHai-Bin; QINLei; YUXiao-Jing; CHENJing; CHENLi-Li; XIONGBin; YUELi-Duo; CAIJian-Hua; SHENJian-Hua; LUOXiao-Min; CHENKai-Xian; SHITie-Liu; LIYi-Xue; HUGeng-Xi; JIANGHua-Liang

    2003-01-01

    AIM:To obtain the pure sample of SARS small envelope E protein (SARS E protein), study its properties and analyze its possible functions. METHODS: The plasmid of SARS E protein was constructed by the polymerase chain reaction (PCR), and the protein was expressed in the E coli strain. The secondary structure feature of the protein was determined by circular dichroism (CD) technique. The possible functions of this protein were annotated by bioinformatics methods, and its possible three-dimensional model was constructed by molecular modeling. RESULTS: The pure sample of SARS E protein was obtained. The secondary structure feature derived from CD determination is similar to that from the secondary structure prediction. Bioinformatics analysis indicated that the key residues of SARS E protein were much conserved compared to the E proteins of other coronaviruses. In particular, the primary amino acid sequence of SARS E protien is much more similar to that of murine hepatitis virus(MHV) and other mammal coronaviruses. The transmembrane (TM) segment of the SARS E protein is relatively more conserved in the whole protein than other regions. CONCLUSION: The success of expressing the SARS E protein is a good starting point for investigating the structure and functions of this protein and SARS coronavirus itself as well. The SARS E protein may fold in water solution in a similar way as it in membrane-water mixed environment. It is possible that β-sheet I of the SARS E protein interacts with the membrane surface via hydrogen bonding, this β-sheet may uncoil to a random structure in water solution.

  6. Identification of probable genomic packaging signal sequence from SARS—CoV genome by bioinformatics analysis

    Institute of Scientific and Technical Information of China (English)

    QINLei; XIONGBin; LUOCheng; GUOZong-Ming; HAOPei; SUJiong; NANPeng; FENGYing; SHIYi-Xiang; YUXiao-Jing; LUOXiao-Min; CHENKai-Xian; SHENXu; SHENJian-Hua; ZOUJian-Ping; ZHAOGuo-Ping; SHITie-Liu; HEWei-Zhong; ZHONGYang; JIANGHua-Liang; LIYi-Xue

    2003-01-01

    AIM:To predict the probable genomic packaging signal of SARS-CoV by bioinformatics analysis. The derived packaging signal may be used to design antisense RNA and RNA interfere (RANi) drugs treating SARS. methods: Based on the studies about the genomic packaging signals of MHV and BCoV, especially the information about primary and secondary structures, the putative genomic packaging signal of SARS_CoV were analyzed by using bioinformatic tools. Multi-alignment for the genomic sequences was performed among SARS-CoV,MHV,BCoV, PEDV and HCoV 229E. Secondary structures of RNA sequences were also predicted for the identification fo the possible genomic packaging signals. Meanwhile, the N and M proteins of all five viruses were analyzed to study the evolutionary relationship with genomic packaging signals. RESULTS: The putative genomic packaging signal of SARS-CoV locates at the 3′ end of ORF1b near that of MHV and BCoV, where is the most variable region of this gene. The RNA secondary structure of SARS-CoV genomic packaging signal is very similar to that of MHV and BCoV. The same result was also obtained in studying the genomic packaging signals of PEDV and HCoV 229E. Further more, the genomic sequence multi-alignment indicated that the locations of packaging signals of SARS-CoV, PEDV, and HCoV overlaped each other. It seems that the mutation rate of packaging signal sequences is much higher than the N protein, while only subtle variations for the M protein. CONCLUSIONS: The probable genomic packaging signal of SARS-CoV is analogous to that of MHV and BCoV, with the corresponding secondary RNA structure locating at the similar region of ORF1b. The positions where genomic packaging signals exist have suffered rounds of mutations, which may influence the primary structures of the N and M proteins consequently.

  7. Protectome analysis: a new selective bioinformatics tool for bacterial vaccine candidate discovery.

    Science.gov (United States)

    Altindis, Emrah; Cozzi, Roberta; Di Palo, Benedetta; Necchi, Francesca; Mishra, Ravi P; Fontana, Maria Rita; Soriani, Marco; Bagnoli, Fabio; Maione, Domenico; Grandi, Guido; Liberatori, Sabrina

    2015-02-01

    New generation vaccines are in demand to include only the key antigens sufficient to confer protective immunity among the plethora of pathogen molecules. In the last decade, large-scale genomics-based technologies have emerged. Among them, the Reverse Vaccinology approach was successfully applied to the development of an innovative vaccine against Neisseria meningitidis serogroup B, now available on the market with the commercial name BEXSERO® (Novartis Vaccines). The limiting step of such approaches is the number of antigens to be tested in in vivo models. Several laboratories have been trying to refine the original approach in order to get to the identification of the relevant antigens straight from the genome. Here we report a new bioinformatics tool that moves a first step in this direction. The tool has been developed by identifying structural/functional features recurring in known bacterial protective antigens, the so called "Protectome space," and using such "protective signatures" for protective antigen discovery. In particular, we applied this new approach to Staphylococcus aureus and Group B Streptococcus and we show that not only already known protective antigens were re-discovered, but also two new protective antigens were identified.

  8. Bioinformatics and moonlighting proteins

    Directory of Open Access Journals (Sweden)

    Sergio eHernández

    2015-06-01

    Full Text Available Multitasking or moonlighting is the capability of some proteins to execute two or more biochemical functions. Usually, moonlighting proteins are experimentally revealed by serendipity. For this reason, it would be helpful that Bioinformatics could predict this multifunctionality, especially because of the large amounts of sequences from genome projects. In the present work, we analyse and describe several approaches that use sequences, structures, interactomics and current bioinformatics algorithms and programs to try to overcome this problem. Among these approaches are: a remote homology searches using Psi-Blast, b detection of functional motifs and domains, c analysis of data from protein-protein interaction databases (PPIs, d match the query protein sequence to 3D databases (i.e., algorithms as PISITE, e mutation correlation analysis between amino acids by algorithms as MISTIC. Programs designed to identify functional motif/domains detect mainly the canonical function but usually fail in the detection of the moonlighting one, Pfam and ProDom being the best methods. Remote homology search by Psi-Blast combined with data from interactomics databases (PPIs have the best performance. Structural information and mutation correlation analysis can help us to map the functional sites. Mutation correlation analysis can only be used in very specific situations –it requires the existence of multialigned family protein sequences - but can suggest how the evolutionary process of second function acquisition took place. The multitasking protein database MultitaskProtDB (http://wallace.uab.es/multitask/, previously published by our group, has been used as a benchmark for the all of the analyses.

  9. Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center

    Science.gov (United States)

    Wattam, Alice R.; Davis, James J.; Assaf, Rida; Boisvert, Sébastien; Brettin, Thomas; Bun, Christopher; Conrad, Neal; Dietrich, Emily M.; Disz, Terry; Gabbard, Joseph L.; Gerdes, Svetlana; Henry, Christopher S.; Kenyon, Ronald W.; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K.; Olsen, Gary J.; Murphy-Olson, Daniel E.; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D.; Shukla, Maulik; Vonstein, Veronika; Warren, Andrew; Xia, Fangfang; Yoo, Hyunseung; Stevens, Rick L.

    2017-01-01

    The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by ‘virtual integration’ to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics. PMID:27899627

  10. BATMAN-TCM: a Bioinformatics Analysis Tool for Molecular mechANism of Traditional Chinese Medicine

    Science.gov (United States)

    Liu, Zhongyang; Guo, Feifei; Wang, Yong; Li, Chun; Zhang, Xinlei; Li, Honglei; Diao, Lihong; Gu, Jiangyong; Wang, Wei; Li, Dong; He, Fuchu

    2016-01-01

    Traditional Chinese Medicine (TCM), with a history of thousands of years of clinical practice, is gaining more and more attention and application worldwide. And TCM-based new drug development, especially for the treatment of complex diseases is promising. However, owing to the TCM’s diverse ingredients and their complex interaction with human body, it is still quite difficult to uncover its molecular mechanism, which greatly hinders the TCM modernization and internationalization. Here we developed the first online Bioinformatics Analysis Tool for Molecular mechANism of TCM (BATMAN-TCM). Its main functions include 1) TCM ingredients’ target prediction; 2) functional analyses of targets including biological pathway, Gene Ontology functional term and disease enrichment analyses; 3) the visualization of ingredient-target-pathway/disease association network and KEGG biological pathway with highlighted targets; 4) comparison analysis of multiple TCMs. Finally, we applied BATMAN-TCM to Qishen Yiqi dripping Pill (QSYQ) and combined with subsequent experimental validation to reveal the functions of renin-angiotensin system responsible for QSYQ’s cardioprotective effects for the first time. BATMAN-TCM will contribute to the understanding of the “multi-component, multi-target and multi-pathway” combinational therapeutic mechanism of TCM, and provide valuable clues for subsequent experimental validation, accelerating the elucidation of TCM’s molecular mechanism. BATMAN-TCM is available at http://bionet.ncpsb.org/batman-tcm. PMID:26879404

  11. Cloning, bioinformatics analysis, and expression of the dust mite allergen Der f 5 of Dermatophagoides farinae

    Directory of Open Access Journals (Sweden)

    Yubao Cui

    2012-08-01

    Full Text Available Crude extracts of house dust mites are used clinically for diagnosis and immunotherapy of allergic diseases, including bronchial asthma, perennial rhinitis, and atopic dermatitis. However, crude extracts are complexes with non-allergenic antigens and lack effective concentrations of important allergens, resulting in several side effects. Dermatophagoides farinae (Hughes; Acari: Pyroglyphidae is one of the predominant sources of dust mite allergens, which has more than 30 groups of allergen. The cDNA coding for the group 5 allergen of D. farinae from China was cloned, sequenced and expressed. According to alignment using the VECTOR NTI 9.0 software, there were eight mismatched nucleotides in five cDNA clones resulting in seven incompatible amino acid residues, suggesting that the Der f 5 allergen might have sequence polymorphism. Bioinformatics analysis revealed that the matured Der f 5 allergen has a molecular mass of 13604.03 Da, a theoretical pI of 5.43 and is probably hydrophobic and cytoplasmic. Similarities in amino acid sequences between Der f 5 and allergens of other domestic mite species, viz. Der p 5, Blo t 5, Sui m 5, and Lep d 5, were 79, 48, 53, and 37%, respectively. Phylogenetic analysis indicated that Der f 5 and Der p 5 clustered together. Blo t 5 and Ale o 5 also clustered together, although Blomia tropicalis and Aleuroglyphus ovatus belong to different mite families, viz. Echimyopodidae and Acaridae, respectively.

  12. BATMAN-TCM: a Bioinformatics Analysis Tool for Molecular mechANism of Traditional Chinese Medicine

    Science.gov (United States)

    Liu, Zhongyang; Guo, Feifei; Wang, Yong; Li, Chun; Zhang, Xinlei; Li, Honglei; Diao, Lihong; Gu, Jiangyong; Wang, Wei; Li, Dong; He, Fuchu

    2016-02-01

    Traditional Chinese Medicine (TCM), with a history of thousands of years of clinical practice, is gaining more and more attention and application worldwide. And TCM-based new drug development, especially for the treatment of complex diseases is promising. However, owing to the TCM’s diverse ingredients and their complex interaction with human body, it is still quite difficult to uncover its molecular mechanism, which greatly hinders the TCM modernization and internationalization. Here we developed the first online Bioinformatics Analysis Tool for Molecular mechANism of TCM (BATMAN-TCM). Its main functions include 1) TCM ingredients’ target prediction; 2) functional analyses of targets including biological pathway, Gene Ontology functional term and disease enrichment analyses; 3) the visualization of ingredient-target-pathway/disease association network and KEGG biological pathway with highlighted targets; 4) comparison analysis of multiple TCMs. Finally, we applied BATMAN-TCM to Qishen Yiqi dripping Pill (QSYQ) and combined with subsequent experimental validation to reveal the functions of renin-angiotensin system responsible for QSYQ’s cardioprotective effects for the first time. BATMAN-TCM will contribute to the understanding of the “multi-component, multi-target and multi-pathway” combinational therapeutic mechanism of TCM, and provide valuable clues for subsequent experimental validation, accelerating the elucidation of TCM’s molecular mechanism. BATMAN-TCM is available at http://bionet.ncpsb.org/batman-tcm.

  13. Bioinformatic and molecular analysis of hydroxymethylbutenyl diphosphate synthase (GCPE) gene expression during carotenoid accumulation in ripening tomato fruit.

    Science.gov (United States)

    Rodríguez-Concepción, Manuel; Querol, Jordi; Lois, Luisa María; Imperial, Santiago; Boronat, Albert

    2003-07-01

    Carotenoids are plastidic isoprenoid pigments of great biological and biotechnological interest. The precursors for carotenoid production are synthesized through the recently elucidated methylerythritol phosphate (MEP) pathway. Here we have identified a tomato ( Lycopersicon esculentum Mill.) cDNA sequence encoding a full-length protein with homology to the MEP pathway enzyme hydroxymethylbutenyl 4-diphosphate synthase (HDS, also called GCPE). Comparison with other plant and bacterial HDS sequences showed that the plant enzymes contain a plastid-targeting N-terminal sequence and two highly conserved plant-specific domains in the mature protein with no homology to any other sequence in the databases. The ubiquitous distribution of HDS-encoding expressed sequence tags (ESTs) in the tomato collections suggests that the corresponding gene is likely expressed throughout the plant. The role of HDS in controlling the supply of precursors for carotenoid biosynthesis was estimated from the bioinformatic and molecular analysis of transcript abundance in different stages of fruit development. No significant changes in HDS gene expression were deduced from the statistical analysis of EST distribution during fruit ripening, when an active MEP pathway is required to support a massive accumulation of carotenoids. RNA blot experiments confirmed that similar transcript levels were present in both the wild-type and carotenoid-depleted yellow ripe ( r) mutant fruit independent of the stage of development and the carotenoid composition of the fruit. Together, our results are consistent with a non-limiting role for HDS in carotenoid biosynthesis during tomato fruit ripening.

  14. String Mining in Bioinformatics

    Science.gov (United States)

    Abouelhoda, Mohamed; Ghanem, Moustafa

    Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word "data-mining" is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

  15. Identification and characterization of microRNAs in Eucheuma denticulatum by high-throughput sequencing and bioinformatics analysis.

    Science.gov (United States)

    Gao, Fan; Nan, Fangru; Feng, Jia; Lv, Junping; Liu, Qi; Xie, Shulian

    2016-01-01

    Eucheuma denticulatum, an economically and industrially important red alga, is a valuable marine resource. Although microRNAs (miRNAs) play an essential role in gene post-transcriptional regulation, no research has been conducted to identify and characterize miRNAs in E. denticulatum. In this study, we identified 134 miRNAs (133 conserved miRNAs and one novel miRNA) from 2,997,135 small-RNA reads by high-throughput sequencing combined with bioinformatics analysis. BLAST searching against miRBase uncovered 126 potential miRNA families. A conservation and diversity analysis of predicted miRNA families in different plant species was performed by comparative alignment and homology searching. A total of 4 and 13 randomly selected miRNAs were respectively validated by northern blotting and stem-loop reverse transcription PCR, thereby demonstrating the reliability of the miRNA sequencing data. Altogether, 871 potential target genes were predicted using psRobot and TargetFinder. Target genes classification and enrichment were conducted based on Gene Ontology analysis. The functions of target gene products and associated metabolic pathways were predicted by Kyoto Encyclopedia of Genes and Genomes pathway analysis. A Cytoscape network was constructed to explore the interrelationships of miRNAs, miRNA-target genes and target genes. A large number of miRNAs with diverse target genes will play important roles for further understanding some essential biological processes in E. denticulatum. The uncovered information can serve as an important reference for the protection and utilization of this unique red alga in the future.

  16. Bioinformatics analysis of a non-specific nuclease from Yersinia enterocolitica subsp. palearctica.

    Science.gov (United States)

    Li, Zhen-Hua; Tang, Zhen-Xing; Fang, Xiu-Juan; Zhang, Zhi-Liang; Shi, Lu-E

    2013-12-01

    In this paper, the physical and chemical characteristics, biological structure and function of a non-specific nuclease from Yersinia enterocolitica subsp. palearctica (Y. NSN) found in our group were studied using multiple bioinformatics approaches. The results showed that Y. NSN had 283 amino acids, a weight of 30,692.5 ku and a certain hydrophilic property. Y. NSN had a signal peptide, no transmembrane domains and disulphide bonds. Cleavage site in Y. NSN was between pos. 23 and 24. The prediction result of the secondary structure showed Y. NSN was a coil structure-based protein. The ratio of α-helix, β-folded and random coil were 18.73%, 16.96% and 64.31%, respectively. Active sites were pos. 124, 125, 127, 157, 165 and 169. Mg(2+) binding site was pos. 157. Substrate binding sites were pos. 124, 125 and 169. The analysis of multisequencing alignment and phylogenetic tree indicated that Y. NSN shared high similarity with the nuclease from Y. enterocolitica subsp. enterocolitica 8081. The enzyme activity results showed that Y. NSN was a nuclease with good thermostability.

  17. Prokaryotic Expression of Rice Ospgip1 Gene and Bioinformatic Analysis of Encoded Product

    Institute of Scientific and Technical Information of China (English)

    CHEN Xi-jun; LIU Xiao-wei; Zuo Si-min; MA Yu-yin; TONG Yun-hui; PAN Xue-biao; XU Jing-you

    2011-01-01

    Using the reference sequences of pgip genes in GenBank,a fragment of 930 bp covering the open reading frame (ORF) of rice Ospgip1 (Oryza sativa polygalacturonase-inhibiting protein 1) was amplified.The prokaryotic expression product of the gene inhibited the growth of Rhizoctonia solani,the causal agent of rice sheath blight,and reduced its polygalacturonase activity.Bioinformatic analysis showed that OsPGIP1 is a hydrophobic protein with a molecular weight of 32.8 kDa and an isoelectric point (pl) of 7.26.The protein is mainly located in the cell wall of rice,and its signal peptide cleavage site is located between the 17th and 18th amino acids.There are four cysteines in both the N-and C-termini of the deduced protein,which can form three disulfide bonds (between the 56th and 63rd,the 278th and 298th,and the 300th and308th amino acids).The protein has a typical leucine-rich repeat (LRR) domain,and its secondary structure comprises α-helices,β-sheets and irregular coils.Compared with polygalacturonase-inhibiting proteins (PGIPs) from other plants,the 7th LRR is absent in OsPGIP1.The nine LRRs could form a cleft that might associate with proteins from pathogenic fungi,such as polygalacturonase.

  18. Effect of Wnt3a on Keratinocytes Utilizing in Vitro and Bioinformatics Analysis

    Directory of Open Access Journals (Sweden)

    Ju-Suk Nam

    2014-03-01

    Full Text Available Wingless-type (Wnt signaling proteins participate in various cell developmental processes. A suppressive role of Wnt5a on keratinocyte growth has already been observed. However, the role of other Wnt proteins in proliferation and differentiation of keratinocytes remains unknown. Here, we investigated the effects of the Wnt ligand, Wnt3a, on proliferation and differentiation of keratinocytes. Keratinocytes from normal human skin were cultured and treated with recombinant Wnt3a alone or in combination with the inflammatory cytokine, tumor necrosis factor α (TNFα. Furthermore, using bioinformatics, we analyzed the biochemical parameters, molecular evolution, and protein–protein interaction network for the Wnt family. Application of recombinant Wnt3a showed an anti-proliferative effect on keratinocytes in a dose-dependent manner. After treatment with TNFα, Wnt3a still demonstrated an anti-proliferative effect on human keratinocytes. Exogenous treatment of Wnt3a was unable to alter mRNA expression of differentiation markers of keratinocytes, whereas an altered expression was observed in TNFα-stimulated keratinocytes. In silico phylogenetic, biochemical, and protein–protein interaction analysis showed several close relationships among the family members of the Wnt family. Moreover, a close phylogenetic and biochemical similarity was observed between Wnt3a and Wnt5a. Finally, we proposed a hypothetical mechanism to illustrate how the Wnt3a protein may inhibit the process of proliferation in keratinocytes, which would be useful for future researchers.

  19. Bioinformatic analysis for structure and function of TCTP from Spirometra mansoni

    Institute of Scientific and Technical Information of China (English)

    Ya-Jun Lu; Gang Lu; Da-Zhong Shi; Li-Hua Li; Sai-Feng Zhong

    2013-01-01

    Objective:To predict structure and function of translationally controlled tumor protein (TCTP) from Spirometra mansoni by bioinformatics technology, and to provide a theoretical basis for further study. Methods: Open reading frame (ORF) of EST sequence from Spirometra mansoni was obtained by ORF finder and was translated into amino acid residue by DNAclub. The structure domain was analyzed by Blast. By the method of online analysis tools: Protparam, InterProScan, protscale, SignalP-3.0, PSORTⅡ, BepiPred, TMHMM, VectorNTI Suite 9 packages and Phyre2, the structure and function of the protein were predicted and analyzed. Results:The results showed that the EST sequence was Sm TCTP with 173 amino acid residues, theoretical molecular weight was 19 872.0 Da. The protein has the closest evolutionary status with Clonorchis sinensis, Schistosoma mansoni, and Schistosoma japonicum. Then it had no signal peptide site and transmembrane domain. Secondary structure of TCTP contained twoα-helices and eightβ-strands. Conclusions:Sm TCTP was a variety of biological functions of protein that may be used as a vaccine candidate molecule and drug target.

  20. Bioinformatics approaches for structural and functional analysis of proteins in secondary metabolism in Withania somnifera.

    Science.gov (United States)

    Sanchita; Singh, Swati; Sharma, Ashok

    2014-11-01

    Withania somnifera (Ashwagandha) is an affluent storehouse of large number of pharmacologically active secondary metabolites known as withanolides. These secondary metabolites are produced by withanolide biosynthetic pathway. Very less information is available on structural and functional aspects of enzymes involved in withanolides biosynthetic pathways of Withiana somnifera. We therefore performed a bioinformatics analysis to look at functional and structural properties of these important enzymes. The pathway enzymes taken for this study were 3-Hydroxy-3-methylglutaryl coenzyme A reductase, 1-Deoxy-D-xylulose-5-phosphate synthase, 1-Deoxy-D-xylulose-5-phosphate reductase, farnesyl pyrophosphate synthase, squalene synthase, squalene epoxidase, and cycloartenol synthase. The prediction of secondary structure was performed for basic structural information. Three-dimensional structures for these enzymes were predicted. The physico-chemical properties such as pI, AI, GRAVY and instability index were also studied. The current information will provide a platform to know the structural attributes responsible for the function of these protein until experimental structures become available.

  1. Molecular cloning and bioinformatic analysis of the Streptococcus agalactiae neuA gene isolated from tilapia.

    Science.gov (United States)

    Wang, E L; Wang, K Y; Chen, D F; Geng, Y; Huang, L Y; Wang, J; He, Y

    2015-06-01

    Cytidine monophosphate (CMP) N-acetylneuraminic acid (NeuNAc) synthetase, which is encoded by the neuA gene, can catalyze the activation of sialic acid with CMP, and plays an important role in Streptococcus agalactiae infection pathogenesis. To study the structure and function of the S. agalactiae neuA gene, we isolated it from diseased tilapia, amplified it using polymerase chain reaction (PCR) with specific primers, and cloned it into a pMD19-T vector. The recombinant plasmid was confirmed by PCR and restriction enzyme digestion, and identified by sequencing. Molecular characterization analyses of the neuA nucleotide amino acid sequence were performed using bioinformatic tools and an online server. The results showed that the neuA nucleotide sequence contained a complete coding region, which comprised 1242 bp, encoding 413 amino acids (aa). The aa sequence was highly conserved and contained a Glyco_tranf_GTA_type superfamily and an SGNH_hydrolase superfamily conserved domain, which are related to sialic acid activation catalysis. The NeuA protein possessed many important sites related to post-translational modification, including 28 potential phosphorylation sites and 2 potential N-glycosylation sites, had no signal peptides or transmembrane regions, and was predicted to reside in the cytoplasm. Moreover, the protein had some B-cell epitopes, which suggests its potential in development of a vaccine against S. agalactiae infection. The codon usage frequency of neuA differed greatly in Escherichia coli and Homo sapiens genes, and neuA may be more efficiently expressed in eukaryotes (yeast). S. agalactiae neuA from tilapia maintains high structural homology and sequence identity with CMP-NeuNAc synthetases from other bacteria.

  2. Bioinformatics data supporting revelatory diversity of cultivable thermophiles isolated and identified from two terrestrial hot springs, Unkeshwar, India

    Directory of Open Access Journals (Sweden)

    Bhagwan N. Rekadwad

    2016-06-01

    Full Text Available A total of 21 thermophilic bacteria were isolated and identified using 16S rRNA gene sequencing method. Sequences were submitted to NCBI website. Short DNA sequences JN392966–JN392972; KC120909–KC120919; KM998072–KM998074 and KP053645 strains were downloaded from NCBI BioSample database. ENDMEMO GC calculating tool was used for calculation of maximum, minimum and average GC percentage and graphical representation of GC content. Data generated indicate 20 short DNA sequences have maximum GC content ranged from 60% to 100% with an average GC content 52.5–59.8%. It is recorded that Bacillus sp. W7, Escherichia coli strain NW1 and Geobacillus thermoleovorans strain rekadwadsis strains showed GC content maximum up to 70%; Actinobacterium EF_NAK1-7 up to 85.7%, while Bacillus megaterium and E. coli strain NW2 showed GC content maximum to 100%. Digital data on thermophilic bacteria isolated from Unkeshwar hot springs would be useful for interpretation of presence of biodiversity in addition to phenotypic, physiological characteristics and data generated through 16S rRNA gene sequencing technology.

  3. Bioinformatics data supporting revelatory diversity of cultivable thermophiles isolated and identified from two terrestrial hot springs, Unkeshwar, India.

    Science.gov (United States)

    Rekadwad, Bhagwan N; Khobragade, Chandrahasya N

    2016-06-01

    A total of 21 thermophilic bacteria were isolated and identified using 16S rRNA gene sequencing method. Sequences were submitted to NCBI website. Short DNA sequences JN392966-JN392972; KC120909-KC120919; KM998072-KM998074 and KP053645 strains were downloaded from NCBI BioSample database. ENDMEMO GC calculating tool was used for calculation of maximum, minimum and average GC percentage and graphical representation of GC content. Data generated indicate 20 short DNA sequences have maximum GC content ranged from 60% to 100% with an average GC content 52.5-59.8%. It is recorded that Bacillus sp. W7, Escherichia coli strain NW1 and Geobacillus thermoleovorans strain rekadwadsis strains showed GC content maximum up to 70%; Actinobacterium EF_NAK1-7 up to 85.7%, while Bacillus megaterium and E. coli strain NW2 showed GC content maximum to 100%. Digital data on thermophilic bacteria isolated from Unkeshwar hot springs would be useful for interpretation of presence of biodiversity in addition to phenotypic, physiological characteristics and data generated through 16S rRNA gene sequencing technology.

  4. [Phylogenetic and Bioinformatics Analysis of Replicase Gene Sequence of Cucumber Green Mottle Mosaic Virus].

    Science.gov (United States)

    Liang, Chaoqiong; Meng, Yan; Luo, Laixin; Liu, Pengfei; Li, Jianqiang

    2015-11-01

    kD proteins of tested CGMMV isolates. The current results that there was no significant difference between the replicase gene sequences, it was stable and conservative for intra-species and clearly difference for inter-species. CGMMV-No. 1, CGMMV-No. 3, CGMMV-No. 4 and CGMMV-No. 5 had. a close genetic relationship with Shandong and Liangning isolates (Accession No. KJ754195 and EF611826), they are potentially originate from the same source. CGMMV-No. 2 was closer with Korea isolate. High sequence similarity of tested samples were gathered for a class in phylogenetic tree. It didn't show regularity of the bioinformatics analysis results of 129 kD and 57 kD proteins of tested CGMMV isolates. There was no corresponding relationship among the molecular phylogeny and the bioinformatics analysis of the tested CGMMV isolates.

  5. Analysis of metagenomics next generation sequence data for fungal ITS barcoding: Do you need advance bioinformatics experience?

    Directory of Open Access Journals (Sweden)

    Abdalla Osman Abdalla Ahmed

    2016-07-01

    Full Text Available During the last few decades, most of microbiology laboratories have become familiar in analyzing Sanger sequence data for ITS barcoding. However, with the availability of next-generation sequencing platforms in many centers, it has become important for medical mycologists to know how to make sense of the massive sequence data generated by these new sequencing technologies. In many reference laboratories, the analysis of such data is not a big deal, since suitable IT infrastructure and well-trained bioinformatics scientists are always available. However, in small research laboratories and clinical microbiology laboratories the availability of such resources are always lacking. In this report, simple and user-friendly bioinformatics work-flow is suggested for fast and reproducible ITS barcoding of fungi.

  6. A Critical Analysis of Assessment Quality in Genomics and Bioinformatics Education Research

    Science.gov (United States)

    Campbell, Chad E.; Nehm, Ross H.

    2013-01-01

    The growing importance of genomics and bioinformatics methods and paradigms in biology has been accompanied by an explosion of new curricula and pedagogies. An important question to ask about these educational innovations is whether they are having a meaningful impact on students' knowledge, attitudes, or skills. Although assessments are…

  7. Bioinformatic analysis of functional differences between the immunoproteasome and the constitutive proteasome

    DEFF Research Database (Denmark)

    Kesmir, Can; van Noort, V.; de Boer, R.J.;

    2003-01-01

    not yet been quantified how different the specificity of two forms of the proteasome are. The main question, which still lacks direct evidence, is whether the immunoproteasome generates more MHC ligands. Here we use bioinformatics tools to quantify these differences and show that the immunoproteasome...

  8. BIOINFORMATICS AND BIOSYNTHESIS ANALYSIS OF CELLULOSE SYNTHASE OPERON IN ZYMOMONAS MOBILIS ZM4

    Directory of Open Access Journals (Sweden)

    Sheik Abdul Kader Sheik Asraf, K. Narayanan Rajnish, and Paramasamy Gunasekaran

    2011-03-01

    confirmed by the Acetic-Nitric (Updegraff Cellulose assay. The Bioinformatics and biosynthetic analysis confirm the biosynthesis of cellulose in Z. mobilis.

  9. Identification of Immunoreactive Leishmania infantum Protein Antigens to Asymptomatic Dog Sera through Combined Immunoproteomics and Bioinformatics Analysis

    Science.gov (United States)

    Samiotaki, Martina; Panayotou, George; Karagouni, Evdokia

    2016-01-01

    Leishmania infantum is the etiologic agent of zoonotic visceral leishmaniasis (VL) in countries in the Mediterranean basin, where dogs are the domestic reservoirs and represent important elements in the transmission of the disease. Since the major focal areas of human VL exhibit a high prevalence of seropositive dogs, the control of canine VL could reduce the infection rate in humans. Efforts toward this have focused on the improvement of diagnostic tools, as well as on vaccine development. The identification of parasite antigens including suitable major histocompatibility complex (MHC) class I- and/or II-restricted epitopes is very important since disease protection is characterized by strong and long-lasting CD8+ T and CD4+ Th1 cell-dominated immunity. In the present study, total protein extract from late-log phase L. infantum promastigotes was analyzed by two-dimensional western blots and probed with sera from asymptomatic and symptomatic dogs. A total of 42 protein spots were found to differentially react with IgG from asymptomatic dogs, while 17 of these identified by Coommasie stain were extracted and analyzed. Of these, 21 proteins were identified by mass spectrometry; they were mainly involved in metabolism and stress responses. An in silico analysis predicted that the chaperonin HSP60, dihydrolipoamide dehydrogenase, enolase, cyclophilin 2, cyclophilin 40, and one hypothetical protein contain promiscuous MHCI and/or MHCII epitopes. Our results suggest that the combination of immunoproteomics and bioinformatics analyses is a promising method for the identification of novel candidate antigens for vaccine development or with potential use in the development of sensitive diagnostic tests. PMID:26906226

  10. Identification and bioinformatics analysis of lactate dehydrogenase genes fromEchinococcus granulosus

    Institute of Scientific and Technical Information of China (English)

    Gang Lu; Yajun Lu; Lihua Li; Lixian Wu; Zhigang Fan; Dazhong Shi; Hu Wang; Xiumin Han

    2010-01-01

    Objective:To identify full length cDNA sequence of lactate dehydrogenase(LDH) from adultEchinococcus granulosus (E. granulosus) and to predict the structure and function of its encoding protein using bioinformatics methods.Methods: With the help ofNCBI, EMBI, Expasy and other online sites, the open reading frame (ORF), conserved domain, physical and chemical parameters, signal peptide, epitope, topological structures of the protein sequences were predicted and a homology tertiary structure model was created; VectorNTI software was used for sequence alignment, phylogenetic tree construction and tertiary structure prediction. Results: The target sequence was1 233 bp length with a996 bp biggestORFencoding331 amino acids protein with typicalL-LDH conserved domain. It was confirmed as full length cDNA of LDH fromE. granulosus and named asEgLDH (GenBank accession number:HM748917). The predicted molecular weight and isoelectric point of the deduced protein were3 5516.2Da and6.32 respectively. Compared withLDHs fromTaenia solium, Taenia saginata asiatica, Spirometra erinaceieuropaei, Schistosoma japonicum, Clonorchis sinensis and human, it showed similarity of 86%, 85%, 55%, 58%, 58% and 53%, respectively. EgLDH contained3putative transmembrane regions and4 major epitopes (54aa-59aa,81aa-87aa,97aa-102aa,307aa-313aa), the latter were significant different from the corresponding regions of humanLDH. In addition, someNAD and substrate binding sites located on epitopes54aa-59aa and97aa-102aa, respectively. Tertiary structure prediction showed that3 key catalytic residues105R, 165D and192H forming a catalytic center near the epitope97aa-102aa, mostNAD and substrate binding sites located around the center.Conclusions: The full length cDNA sequences of EgLDH were identified. It encoded a putative transmembrane protein which might be an ideal target molecule for vaccine and drugs.

  11. Identifying glioblastoma gene networks based on hypergeometric test analysis.

    Directory of Open Access Journals (Sweden)

    Vasileios Stathias

    Full Text Available Patient specific therapy is emerging as an important possibility for many cancer patients. However, to identify such therapies it is essential to determine the genomic and transcriptional alterations present in one tumor relative to control samples. This presents a challenge since use of a single sample precludes many standard statistical analysis techniques. We reasoned that one means of addressing this issue is by comparing transcriptional changes in one tumor with those observed in a large cohort of patients analyzed by The Cancer Genome Atlas (TCGA. To test this directly, we devised a bioinformatics pipeline to identify differentially expressed genes in tumors resected from patients suffering from the most common malignant adult brain tumor, glioblastoma (GBM. We performed RNA sequencing on tumors from individual GBM patients and filtered the results through the TCGA database in order to identify possible gene networks that are overrepresented in GBM samples relative to controls. Importantly, we demonstrate that hypergeometric-based analysis of gene pairs identifies gene networks that validate experimentally. These studies identify a putative workflow for uncovering differentially expressed patient specific genes and gene networks for GBM and other cancers.

  12. SeqBuster, a bioinformatic tool for the processing and analysis of small RNAs datasets, reveals ubiquitous miRNA modifications in human embryonic cells.

    Science.gov (United States)

    Pantano, Lorena; Estivill, Xavier; Martí, Eulàlia

    2010-03-01

    High-throughput sequencing technologies enable direct approaches to catalog and analyze snapshots of the total small RNA content of living cells. Characterization of high-throughput sequencing data requires bioinformatic tools offering a wide perspective of the small RNA transcriptome. Here we present SeqBuster, a highly versatile and reliable web-based toolkit to process and analyze large-scale small RNA datasets. The high flexibility of this tool is illustrated by the multiple choices offered in the pre-analysis for mapping purposes and in the different analysis modules for data manipulation. To overcome the storage capacity limitations of the web-based tool, SeqBuster offers a stand-alone version that permits the annotation against any custom database. SeqBuster integrates multiple analyses modules in a unique platform and constitutes the first bioinformatic tool offering a deep characterization of miRNA variants (isomiRs). The application of SeqBuster to small-RNA datasets of human embryonic stem cells revealed that most miRNAs present different types of isomiRs, some of them being associated to stem cell differentiation. The exhaustive description of the isomiRs provided by SeqBuster could help to identify miRNA-variants that are relevant in physiological and pathological processes. SeqBuster is available at http://estivill_lab.crg.es/seqbuster.

  13. Bioinformatic analysis of CaBP/calneuron proteins reveals a family of highly conserved vertebrate Ca2+-binding proteins

    Directory of Open Access Journals (Sweden)

    Burgoyne Robert D

    2010-04-01

    Full Text Available Abstract Background Ca2+-binding proteins are important for the transduction of Ca2+ signals into physiological outcomes. As in calmodulin many of the Ca2+-binding proteins bind Ca2+ through EF-hand motifs. Amongst the large number of EF-hand containing Ca2+-binding proteins are a subfamily expressed in neurons and retinal photoreceptors known as the CaBPs and the related calneuron proteins. These were suggested to be vertebrate specific but exactly which family members are expressed outside of mammalian species had not been examined. Findings We have carried out a bioinformatic analysis to determine when members of this family arose and the conserved aspects of the protein family. Sequences of human members of the family obtained from GenBank were used in Blast searches to identify corresponding proteins encoded in other species using searches of non-redundant proteins, genome sequences and mRNA sequences. Sequences were aligned and compared using ClustalW. Some families of Ca2+-binding proteins are known to show a progressive expansion in gene number as organisms increase in complexity. In contrast, the results for CaBPs and calneurons showed that a full complement of CaBPs and calneurons are present in the teleost fish Danio rerio and possibly in cartilaginous fish. These findings suggest that the entire family of genes may have arisen at the same time during vertebrate evolution. Certain members of the family (for example the short form of CaBP1 and calneuron 1 are highly conserved suggesting essential functional roles. Conclusions The findings support the designation of the calneurons as a distinct sub-family. While the gene number for CaBPs/calneurons does not increase, a distinctive evolutionary change in these proteins in vertebrates has been an increase in the number of splice variants present in mammals.

  14. Bioinformatics analysis of the factors controlling type I IFN gene expression in autoimmune disease and virus-induced immunity

    Directory of Open Access Journals (Sweden)

    Di eFeng

    2013-09-01

    Full Text Available Patients with systemic lupus erythematosus (SLE and Sjögren's syndrome (SS display increased levels of type I IFN-induced genes. Plasmacytoid dendritic cells (PDCs are natural interferon producing cells and considered to be a primary source of IFN-α in these two diseases. Differential expression patterns of type I IFN inducible transcripts can be found in different immune cell subsets and in patients with both active and inactive autoimmune disease. A type I IFN gene signature generally consists of three groups of IFN-induced genes - those regulated in response to virus-induced type I IFN, those regulated by the IFN-induced mitogen-activated protein kinase/extracellular-regulated kinase (MAPK/ERK pathway, and those by the IFN-induced phosphoinositide-3 kinase (PI-3K pathway. These three groups of type I IFN-regulated genes control important cellular processes such as apoptosis, survival, adhesion, and chemotaxis, that when dysregulated, contribute to autoimmunity. With the recent generation of large datasets in the public domain from next-generation sequencing and DNA microarray experiments, one can perform detailed analyses of cell type-specific gene signatures as well as identify distinct transcription factors that differentially regulate these gene signatures. We have performed bioinformatics analysis of data in the public domain and experimental data from our lab to gain insight into the regulation of type I IFN gene expression. We have found that the genetic landscape of the IFNA and IFNB genes are occupied by transcription factors, such as insulators CTCF and cohesin, that negatively regulate transcription, as well as IRF5 and IRF7, that positively and distinctly regulate IFNA subtypes. A detailed understanding of the factors controlling type I IFN gene transcription will significantly aid in the identification and development of new therapeutic strategies targeting the IFN pathway in autoimmune disease.

  15. Cloning and Bioinformatics Analysis of ZmERECTA-LIKE1 and Construction of Plant Expression Vector

    Institute of Scientific and Technical Information of China (English)

    Yihong JI; Jinbao PAN; Min LU; Jun HAN; Zhangjie NAN; Qingpeng SUN

    2016-01-01

    Objective] This study was conducted to clone and analyze ERECTA-LIKE1 gene in Zea mays by PCR and bioinfor-matics methods and to construct plant expression vector pCambia3301-zmERECTA-LIKE1. [Method] zmERECTA-LIKE1 (zmERL1) gene was obtained using RT-PCR, and physical-chemical properties were analyzed by bioinformatics methods, including domains, transmembrane regions, N-Glycosylation potential sites phosphorylation sites, and etc. [Result] Bioinformatics results showed that zmERL1 gene was 2 169 bp, which encoded a protein consisting of 722 amino acids, 11 N-glycosylation potential sites and 42 kinase specific phosphorylation sites. According to CDD2.23 and TMHMM Server v. 2.0 software, there were leucine-rich repeats, a PKC domain and a transmembrane region in this protein. The theoretical pI and molecular weight of zmERL1 encoded protein was 6.20 and 79 184.8 using Compute PI/Mw tool. Furthermore, we constructed the plant expression vector pCambia3301-zmERECTA-LIKE1 by subcloning zmERL1 gene into pCambia3301 instead of GUS. [Conclusion] The results provide a theoretical basis for the application of zmERL1 gene in future study.

  16. A Comprehensive Bioinformatics Analysis of the Nudix Superfamily in Arabidopsis thaliana

    Directory of Open Access Journals (Sweden)

    D. Gunawardana

    2009-01-01

    Full Text Available Nudix enzymes are a superfamily with a conserved common reaction mechanism that provides the capacity for the hydrolysis of a broad spectrum of metabolites. We used hidden Markov models based on Nudix sequences from the PFAM and PROSITE databases to identify Nudix hydrolases encoded by the Arabidopsis genome. 25 Nudix hydrolases were identified and classified into 11 individual families by pairwise sequence alignments. Intron phases were strikingly conserved in each family. Phylogenetic analysis showed that all multimember families formed monophyletic clusters. Conserved familial sequence motifs were identified with the MEME motif analysis algorithm. One motif (motif 4 was found in three diverse families. All proteins containing motif 4 demonstrated a degree of preference for substrates containing an ADP moiety. We conclude that HMM model-based genome scanning and MEME motif analysis, respectively, can significantly improve the identification and assignment of function of new members of this mechanistically-diverse protein superfamily.

  17. Towards understanding the lifespan extension by reduced insulin signaling: bioinformatics analysis of DAF-16/FOXO direct targets in Caenorhabditis elegans

    Science.gov (United States)

    Li, Yan-Hui; Zhang, Gai-Gai

    2016-01-01

    DAF-16, the C. elegans FOXO transcription factor, is an important determinant in aging and longevity. In this work, we manually curated FOXODB http://lyh.pkmu.cn/foxodb/, a database of FOXO direct targets. It now covers 208 genes. Bioinformatics analysis on 109 DAF-16 direct targets in C. elegans found interesting results. (i) DAF-16 and transcription factor PQM-1 co-regulate some targets. (ii) Seventeen targets directly regulate lifespan. (iii) Four targets are involved in lifespan extension induced by dietary restriction. And (iv) DAF-16 direct targets might play global roles in lifespan regulation. PMID:27027346

  18. Genome Exploitation and Bioinformatics Tools

    Science.gov (United States)

    de Jong, Anne; van Heel, Auke J.; Kuipers, Oscar P.

    Bioinformatic tools can greatly improve the efficiency of bacteriocin screening efforts by limiting the amount of strains. Different classes of bacteriocins can be detected in genomes by looking at different features. Finding small bacteriocins can be especially challenging due to low homology and because small open reading frames (ORFs) are often omitted from annotations. In this chapter, several bioinformatic tools/strategies to identify bacteriocins in genomes are discussed.

  19. Bioinformatics for Exploration

    Science.gov (United States)

    Johnson, Kathy A.

    2006-01-01

    For the purpose of this paper, bioinformatics is defined as the application of computer technology to the management of biological information. It can be thought of as the science of developing computer databases and algorithms to facilitate and expedite biological research. This is a crosscutting capability that supports nearly all human health areas ranging from computational modeling, to pharmacodynamics research projects, to decision support systems within autonomous medical care. Bioinformatics serves to increase the efficiency and effectiveness of the life sciences research program. It provides data, information, and knowledge capture which further supports management of the bioastronautics research roadmap - identifying gaps that still remain and enabling the determination of which risks have been addressed.

  20. Feature selection in bioinformatics

    Science.gov (United States)

    Wang, Lipo

    2012-06-01

    In bioinformatics, there are often a large number of input features. For example, there are millions of single nucleotide polymorphisms (SNPs) that are genetic variations which determine the dierence between any two unrelated individuals. In microarrays, thousands of genes can be proled in each test. It is important to nd out which input features (e.g., SNPs or genes) are useful in classication of a certain group of people or diagnosis of a given disease. In this paper, we investigate some powerful feature selection techniques and apply them to problems in bioinformatics. We are able to identify a very small number of input features sucient for tasks at hand and we demonstrate this with some real-world data.

  1. CDH1/E-cadherin and solid tumors. An updated gene-disease association analysis using bioinformatics tools.

    Science.gov (United States)

    Abascal, María Florencia; Besso, María José; Rosso, Marina; Mencucci, María Victoria; Aparicio, Evangelina; Szapiro, Gala; Furlong, Laura Inés; Vazquez-Levin, Mónica Hebe

    2016-02-01

    Cancer is a group of diseases that causes millions of deaths worldwide. Among cancers, Solid Tumors (ST) stand-out due to their high incidence and mortality rates. Disruption of cell-cell adhesion is highly relevant during tumor progression. Epithelial-cadherin (protein: E-cadherin, gene: CDH1) is a key molecule in cell-cell adhesion and an abnormal expression or/and function(s) contributes to tumor progression and is altered in ST. A systematic study was carried out to gather and summarize current knowledge on CDH1/E-cadherin and ST using bioinformatics resources. The DisGeNET database was exploited to survey CDH1-associated diseases. Reported mutations in specific ST were obtained by interrogating COSMIC and IntOGen tools. CDH1 Single Nucleotide Polymorphisms (SNP) were retrieved from the dbSNP database. DisGeNET analysis identified 609 genes annotated to ST, among which CDH1 was listed. Using CDH1 as query term, 26 disease concepts were found, 21 of which were neoplasms-related terms. Using DisGeNET ALL Databases, 172 disease concepts were identified. Of those, 80 ST disease-related terms were subjected to manual curation and 75/80 (93.75%) associations were validated. On selected ST, 489 CDH1 somatic mutations were listed in COSMIC and IntOGen databases. Breast neoplasms had the highest CDH1-mutation rate. CDH1 was positioned among the 20 genes with highest mutation frequency and was confirmed as driver gene in breast cancer. Over 14,000 SNP for CDH1 were found in the dbSNP database. This report used DisGeNET to gather/compile current knowledge on gene-disease association for CDH1/E-cadherin and ST; data curation expanded the number of terms that relate them. An updated list of CDH1 somatic mutations was obtained with COSMIC and IntOGen databases and of SNP from dbSNP. This information can be used to further understand the role of CDH1/E-cadherin in health and disease.

  2. Design and bioinformatics analysis of novel biomimetic peptides as nanocarriers for gene transfer

    Directory of Open Access Journals (Sweden)

    Asia Majidi

    2015-01-01

    Full Text Available Objective(s: The introduction of nucleic acids into cells for therapeutic objectives is significantly hindered by the size and charge of these molecules and therefore requires efficient vectors that assist cellular uptake. For several years great efforts have been devoted to the study of development of recombinant vectors based on biological domains with potential applications in gene therapy. Such vectors have been synthesized in genetically engineered approach, resulting in biomacromolecules with new properties that are not present in nature. Materials and Methods: In this study, we have designed new peptides using homology modeling with the purpose of overcoming the cell barriers for successful gene delivery through Bioinformatics tools. Three different carriers were designed and one of those with better score through Bioinformatics tools was cloned, expressed and its affinity for pDNA was monitored. Results: The resultszz demonstrated that the vector can effectively condense pDNAinto nanoparticles with the average sizes about 100 nm. Conclusion: We hope these peptides can overcome the biological barriers associated with gene transfer, and mediate efficient gene delivery.

  3. Secretome Analysis of Lipid-Induced Insulin Resistance in Skeletal Muscle Cells by a Combined Experimental and Bioinformatics Workflow

    DEFF Research Database (Denmark)

    Deshmukh, Atul S; Cox, Juergen; Jensen, Lars Juhl

    2015-01-01

    the secretome of lipid-induced insulin-resistant skeletal muscle cells. Our workflow identified 1073 putative secreted proteins including 32 growth factors, 25 cytokines, and 29 metalloproteinases. In addition to previously reported proteins, we report hundreds of novel ones. Intriguingly, ∼40% of the secreted...... proteins were regulated under insulin-resistant conditions, including a protein family with signal peptide and EGF-like domain structure that had not yet been associated with insulin resistance. Finally, we report that secretion of IGF and IGF-binding proteins was down-regulated under insulin-resistant...... conditions. Our study demonstrates an efficient combined experimental and bioinformatics workflow to identify putative secreted proteins from insulin-resistant skeletal muscle cells, which could easily be adapted to other cellular models....

  4. Bioinformatics Approaches for Human Gut Microbiome Research

    Directory of Open Access Journals (Sweden)

    Zhijun Zheng

    2016-07-01

    Full Text Available The human microbiome has received much attention because many studies have reported that the human gut microbiome is associated with several diseases. The very large datasets that are produced by these kinds of studies means that bioinformatics approaches are crucial for their analysis. Here, we systematically reviewed bioinformatics tools that are commonly used in microbiome research, including a typical pipeline and software for sequence alignment, abundance profiling, enterotype determination, taxonomic diversity, identifying differentially abundant species/genes, gene cataloging, and functional analyses. We also summarized the algorithms and methods used to define metagenomic species and co-abundance gene groups to expand our understanding of unclassified and poorly understood gut microbes that are undocumented in the current genome databases. Additionally, we examined the methods used to identify metagenomic biomarkers based on the gut microbiome, which might help to expand the knowledge and approaches for disease detection and monitoring.

  5. Phylogenetic trees in bioinformatics

    Energy Technology Data Exchange (ETDEWEB)

    Burr, Tom L [Los Alamos National Laboratory

    2008-01-01

    Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including: the types of data used to represent each OTU; the use ofprobabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that fmding the best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.

  6. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software.

    Science.gov (United States)

    Lawlor, Brendan; Walsh, Paul

    2015-01-01

    There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians.

  7. Genetic and bioinformatics analysis of four novel GCK missense variants detected in Caucasian families with GCK-MODY phenotype.

    Science.gov (United States)

    Costantini, S; Malerba, G; Contreas, G; Corradi, M; Marin Vargas, S P; Giorgetti, A; Maffeis, C

    2015-05-01

    Heterozygous loss-of-function mutations in the glucokinase (GCK) gene cause maturity-onset diabetes of the young (MODY) subtype GCK (GCK-MODY/MODY2). GCK sequencing revealed 16 distinct mutations (13 missense, 1 nonsense, 1 splice site, and 1 frameshift-deletion) co-segregating with hyperglycaemia in 23 GCK-MODY families. Four missense substitutions (c.718A>G/p.Asn240Asp, c.757G>T/p.Val253Phe, c.872A>C/p.Lys291Thr, and c.1151C>T/p.Ala384Val) were novel and a founder effect for the nonsense mutation (c.76C>T/p.Gln26*) was supposed. We tested whether an accurate bioinformatics approach could strengthen family-genetic evidence for missense variant pathogenicity in routine diagnostics, where wet-lab functional assays are generally unviable. In silico analyses of the novel missense variants, including orthologous sequence conservation, amino acid substitution (AAS)-pathogenicity predictors, structural modeling and splicing predictors, suggested that the AASs and/or the underlying nucleotide changes are likely to be pathogenic. This study shows how a careful bioinformatics analysis could provide effective suggestions to help molecular-genetic diagnosis in absence of wet-lab validations.

  8. Clustering Techniques in Bioinformatics

    Directory of Open Access Journals (Sweden)

    Muhammad Ali Masood

    2015-01-01

    Full Text Available Dealing with data means to group information into a set of categories either in order to learn new artifacts or understand new domains. For this purpose researchers have always looked for the hidden patterns in data that can be defined and compared with other known notions based on the similarity or dissimilarity of their attributes according to well-defined rules. Data mining, having the tools of data classification and data clustering, is one of the most powerful techniques to deal with data in such a manner that it can help researchers identify the required information. As a step forward to address this challenge, experts have utilized clustering techniques as a mean of exploring hidden structure and patterns in underlying data. Improved stability, robustness and accuracy of unsupervised data classification in many fields including pattern recognition, machine learning, information retrieval, image analysis and bioinformatics, clustering has proven itself as a reliable tool. To identify the clusters in datasets algorithm are utilized to partition data set into several groups based on the similarity within a group. There is no specific clustering algorithm, but various algorithms are utilized based on domain of data that constitutes a cluster and the level of efficiency required. Clustering techniques are categorized based upon different approaches. This paper is a survey of few clustering techniques out of many in data mining. For the purpose five of the most common clustering techniques out of many have been discussed. The clustering techniques which have been surveyed are: K-medoids, K-means, Fuzzy C-means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN and Self-Organizing Map (SOM clustering.

  9. Network Analysis Identifies Disease-Specific Pathways for Parkinson's Disease.

    Science.gov (United States)

    Monti, Chiara; Colugnat, Ilaria; Lopiano, Leonardo; Chiò, Adriano; Alberio, Tiziana

    2016-12-21

    Neurodegenerative diseases are characterized by the progressive loss of specific neurons in selected regions of the central nervous system. The main clinical manifestation (movement disorders, cognitive impairment, and/or psychiatric disturbances) depends on the neuron population being primarily affected. Parkinson's disease is a common movement disorder, whose etiology remains mostly unknown. Progressive loss of dopaminergic neurons in the substantia nigra causes an impairment of the motor control. Some of the pathogenetic mechanisms causing the progressive deterioration of these neurons are not specific for Parkinson's disease but are shared by other neurodegenerative diseases, like Alzheimer's disease and amyotrophic lateral sclerosis. Here, we performed a meta-analysis of the literature of all the quantitative proteomic investigations of neuronal alterations in different models of Parkinson's disease, Alzheimer's disease, and amyotrophic lateral sclerosis to distinguish between general and Parkinson's disease-specific pattern of neurodegeneration. Then, we merged proteomics data with genetics information from the DisGeNET database. The comparison of gene and protein information allowed us to identify 25 proteins involved uniquely in Parkinson's disease and we verified the alteration of one of them, i.e., transaldolase 1 (TALDO1), in the substantia nigra of 5 patients. By using open-source bioinformatics tools, we identified the biological processes specifically affected in Parkinson's disease, i.e., proteolysis, mitochondrion organization, and mitophagy. Eventually, we highlighted four cellular component complexes mostly involved in the pathogenesis: the proteasome complex, the protein phosphatase 2A, the chaperonins CCT complex, and the complex III of the respiratory chain.

  10. Novel C16orf57 mutations in patients with Poikiloderma with Neutropenia: bioinformatic analysis of the protein and predicted effects of all reported mutations

    Directory of Open Access Journals (Sweden)

    Colombo Elisa A

    2012-01-01

    Full Text Available Abstract Background Poikiloderma with Neutropenia (PN is a rare autosomal recessive genodermatosis caused by C16orf57 mutations. To date 17 mutations have been identified in 31 PN patients. Results We characterize six PN patients expanding the clinical phenotype of the syndrome and the mutational repertoire of the gene. We detect the two novel C16orf57 mutations, c.232C>T and c.265+2T>G, as well as the already reported c.179delC, c.531delA and c.693+1G>T mutations. cDNA analysis evidences the presence of aberrant transcripts, and bioinformatic prediction of C16orf57 protein structure gauges the mutations effects on the folded protein chain. Computational analysis of the C16orf57 protein shows two conserved H-X-S/T-X tetrapeptide motifs marking the active site of a two-fold pseudosymmetric structure recalling the 2H phosphoesterase superfamily. Based on this model C16orf57 is likely a 2H-active site enzyme functioning in RNA processing, as a presumptive RNA ligase. According to bioinformatic prediction, all known C16orf57 mutations, including the novel mutations herein described, impair the protein structure by either removing one or both tetrapeptide motifs or by destroying the symmetry of the native folding. Finally, we analyse the geographical distribution of the recurrent mutations that depicts clusters featuring a founder effect. Conclusions In cohorts of patients clinically affected by genodermatoses with overlapping symptoms, the molecular screening of C16orf57 gene seems the proper way to address the correct diagnosis of PN, enabling the syndrome-specific oncosurveillance. The bioinformatic prediction of the C16orf57 protein structure denotes a very basic enzymatic function consistent with a housekeeping function. Detection of aberrant transcripts, also in cells from PN patients carrying early truncated mutations, suggests they might be translatable. Tissue-specific sensitivity to the lack of functionally correct protein accounts for the

  11. Entropy-based analysis and bioinformatics-inspired integration of global economic information transfer.

    Science.gov (United States)

    Kim, Jinkyu; Kim, Gunn; An, Sungbae; Kwon, Young-Kyun; Yoon, Sungroh

    2013-01-01

    The assessment of information transfer in the global economic network helps to understand the current environment and the outlook of an economy. Most approaches on global networks extract information transfer based mainly on a single variable. This paper establishes an entirely new bioinformatics-inspired approach to integrating information transfer derived from multiple variables and develops an international economic network accordingly. In the proposed methodology, we first construct the transfer entropies (TEs) between various intra- and inter-country pairs of economic time series variables, test their significances, and then use a weighted sum approach to aggregate information captured in each TE. Through a simulation study, the new method is shown to deliver better information integration compared to existing integration methods in that it can be applied even when intra-country variables are correlated. Empirical investigation with the real world data reveals that Western countries are more influential in the global economic network and that Japan has become less influential following the Asian currency crisis.

  12. Entropy-based analysis and bioinformatics-inspired integration of global economic information transfer.

    Directory of Open Access Journals (Sweden)

    Jinkyu Kim

    Full Text Available The assessment of information transfer in the global economic network helps to understand the current environment and the outlook of an economy. Most approaches on global networks extract information transfer based mainly on a single variable. This paper establishes an entirely new bioinformatics-inspired approach to integrating information transfer derived from multiple variables and develops an international economic network accordingly. In the proposed methodology, we first construct the transfer entropies (TEs between various intra- and inter-country pairs of economic time series variables, test their significances, and then use a weighted sum approach to aggregate information captured in each TE. Through a simulation study, the new method is shown to deliver better information integration compared to existing integration methods in that it can be applied even when intra-country variables are correlated. Empirical investigation with the real world data reveals that Western countries are more influential in the global economic network and that Japan has become less influential following the Asian currency crisis.

  13. [Cloning and bioinformatics analysis of SLA-DR genes in Hunan Shaziling pigs].

    Science.gov (United States)

    Tang, Yi-Ya; Xing, Xiao-Wei; Xue, Li-Qun; Huang, Sheng-Qiang; Wang, Wei

    2007-12-01

    In order to clone class II DRA and DRB genes of swine leukocyte antigen (SLA) in Hunan Shaziling pigs, to analyze their characteristics and polymorphism and to provide immunological basic parameters for xenotransplantation from pigs to humans. SLA-DRA and SLA-DRB genes in two Shaziling pigs with the absence of porcine endogenous retrovirus (PERV) env-c were amplified by RT-PCR, cloned into PUCm-T vectors, sequenced and analyzed through BLAST in NCBI and related software in ExPASY. The obtained SLA-DRA and SLA-DRB genes of Shaziling pigs were 1,177 and 909 nucleotides in length with their accession numbers in Genbank as EF143987 and EF143988. Bioinformatics analyses have shown that they both contain opening reading frame (ORF) and encode 252 and 266 amino acids respectively. Comparing the ORF and protein sequences of the Shaziling SLA-DRA and SLA-DRB genes with their counterpart sequences of human, the homologies of nucleotide sequences were 83% and 83%, and the homologies of amino acid sequences 83 % and 79% respectively. Further comparison with SLA sequences published in GenBank indicated that SLA-DRB gene found in Shaziling pigs has polymorphism while the homology of SLA-DRA gene is up to 100 % .

  14. Bioinformatics analysis and construction of phylogenetic tree of aquaporins from Echinococcus granulosus.

    Science.gov (United States)

    Wang, Fen; Ye, Bin

    2016-09-01

    Cyst echinococcosis caused by the matacestodal larvae of Echinococcus granulosus (Eg), is a chronic, worldwide, and severe zoonotic parasitosis. The treatment of cyst echinococcosis is still difficult since surgery cannot fit the needs of all patients, and drugs can lead to serious adverse events as well as resistance. The screen of target proteins interacted with new anti-hydatidosis drugs is urgently needed to meet the prevailing challenges. Here, we analyzed the sequences and structure properties, and constructed a phylogenetic tree by bioinformatics methods. The MIP family signature and Protein kinase C phosphorylation sites were predicted in all nine EgAQPs. α-helix and random coil were the main secondary structures of EgAQPs. The numbers of transmembrane regions were three to six, which indicated that EgAQPs contained multiple hydrophobic regions. A neighbor-joining tree indicated that EgAQPs were divided into two branches, seven EgAQPs formed a clade with AQP1 from human, a "strict" aquaporins, other two EgAQPs formed a clade with AQP9 from human, an aquaglyceroporins. Unfortunately, homology modeling of EgAQPs was aborted. These results provide a foundation for understanding and researches of the biological function of E. granulosus.

  15. Cloning and bioinformatic analysis of HSPC016 gene in dermal papilla cells

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Objective: To clone the full-length cDNA sequence of HSPC016 gene, an aggregative growth related gene in dermal papilla cells (DPC), and analyze its characteristics and predict its biological function. Methods: Rapid amplification of cDNA ends (RACE) technology was entailed to amplify the 5' and 3' sequences of HSPC016. The amplified fragments were TA-cloned, sequenced and spliced together to obtain the full-length cDNA. Its chromosome localization, domain and possible function were analyzed by bioinformatic methods. Results: Two isoforms, 400 bp and 493 bp, were obtained. The gene was mapped on chromosome 3q21. 31, and was conservative on evolution. HSPC016, a 64aa protein, belongs to PD053992 protein family and its functional domain was homologous to T2FA gene. Conclusion: HSPC016 may be related to transcriptional regulation and its protein product may act as a subunit of a transcriptional complex and play a role on DPC growth and differentiation through facilitating or suppressing other genes'transcription within the nucleus.

  16. Deep Artificial Neural Networks and Neuromorphic Chips for Big Data Analysis: Pharmaceutical and Bioinformatics Applications.

    Science.gov (United States)

    Pastur-Romay, Lucas Antón; Cedrón, Francisco; Pazos, Alejandro; Porto-Pazos, Ana Belén

    2016-08-11

    Over the past decade, Deep Artificial Neural Networks (DNNs) have become the state-of-the-art algorithms in Machine Learning (ML), speech recognition, computer vision, natural language processing and many other tasks. This was made possible by the advancement in Big Data, Deep Learning (DL) and drastically increased chip processing abilities, especially general-purpose graphical processing units (GPGPUs). All this has created a growing interest in making the most of the potential offered by DNNs in almost every field. An overview of the main architectures of DNNs, and their usefulness in Pharmacology and Bioinformatics are presented in this work. The featured applications are: drug design, virtual screening (VS), Quantitative Structure-Activity Relationship (QSAR) research, protein structure prediction and genomics (and other omics) data mining. The future need of neuromorphic hardware for DNNs is also discussed, and the two most advanced chips are reviewed: IBM TrueNorth and SpiNNaker. In addition, this review points out the importance of considering not only neurons, as DNNs and neuromorphic chips should also include glial cells, given the proven importance of astrocytes, a type of glial cell which contributes to information processing in the brain. The Deep Artificial Neuron-Astrocyte Networks (DANAN) could overcome the difficulties in architecture design, learning process and scalability of the current ML methods.

  17. Deep Artificial Neural Networks and Neuromorphic Chips for Big Data Analysis: Pharmaceutical and Bioinformatics Applications

    Directory of Open Access Journals (Sweden)

    Lucas Antón Pastur-Romay

    2016-08-01

    Full Text Available Over the past decade, Deep Artificial Neural Networks (DNNs have become the state-of-the-art algorithms in Machine Learning (ML, speech recognition, computer vision, natural language processing and many other tasks. This was made possible by the advancement in Big Data, Deep Learning (DL and drastically increased chip processing abilities, especially general-purpose graphical processing units (GPGPUs. All this has created a growing interest in making the most of the potential offered by DNNs in almost every field. An overview of the main architectures of DNNs, and their usefulness in Pharmacology and Bioinformatics are presented in this work. The featured applications are: drug design, virtual screening (VS, Quantitative Structure–Activity Relationship (QSAR research, protein structure prediction and genomics (and other omics data mining. The future need of neuromorphic hardware for DNNs is also discussed, and the two most advanced chips are reviewed: IBM TrueNorth and SpiNNaker. In addition, this review points out the importance of considering not only neurons, as DNNs and neuromorphic chips should also include glial cells, given the proven importance of astrocytes, a type of glial cell which contributes to information processing in the brain. The Deep Artificial Neuron–Astrocyte Networks (DANAN could overcome the difficulties in architecture design, learning process and scalability of the current ML methods.

  18. Towards bioinformatics assisted infectious disease control

    Directory of Open Access Journals (Sweden)

    Gallego Blanca

    2009-02-01

    Full Text Available Abstract Background This paper proposes a novel framework for bioinformatics assisted biosurveillance and early warning to address the inefficiencies in traditional surveillance as well as the need for more timely and comprehensive infection monitoring and control. It leverages on breakthroughs in rapid, high-throughput molecular profiling of microorganisms and text mining. Results This framework combines the genetic and geographic data of a pathogen to reconstruct its history and to identify the migration routes through which the strains spread regionally and internationally. A pilot study of Salmonella typhimurium genotype clustering and temporospatial outbreak analysis demonstrated better discrimination power than traditional phage typing. Half of the outbreaks were detected in the first half of their duration. Conclusion The microbial profiling and biosurveillance focused text mining tools can enable integrated infectious disease outbreak detection and response environments based upon bioinformatics knowledge models and measured by outcomes including the accuracy and timeliness of outbreak detection.

  19. Bioinformatic prediction, deep sequencing of microRNAs and expression analysis during phenotypic plasticity in the pea aphid, Acyrthosiphon pisum

    Directory of Open Access Journals (Sweden)

    Leterme Nathalie

    2010-05-01

    Full Text Available Abstract Background Post-transcriptional regulation in eukaryotes can be operated through microRNA (miRNAs mediated gene silencing. MiRNAs are small (18-25 nucleotides non-coding RNAs that play crucial role in regulation of gene expression in eukaryotes. In insects, miRNAs have been shown to be involved in multiple mechanisms such as embryonic development, tissue differentiation, metamorphosis or circadian rhythm. Insect miRNAs have been identified in different species belonging to five orders: Coleoptera, Diptera, Hymenoptera, Lepidoptera and Orthoptera. Results We developed high throughput Solexa sequencing and bioinformatic analyses of the genome of the pea aphid Acyrthosiphon pisum in order to identify the first miRNAs from a hemipteran insect. By combining these methods we identified 149 miRNAs including 55 conserved and 94 new miRNAs. Moreover, we investigated the regulation of these miRNAs in different alternative morphs of the pea aphid by analysing the expression of miRNAs across the switch of reproduction mode. Pea aphid microRNA sequences have been posted to miRBase: http://microrna.sanger.ac.uk/sequences/ Conclusions Our study has identified candidates as putative regulators involved in reproductive polyphenism in aphids and opens new avenues for further functional analyses.

  20. A bioinformatic strategy for the detection, classification and analysis of bacterial autotransporters.

    Directory of Open Access Journals (Sweden)

    Nermin Celik

    Full Text Available Autotransporters are secreted proteins that are assembled into the outer membrane of bacterial cells. The passenger domains of autotransporters are crucial for bacterial pathogenesis, with some remaining attached to the bacterial surface while others are released by proteolysis. An enigma remains as to whether autotransporters should be considered a class of secretion system, or simply a class of substrate with peculiar requirements for their secretion. We sought to establish a sensitive search protocol that could identify and characterize diverse autotransporters from bacterial genome sequence data. The new sequence analysis pipeline identified more than 1500 autotransporter sequences from diverse bacteria, including numerous species of Chlamydiales and Fusobacteria as well as all classes of Proteobacteria. Interrogation of the proteins revealed that there are numerous classes of passenger domains beyond the known proteases, adhesins and esterases. In addition the barrel-domain-a characteristic feature of autotransporters-was found to be composed from seven conserved sequence segments that can be arranged in multiple ways in the tertiary structure of the assembled autotransporter. One of these conserved motifs overlays the targeting information required for autotransporters to reach the outer membrane. Another conserved and diagnostic motif maps to the linker region between the passenger domain and barrel-domain, indicating it as an important feature in the assembly of autotransporters.

  1. Bioinformatics analysis of the structural and evolutionary characteristics for toll-like receptor 15

    Directory of Open Access Journals (Sweden)

    Jinlan Wang

    2016-05-01

    Full Text Available Toll-like receptors (TLRs play important role in the innate immune system. TLR15 is reported to have a unique role in defense against pathogens, but its structural and evolution characterizations are still poorly understood. In this study, we identified 57 completed TLR15 genes from avian and reptilian genomes. TLR15 clustered into an individual clade and was closely related to family 1 on the phylogenetic tree. Unlike the TLRs in family 1 with the broken asparagine ladders in the middle, TLR15 ectodomain had an intact asparagine ladder that is critical to maintain the overall shape of ectodomain. The conservation analysis found that TLR15 ectodomain had a highly evolutionarily conserved region on the convex surface of LRR11 module, which is probably involved in TLR15 activation process. Furthermore, the protein–protein docking analysis indicated that TLR15 TIR domains have the potential to form homodimers, the predicted interaction interface of TIR dimer was formed mainly by residues from the BB-loops and αC-helixes. Although TLR15 mainly underwent purifying selection, we detected 27 sites under positive selection for TLR15, 24 of which are located on its ectodomain. Our observations suggest the structural features of TLR15 which may be relevant to its function, but which requires further experimental validation.

  2. Global secretome analysis identifies novel mediators of bone metastasis

    Institute of Scientific and Technical Information of China (English)

    Mario Andres Blanco; Gary LeRoy; Zia Khan; Ma(s)a Ale(c)kovi(c); Barry M Zee; Benjamin A Garcia; Yibin Kang

    2012-01-01

    Bone is the one of the most common sites of distant metastasis of solid tumors.Secreted proteins are known to influence pathological interactions between metastatic cancer cells and the bone stroma.To comprehensively profile secreted proteins associated with bone metastasis,we used quantitative and non-quantitative mass spectrometry to globally analyze the secretomes of nine cell lines of varying bone metastatic ability from multiple species and cancer types.By comparing the secretomes of parental cells and their bone metastatic derivatives,we identified the secreted proteins that were uniquely associated with bone metastasis in these cell lines.We then incorporated bioinformatic analyses of large clinical metastasis datasets to obtain a list of candidate novel bone metastasis proteins of several functional classes that were strongly associated with both clinical and experimental bone metastasis.Functional validation of selected proteins indicated that in vivo bone metastasis can be promoted by high expression of (1) the salivary cystatins CST1,CST2,and CST4; (2) the plasminogen activators PLAT and PLAU; or (3) the collagen functionality proteins PLOD2 and COL6A1.Overall,our study has uncovered several new secreted mediators of bone metastasis and therefore demonstrated that secretome analysis is a powerful method for identification of novel biomarkers and candidate therapeutic targets.

  3. Rapid cloning and bioinformatic analysis of spinach Y chromosome-specific EST sequences

    Indian Academy of Sciences (India)

    Chuan-Liang Deng; Wei-Li Zhang; Ying Cao; Shao-Jing Wang; Shu-Fen Li; Wu-Jun Gao; Long-Dou Lu

    2015-12-01

    The genome of spinach single chromosome complement is about 1000 Mbp, which is the model material to study the molecular mechanisms of plant sex differentiation. The cytological study showed that the biggest spinach chromosome (chromosome 1) was taken as spinach sex chromosome. It had three alleles of sex-related , m and . Many researchers have been trying to clone the sex-determining genes and investigated the molecular mechanism of spinach sex differentiation. However, there are no successful cloned reports about these genes. A new technology combining chromosome microdissection with hybridization-specific amplification (HSA) was adopted. The spinach Y chromosome degenerate oligonucleotide primed-PCR (DOP-PCR) products were hybridized with cDNA of the male spinach flowers in florescence. The female spinach genome was taken as blocker and cDNA library specifically expressed in Y chromosome was constructed. Moreover, expressed sequence tag (EST) sequences in cDNA library were cloned, sequenced and bioinformatics was analysed. There were 63 valid EST sequences obtained in this study. The fragment size was between 53 and 486 bp. BLASTn homologous alignment indicated that 12 EST sequences had homologous sequences of nucleic acids, the rest were new sequences. BLASTx homologous alignment indicated that 16 EST sequences had homologous protein-encoding nucleic acid sequence. The spinach Y chromosome-specific EST sequences laid the foundation for cloning the functional genes, specifically expressed in spinach Y chromosome. Meanwhile, the establishment of the technology system in the research provided a reference for rapid cloning of other biological sex chromosome-specific EST sequences.

  4. Bioinformatics analysis for structure and function ofCPR ofPlasmodium falciparum

    Institute of Scientific and Technical Information of China (English)

    ZhigangFan; Lingmin Zhang; GuogangYan; QiangWu; XiufengGan; Saifeng Zhong; GuifenLin

    2011-01-01

    Objective:To analyse the structure and function ofNADPH-cytochrome p450 reductase(CYPOR orCPR) fromPlasmodium falciparum (Pf), and to predict its’ drug target and vaccine target. Methods: The structure, function, drug target and vaccine target ofCPR fromPlasmodium falciparum were analyzed and predicted by bioinformatics methods.Results:PfCPR, which was olderCPR, had close relationship with theCPR from otherPlasmodium species, but it was distant from its hosts, such asHomo sapiens andAnopheles.PfCPR was located in the cellular nucleus ofPlasmodium falciparum.335aa-352aa and591aa -608aa were inserted the interior side of the nuclear membrane, while151aa-265aa was located in the nucleolus organizer regions.PfCPR had40 function sites and44 protein-protein binding sites in amino acid sequence. The teriary structure of 1aa-700aa was forcep-shaped with wings.15 segments ofPfCPR had no homology withHomo sapien CPR and most were exposed on the surface of the protein. These segments had25 protein-protein binding sites. While13other segments all possessed function sites. Conclusions: The evolution or genesis ofPlasmodium falciparum is earlier than those ofHomo sapiens. PfCPR is a possible resistance site of antimalarial drug and may involve immune evasion, which is associated with parasite of sporozoite in hepatocytes.PfCPR is unsuitable as vaccine target, but it has at least 13 ideal drug targets.

  5. An Introduction to Bioinformatics

    Institute of Scientific and Technical Information of China (English)

    SHENG Qi-zheng; De Moor Bart

    2004-01-01

    As a newborn interdisciplinary field, bioinformatics is receiving increasing attention from biologists, computer scientists, statisticians, mathematicians and engineers. This paper briefly introduces the birth, importance, and extensive applications of bioinformatics in the different fields of biological research. A major challenge in bioinformatics - the unraveling of gene regulation - is discussed in detail.

  6. Hepatocellular carcinoma associated microRNA expression signature: integrated bioinformatics analysis, experimental validation and clinical significance.

    Science.gov (United States)

    Shi, Ke-Qing; Lin, Zhuo; Chen, Xiang-Jian; Song, Mei; Wang, Yu-Qun; Cai, Yi-Jing; Yang, Nai-Bing; Zheng, Ming-Hua; Dong, Jin-Zhong; Zhang, Lei; Chen, Yong-Ping

    2015-09-22

    microRNA (miRNA) expression profiles varied greatly among current studies due to different technological platforms and small sample size. Systematic and integrative analysis of published datesets that compared the miRNA expression profiles between hepatocellular carcinoma (HCC) tissue and paired adjacent noncancerous liver tissue was performed to determine candidate HCC associated miRNAs. Moreover, we further validated the confirmed miRNAs in a clinical setting using qRT-PCR and Tumor Cancer Genome Atlas (TCGA) dataset. A miRNA integrated-signature of 5 upregulated and 8 downregulated miRNAs was identified from 26 published datesets in HCC using robust rank aggregation method. qRT-PCR demonstrated that miR-93-5p, miR-224-5p, miR-221-3p and miR-21-5p was increased, whereas the expression of miR-214-3p, miR-199a-3p, miR-195-5p, miR-150-5p and miR-145-5p was decreased in the HCC tissues, which was also validated on TCGA dataset. A miRNA based score using LASSO regression model provided a high accuracy for identifying HCC tissue (AUC = 0.982): HCC risk score = 0.180E_miR-221 + 0.0262E_miR-21 - 0.007E_miR-223 - 0.185E_miR-130a. E_miR-n = Log 2 (expression of microRNA n). Furthermore, expression of 5 miRNAs (miR-222, miR-221, miR-21 miR-214 and miR-130a) correlated with pathological tumor grade. Cox regression analysis showed that miR-21 was related with 3-year survival (hazard ratio [HR]: 1.509, 95%CI: 1.079-2.112, P = 0.016) and 5-year survival (HR: 1.416, 95%CI: 1.057-1.897, P = 0.020). However, none of the deregulated miRNAs was related with microscopic vascular invasion. This study provides a basis for further clinical application of miRNAs in HCC.

  7. Bioinformatics analysis of SARS-Cov M protein provides information for vaccine development

    Institute of Scientific and Technical Information of China (English)

    LIU Wanli; LU Yun; CHEN Yinghua

    2003-01-01

    The pathogen causing severe acute respiratory syndrome (SARS) is identified to be SARS-Cov. It is urgent to know more about SARS-Cov for developing an efficient SARS vaccine to prevent this epidemic disease. In this report, the homology of SARS-Cov M protein to other members of coronavirus is illustrated, and all amino acid changes in both S and M proteins among all available SARS-Cov isolates in GenBank are described. Furthermore, one topological trans-membrane secondary structure model of M protein is proposed, which is corresponded well with the accepted topology model of M proteins of other members of coronavirus. Hydrophilic profile analysis indicated that one region (aa150~210) on the cytoplasmic domain is fairly hydrophilic, suggesting its property of antigenicity. Based on the fact that cytoplasmic domain of the M protein of some other coronavirus could induce protective activities against virus infection, this region might be one potential target for SARS vaccine development.

  8. Bioinformatic analysis of expressed sequence tags from sporophyte of Porphyra yezoensis (Bagiaceae, Rhodophyta)

    Institute of Scientific and Technical Information of China (English)

    XU Minjun; MAO Yunxiang; ZHANG Xuecheng; ZHOU Xiaojun; SUI Zhenghong; ZHOU Hailin; LI Jinhong

    2006-01-01

    A total of 719 expressed sequence tags (EST) clustered into 329 non-redundant EST groups are obtained from the sporophyte cDNA library of red algae, Porphyra yezoensis. Gene Ontology (GO) analysis is employed in characterizing 60 strictest annotated unique genes out of the 329 EST groups and some domains such as COX1, Sod_ Fe-C, GST-N, SHMT, and RNase_ PH related to the enz ymes and proteins functioning in cells have been identified by HMMPFAM search. As its leafy gametophyte, the similar codon usage with strong bias is found in P. yezoensis filamentous sporophyte, regardless of some differences found in given amino acids. The average GC content of the 329 unique genes is 53.0 %. In contrast, the third nucleotide of codon exhibits a higher GC content (72 % ) than that of the first (58 % ) and the second (42 % ) nucleotides. Similarity search of the present study shows a novel EST ratio of 60.2 %,which is against the Porphyra ESTs database, suggesting further investigations towards elucidating the characteristics of Porphyra functional genome.

  9. The secondary metabolite bioinformatics portal

    DEFF Research Database (Denmark)

    Weber, Tilmann; Kim, Hyun Uk

    2016-01-01

    . In this context, this review gives a summary of tools and databases that currently are available to mine, identify and characterize natural product biosynthesis pathways and their producers based on ‘omics data. A web portal called Secondary Metabolite Bioinformatics Portal (SMBP at http...

  10. The cinnamyl alcohol dehydrogenase gene family in melon (Cucumis melo L.): bioinformatic analysis and expression patterns.

    Science.gov (United States)

    Jin, Yazhong; Zhang, Chong; Liu, Wei; Qi, Hongyan; Chen, Hao; Cao, Songxiao

    2014-01-01

    Cinnamyl alcohol dehydrogenase (CAD) is a key enzyme in lignin biosynthesis. However, little was known about CADs in melon. Five CAD-like genes were identified in the genome of melons, namely CmCAD1 to CmCAD5. The signal peptides analysis and CAD proteins prediction showed no typical signal peptides were found in all CmCADs and CmCAD proteins may locate in the cytoplasm. Multiple alignments implied that some motifs may be responsible for the high specificity of these CAD proteins, and may be one of the key residues in the catalytic mechanism. The phylogenetic tree revealed seven groups of CAD and melon CAD genes fell into four main groups. CmCAD1 and CmCAD2 belonged to the bona fide CAD group, in which these CAD genes, as representative from angiosperms, were involved in lignin synthesis. Other CmCADs were distributed in group II, V and VII, respectively. Semi-quantitative PCR and real time qPCR revealed differential expression of CmCADs, and CmCAD5 was expressed in different vegetative tissues except mature leaves, with the highest expression in flower, while CmCAD2 and CmCAD5 were strongly expressed in flesh during development. Promoter analysis revealed several motifs of CAD genes involved in the gene expression modulated by various hormones. Treatment of abscisic acid (ABA) elevated the expression of CmCADs in flesh, whereas the transcript levels of CmCAD1 and CmCAD5 were induced by auxin (IAA); Ethylene induced the expression of CmCADs, while 1-MCP repressed the effect, apart from CmCAD4. Taken together, these data suggested that CmCAD4 may be a pseudogene and that all other CmCADs may be involved in the lignin biosynthesis induced by both abiotic and biotic stresses and in tissue-specific developmental lignification through a CAD genes family network, and CmCAD2 may be the main CAD enzymes for lignification of melon flesh and CmCAD5 may also function in flower development.

  11. Molecular Cloning, Bioinformatic Analysis, and Expression of Bombyx mori Lebocin 5 Gene Related to Beauveria bassiana Infection.

    Science.gov (United States)

    Lü, Dingding; Hou, Chengxiang; Qin, Guangxing; Gao, Kun; Chen, Tian; Guo, Xijie

    2017-01-01

    A full-length cDNA of lebocin 5 (BmLeb5) was first cloned from silkworm, Bombyx mori, by rapid amplification of cDNA ends. The BmLeb5 gene is 808 bp in length and the open reading frame encodes a 179-amino acid hydroxyproline-rich peptide. Bioinformatic analysis results showed that BmLeb5 owns an O-glycosylation site and four RXXR motifs as other lebocins. Sequence similarity and phylogenic analysis results indicated that lebocins form a multiple gene family in silkworm as cecropins. Quantitative real-time PCR analysis revealed that BmLeb5 was highest expressed in the fat body. In the silkworm larvae infected by Beauveria bassiana, the expression level of BmLeb5 was upregulated in the fat body and hemolymph which are the most important immune tissues in silkworm. The recombinant protein of BmLeb5 was for the first time successfully expressed with prokaryotic expression system and purified. There are no reports so far that the expression of lebocins could be induced by entomopathogenic fungus. Our study suggested that BmLeb5 might play an important role in the immune response of silkworm to defend B. bassiana infection. The results also provided helpful information for further studying the lebocin family functioned in antifungal immune response in the silkworm.

  12. Molecular Cloning, Bioinformatic Analysis, and Expression of Bombyx mori Lebocin 5 Gene Related to Beauveria bassiana Infection

    Science.gov (United States)

    Lü, Dingding; Hou, Chengxiang; Qin, Guangxing; Gao, Kun; Chen, Tian

    2017-01-01

    A full-length cDNA of lebocin 5 (BmLeb5) was first cloned from silkworm, Bombyx mori, by rapid amplification of cDNA ends. The BmLeb5 gene is 808 bp in length and the open reading frame encodes a 179-amino acid hydroxyproline-rich peptide. Bioinformatic analysis results showed that BmLeb5 owns an O-glycosylation site and four RXXR motifs as other lebocins. Sequence similarity and phylogenic analysis results indicated that lebocins form a multiple gene family in silkworm as cecropins. Quantitative real-time PCR analysis revealed that BmLeb5 was highest expressed in the fat body. In the silkworm larvae infected by Beauveria bassiana, the expression level of BmLeb5 was upregulated in the fat body and hemolymph which are the most important immune tissues in silkworm. The recombinant protein of BmLeb5 was for the first time successfully expressed with prokaryotic expression system and purified. There are no reports so far that the expression of lebocins could be induced by entomopathogenic fungus. Our study suggested that BmLeb5 might play an important role in the immune response of silkworm to defend B. bassiana infection. The results also provided helpful information for further studying the lebocin family functioned in antifungal immune response in the silkworm. PMID:28194425

  13. The Alcohol Dehydrogenase Gene Family in Melon (Cucumis melo L.: Bioinformatic Analysis and Expression Patterns

    Directory of Open Access Journals (Sweden)

    Yazhong eJin

    2016-05-01

    Full Text Available Alcohol dehydrogenases (ADH, encoded by multigene family in plants, play a critical role in plant growth, development, adaptation, fruit ripening and aroma production. Thirteen ADH genes were identified in melon genome, including 12 ADHs and one formaldehyde dehydrogenease (FDH, designated CmADH1-12 and CmFDH1, in which CmADH1 and CmADH2 have been isolated in Cantaloupe. ADH genes shared a lower identity with each other at the protein level and had different intron-exon structure at nucleotide level. No typical signal peptides were found in all CmADHs, and CmADH proteins might locate in the cytoplasm. The phylogenetic tree revealed that 13 ADH genes were divided into 3 groups respectively, namely long-, medium- and short-chain ADH subfamily, and CmADH1,3-11, which belongs to the medium-chain ADH subfamily, fell into 6 medium-chain ADH subgroups. CmADH12 may belong to the long-chain ADH subfamily, while CmFDH1 may be a Class III ADH and serve as an ancestral ADH in melon. Expression profiling revealed that CmADH1, CmADH2, CmADH10 and CmFDH1 were moderately or strongly expressed in different vegetative tissues and fruit at medium and late developmental stages, while CmADH8 and CmADH12 were highly expressed in fruit after 20 days. CmADH3 showed preferential expression in young tissues. CmADH4 only had slight expression in root. Promoter analysis revealed several motifs of CmADH genes involved in the gene expression modulated by various hormones, and the response pattern of CmADH genes to ABA, IAA and ethylene were different. These CmADHs were divided into ethylene-sensitive and –insensitive groups, and the functions of CmADHs were discussed.

  14. Cancer bioinformatics: detection of chromatin states,SNP-containing motifs, and functional enrichment modules

    Institute of Scientific and Technical Information of China (English)

    Xiaobo Zhou

    2013-01-01

    In this editorial preface,I briefly review cancer bioinformatics and introduce the four articles in this special issue highlighting important applications of the field:detection of chromatin states; detection of SNP-containing motifs and association with transcription factor-binding sites; improvements in functional enrichment modules; and gene association studies on aging and cancer.We expect this issue to provide bioinformatics scientists,cancer biologists,and clinical doctors with a better understanding of how cancer bioinformatics can be used to identify candidate biomarkers and targets and to conduct functional analysis.

  15. Analysis of ultra-deep pyrosequencing and cloning based sequencing of the basic core promoter/precore/core region of hepatitis B virus using newly developed bioinformatics tools.

    Directory of Open Access Journals (Sweden)

    Mukhlid Yousif

    Full Text Available AIMS: The aims of this study were to develop bioinformatics tools to explore ultra-deep pyrosequencing (UDPS data, to test these tools, and to use them to determine the optimum error threshold, and to compare results from UDPS and cloning based sequencing (CBS. METHODS: Four serum samples, infected with either genotype D or E, from HBeAg-positive and HBeAg-negative patients were randomly selected. UDPS and CBS were used to sequence the basic core promoter/precore region of HBV. Two online bioinformatics tools, the "Deep Threshold Tool" and the "Rosetta Tool" (http://hvdr.bioinf.wits.ac.za/tools/, were built to test and analyze the generated data. RESULTS: A total of 10952 reads were generated by UDPS on the 454 GS Junior platform. In the four samples, substitutions, detected at 0.5% threshold or above, were identified at 39 unique positions, 25 of which were non-synonymous mutations. Sample #2 (HBeAg-negative, genotype D had substitutions in 26 positions, followed by sample #1 (HBeAg-negative, genotype E in 12 positions, sample #3 (HBeAg-positive, genotype D in 7 positions and sample #4 (HBeAg-positive, genotype E in only four positions. The ratio of nucleotide substitutions between isolates from HBeAg-negative and HBeAg-positive patients was 3.5 ∶ 1. Compared to genotype E isolates, genotype D isolates showed greater variation in the X, basic core promoter/precore and core regions. Only 18 of the 39 positions identified by UDPS were detected by CBS, which detected 14 of the 25 non-synonymous mutations detected by UDPS. CONCLUSION: UDPS data should be approached with caution. Appropriate curation of read data is required prior to analysis, in order to clean the data and eliminate artefacts. CBS detected fewer than 50% of the substitutions detected by UDPS. Furthermore it is important that the appropriate consensus (reference sequence is used in order to identify variants correctly.

  16. A library-based bioinformatics services program.

    Science.gov (United States)

    Yarfitz, S; Ketchell, D S

    2000-01-01

    Support for molecular biology researchers has been limited to traditional library resources and services in most academic health sciences libraries. The University of Washington Health Sciences Libraries have been providing specialized services to this user community since 1995. The library recruited a Ph.D. biologist to assess the molecular biological information needs of researchers and design strategies to enhance library resources and services. A survey of laboratory research groups identified areas of greatest need and led to the development of a three-pronged program: consultation, education, and resource development. Outcomes of this program include bioinformatics consultation services, library-based and graduate level courses, networking of sequence analysis tools, and a biological research Web site. Bioinformatics clients are drawn from diverse departments and include clinical researchers in need of tools that are not readily available outside of basic sciences laboratories. Evaluation and usage statistics indicate that researchers, regardless of departmental affiliation or position, require support to access molecular biology and genetics resources. Centralizing such services in the library is a natural synergy of interests and enhances the provision of traditional library resources. Successful implementation of a library-based bioinformatics program requires both subject-specific and library and information technology expertise.

  17. A library-based bioinformatics services program*

    Science.gov (United States)

    Yarfitz, Stuart; Ketchell, Debra S.

    2000-01-01

    Support for molecular biology researchers has been limited to traditional library resources and services in most academic health sciences libraries. The University of Washington Health Sciences Libraries have been providing specialized services to this user community since 1995. The library recruited a Ph.D. biologist to assess the molecular biological information needs of researchers and design strategies to enhance library resources and services. A survey of laboratory research groups identified areas of greatest need and led to the development of a three-pronged program: consultation, education, and resource development. Outcomes of this program include bioinformatics consultation services, library-based and graduate level courses, networking of sequence analysis tools, and a biological research Web site. Bioinformatics clients are drawn from diverse departments and include clinical researchers in need of tools that are not readily available outside of basic sciences laboratories. Evaluation and usage statistics indicate that researchers, regardless of departmental affiliation or position, require support to access molecular biology and genetics resources. Centralizing such services in the library is a natural synergy of interests and enhances the provision of traditional library resources. Successful implementation of a library-based bioinformatics program requires both subject-specific and library and information technology expertise. PMID:10658962

  18. Genomic-bioinformatic analysis of transcripts enriched in the third-stage larva of the parasitic nematode Ascaris suum.

    Directory of Open Access Journals (Sweden)

    Cui-Qin Huang

    Full Text Available Differential transcription in Ascaris suum was investigated using a genomic-bioinformatic approach. A cDNA archive enriched for molecules in the infective third-stage larva (L3 of A. suum was constructed by suppressive-subtractive hybridization (SSH, and a subset of cDNAs from 3075 clones subjected to microarray analysis using cDNA probes derived from RNA from different developmental stages of A. suum. The cDNAs (n = 498 shown by microarray analysis to be enriched in the L3 were sequenced and subjected to bioinformatic analyses using a semi-automated pipeline (ESTExplorer. Using gene ontology (GO, 235 of these molecules were assigned to 'biological process' (n = 68, 'cellular component' (n = 50, or 'molecular function' (n = 117. Of the 91 clusters assembled, 56 molecules (61.5% had homologues/orthologues in the free-living nematodes Caenorhabditis elegans and C. briggsae and/or other organisms, whereas 35 (38.5% had no significant similarity to any sequences available in current gene databases. Transcripts encoding protein kinases, protein phosphatases (and their precursors, and enolases were abundantly represented in the L3 of A. suum, as were molecules involved in cellular processes, such as ubiquitination and proteasome function, gene transcription, protein-protein interactions, and function. In silico analyses inferred the C. elegans orthologues/homologues (n = 50 to be involved in apoptosis and insulin signaling (2%, ATP synthesis (2%, carbon metabolism (6%, fatty acid biosynthesis (2%, gap junction (2%, glucose metabolism (6%, or porphyrin metabolism (2%, although 34 (68% of them could not be mapped to a specific metabolic pathway. Small numbers of these 50 molecules were predicted to be secreted (10%, anchored (2%, and/or transmembrane (12% proteins. Functionally, 17 (34% of them were predicted to be associated with (non-wild-type RNAi phenotypes in C. elegans, the majority being embryonic lethality (Emb (13 types; 58.8%, larval arrest

  19. Identification of candidate genes and mutations in QTL regions for chicken growth using bioinformatic analysis of NGS and SNP-chip data

    Directory of Open Access Journals (Sweden)

    Muhammad eAhsan

    2013-11-01

    Full Text Available Mapping of chromosomal regions harboring genetic polymorphisms that regulate complex traits is usually followed by a search for the causative mutations underlying the observed effects. This is often a challenging task even after fine mapping, as millions of base pairs including many genes will typically need to be investigated. Thus to trace the causative mutation(s there is a great need for efficient bioinformatic strategies. Here, we searched for genes and mutations regulating growth in the Virginia chicken lines – an experimental population comprising two lines that have been divergently selected for body weight at 56 days for more than 50 generations. Several QTL regions have been mapped in an F2 intercross between the lines, and the regions have subsequently been replicated and fine mapped using an Advanced Intercross Line. We have further analyzed the QTL regions where the largest genetic divergence between the High-Weight selected (HWS and Low-Weight selected (LWS lines was observed. Such regions, covering about 37% of the actual QTL regions, were identified by comparing the allele frequencies of the HWS and LWS lines using both individual 60K SNP chip genotyping of birds and analysis of read proportions from genome resequencing of DNA pools. Based on a combination of criteria including significance of the QTL, allele frequency difference of identified mutations between the selected lines, gene information on relevance for growth, and the predicted functional effects of identified mutations we propose here a subset of candidate mutations of highest priority for further evaluation in functional studies. The candidate mutations were identified within the GCG, IGFBP2, GRB14, CRIM1, FGF16, VEGFR-2, ALG11, EDN1, SNX6 and BIRC7 genes. We believe that the proposed method of combining different types of genomic information increases the probability that the genes underlying the observed QTL effects are represented among the candidate mutations

  20. Bioinformatic analysis for allergenicity assessment of Bacillus thuringiensis Cry proteins expressed in insect-resistant food crops.

    Science.gov (United States)

    Randhawa, Gurinder Jit; Singh, Monika; Grover, Monendra

    2011-02-01

    The novel proteins introduced into the genetically modified (GM) crops need to be evaluated for the potential allergenicity before their introduction into the food chain to address the safety concerns of consumers. At present, there is no single definitive test that can be relied upon to predict allergic response in humans to a new protein; hence a composite approach to allergic response prediction is described in this study. The present study reports on the evaluation of the Cry proteins, encoded by cry1Ac, cry1Ab, cry2Ab, cry1Ca, cry1Fa/cry1Ca hybrid, being expressed in Bt food crops that are under field trials in India, for potential allergenic cross-reactivity using bioinformatics search tools. The sequence identity of amino acids was analyzed using FASTA3 of AllergenOnline version 10.0 and BLASTX of NCBI Entrez to identify any potential sequence matches to allergen proteins. As a step further in the detection of allergens, an independent database of domains in the allergens available in the AllergenOnline database was also developed. The results indicated no significant alignment and similarity of Cry proteins at domain level with any of the known allergens revealing that there is no potential risk of allergenic cross-reactivity.

  1. Bioinformatics and the Undergraduate Curriculum

    Science.gov (United States)

    Maloney, Mark; Parker, Jeffrey; LeBlanc, Mark; Woodard, Craig T.; Glackin, Mary; Hanrahan, Michael

    2010-01-01

    Recent advances involving high-throughput techniques for data generation and analysis have made familiarity with basic bioinformatics concepts and programs a necessity in the biological sciences. Undergraduate students increasingly need training in methods related to finding and retrieving information stored in vast databases. The rapid rise of…

  2. Visualising "Junk" DNA through Bioinformatics

    Science.gov (United States)

    Elwess, Nancy L.; Latourelle, Sandra M.; Cauthorn, Olivia

    2005-01-01

    One of the hottest areas of science today is the field in which biology, information technology,and computer science are merged into a single discipline called bioinformatics. This field enables the discovery and analysis of biological data, including nucleotide and amino acid sequences that are easily accessed through the use of computers. As…

  3. Ready to use bioinformatics analysis as a tool to predict immobilisation strategies for protein direct electron transfer (DET).

    Science.gov (United States)

    Cazelles, R; Lalaoui, N; Hartmann, T; Leimkühler, S; Wollenberger, U; Antonietti, M; Cosnier, S

    2016-11-15

    Direct electron transfer (DET) to proteins is of considerable interest for the development of biosensors and bioelectrocatalysts. While protein structure is mainly used as a method of attaching the protein to the electrode surface, we employed bioinformatics analysis to predict the suitable orientation of the enzymes to promote DET. Structure similarity and secondary structure prediction were combined underlying localized amino-acids able to direct one of the enzyme's electron relays toward the electrode surface by creating a suitable bioelectrocatalytic nanostructure. The electro-polymerization of pyrene pyrrole onto a fluorine-doped tin oxide (FTO) electrode allowed the targeted orientation of the formate dehydrogenase enzyme from Rhodobacter capsulatus (RcFDH) by means of hydrophobic interactions. Its electron relays were directed to the FTO surface, thus promoting DET. The reduction of nicotinamide adenine dinucleotide (NAD(+)) generating a maximum current density of 1μAcm(-2) with 10mM NAD(+) leads to a turnover number of 0.09electron/s/molRcFDH. This work represents a practical approach to evaluate electrode surface modification strategies in order to create valuable bioelectrocatalysts.

  4. Bioinformatics Analysis for the Antirheumatic Effects of Huang-Lian-Jie-Du-Tang from a Network Perspective

    Directory of Open Access Journals (Sweden)

    Haiyang Fang

    2013-01-01

    Full Text Available Huang-Lian-Jie-Du-Tang (HLJDT is a classic TCM formula to clear “heat” and “poison” that exhibits antirheumatic activity. Here we investigated the therapeutic mechanisms of HLJDT at protein network level using bioinformatics approach. It was found that HLJDT shares 5 target proteins with 3 types of anti-RA drugs, and several pathways in immune system and bone formation are significantly regulated by HLJDT’s components, suggesting the therapeutic effect of HLJDT on RA. By defining an antirheumatic effect score to quantitatively measure the therapeutic effect, we found that the score of each HLJDT’s component is very low, while the whole HLJDT achieves a much higher effect score, suggesting a synergistic effect of HLJDT achieved by its multiple components acting on multiple targets. At last, topological analysis on the RA-associated PPI network was conducted to illustrate key roles of HLJDT’s target proteins on this network. Integrating our findings with TCM theory suggests that HLJDT targets on hub nodes and main pathway in the Hot ZENG network, and thus it could be applied as adjuvant treatment for Hot-ZENG-related RA. This study may facilitate our understanding of antirheumatic effect of HLJDT and it may suggest new approach for the study of TCM pharmacology.

  5. Deep Learning in Bioinformatics

    OpenAIRE

    Min, Seonwoo; Lee, Byunghan; Yoon, Sungroh

    2016-01-01

    In the era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important challenges in bioinformatics. Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-art performance in various fields. Accordingly, application of deep learning in bioinformatics to gain insight from data has been emphasized in both academia and industry. Here, we review deep learning in bioinformatics, presenting examples of current res...

  6. Factor analysis identifies subgroups of constipation

    Institute of Scientific and Technical Information of China (English)

    Philip G Dinning; Mike Jones; Linda Hunt; Sergio E Fuentealba; Jamshid Kalanter; Denis W King; David Z Lubowski; Nicholas J Talley; Ian J Cook

    2011-01-01

    AIM: To determine whether distinct symptom groupings exist in a constipated population and whether such grouping might correlate with quantifiable pathophysiological measures of colonic dysfunction. METHODS: One hundred and ninety-one patients presenting to a Gastroenterology clinic with constipation and 32 constipated patients responding to a newspaper advertisement completed a 53-item, wide-ranging selfreport questionnaire. One hundred of these patients had colonic transit measured scintigraphically. Factor analysis determined whether constipation-related symptoms grouped into distinct aspects of symptomatology. Cluster analysis was used to determine whether individual patients naturally group into distinct subtypes. RESULTS: Cluster analysis yielded a 4 cluster solution with the presence or absence of pain and laxative unresponsiveness providing the main descriptors. Amongst all clusters there was a considerable proportion of patients with demonstrable delayed colon transit, irritable bowel syndrome positive criteria and regular stool frequency. The majority of patients with these characteristics also reported regular laxative use. CONCLUSION: Factor analysis identified four constipation subgroups, based on severity and laxative unresponsiveness, in a constipated population. However, clear stratification into clinically identifiable groups remains imprecise.

  7. Bioinformatics Prediction and Evolution Analysis of Arabinogalactan Proteins in the Plant Kingdom

    Science.gov (United States)

    Ma, Yuling; Yan, Chenchao; Li, Huimin; Wu, Wentao; Liu, Yaxue; Wang, Yuqian; Chen, Qin; Ma, Haoli

    2017-01-01

    Arabinogalactan proteins (AGPs) are a family of extracellular glycoproteins implicated in plant growth and development. With a rapid growth in the number of genomes sequenced in many plant species, the family members of AGPs can now be predicted to facilitate functional investigation. Building upon previous advances in identifying Arabidopsis AGPs, an integrated strategy of systematical AGP screening for “classical” and “chimeric” family members is proposed in this study. A Python script named Finding-AGP is compiled to find AGP-like sequences and filter AGP candidates under the given thresholds. The primary screening of classical AGPs, Lys-rich classical AGPs, AGP-extensin hybrids, and non-classical AGPs was performed using the existence of signal peptides as a necessary requirement, and BLAST searches were conducted mainly for fasciclin-like, phytocyanin-like and xylogen-like AGPs. Then glycomodule index and partial PAST (Pro, Ala, Ser, and Thr) percentage are adopted to identify AGP candidates. The integrated strategy successfully discovered AGP gene families in 47 plant species and the main results are summarized as follows: (i) AGPs are abundant in angiosperms and many “ancient” AGPs with Ser-Pro repeats are found in Chlamydomonas reinhardtii; (ii) Classical AGPs, AG-peptides, and Lys-rich classical AGPs first emerged in Physcomitrella patens, Selaginella moellendorffii, and Picea abies, respectively; (iii) Nine subfamilies of chimeric AGPs are introduced as newly identified chimeric subfamilies similar to fasciclin-like, phytocyanin-like, and xylogen-like AGPs; (iv) The length and amino acid composition of Lys-rich domains are largely variable, indicating an insertion/deletion model during evolution. Our findings provide not only a powerful means to identify AGP gene families but also probable explanations of AGPs in maintaining the plant cell wall and transducing extracellular signals into the cytoplasm. PMID:28184232

  8. Suppression subtractive hybridization (SSH) combined with bioinformatics method: an integrated functional annotation approach for analysis of differentially expressed immune-genes in insects.

    Science.gov (United States)

    Badapanda, Chandan

    2013-01-01

    The suppression subtractive hybridization (SSH) approach, a PCR based approach which amplifies differentially expressed cDNAs (complementary DNAs), while simultaneously suppressing amplification of common cDNAs, was employed to identify immuneinducible genes in insects. This technique has been used as a suitable tool for experimental identification of novel genes in eukaryotes as well as prokaryotes; whose genomes have been sequenced, or the species whose genomes have yet to be sequenced. In this article, I have proposed a method for in silico functional characterization of immune-inducible genes from insects. Apart from immune-inducible genes from insects, this method can be applied for the analysis of genes from other species, starting from bacteria to plants and animals. This article is provided with a background of SSH-based method taking specific examples from innate immune-inducible genes in insects, and subsequently a bioinformatics pipeline is proposed for functional characterization of newly sequenced genes. The proposed workflow presented here, can also be applied for any newly sequenced species generated from Next Generation Sequencing (NGS) platforms.

  9. Antimicrobial Protein Candidates from the Thermophilic Geobacillus sp. Strain ZGt-1: Production, Proteomics, and Bioinformatics Analysis

    Science.gov (United States)

    Alkhalili, Rawana N.; Bernfur, Katja; Dishisha, Tarek; Mamo, Gashaw; Schelin, Jenny; Canbäck, Björn; Emanuelsson, Cecilia; Hatti-Kaul, Rajni

    2016-01-01

    A thermophilic bacterial strain, Geobacillus sp. ZGt-1, isolated from Zara hot spring in Jordan, was capable of inhibiting the growth of the thermophilic G. stearothermophilus and the mesophilic Bacillus subtilis and Salmonella typhimurium on a solid cultivation medium. Antibacterial activity was not observed when ZGt-1 was cultivated in a liquid medium; however, immobilization of the cells in agar beads that were subjected to sequential batch cultivation in the liquid medium at 60 °C showed increasing antibacterial activity up to 14 cycles. The antibacterial activity was lost on protease treatment of the culture supernatant. Concentration of the protein fraction by ammonium sulphate precipitation followed by denaturing polyacrylamide gel electrophoresis separation and analysis of the gel for antibacterial activity against G. stearothermophilus showed a distinct inhibition zone in 15–20 kDa range, suggesting that the active molecule(s) are resistant to denaturation by SDS. Mass spectrometric analysis of the protein bands around the active region resulted in identification of 22 proteins with molecular weight in the range of interest, three of which were new and are here proposed as potential antimicrobial protein candidates by in silico analysis of their amino acid sequences. Mass spectrometric analysis also indicated the presence of partial sequences of antimicrobial enzymes, amidase and dd-carboxypeptidase. PMID:27548162

  10. Antimicrobial Protein Candidates from the Thermophilic Geobacillus sp. Strain ZGt-1: Production, Proteomics, and Bioinformatics Analysis

    Directory of Open Access Journals (Sweden)

    Rawana N. Alkhalili

    2016-08-01

    Full Text Available A thermophilic bacterial strain, Geobacillus sp. ZGt-1, isolated from Zara hot spring in Jordan, was capable of inhibiting the growth of the thermophilic G. stearothermophilus and the mesophilic Bacillus subtilis and Salmonella typhimurium on a solid cultivation medium. Antibacterial activity was not observed when ZGt-1 was cultivated in a liquid medium; however, immobilization of the cells in agar beads that were subjected to sequential batch cultivation in the liquid medium at 60 °C showed increasing antibacterial activity up to 14 cycles. The antibacterial activity was lost on protease treatment of the culture supernatant. Concentration of the protein fraction by ammonium sulphate precipitation followed by denaturing polyacrylamide gel electrophoresis separation and analysis of the gel for antibacterial activity against G. stearothermophilus showed a distinct inhibition zone in 15–20 kDa range, suggesting that the active molecule(s are resistant to denaturation by SDS. Mass spectrometric analysis of the protein bands around the active region resulted in identification of 22 proteins with molecular weight in the range of interest, three of which were new and are here proposed as potential antimicrobial protein candidates by in silico analysis of their amino acid sequences. Mass spectrometric analysis also indicated the presence of partial sequences of antimicrobial enzymes, amidase and dd-carboxypeptidase.

  11. Identification and bioinformatics analysis of microRNAs from the sporophyte and gametophyte of Pyropia haitanensis

    Science.gov (United States)

    Huang, Aiyou; Wang, Guangce

    2016-05-01

    Pyropia haitanensis (T. J. Chang et B. F. Zheng) N. Kikuchi et M. Miyata ( Porphyra haitanensis) is an economically important genus that is cultured widely in China. P. haitanensis is cultured on a larger scale than Pyropia yezoensis, making up an important part of the total production of cultivated Pyropia in China. However, the majority of molecular mechanisms underlying the physiological processes of P. haitanensis remain unknown. P. haitanensis could utilize inorganic carbon and the sporophytes of P. haitanensis might possess a PCK-type C4-like carbon-fixation pathway. To identify microRNAs and their probable roles in sporophyte and gametophyte development, we constructed and sequenced small RNA libraries from sporophytes and gametophytes of P. haitanensis. Five microRNAs were identified that shared no sequence homology with known microRNAs. Our results indicated that P. haitanensis might posses a complex sRNA processing system in which the novel microRNAs act as important regulators of the development of different generations of P. haitanensis.

  12. A bioinformatic approach to understanding antibiotic resistance in intracellular bacteria through whole genome analysis

    OpenAIRE

    Biswas, S.(National Institute of Science Education and Research, Bhubaneswar, India); Raoult, Didier; Rolain, J. M.

    2008-01-01

    Intracellular bacteria survive within eukaryotic host cells and are difficult to kill with certain antibiotics. As a result, antibiotic resistance in intracellular bacteria is becoming commonplace in healthcare institutions. Owing to the lack of methods available for transforming these bacteria, we evaluated the mechanisms of resistance using molecular methods and in silico genome analysis. The objective of this review was to understand the molecular mechanisms of antibiotic resistance throug...

  13. Bioinformatics analysis of the molecular mechanism of diffuse intrinsic pontine glioma

    Science.gov (United States)

    Deng, Lei; Xiong, Pengju; Luo, Yunhui; Bu, Xiao; Qian, Suokai; Zhong, Wuzhao

    2016-01-01

    The present study aimed to elucidate key molecular mechanisms in the progression of diffuse intrinsic pontine glioma (DIPG). The gene expression profile GSE50021, which consisted of 35 pediatric DIPG samples and 10 normal brain samples, was downloaded from the Gene Expression Omnibus database. The differentially-expressed genes (DEGs) in the pediatric DIPG samples were identified. Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome pathways of DEGs were enriched and analyzed. The protein-protein interaction (PPI) network of the DEGs was constructed and functional modules of the PPI network were disclosed using ClusterONE. A total of 679 DEGs (454 up- and 225 downregulated) were identified in the pediatric DIPG samples. DEGs were significantly enriched in various GO terms, and KEGG and Reactome pathways. The PPI network of upregulated (153 nodes and 298 connections) and downregulated (71 nodes and 124 connections) DEGs, and two crucial modules, were obtained. Downregulated genes in module 2, such as cholecystokinin (CCK), gastrin (GAST), adenylate cyclase 2 (brain) (ADCY2) and 5-hydroxytryptamine (serotonin) receptor 7 (HTR7), were significantly enriched in the calcium signaling pathway, the neuroactive ligand-receptor interaction pathway and in GO terms, such as the G-protein coupled receptor (GPCR) signaling pathway, while upregulated genes in module 1 were not enriched in any pathways or GO terms. CCK and GAST associated with the GPCR signaling pathway, HTR7 enriched in the neuroactive ligand-receptor interaction, and ADCY2 and HTR7 involved in the calcium signaling pathway may be key mechanisms playing crucial roles in the development and progression of DIPG.

  14. Bioinformatic Analysis Reveals Archaeal tRNATyr and tRNATrp Identities in Bacteria

    Directory of Open Access Journals (Sweden)

    Takahito Mukai

    2017-02-01

    Full Text Available The tRNA identity elements for some amino acids are distinct between the bacterial and archaeal domains. Searching in recent genomic and metagenomic sequence data, we found some candidate phyla radiation (CPR bacteria with archaeal tRNA identity for Tyr-tRNA and Trp-tRNA synthesis. These bacteria possess genes for tyrosyl-tRNA synthetase (TyrRS and tryptophanyl-tRNA synthetase (TrpRS predicted to be derived from DPANN superphylum archaea, while the cognate tRNATyr and tRNATrp genes reveal bacterial or archaeal origins. We identified a trace of domain fusion and swapping in the archaeal-type TyrRS gene of a bacterial lineage, suggesting that CPR bacteria may have used this mechanism to create diverse proteins. Archaeal-type TrpRS of bacteria and a few TrpRS species of DPANN archaea represent a new phylogenetic clade (named TrpRS-A. The TrpRS-A open reading frames (ORFs are always associated with another ORF (named ORF1 encoding an unknown protein without global sequence identity to any known protein. However, our protein structure prediction identified a putative HIGH-motif and KMSKS-motif as well as many α-helices that are characteristic of class I aminoacyl-tRNA synthetase (aaRS homologs. These results provide another example of the diversity of molecular components that implement the genetic code and provide a clue to the early evolution of life and the genetic code.

  15. Structure, circadian regulation and bioinformatic analysis of the unique sigma factor gene in Chlamydomonas reinhardtii.

    Science.gov (United States)

    Carter, Matthew L; Smith, Annette C; Kobayashi, Hirokazu; Purton, Saul; Herrin, David L

    2004-01-01

    In higher plants, the transcription of plastid genes is mediated by at least two types of RNA polymerase (RNAP); a plastid-encoded bacterial RNAP in which promoter specificity is conferred by nuclear-encoded sigma factors, and a nuclear-encoded phage-like RNAP. Green algae, however, appear to possess only the bacterial enzyme. Since transcription of much, if not most, of the chloroplast genome in Chlamydomonas reinhardtii is regulated by the circadian clock and the nucleus, we sought to identify sigma factor genes that might be responsible for this regulation. We describe a nuclear gene (RPOD) that is predicted to encode an 80 kDa protein that, in addition to a predicted chloroplast transit peptide at the N-terminus, has the conserved motifs (2.1- 4.2) diagnostic of bacterial sigma-70 factors. We also identified two motifs not previously recognized for sigma factors, adjacent PEST sequences and a leucine zipper, both suggested to be involved in protein-protein interactions. PEST sequences were also found in approximately 40% of sigma factors examined, indicating they may be of general significance. Southern blot hybridization and BLAST searches of the genome and EST databases suggest that RPODmay be the only sigma factor gene in C. reinhardtii. The levels of RPODmRNA increased 2- 3-fold in the mid-to-late dark period of light-dark cycling cells, just prior to, or coincident with, the peak in chloroplast transcription. Also, the dark-period peak in RPOD mRNA persisted in cells shifted to continuous light or continuous dark for at least one cycle, indicating that RPODis under circadian clock control. These results suggest that regulation of RPODexpression contributes to the circadian clock's control of chloroplast transcription.

  16. The haloarchaeal MCM proteins: bioinformatic analysis and targeted mutagenesis of the β7-β8 and β9-β10 hairpin loops and conserved zinc binding domain cysteines

    Directory of Open Access Journals (Sweden)

    Tatjana P Kristensen

    2014-03-01

    Full Text Available The hexameric MCM complex is the catalytic core of the replicative helicase in eukaryotic and archaeal cells. Here we describe the first in vivo analysis of archaeal MCM protein structure and function relationships using the genetically tractable haloarchaeon Haloferax volcanii as a model system. Hfx. volcanii encodes a single MCM protein that is part of the previously identified core group of haloarchaeal MCM proteins. Three structural features of the N-terminal domain of the Hfx. volcanii MCM protein were targeted for mutagenesis: the β7-β8 and β9-β10 β-hairpin loops and putative zinc binding domain. Five strains carrying single point mutations in the β7-β8 β-hairpin loop were constructed, none of which displayed impaired cell growth under normal conditions or when treated with the DNA damaging agent mitomycin C. However, short sequence deletions within the β7-β8 β-hairpin were not tolerated and neither was replacement of the highly conserved residue glutamate 187 with alanine. Six strains carrying paired alanine substitutions within the β9-β10 β-hairpin loop were constructed, leading to the conclusion that no individual amino acid within that hairpin loop is absolutely required for MCM function, although one of the mutant strains displays greatly enhanced sensitivity to mitomycin C. Deletions of two or four amino acids from the β9-β10 β-hairpin were tolerated but mutants carrying larger deletions were inviable. Similarly, it was not possible to construct mutants in which any of the conserved zinc binding cysteines was replaced with alanine, underlining the likely importance of zinc binding for MCM function. The results of these studies demonstrate the feasibility of using Hfx. volcanii as a model system for reverse genetic analysis of archaeal MCM protein function and provide important confirmation of the in vivo importance of conserved structural features identified by previous bioinformatic, biochemical and structural

  17. Toward the Replacement of Animal Experiments through the Bioinformatics-driven Analysis of 'Omics' Data from Human Cell Cultures.

    Science.gov (United States)

    Grafström, Roland C; Nymark, Penny; Hongisto, Vesa; Spjuth, Ola; Ceder, Rebecca; Willighagen, Egon; Hardy, Barry; Kaski, Samuel; Kohonen, Pekka

    2015-11-01

    This paper outlines the work for which Roland Grafström and Pekka Kohonen were awarded the 2014 Lush Science Prize. The research activities of the Grafström laboratory have, for many years, covered cancer biology studies, as well as the development and application of toxicity-predictive in vitro models to determine chemical safety. Through the integration of in silico analyses of diverse types of genomics data (transcriptomic and proteomic), their efforts have proved to fit well into the recently-developed Adverse Outcome Pathway paradigm. Genomics analysis within state-of-the-art cancer biology research and Toxicology in the 21st Century concepts share many technological tools. A key category within the Three Rs paradigm is the Replacement of animals in toxicity testing with alternative methods, such as bioinformatics-driven analyses of data obtained from human cell cultures exposed to diverse toxicants. This work was recently expanded within the pan-European SEURAT-1 project (Safety Evaluation Ultimately Replacing Animal Testing), to replace repeat-dose toxicity testing with data-rich analyses of sophisticated cell culture models. The aims and objectives of the SEURAT project have been to guide the application, analysis, interpretation and storage of 'omics' technology-derived data within the service-oriented sub-project, ToxBank. Particularly addressing the Lush Science Prize focus on the relevance of toxicity pathways, a 'data warehouse' that is under continuous expansion, coupled with the development of novel data storage and management methods for toxicology, serve to address data integration across multiple 'omics' technologies. The prize winners' guiding principles and concepts for modern knowledge management of toxicological data are summarised. The translation of basic discovery results ranged from chemical-testing and material-testing data, to information relevant to human health and environmental safety.

  18. Bioinformatics clouds for big data manipulation

    KAUST Repository

    Dai, Lin

    2012-11-28

    As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics.This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor. 2012 Dai et al.; licensee BioMed Central Ltd.

  19. Bioinformatics clouds for big data manipulation

    Directory of Open Access Journals (Sweden)

    Dai Lin

    2012-11-01

    Full Text Available Abstract As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS, Software as a Service (SaaS, Platform as a Service (PaaS, and Infrastructure as a Service (IaaS, and present our perspectives on the adoption of cloud computing in bioinformatics. Reviewers This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor.

  20. First report on interferon related developmental regulator-1 from Macrobrachium rosenbergii: bioinformatic analysis and gene expression.

    Science.gov (United States)

    Arockiaraj, Jesu; Easwvaran, Sarasvathi; Vanaraja, Puganeshwaran; Singh, Arun; Othman, Rofina Yasmin; Bhassu, Subha

    2012-05-01

    This study reports the first full length gene of interferon related developmental regulator-1 (designated as MrIRDR-1), identified from the transcriptome of Macrobrachium rosenbergii. The complete gene sequence of the MrIRDR-1 is 2459 base pair long with an open reading frame of 1308 base pairs and encoding a predicted protein of 436 amino acids with a calculated molecular mass of 48 kDa. The MrIRDR-1 protein contains a long interferon related developmental regulator super family domain between 30 and 330. The mRNA expressions of MrIRDR-1 in healthy and the infectious hypodermal and hematopoietic necrosis virus (IHHNV) infected M. rosenbergii were examined using qRT-PCR. The MrIRDR-1 is highly expressed in hepatopancreas along with all other tissues (walking leg, gills, muscle, haemocyte, pleopods, brain, stomach, intestine and eye stalk). After IHHNV infection, the expression is highly upregulated in hepatopancreas. This result indicates an important role of MrIRDR-1 in prawn defense system.

  1. Bioinformatic analysis of the neprilysin (M13 family of peptidases reveals complex evolutionary and functional relationships

    Directory of Open Access Journals (Sweden)

    Pinney John W

    2008-01-01

    Full Text Available Abstract Background The neprilysin (M13 family of endopeptidases are zinc-metalloenzymes, the majority of which are type II integral membrane proteins. The best characterised of this family is neprilysin, which has important roles in inactivating signalling peptides involved in modulating neuronal activity, blood pressure and the immune system. Other family members include the endothelin converting enzymes (ECE-1 and ECE-2, which are responsible for the final step in the synthesis of potent vasoconstrictor endothelins. The ECEs, as well as neprilysin, are considered valuable therapeutic targets for treating cardiovascular disease. Other members of the M13 family have not been functionally characterised, but are also likely to have biological roles regulating peptide signalling. The recent sequencing of animal genomes has greatly increased the number of M13 family members in protein databases, information which can be used to reveal evolutionary relationships and to gain insight into conserved biological roles. Results The phylogenetic analysis successfully resolved vertebrate M13 peptidases into seven classes, one of which appears to be specific to mammals, and insect genes into five functional classes and a series of expansions, which may include inactive peptidases. Nematode genes primarily resolved into groups containing no other taxa, bar the two nematode genes associated with Drosophila DmeNEP1 and DmeNEP4. This analysis reconstructed only one relationship between chordate and invertebrate clusters, that of the ECE sub-group and the DmeNEP3 related genes. Analysis of amino acid utilisation in the active site of M13 peptidases reveals a basis for their biochemical properties. A relatively invariant S1' subsite gives the majority of M13 peptidases their strong preference for hydrophobic residues in P1' position. The greater variation in the S2' subsite may be instrumental in determining the specificity of M13 peptidases for their substrates

  2. Bioinformatic analysis of xenobiotic reactive metabolite target proteins and their interacting partners

    Directory of Open Access Journals (Sweden)

    Hanzlik Robert P

    2009-06-01

    Full Text Available Abstract Background Protein covalent binding by reactive metabolites of drugs, chemicals and natural products can lead to acute cytotoxicity. Recent rapid progress in reactive metabolite target protein identification has shown that adduction is surprisingly selective and inspired the hope that analysis of target proteins might reveal protein factors that differentiate target- vs. non-target proteins and illuminate mechanisms connecting covalent binding to cytotoxicity. Results Sorting 171 known reactive metabolite target proteins revealed a number of GO categories and KEGG pathways to be significantly enriched in targets, but in most cases the classes were too large, and the "percent coverage" too small, to allow meaningful conclusions about mechanisms of toxicity. However, a similar analysis of the directlyinteracting partners of 28 common targets of multiple reactive metabolites revealed highly significant enrichments in terms likely to be highly relevant to cytotoxicity (e.g., MAP kinase pathways, apoptosis, response to unfolded protein. Machine learning was used to rank the contribution of 211 computed protein features to determining protein susceptibility to adduction. Protein lysine (but not cysteine content and protein instability index (i.e., rate of turnover in vivo were among the features most important to determining susceptibility. Conclusion As yet there is no good explanation for why some low-abundance proteins become heavily adducted while some abundant proteins become only lightly adducted in vivo. Analyzing the directly interacting partners of target proteins appears to yield greater insight into mechanisms of toxicity than analyzing target proteins per se. The insights provided can readily be formulated as hypotheses to test in future experimental studies.

  3. Quantitative proteomics and bioinformatic analysis provide new insight into the dynamic response of porcine intestine to Salmonella Typhimurium.

    Directory of Open Access Journals (Sweden)

    Melania eCollado-Romero

    2015-09-01

    Full Text Available The enteropathogen Salmonella Typhimurium (S. Typhimurium is the most commonly nontyphoideal serotype isolated in pig worldwide. Currently, one of the main sources of human infection is by consumption of pork meat. Therefore, prevention and control of salmonellosis in pigs is crucial for minimizing risks to public health. The aim of the present study was to use isobaric tags for relative and absolute quantification (iTRAQ to explore differences in the response to Salmonella in two segment of the porcine gut (ileum and colon along a time course of 1, 2 and 6 days post infection (dpi with S. Typhimurium. A total of 298 proteins were identified in the infected ileum samples of which, 112 displayed significant expression differences due to Salmonella infection. In colon, 184 proteins were detected in the infected samples of which 46 resulted differentially expressed with respect to the controls. The higher number of changes in protein expression was quantified in ileum at 2 dpi. Further biological interpretation of proteomics data using bioinformatics tools demonstrated that the expression changes in colon were found in proteins involved in cell death and survival, tissue morphology or molecular transport at the early stages and tissue regeneration at 6 dpi. In ileum, however, changes in protein expression were mainly related to immunological and infection diseases, inflammatory response or connective tissue disorders at 1 and 2 dpi. iTRAQ has proved to be a proteomic robust approach allowing us to identify ileum as the earliest response focus upon S. Typhimurium in the porcine gut. In addition, new functions involved in the response to bacteria such as eIF2 signalling, free radical scavengers or antimicrobial peptides expression have been identified. Finally, the impairment at of the enterohepatic circulation of bile acids and lipid metabolism by means the under regulation of FABP6 protein and FXR/RXR and LXR/RXR signalling pathway in ileum has

  4. Molecular cloning, bioinformatics analysis, and transcriptional profiling of JAZ1 and JAZ2 from Salvia miltiorrhiza.

    Science.gov (United States)

    Zhou, Yangyun; Zhou, Xun; Li, Qing; Chen, Junfeng; Xiao, Ying; Zhang, Lei; Chen, Wansheng

    2017-01-01

    Production of major effective metabolites, tanshinones and lithospermic acid B (LAB), was dramatically enhanced by exogenous jasmonate (JA) treatment in Salvia miltiorrhiza. However, the molecular mechanism of such metabolic activation in S. miltiorrhiza has not been elucidated yet. Here, we focused on jasmonate ZIM-domain (JAZ) proteins that act as repressors of JA signaling. Open reading frames of two novel genes, SmJAZ1 and SmJAZ2, from S. miltiorrhiza were amplified according to the annotation of S. miltiorrhiza transcriptome. Compared to plant JAZs, SmJAZ1 and SmJAZ2 were clustered into different groups by phylogenetic analysis. Organ expression pattern was studied by real-time quantitative PCR (RT-qPCR), showing higher transcription level of both genes in stems than roots and leaves. The two SmJAZs responded to methyl jasmonate at early stage and the transcriptional level significantly increased at 4 H. Our experimental results indicate that SmJAZ1 and SmJAZ2 are JA responsive and presented similar expression trend in JA response. The whole research will certainly facilitate further characterization of JAs effect on effective metabolites and help to ultimately achieve high yield of target compounds (tanshinones and LAB).

  5. Flux Analysis of the Trypanosoma brucei Glycolysis Based on a Multiobjective-Criteria Bioinformatic Approach

    Directory of Open Access Journals (Sweden)

    Amine Ghozlane

    2012-01-01

    Full Text Available Trypanosoma brucei is a protozoan parasite of major of interest in discovering new genes for drug targets. This parasite alternates its life cycle between the mammal host(s (bloodstream form and the insect vector (procyclic form, with two divergent glucose metabolism amenable to in vitro culture. While the metabolic network of the bloodstream forms has been well characterized, the flux distribution between the different branches of the glucose metabolic network in the procyclic form has not been addressed so far. We present a computational analysis (called Metaboflux that exploits the metabolic topology of the procyclic form, and allows the incorporation of multipurpose experimental data to increase the biological relevance of the model. The alternatives resulting from the structural complexity of networks are formulated as an optimization problem solved by a metaheuristic where experimental data are modeled in a multiobjective function. Our results show that the current metabolic model is in agreement with experimental data and confirms the observed high metabolic flexibility of glucose metabolism. In addition, Metaboflux offers a rational explanation for the high flexibility in the ratio between final products from glucose metabolism, thsat is, flux redistribution through the malic enzyme steps.

  6. Isolation and bioinformatics analysis of differentially methylated genomic fragments in human gastric cancer

    Institute of Scientific and Technical Information of China (English)

    Ai-Jun Liao; Qi Su; Xun Wang; Bin Zeng; Wei Shi

    2008-01-01

    AIM:To isolate and analyze the DNA sequences which are methylated differentially between gastric cancer and normal gastric mucosa.METHODS:The differentially methylated DNA sequences between gastric cancer and normal gastric mucosa were isolated by methylation-sensitive representational difference analysis (MS-RDA).Similarities between the separated fragments and the human genomic DNA were analyzed with Basic Local Alignment Search Tool (BLAST).RESULTS:Three differentially methylated DNA sequences were obtained,two of which have been accepted by GenBank.The accession numbers are AY887106 and AY887107.AY887107 was highly similar to the 11th exon of LOC440683 (98%),3'end of LOC440887 (99%),and promoter and exon regions of DRD5 (94%).AY887106 was consistent (98%) with a CpG island in ribosomal RNA isolated from colorectal cancer by Minoru Toyota in 1999.CONCLUSION:The methylation degree is different between gastric cancer and normal gastric mucosa.The differentially methylated DNA sequences can be isolated effectively by MS-RDA.

  7. A comparative structural bioinformatics analysis of the insulin receptor family ectodomain based on phylogenetic information.

    Directory of Open Access Journals (Sweden)

    Miguel E Rentería

    Full Text Available The insulin receptor (IR, the insulin-like growth factor 1 receptor (IGF1R and the insulin receptor-related receptor (IRR are covalently-linked homodimers made up of several structural domains. The molecular mechanism of ligand binding to the ectodomain of these receptors and the resulting activation of their tyrosine kinase domain is still not well understood. We have carried out an amino acid residue conservation analysis in order to reconstruct the phylogeny of the IR Family. We have confirmed the location of ligand binding site 1 of the IGF1R and IR. Importantly, we have also predicted the likely location of the insulin binding site 2 on the surface of the fibronectin type III domains of the IR. An evolutionary conserved surface on the second leucine-rich domain that may interact with the ligand could not be detected. We suggest a possible mechanical trigger of the activation of the IR that involves a slight 'twist' rotation of the last two fibronectin type III domains in order to face the likely location of insulin. Finally, a strong selective pressure was found amongst the IRR orthologous sequences, suggesting that this orphan receptor has a yet unknown physiological role which may be conserved from amphibians to mammals.

  8. Unsupervised analysis of classical biomedical markers: robustness and medical relevance of patient clustering using bioinformatics tools.

    Directory of Open Access Journals (Sweden)

    Michal Markovich Gordon

    Full Text Available MOTIVATION: It has been proposed that clustering clinical markers, such as blood test results, can be used to stratify patients. However, the robustness of clusters formed with this approach to data pre-processing and clustering algorithm choices has not been evaluated, nor has clustering reproducibility. Here, we made use of the NHANES survey to compare clusters generated with various combinations of pre-processing and clustering algorithms, and tested their reproducibility in two separate samples. METHOD: Values of 44 biomarkers and 19 health/life style traits were extracted from the National Health and Nutrition Examination Survey (NHANES. The 1999-2002 survey was used for training, while data from the 2003-2006 survey was tested as a validation set. Twelve combinations of pre-processing and clustering algorithms were applied to the training set. The quality of the resulting clusters was evaluated both by considering their properties and by comparative enrichment analysis. Cluster assignments were projected to the validation set (using an artificial neural network and enrichment in health/life style traits in the resulting clusters was compared to the clusters generated from the original training set. RESULTS: The clusters obtained with different pre-processing and clustering combinations differed both in terms of cluster quality measures and in terms of reproducibility of enrichment with health/life style properties. Z-score normalization, for example, dramatically improved cluster quality and enrichments, as compared to unprocessed data, regardless of the clustering algorithm used. Clustering diabetes patients revealed a group of patients enriched with retinopathies. This could indicate that routine laboratory tests can be used to detect patients suffering from complications of diabetes, although other explanations for this observation should also be considered. CONCLUSIONS: Clustering according to classical clinical biomarkers is a robust

  9. Bioinformatic analysis of microRNA networks following the activation of the constitutive androstane receptor (CAR) in mouse liver.

    Science.gov (United States)

    Hao, Ruixin; Su, Shengzhong; Wan, Yinan; Shen, Frank; Niu, Ben; Coslo, Denise M; Albert, Istvan; Han, Xing; Omiecinski, Curtis J

    2016-09-01

    The constitutive androstane receptor (CAR; NR1I3) is a member of the nuclear receptor superfamily that functions as a xenosensor, serving to regulate xenobiotic detoxification, lipid homeostasis and energy metabolism. CAR activation is also a key contributor to the development of chemical hepatocarcinogenesis in mice. The underlying pathways affected by CAR in these processes are complex and not fully elucidated. MicroRNAs (miRNAs) have emerged as critical modulators of gene expression and appear to impact many cellular pathways, including those involved in chemical detoxification and liver tumor development. In this study, we used deep sequencing approaches with an Illumina HiSeq platform to differentially profile microRNA expression patterns in livers from wild type C57BL/6J mice following CAR activation with the mouse CAR-specific ligand activator, 1,4-bis-[2-(3,5,-dichloropyridyloxy)] benzene (TCPOBOP). Bioinformatic analyses and pathway evaluations were performed leading to the identification of 51 miRNAs whose expression levels were significantly altered by TCPOBOP treatment, including mmu-miR-802-5p and miR-485-3p. Ingenuity Pathway Analysis of the differentially expressed microRNAs revealed altered effector pathways, including those involved in liver cell growth and proliferation. A functional network among CAR targeted genes and the affected microRNAs was constructed to illustrate how CAR modulation of microRNA expression may potentially mediate its biological role in mouse hepatocyte proliferation. This article is part of a Special Issue entitled: Xenobiotic nuclear receptors: New Tricks for An Old Dog, edited by Dr. Wen Xie.

  10. Bioinformatic evaluation of L-arginine catabolic pathways in 24 cyanobacteria and transcriptional analysis of genes encoding enzymes of L-arginine catabolism in the cyanobacterium Synechocystis sp. PCC 6803

    Directory of Open Access Journals (Sweden)

    Pistorius Elfriede K

    2007-11-01

    Full Text Available Abstract Background So far very limited knowledge exists on L-arginine catabolism in cyanobacteria, although six major L-arginine-degrading pathways have been described for prokaryotes. Thus, we have performed a bioinformatic analysis of possible L-arginine-degrading pathways in cyanobacteria. Further, we chose Synechocystis sp. PCC 6803 for a more detailed bioinformatic analysis and for validation of the bioinformatic predictions on L-arginine catabolism with a transcript analysis. Results We have evaluated 24 cyanobacterial genomes of freshwater or marine strains for the presence of putative L-arginine-degrading enzymes. We identified an L-arginine decarboxylase pathway in all 24 strains. In addition, cyanobacteria have one or two further pathways representing either an arginase pathway or L-arginine deiminase pathway or an L-arginine oxidase/dehydrogenase pathway. An L-arginine amidinotransferase pathway as a major L-arginine-degrading pathway is not likely but can not be entirely excluded. A rather unusual finding was that the cyanobacterial L-arginine deiminases are substantially larger than the enzymes in non-photosynthetic bacteria and that they are membrane-bound. A more detailed bioinformatic analysis of Synechocystis sp. PCC 6803 revealed that three different L-arginine-degrading pathways may in principle be functional in this cyanobacterium. These are (i an L-arginine decarboxylase pathway, (ii an L-arginine deiminase pathway, and (iii an L-arginine oxidase/dehydrogenase pathway. A transcript analysis of cells grown either with nitrate or L-arginine as sole N-source and with an illumination of 50 μmol photons m-2 s-1 showed that the transcripts for the first enzyme(s of all three pathways were present, but that the transcript levels for the L-arginine deiminase and the L-arginine oxidase/dehydrogenase were substantially higher than that of the three isoenzymes of L-arginine decarboxylase. Conclusion The evaluation of 24

  11. Bioinformatics Training: A Review of Challenges, Actions and Support Requirements

    DEFF Research Database (Denmark)

    Schneider, M.V.; Watson, J.; Attwood, T.;

    2010-01-01

    As bioinformatics becomes increasingly central to research in the molecular life sciences, the need to train non-bioinformaticians to make the most of bioinformatics resources is growing. Here, we review the key challenges and pitfalls to providing effective training for users of bioinformatics...... services, and discuss successful training strategies shared by a diverse set of bioinformatics trainers. We also identify steps that trainers in bioinformatics could take together to advance the state of the art in current training practices. The ideas presented in this article derive from the first...

  12. Bioinformatics for Genome Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gary J. Olsen

    2005-06-30

    Nesbo, Boucher and Doolittle (2001) used phylogenetic trees of four taxa to assess whether euryarchaeal genes share a common history. They have suggested that of the 521 genes examined, each of the three possible tree topologies relating the four taxa was supported essentially equal numbers of times. They suggest that this might be the result of numerous horizontal gene transfer events, essentially randomizing the relationships between gene histories (as inferred in the 521 gene trees) and organismal relationships (which would be a single underlying tree). Motivated by the fact that the order in which sequences are added to a multiple sequence alignment influences the alignment, and ultimately inferred tree, they were interested in the extent to which the variations among inferred trees might be due to variations in the alignment order. This bears directly on their efforts to evaluate and improve upon methods of multiple sequence alignment. They set out to analyze the influence of alignment order on the tree inferred for 43 genes shared among these same 4 taxa. Because alignments produced by CLUSTALW are directed by a rooted guide tree (the denderogram), there are 15 possible alignment orders of 4 taxa. For each gene they tested all 15 alignment orders, and as a 16th option, allowed CLUSTALW to generate its own guide tree. If we supply all 15 possible rooted guide trees, they expected that at least one of them should be as good at CLUSTAL's own guide tree, but most of the time they differed (sometimes being better than CLUSTAL's default tree and sometimes being worse). The difference seems to be that the user-supplied tree is not given meaningful branch lengths, which effect the assumed probability of amino acid changes. They examined the practicality of modifying CLUSTALW to improve its treatment of user-supplied guide trees. This work became ever increasing bogged down in finding and repairing minor bugs in the CLUSTALW code. This effort was put on hold as we feel that our other proposed approaches will ultimately be better.

  13. Non-Coding RNAs in Lung Cancer: Contribution of Bioinformatics Analysis to the Development of Non-Invasive Diagnostic Tools

    Science.gov (United States)

    Kunz, Meik; Wolf, Beat; Schulze, Harald; Atlan, David; Walles, Thorsten; Walles, Heike; Dandekar, Thomas

    2016-01-01

    Lung cancer is currently the leading cause of cancer related mortality due to late diagnosis and limited treatment intervention. Non-coding RNAs are not translated into proteins and have emerged as fundamental regulators of gene expression. Recent studies reported that microRNAs and long non-coding RNAs are involved in lung cancer development and progression. Moreover, they appear as new promising non-invasive biomarkers for early lung cancer diagnosis. Here, we highlight their potential as biomarker in lung cancer and present how bioinformatics can contribute to the development of non-invasive diagnostic tools. For this, we discuss several bioinformatics algorithms and software tools for a comprehensive understanding and functional characterization of microRNAs and long non-coding RNAs. PMID:28035947

  14. Identification of microRNAs from Amur grape (vitis amurensis Rupr. by deep sequencing and analysis of microRNA variations with bioinformatics

    Directory of Open Access Journals (Sweden)

    Wang Chen

    2012-03-01

    Full Text Available Abstract Background MicroRNA (miRNA is a class of functional non-coding small RNA with 19-25 nucleotides in length while Amur grape (Vitis amurensis Rupr. is an important wild fruit crop with the strongest cold resistance among the Vitis species, is used as an excellent breeding parent for grapevine, and has elicited growing interest in wine production. To date, there is a relatively large number of grapevine miRNAs (vv-miRNAs from cultivated grapevine varieties such as Vitis vinifera L. and hybrids of V. vinifera and V. labrusca, but there is no report on miRNAs from Vitis amurensis Rupr, a wild grapevine species. Results A small RNA library from Amur grape was constructed and Solexa technology used to perform deep sequencing of the library followed by subsequent bioinformatics analysis to identify new miRNAs. In total, 126 conserved miRNAs belonging to 27 miRNA families were identified, and 34 known but non-conserved miRNAs were also found. Significantly, 72 new potential Amur grape-specific miRNAs were discovered. The sequences of these new potential va-miRNAs were further validated through miR-RACE, and accumulation of 18 new va-miRNAs in seven tissues of grapevines confirmed by real time RT-PCR (qRT-PCR analysis. The expression levels of va-miRNAs in flowers and berries were found to be basically consistent in identity to those from deep sequenced sRNAs libraries of combined corresponding tissues. We also describe the conservation and variation of va-miRNAs using miR-SNPs and miR-LDs during plant evolution based on comparison of orthologous sequences, and further reveal that the number and sites of miR-SNP in diverse miRNA families exhibit distinct divergence. Finally, 346 target genes for the new miRNAs were predicted and they include a number of Amur grape stress tolerance genes and many genes regulating anthocyanin synthesis and sugar metabolism. Conclusions Deep sequencing of short RNAs from Amur grape flowers and berries identified 72

  15. Deep learning in bioinformatics.

    Science.gov (United States)

    Min, Seonwoo; Lee, Byunghan; Yoon, Sungroh

    2016-07-29

    In the era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important challenges in bioinformatics. Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-art performance in various fields. Accordingly, application of deep learning in bioinformatics to gain insight from data has been emphasized in both academia and industry. Here, we review deep learning in bioinformatics, presenting examples of current research. To provide a useful and comprehensive perspective, we categorize research both by the bioinformatics domain (i.e. omics, biomedical imaging, biomedical signal processing) and deep learning architecture (i.e. deep neural networks, convolutional neural networks, recurrent neural networks, emergent architectures) and present brief descriptions of each study. Additionally, we discuss theoretical and practical issues of deep learning in bioinformatics and suggest future research directions. We believe that this review will provide valuable insights and serve as a starting point for researchers to apply deep learning approaches in their bioinformatics studies.

  16. A Bioinformatics Facility for NASA

    Science.gov (United States)

    Schweighofer, Karl; Pohorille, Andrew

    2006-01-01

    Building on an existing prototype, we have fielded a facility with bioinformatics technologies that will help NASA meet its unique requirements for biological research. This facility consists of a cluster of computers capable of performing computationally intensive tasks, software tools, databases and knowledge management systems. Novel computational technologies for analyzing and integrating new biological data and already existing knowledge have been developed. With continued development and support, the facility will fulfill strategic NASA s bioinformatics needs in astrobiology and space exploration. . As a demonstration of these capabilities, we will present a detailed analysis of how spaceflight factors impact gene expression in the liver and kidney for mice flown aboard shuttle flight STS-108. We have found that many genes involved in signal transduction, cell cycle, and development respond to changes in microgravity, but that most metabolic pathways appear unchanged.

  17. 猪MSTN基因生物信息学分析%Bioinformatics Analysis on MSTN Gene of Swine

    Institute of Scientific and Technical Information of China (English)

    王伟; 连林生; 李继中

    2012-01-01

    [目的]对猪MSTN基因其进行生物信息学分析。[方法]以从GenBank中检索到的猪、大鼠、小鼠、狗、绵羊、山羊、牛、黑猩猩、人、马、鸡和斑马鱼的MSTN基因CDS序列为材料,将该12个物种的MSTN蛋白序列调人DNAStar软件的Megalign程序,进行系统进化分析;并对猪MSTN基因的基本信息、内切酶图谱、编码蛋8--级结构、信号肽、跨膜结构以及蛋白质亚细胞定位进行分析。[结果]猪MSTN基因与大鼠、小鼠、狗、绵羊、山羊、牛、黑猩猩、人、马亲缘关系很近;该基因包含多个酶切位点;其编码的蛋白是一个疏水性不稳定蛋白,分子量为42791.3U,等电点为6.98,包含375个氨基酸残基;蛋白二级结构上,含有20.53%的α-螺旋(Helix)、4%β-转角(Turn)、55.07%无规则卷曲(Coil)、22.4%伸展条(extendenStrand)和1个跨膜结构区域;该蛋白定位于细胞外,作为信号肽的可能性很大。[结论]该研究为猪MSTN基因的进一步分析研究提供了参考依据。%[Objective] The aim was to make bioinformatics analysis on MSTN gene in swine. [Method] In the research, coding sequences (CDS) of MSTN gene in swine, rat, mouse, dog, sheep, goat, cattle, chimpanzee, human, horse, chicken and zebra fish were loaded to Megalign of DNAstar for analysis of phyletic evolution. In addi- tion, we also made analysis on basic information, restriction map, secondary struc- ture of coding protein, signal peptide, transmembrane domain and protein subcellular localization. [Result] Closely related to rat, mouse, dog, sheep, goat, cattle, chim- panzee, human and horse, MSTN genes in swine included many enzyme cutting sites and the encoding protein was unstable in hydrophobicity. The molecular weight was 42 791.3 u, isoelectric point was 6.98, and the gene included 375 amino acid residues. In addition, the secondary structure of protein contained 20.53% of c~-Helix, 4

  18. 猪MSTN基因生物信息学分析%Bioinformatics Analysis on MSTN Gene of Swine

    Institute of Scientific and Technical Information of China (English)

    王伟; 连林生; 李继中

    2012-01-01

    [Object] The aim was to make bioinformatics analysis on MSTN gene in swine. [Method] In the research, coding sequences (CDS) of MSTN gene in swine, rat, mouse, dog, sheep, goat, cattle, chimpanzee, human, horse, chicken and zebra fish were loaded to Megalign of DNAstar for analysis of phyletic evolution. In addition, we also made analysis on basic information, restriction map, secondary structure of coding protein, signal peptide, transmembrane domain and protein subcellular localization. [ Result ] Closely related to rat, mouse, dog, sheep, goat, cattle, chimpanzee, human and horse, MSTN genes in swine included many enzyme cutting sites and the encoding protein was unstable in hydrophobicity. The molecular weight was 42 791. 3 u, isoelectric point was 6.98, and the gene included 375 amino acid residues. In addition, the secondary structure of protein contained 20.53% of a-Helix, 4% of p-Turn, 53.07% of Coil, 22.4% of ex-tenden strand and one transmembrane domain. The extracellular protein was probably being signal peptide. [ Conclusion] The research provides references for further study on MSTN genes of swine.%[目的]对猪MSTN基因进行生物信息学分析.[方法]以从GenBank中检索到的猪、大鼠、小鼠、狗、绵羊、山羊、牛、黑猩猩、人、马、鸡和斑马鱼的MSTN基因CDS序列为材料,将该12个物种的MSTN蛋白序列调入DNAStar软件的Megalign程序,进行系统进化分析;并对猪MSTN基因的基本信息、内切酶图谱、编码蛋白二级结构、信号肽、跨膜结构以及蛋白质亚细胞定位进行分析.[结果]猪MSTN基因与大鼠、小鼠、狗、绵羊、山羊、牛、黑猩猩、人、马亲缘关系很近;该基因包含多个酶切位点;其编码的蛋白是一个疏水性不稳定蛋白,分子量为42 791.3 u,等电点为6.98,包含375个氨基酸残基;蛋白二级结构上,含有20.53%的α-螺旋(Helix)、4% β-转角(Turn)、53.07%无规则卷曲(Coil)、22.4

  19. Bioinformatics resource manager v2.3: an integrated software environment for systems biology with microRNA and cross-species analysis tools

    Directory of Open Access Journals (Sweden)

    Tilton Susan C

    2012-11-01

    Full Text Available Abstract Background MicroRNAs (miRNAs are noncoding RNAs that direct post-transcriptional regulation of protein coding genes. Recent studies have shown miRNAs are important for controlling many biological processes, including nervous system development, and are highly conserved across species. Given their importance, computational tools are necessary for analysis, interpretation and integration of high-throughput (HTP miRNA data in an increasing number of model species. The Bioinformatics Resource Manager (BRM v2.3 is a software environment for data management, mining, integration and functional annotation of HTP biological data. In this study, we report recent updates to BRM for miRNA data analysis and cross-species comparisons across datasets. Results BRM v2.3 has the capability to query predicted miRNA targets from multiple databases, retrieve potential regulatory miRNAs for known genes, integrate experimentally derived miRNA and mRNA datasets, perform ortholog mapping across species, and retrieve annotation and cross-reference identifiers for an expanded number of species. Here we use BRM to show that developmental exposure of zebrafish to 30 uM nicotine from 6–48 hours post fertilization (hpf results in behavioral hyperactivity in larval zebrafish and alteration of putative miRNA gene targets in whole embryos at developmental stages that encompass early neurogenesis. We show typical workflows for using BRM to integrate experimental zebrafish miRNA and mRNA microarray datasets with example retrievals for zebrafish, including pathway annotation and mapping to human ortholog. Functional analysis of differentially regulated (p Conclusions BRM provides the ability to mine complex data for identification of candidate miRNAs or pathways that drive phenotypic outcome and, therefore, is a useful hypothesis generation tool for systems biology. The miRNA workflow in BRM allows for efficient processing of multiple miRNA and mRNA datasets in a single

  20. [Bioinformatics-based Design of Peptide Vaccine Candidates Targeting Spike Protein of MERS-CoV and Immunity analysis in Mice].

    Science.gov (United States)

    Lan, Jiaming; Lu, Shuai; Deng, Yao; Wen, Bo; Chen, Hong; Wang, Wen; Tan, Wenjie

    2016-01-01

    Middle East respiratory syndrome coronavirus (MERS-CoV) was identified as a novel human coronavirus and posed great threat to public health world wide,which calls for the development of effective and safe vaccine urgently. In the study, peptide epitopes tagrgeting spike antigen were predicted based on bioinformatics methods. Nine polypeptides with high scores were synthesized and linked to keyhole limpet hemocyanin (KLH). Female BALB/C mice were immunized with individual polypeptide-KLH, and the total IgG was detected by ELISA as well as the cellular mediated immunity (CMI) was analyzed using ELIs-pot assay. The results showed that an individual peptide of YVDVGPDSVKSACIEVDIQQTFFDKTWPRPIDVSKADGI could induce the highest level of total IgG as well as CMI (high frequency of IFN-γ secretion) against MERS-CoV antigen in mice. Our study identified a promising peptide vaccine candidate against MERS-CoV and provided an experimental support for bioinformatics-based design of peptide vaccine.

  1. A Bioinformatics Analysis Reveals a Group of MocR Bacterial Transcriptional Regulators Linked to a Family of Genes Coding for Membrane Proteins

    Directory of Open Access Journals (Sweden)

    Teresa Milano

    2016-01-01

    Full Text Available The MocR bacterial transcriptional regulators are characterized by an N-terminal domain, 60 residues long on average, possessing the winged-helix-turn-helix (wHTH architecture responsible for DNA recognition and binding, linked to a large C-terminal domain (350 residues on average that is homologous to fold type-I pyridoxal 5′-phosphate (PLP dependent enzymes like aspartate aminotransferase (AAT. These regulators are involved in the expression of genes taking part in several metabolic pathways directly or indirectly connected to PLP chemistry, many of which are still uncharacterized. A bioinformatics analysis is here reported that studied the features of a distinct group of MocR regulators predicted to be functionally linked to a family of homologous genes coding for integral membrane proteins of unknown function. This group occurs mainly in the Actinobacteria and Gammaproteobacteria phyla. An analysis of the multiple sequence alignments of their wHTH and AAT domains suggested the presence of specificity-determining positions (SDPs. Mapping of SDPs onto a homology model of the AAT domain hinted at possible structural/functional roles in effector recognition. Likewise, SDPs in wHTH domain suggested the basis of specificity of Transcription Factor Binding Site recognition. The results reported represent a framework for rational design of experiments and for bioinformatics analysis of other MocR subgroups.

  2. A Bioinformatics Analysis Reveals a Group of MocR Bacterial Transcriptional Regulators Linked to a Family of Genes Coding for Membrane Proteins

    Science.gov (United States)

    Milano, Teresa

    2016-01-01

    The MocR bacterial transcriptional regulators are characterized by an N-terminal domain, 60 residues long on average, possessing the winged-helix-turn-helix (wHTH) architecture responsible for DNA recognition and binding, linked to a large C-terminal domain (350 residues on average) that is homologous to fold type-I pyridoxal 5′-phosphate (PLP) dependent enzymes like aspartate aminotransferase (AAT). These regulators are involved in the expression of genes taking part in several metabolic pathways directly or indirectly connected to PLP chemistry, many of which are still uncharacterized. A bioinformatics analysis is here reported that studied the features of a distinct group of MocR regulators predicted to be functionally linked to a family of homologous genes coding for integral membrane proteins of unknown function. This group occurs mainly in the Actinobacteria and Gammaproteobacteria phyla. An analysis of the multiple sequence alignments of their wHTH and AAT domains suggested the presence of specificity-determining positions (SDPs). Mapping of SDPs onto a homology model of the AAT domain hinted at possible structural/functional roles in effector recognition. Likewise, SDPs in wHTH domain suggested the basis of specificity of Transcription Factor Binding Site recognition. The results reported represent a framework for rational design of experiments and for bioinformatics analysis of other MocR subgroups. PMID:27446613

  3. Short-term arginine deprivation results in large-scale modulation of hepatic gene expression in both normal and tumor cells: microarray bioinformatic analysis

    Directory of Open Access Journals (Sweden)

    Sabo Edmond

    2006-09-01

    Full Text Available Abstract Background We have reported arginine-sensitive regulation of LAT1 amino acid transporter (SLC 7A5 in normal rodent hepatic cells with loss of arginine sensitivity and high level constitutive expression in tumor cells. We hypothesized that liver cell gene expression is highly sensitive to alterations in the amino acid microenvironment and that tumor cells may differ substantially in gene sets sensitive to amino acid availability. To assess the potential number and classes of hepatic genes sensitive to arginine availability at the RNA level and compare these between normal and tumor cells, we used an Affymetrix microarray approach, a paired in vitro model of normal rat hepatic cells and a tumorigenic derivative with triplicate independent replicates. Cells were exposed to arginine-deficient or control conditions for 18 hours in medium formulated to maintain differentiated function. Results Initial two-way analysis with a p-value of 0.05 identified 1419 genes in normal cells versus 2175 in tumor cells whose expression was altered in arginine-deficient conditions relative to controls, representing 9–14% of the rat genome. More stringent bioinformatic analysis with 9-way comparisons and a minimum of 2-fold variation narrowed this set to 56 arginine-responsive genes in normal liver cells and 162 in tumor cells. Approximately half the arginine-responsive genes in normal cells overlap with those in tumor cells. Of these, the majority was increased in expression and included multiple growth, survival, and stress-related genes. GADD45, TA1/LAT1, and caspases 11 and 12 were among this group. Previously known amino acid regulated genes were among the pool in both cell types. Available cDNA probes allowed independent validation of microarray data for multiple genes. Among genes downregulated under arginine-deficient conditions were multiple genes involved in cholesterol and fatty acid metabolism. Expression of low-density lipoprotein receptor was

  4. Identifiable Data Files - Medicare Provider Analysis and ...

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Medicare Provider Analysis and Review (MEDPAR) File contains data from claims for services provided to beneficiaries admitted to Medicare certified inpatient...

  5. Identifying Proper Names Based on Association Analysis

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    The issue of proper names recognition in Chinese text was discussed. An automatic approach based on association analysis to extract rules from corpus was presented. The method tries to discover rules relevant to external evidence by association analysis, without additional manual effort. These rules can be used to recognize the proper nouns in Chinese texts. The experimental result shows that our method is practical in some applications.Moreover, the method is language independent.

  6. Predictive mutational bioinformatic analysis of variation in the skin and wool associated corneodesmosin (CDSN) gene in sheep.

    Science.gov (United States)

    Siva Subramaniam, Nitthiya; Morgan, Eleanor; Bottomley, Steven; Tay, Sharon; Gregg, Keith; Lee, Chee Yang; Wetherall, John; Groth, David

    2012-05-01

    Corneodesmosin (CDSN) is an important component of the desmosome in the epidermal cornified stratum and inner root sheath of hair follicles. DNA from a sheep BAC clone previously identified by us to contain CDSN was PCR amplified using cattle-derived primers and the product sequenced. A region of 4579 bp containing CDSN was shown to contain two exons separated by one intron and spanning 3683 bp. The DNA encodes a predicted protein of 546 amino acids. Phylogenetic analysis shows that sheep CDSN falls within a clade containing cattle and other ruminant-like species. Comparison of sequences generated from 12 unrelated merino sheep and the International Sheep Genome Consortium (ISGC) data identified 58 single nucleotide polymorphisms (SNPs) within the 4579 bp region of which 16 are contained within coding sequences (1 in 80 bp). The SNPs identified in this study will add to the Major Histocompatibility Complex (MHC) SNP panel, which will allow extensive haplotyping of the sheep MHC in future studies.

  7. Identifying MMORPG Bots: A Traffic Analysis Approach

    Directory of Open Access Journals (Sweden)

    Wen-Chin Chen

    2008-11-01

    Full Text Available Massively multiplayer online role playing games (MMORPGs have become extremely popular among network gamers. Despite their success, one of MMORPG's greatest challenges is the increasing use of game bots, that is, autoplaying game clients. The use of game bots is considered unsportsmanlike and is therefore forbidden. To keep games in order, game police, played by actual human players, often patrol game zones and question suspicious players. This practice, however, is labor-intensive and ineffective. To address this problem, we analyze the traffic generated by human players versus game bots and propose general solutions to identify game bots. Taking Ragnarok Online as our subject, we study the traffic generated by human players and game bots. We find that their traffic is distinguishable by 1 the regularity in the release time of client commands, 2 the trend and magnitude of traffic burstiness in multiple time scales, and 3 the sensitivity to different network conditions. Based on these findings, we propose four strategies and two ensemble schemes to identify bots. Finally, we discuss the robustness of the proposed methods against countermeasures of bot developers, and consider a number of possible ways to manage the increasingly serious bot problem.

  8. Identifying MMORPG Bots: A Traffic Analysis Approach

    Science.gov (United States)

    Chen, Kuan-Ta; Jiang, Jhih-Wei; Huang, Polly; Chu, Hao-Hua; Lei, Chin-Laung; Chen, Wen-Chin

    2008-12-01

    Massively multiplayer online role playing games (MMORPGs) have become extremely popular among network gamers. Despite their success, one of MMORPG's greatest challenges is the increasing use of game bots, that is, autoplaying game clients. The use of game bots is considered unsportsmanlike and is therefore forbidden. To keep games in order, game police, played by actual human players, often patrol game zones and question suspicious players. This practice, however, is labor-intensive and ineffective. To address this problem, we analyze the traffic generated by human players versus game bots and propose general solutions to identify game bots. Taking Ragnarok Online as our subject, we study the traffic generated by human players and game bots. We find that their traffic is distinguishable by 1) the regularity in the release time of client commands, 2) the trend and magnitude of traffic burstiness in multiple time scales, and 3) the sensitivity to different network conditions. Based on these findings, we propose four strategies and two ensemble schemes to identify bots. Finally, we discuss the robustness of the proposed methods against countermeasures of bot developers, and consider a number of possible ways to manage the increasingly serious bot problem.

  9. Identifying nonlinear biomechanical models by multicriteria analysis

    Science.gov (United States)

    Srdjevic, Zorica; Cveticanin, Livija

    2012-02-01

    In this study, the methodology developed by Srdjevic and Cveticanin (International Journal of Industrial Ergonomics 34 (2004) 307-318) for the nonbiased (objective) parameter identification of the linear biomechanical model exposed to vertical vibrations is extended to the identification of n-degree of freedom (DOF) nonlinear biomechanical models. The dynamic performance of the n-DOF nonlinear model is described in terms of response functions in the frequency domain, such as the driving-point mechanical impedance and seat-to-head transmissibility function. For randomly generated parameters of the model, nonlinear equations of motion are solved using the Runge-Kutta method. The appropriate data transformation from the time-to-frequency domain is performed by a discrete Fourier transformation. Squared deviations of the response functions from the target values are used as the model performance evaluation criteria, thus shifting the problem into the multicriteria framework. The objective weights of criteria are obtained by applying the Shannon entropy concept. The suggested methodology is programmed in Pascal and tested on a 4-DOF nonlinear lumped parameter biomechanical model. The identification process over the 2000 generated sets of parameters lasts less than 20 s. The model response obtained with the imbedded identified parameters correlates well with the target values, therefore, justifying the use of the underlying concept and the mathematical instruments and numerical tools applied. It should be noted that the identified nonlinear model has an improved accuracy of the biomechanical response compared to the accuracy of a linear model.

  10. Bioinformatics methods for identifying candidate disease genes

    NARCIS (Netherlands)

    Driel, M.A. van; Brunner, H.G.

    2006-01-01

    With the explosion in genomic and functional genomics information, methods for disease gene identification are rapidly evolving. Databases are now essential to the process of selecting candidate disease genes. Combining positional information with disease characteristics and functional information i

  11. Combination of meta-analysis and graph clustering to identify prognostic markers of ESCC

    Directory of Open Access Journals (Sweden)

    Hongyun Gao

    2012-01-01

    Full Text Available Esophageal squamous cell carcinoma (ESCC is one of the most malignant gastrointestinal cancers and occurs at a high frequency rate in China and other Asian countries. Recently, several molecular markers were identified for predicting ESCC. Notwithstanding, additional prognostic markers, with a clear understanding of their underlying roles, are still required. Through bioinformatics, a graph-clustering method by DPClus was used to detect co-expressed modules. The aim was to identify a set of discriminating genes that could be used for predicting ESCC through graph-clustering and GO-term analysis. The results showed that CXCL12, CYP2C9, TGM3, MAL, S100A9, EMP-1 and SPRR3 were highly associated with ESCC development. In our study, all their predicted roles were in line with previous reports, whereby the assumption that a combination of meta-analysis, graph-clustering and GO-term analysis is effective for both identifying differentially expressed genes, and reflecting on their functions in ESCC.

  12. Combination of meta-analysis and graph clustering to identify prognostic markers of ESCC.

    Science.gov (United States)

    Gao, Hongyun; Wang, Lishan; Cui, Shitao; Wang, Mingsong

    2012-04-01

    Esophageal squamous cell carcinoma (ESCC) is one of the most malignant gastrointestinal cancers and occurs at a high frequency rate in China and other Asian countries. Recently, several molecular markers were identified for predicting ESCC. Notwithstanding, additional prognostic markers, with a clear understanding of their underlying roles, are still required. Through bioinformatics, a graph-clustering method by DPClus was used to detect co-expressed modules. The aim was to identify a set of discriminating genes that could be used for predicting ESCC through graph-clustering and GO-term analysis. The results showed that CXCL12, CYP2C9, TGM3, MAL, S100A9, EMP-1 and SPRR3 were highly associated with ESCC development. In our study, all their predicted roles were in line with previous reports, whereby the assumption that a combination of meta-analysis, graph-clustering and GO-term analysis is effective for both identifying differentially expressed genes, and reflecting on their functions in ESCC.

  13. Adapting bioinformatics curricula for big data.

    Science.gov (United States)

    Greene, Anna C; Giffin, Kristine A; Greene, Casey S; Moore, Jason H

    2016-01-01

    Modern technologies are capable of generating enormous amounts of data that measure complex biological systems. Computational biologists and bioinformatics scientists are increasingly being asked to use these data to reveal key systems-level properties. We review the extent to which curricula are changing in the era of big data. We identify key competencies that scientists dealing with big data are expected to possess across fields, and we use this information to propose courses to meet these growing needs. While bioinformatics programs have traditionally trained students in data-intensive science, we identify areas of particular biological, computational and statistical emphasis important for this era that can be incorporated into existing curricula. For each area, we propose a course structured around these topics, which can be adapted in whole or in parts into existing curricula. In summary, specific challenges associated with big data provide an important opportunity to update existing curricula, but we do not foresee a wholesale redesign of bioinformatics training programs.

  14. A Statistical Theory for Shape Analysis of Curves and Surfaces with Applications in Image Analysis, Biometrics, Bioinformatics and Medical Diagnostics

    Science.gov (United States)

    2010-05-10

    targets in noisy/corrupted images (Bayesian active contours), finding shape models in point clouds derived from images, shape analysis of facial surfaces...Srivastava and I. H. Jermyn, Bayesian Classification of Shapes Hidden in Point Clouds , Proceedings of 13th Digital Signal Processing Workshop, Marco...CA, June 2010. 18. J. Su, Z. Zhu, F. Huffer, and A. Srivastava, Detecting Shapes in 2D Point Clouds Generated from Images, International Conference on

  15. Regulatory bioinformatics for food and drug safety.

    Science.gov (United States)

    Healy, Marion J; Tong, Weida; Ostroff, Stephen; Eichler, Hans-Georg; Patak, Alex; Neuspiel, Margaret; Deluyker, Hubert; Slikker, William

    2016-10-01

    "Regulatory Bioinformatics" strives to develop and implement a standardized and transparent bioinformatic framework to support the implementation of existing and emerging technologies in regulatory decision-making. It has great potential to improve public health through the development and use of clinically important medical products and tools to manage the safety of the food supply. However, the application of regulatory bioinformatics also poses new challenges and requires new knowledge and skill sets. In the latest Global Coalition on Regulatory Science Research (GCRSR) governed conference, Global Summit on Regulatory Science (GSRS2015), regulatory bioinformatics principles were presented with respect to global trends, initiatives and case studies. The discussion revealed that datasets, analytical tools, skills and expertise are rapidly developing, in many cases via large international collaborative consortia. It also revealed that significant research is still required to realize the potential applications of regulatory bioinformatics. While there is significant excitement in the possibilities offered by precision medicine to enhance treatments of serious and/or complex diseases, there is a clear need for further development of mechanisms to securely store, curate and share data, integrate databases, and standardized quality control and data analysis procedures. A greater understanding of the biological significance of the data is also required to fully exploit vast datasets that are becoming available. The application of bioinformatics in the microbiological risk analysis paradigm is delivering clear benefits both for the investigation of food borne pathogens and for decision making on clinically important treatments. It is recognized that regulatory bioinformatics will have many beneficial applications by ensuring high quality data, validated tools and standardized processes, which will help inform the regulatory science community of the requirements

  16. The Aspergillus Mine - publishing bioinformatics

    DEFF Research Database (Denmark)

    Vesth, Tammi Camilla; Rasmussen, Jane Lind Nybo; Theobald, Sebastian

    so with no computational specialist. Here we present a setup for analysis and publication of genome data of 70 species of Aspergillus fungi. The platform is based on R, Python and uses the RShiny framework to create interactive web‐applications. It allows all participants to create interactive...... analysis which can be shared with the team and in connection with publications. We present analysis for investigation of genetic diversity, secondary and primary metabolism and general data overview. The platform, the Aspergillus Mine, is a collection of analysis tools based on data from collaboration...... with the Joint Genome Institute. The Aspergillus Mine is not intended as a genomic data sharing service but instead focuses on creating an environment where the results of bioinformatic analysis is made available for inspection. The data and code is public upon request and figures can be obtained directly from...

  17. Clinical significance of overexpression of metastasis-associated gene MTA1 in cervical cancer and bioinformatic analysis of genes coordinately expressed with MTA1

    Directory of Open Access Journals (Sweden)

    Shu-ying FAN

    2016-06-01

    Full Text Available Objective  To analyze the clinical significance of MTA1 overexpression in cervical cancer and bioinformatically screen the potential treatment targets from the gene network correlated with MTA1 overexpression. Methods  SPSS software package was used to analyze the correlation of MTA1 with clinical metastasis and pathological grade of cervical cancer based on TCGA-CESC data set. The edgeR software was used to screen the gene set whose expression was correlated with MTA1 in cervical cancer at a global transcriptional level. DAVID platform was adopted to identify the enriched biological functions of the gene set significantly correlated with MTA1 expression. The transcriptional regulation network of the gene set was constructed with STRING online platform and Cytospace softwares to identify the key regulators. Results  TCGA-CESC database assay showed a significant positive correlation of MTA1 expression with clinical metastasis of cervical cancer (P<0.01. There was a gene set in which gene expression was closely correlated with MTA1 level. Functional enrichment of the gene set indicated that cancer pathways, stem cell pathways, cell migration, cell differentiation, etc. were closely linked to MTA1-correlated malignant behaviors of cancers. Bioinformatical screening showed that Agt, Acta1, Fpr2, Pmch and RGS18, which are correlated with MTA1 expression in cervical cancer, were the key regulators in differentially expressed gene sets. And these genes were located to the GPCR pathway. Conclusions  MTA1 overexpression is significantly correlated with clinical metastasis of cervical cancer and paralleled with the activation of gene regulation involved in stem cell pathway, cytokine receptor signaling, cell migration and differentiation pathways. These genes are correlated with MTA1 expression and potential treatment targets in cervical cancer and should be further experimentally evaluated in the future. DOI: 10.11855/j.issn.0577-7402.2016.05.03

  18. Distributed computing in bioinformatics.

    Science.gov (United States)

    Jain, Eric

    2002-01-01

    This paper provides an overview of methods and current applications of distributed computing in bioinformatics. Distributed computing is a strategy of dividing a large workload among multiple computers to reduce processing time, or to make use of resources such as programs and databases that are not available on all computers. Participating computers may be connected either through a local high-speed network or through the Internet.

  19. Privacy Preserving PCA on Distributed Bioinformatics Datasets

    Science.gov (United States)

    Li, Xin

    2011-01-01

    In recent years, new bioinformatics technologies, such as gene expression microarray, genome-wide association study, proteomics, and metabolomics, have been widely used to simultaneously identify a huge number of human genomic/genetic biomarkers, generate a tremendously large amount of data, and dramatically increase the knowledge on human…

  20. Bioinformatics Analysis of the FREM1 Gene—Evolutionary Development of the IL-1R1 Co-Receptor, TILRR

    Directory of Open Access Journals (Sweden)

    Eva E. Qwarnstrom

    2012-09-01

    Full Text Available The TLRs and IL-1 receptors have evolved to coordinate the innate immune response following pathogen invasion. Receptors and signalling intermediates of these systems are generally characterised by a high level of evolutionary conservation. The recently described IL-1R1 co-receptor TILRR is a transcriptional variant of the FREM1 gene. Here we investigate whether innate co-receptor differences between teleosts and mammals extend to the expression of the TILRR isoform of FREM1. Bioinformatic and phylogenetic approaches were used to analyse the genome sequences of FREM1 from eukaryotic organisms including 37 tetrapods and five teleost fish. The TILRR consensus peptide sequence was present in the FREM1 gene of the tetrapods, but not in fish orthologs of FREM1, and neither FREM1 nor TILRR were present in invertebrates. The TILRR gene appears to have arisen via incorporation of adjacent non-coding DNA with a contiguous exonic sequence after the teleost divergence. Comparing co-receptors in other systems, points to their origin during the same stages of evolution. Our results show that modern teleost fish do not possess the IL-1RI co-receptor TILRR, but that this is maintained in tetrapods as early as amphibians. Further, they are consistent with data showing that co-receptors are recent additions to these regulatory systems and suggest this may underlie differences in innate immune responses between mammals and fish.

  1. Bioinformatic analysis of the non-structural protein 1 of type 2 dengue virus%登革2型病毒非结构蛋白NS1的生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    齐一鸣; 黄俊琪

    2011-01-01

    目的:分析登革2型病毒非结构蛋白NS1的结构和功能特征并预测其优势抗原表位.方法:利用NCBI、CBS等生物信息学网站和DNAStar、Vector NTI等软件包,分析登革2型病毒NS1的理化性质和结构与功能特征,及可能的空间结构和抗原表位.结果:NS1基因编码352个氨基酸,含12个保守的半胱氨酸.脂质含量相对较多,理化性质不稳定.无分泌型信号肽及跨膜结构,但存在多个糖基化、磷酸化、酰胺化位点.空间结构为一紧凑球形,N端和C端暴露于球体表面,线性B细胞抗原表位的区域较为密集.中段包埋于分子内部,但含有一些与血小板、血管内皮或纤维蛋白素原高度同源的B细胞表位序列,可能在登革出血热的病理过程中发挥重要作用.结论:NS1不仅是一个极具潜力的诊断性抗原,其抗原表位的预测将为登革病毒表位多肽疫苗的开发提供依据.%Objective Predict the structural and functional characteristics of the non-structural protein 1 (NS1) of dengue virus 2, as well as the predominant antigen epitope, by bioinformatics analysis in order to guide the experimental research on its biological function and application. Methods Utilizing the analysis tools provided by NCBI, CBS bioinformatics web sites and combination of bioinformatics software packages , such as DNAStar, Vector NTI, to identify the characteristics of NS1. Results The NS1 gene coding 352 amino acids which include 12 conservative cysteines. It carries no signal peptide in the N terminus and no transmembrane regions but with instable physico-chemical characteristics.The protein comprises of only one compact globular domain in the protein with both of the N-terminnus and C-terminnus fragment exposed on the surface where linear B cell epitopes are possibly intensive. Although embed internal sterically, it is found that some epitopes are highly cognated with thromboplastid and fibrinogen by blast analysis. Deduced conformational

  2. Bioinformatic Challenges in Clinical Diagnostic Application of Targeted Next Generation Sequencing: Experience from Pheochromocytoma.

    Directory of Open Access Journals (Sweden)

    Joakim Crona

    Full Text Available Recent studies have demonstrated equal quality of targeted next generation sequencing (NGS compared to Sanger Sequencing. Whereas these novel sequencing processes have a validated robust performance, choice of enrichment method and different available bioinformatic software as reliable analysis tool needs to be further investigated in a diagnostic setting.DNA from 21 patients with genetic variants in SDHB, VHL, EPAS1, RET, (n=17 or clinical criteria of NF1 syndrome (n=4 were included. Targeted NGS was performed using Truseq custom amplicon enrichment sequenced on an Illumina MiSEQ instrument. Results were analysed in parallel using three different bioinformatics pipelines; (1 Commercially available MiSEQ Reporter, fully automatized and integrated software, (2 CLC Genomics Workbench, graphical interface based software, also commercially available, and ICP (3 an in-house scripted custom bioinformatic tool.A tenfold read coverage was achieved in between 95-98% of targeted bases. All workflows had alignment of reads to SDHA and NF1 pseudogenes. Compared to Sanger sequencing, variant calling revealed a sensitivity ranging from 83 to 100% and a specificity of 99.9-100%. Only MiSEQ reporter identified all pathogenic variants in both sequencing runs.We conclude that targeted next generation sequencing have equal quality compared to Sanger sequencing. Enrichment specificity and the bioinformatic performance need to be carefully assessed in a diagnostic setting. As acceptable accuracy was noted for a fully automated bioinformatic workflow, we suggest that processing of NGS data could be performed without expert bioinformatics skills utilizing already existing commercially available bioinformatics tools.

  3. Bioinformatics analysis of potential essential genes that response to the high intraocular pressure on astrocyte due to glaucoma

    Institute of Scientific and Technical Information of China (English)

    Yang; Yang; Jing-Zhu; Duan; Yu; Di; Dong-Mei; Gui; Dian-Wen; Gao

    2015-01-01

    AIM: To study the gene expression response and predict the network in cell due to pressure effects on optic nerve injury of glaucoma.METHODS: We used glaucoma related microarray data in public database [Gene Expression Omnibus(GEO)] to explore the potential gene expression changes as well as correspondent biological process alterations due to increased pressure in astrocytes during glaucoma development.RESULTS: A total of six genes were identified to be related with pressure increasing. Through the annotation and network analysis, we found these genes might be involved in cell morphological remodeling, angiogenesis,mismatch repair.CONCLUSION: Increasing pressure in glaucoma on astrocytes might cause gene expression alterations,which might induce some cellular responses changes.

  4. Best practices in bioinformatics training for life scientists.

    KAUST Repository

    Via, Allegra

    2013-06-25

    The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.

  5. Best practices in bioinformatics training for life scientists.

    Science.gov (United States)

    Via, Allegra; Blicher, Thomas; Bongcam-Rudloff, Erik; Brazas, Michelle D; Brooksbank, Cath; Budd, Aidan; De Las Rivas, Javier; Dreyer, Jacqueline; Fernandes, Pedro L; van Gelder, Celia; Jacob, Joachim; Jimenez, Rafael C; Loveland, Jane; Moran, Federico; Mulder, Nicola; Nyrönen, Tommi; Rother, Kristian; Schneider, Maria Victoria; Attwood, Teresa K

    2013-09-01

    The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.

  6. Identification of microRNAs in the Toxigenic Dinoflagellate Alexandrium catenella by High-Throughput Illumina Sequencing and Bioinformatic Analysis.

    Directory of Open Access Journals (Sweden)

    Huili Geng

    Full Text Available Micro-ribonucleic acids (miRNAs are a large group of endogenous, tiny, non-coding RNAs consisting of 19-25 nucleotides that regulate gene expression at either the transcriptional or post-transcriptional level by mediating gene silencing in eukaryotes. They are considered to be important regulators that affect growth, development, and response to various stresses in plants. Alexandrium catenella is an important marine toxic phytoplankton species that can cause harmful algal blooms (HABs. To date, identification and function analysis of miRNAs in A. catenella remain largely unexamined. In this study, high-throughput sequencing was performed on A. catenella to identify and quantitatively profile the repertoire of small RNAs from two different growth phases. A total of 38,092,056 and 32,969,156 raw reads were obtained from the two small RNA libraries, respectively. In total, 88 mature miRNAs belonging to 32 miRNA families were identified. Significant differences were found in the member number, expression level of various families, and expression abundance of each member within a family. A total of 15 potentially novel miRNAs were identified. Comparative profiling showed that 12 known miRNAs exhibited differential expression between the lag phase and the logarithmic phase. Real-time quantitative RT-PCR (qPCR was performed to confirm the expression of two differentially expressed miRNAs that were one up-regulated novel miRNA (aca-miR-3p-456915, and one down-regulated conserved miRNA (tae-miR159a. The expression trend of the qPCR assay was generally consistent with the deep sequencing result. Target predictions of the 12 differentially expressed miRNAs resulted in 1813 target genes. Gene ontology (GO analysis and the Kyoto Encyclopedia of Genes and Genomes pathway database (KEGG annotations revealed that some miRNAs were associated with growth and developmental processes of the alga. These results provide insights into the roles that miRNAs play in

  7. Bioinformatics analysis and prediction for structure and function of nitric oxide synthase and similar proteins from Plasmodium berghei

    Institute of Scientific and Technical Information of China (English)

    Zhigang Fan; Gang Lv; Lingmin Zhang; Xiufeng Gan; Qiang Wu; Saifeng Zhong; Guogang Yan; Guifen Lin

    2011-01-01

    Objective: To search and analyze nitric oxide synthase (NOS) and similar proteins fromPlasmodium berghei(Pb). Methods: The structure and function of nitric oxide synthase and similar proteins from Plasmodium berghei were analyzed and predicted by bioinformatics. Results: PbNOS were not available, but nicotinamide adenine dinucleotide 2’-phosphate reduced tetrasodium (NADPH)-cytochrome p450 reductase(CPR) were gained. PbCPR was in the nucleus of Plasmodium berghei, while 134aa-229aa domain was localize in nucleolar organizer. The amino acids sequence of PbCPR had the closest genetic relationship with Plasmodium vivax showing a 73% homology. The tertiary structure of PbCPR displayed the forcep-shape with wings, but no wings existed in the tertiary structure of its’ host, Mus musculus(Mm). 137aa-200aa, 201aa-218aa, 220aa-230aa, 232aa-248, 269aa-323aa, 478aa-501aa and 592aa-606aa domains of PbCPR showed no homology with MmCPRs’, and all domains were exposed on the surface of the protein. Conclusions: NOS can’t be found in Plasmodium berghei and other Plasmodium species. PbCPR may be a possible resistance site of antimalarial drug, and the targets of antimalarial drug and vaccine. It may be also one of the mechanisms of immune evasion. This study on Plasmodium berghei may be more suitable to Plasmodium vivax. And137aa-200aa, 201aa-218aa, 220aa-230aa, 232aa-248, 269aa-323aa, 478aa-501aa and 592aa-606aa domains ofPb CPR are more ideal targets of antimalarial drug and vaccine.

  8. Genome-wide bioinformatics analysis of steroid metabolism-associated genes in Nocardioides simplex VKM Ac-2033D.

    Science.gov (United States)

    Shtratnikova, Victoria Y; Schelkunov, Mikhail I; Fokina, Victoria V; Pekov, Yury A; Ivashina, Tanya; Donova, Marina V

    2016-08-01

    Actinobacteria comprise diverse groups of bacteria capable of full degradation, or modification of different steroid compounds. Steroid catabolism has been characterized best for the representatives of suborder Corynebacterineae, such as Mycobacteria, Rhodococcus and Gordonia, with high content of mycolic acids in the cell envelope, while it is poorly understood for other steroid-transforming actinobacteria, such as representatives of Nocardioides genus belonging to suborder Propionibacterineae. Nocardioides simplex VKM Ac-2033D is an important biotechnological strain which is known for its ability to introduce ∆(1)-double bond in various 1(2)-saturated 3-ketosteroids, and perform convertion of 3β-hydroxy-5-ene steroids to 3-oxo-4-ene steroids, hydrolysis of acetylated steroids, reduction of carbonyl groups at C-17 and C-20 of androstanes and pregnanes, respectively. The strain is also capable of utilizing cholesterol and phytosterol as carbon and energy sources. In this study, a comprehensive bioinformatics genome-wide screening was carried out to predict genes related to steroid metabolism in this organism, their clustering and possible regulation. The predicted operon structure and number of candidate gene copies paralogs have been estimated. Binding sites of steroid catabolism regulators KstR and KstR2 specified for N. simplex VKM Ac-2033D have been calculated de novo. Most of the candidate genes grouped within three main clusters, one of the predicted clusters having no analogs in other actinobacteria studied so far. The results offer a base for further functional studies, expand the understanding of steroid catabolism by actinobacteria, and will contribute to modifying of metabolic pathways in order to generate effective biocatalysts capable of producing valuable bioactive steroids.

  9. How might ZNF804A variants influence risk for schizophrenia and bipolar disorder? A literature review, synthesis, and bioinformatic analysis.

    Science.gov (United States)

    Hess, Jonathan L; Glatt, Stephen J

    2014-01-01

    The gene that encodes zinc finger protein 804A (ZNF804A) became a candidate risk gene for schizophrenia (SZ) after surpassing genome-wide significance thresholds in replicated genome-wide association scans and meta-analyses. Much remains unknown about this reported gene expression regulator; however, preliminary work has yielded insights into functional and biological effects of ZNF804A by targeting its regulatory activities in vitro and by characterizing allele-specific interactions with its risk-conferring single nucleotide polymorphisms (SNPs). There is now strong epidemiologic evidence for a role of ZNF804A polymorphisms in both SZ and bipolar disorder (BD); however, functional links between implicated variants and susceptible biological states have not been solidified. Here we briefly review the genetic evidence implicating ZNF804A polymorphisms as genetic risk factors for both SZ and BD, and discuss the potential functional consequences of these variants on the regulation of ZNF804A and its downstream targets. Empirical work and predictive bioinformatic analyses of the alternate alleles of the two most strongly implicated ZNF804A polymorphisms suggest they might alter the affinity of the gene sequence for DNA- and/or RNA-binding proteins, which might in turn alter expression levels of the gene or particular ZNF804A isoforms. Future work should focus on clarifying the critical periods and cofactors regulating these genetic influences on ZNF804A expression, as well as the downstream biological consequences of an imbalance in the expression of ZNF804A and its various mRNA isoforms.

  10. Genomics Politics through Space and Time: The Case of Bioinformatics in Brazil.

    Science.gov (United States)

    Bicudo, Edison

    2016-01-01

    The emergence of scientific disciplines, as well as the policies aimed to steer them, have geographical implications. This becomes visible in areas such as genomics and related fields. In this paper, the relation between scientific evolution, political decisions and geographical configuration is studied. The recent formation of bioinformatics in Brazil is focused on. The study involves an analysis of data collected on the website of CNPq, a funding agency attached to the Ministry of Science and Technology. Furthermore, I conducted fieldwork in four cities, interviewing 15 bioinformaticians. In the history of Brazilian bioinformatics, three periods can be identified. In the first period (1900-1996), bioinformatics was actually absent, but biology research groups were formed which would subsequently explore bioinformatics. The second period (1997-2006) was marked by the emergence of the discipline and geographical concentration of major research groups in the southern part of Brazil. A third period can be pointed to (2007-2014), in which political choices have turned geographical diffusion and institutional equality into a national target. As a consequence of the recent shifts, genomics and bioinformatics researchers have been involved in a debate, some defending the existence of few specialized research and sequencing platforms, whereas others welcoming the constitution of a scientific scenario based on decentralized platforms. I defend an intermediate solution, whereby some places would be selected to be genomics hubs. This would fit the regional diversity of this vast country, in addition to tackling the scientific weaknesses of the northern area.

  11. The relationship between RASSF1A gene promoter methylation and the susceptibility and prognosis of melanoma: A meta-analysis and bioinformatics

    Science.gov (United States)

    Li, Haili; Tang, Wenru; Jia, Shuting; Wu, Xiaoming; Luo, Ying

    2017-01-01

    Background The function of the tumor suppressor gene RASSF1A in cancer cells has been detailed in many studies. However, due to the methylation of its promoter, the expression of RASSF1A is missing in most cancers. In the literature, we found that the conclusion regarding the relationship between RASSF1A gene promoter methylation and the susceptibility and prognosis of melanoma was not unified. This study adopts the use of a meta-analysis and bioinformatics to explore the relationship between RASSF1A gene promoter methylation and the susceptibility and prognosis of melanoma. Methods Data on melanoma susceptibility were downloaded from the PubMed, Cochrane Library, Web of Science and Google Scholar databases, which were analyzed via a meta-analysis. The effect sizes were estimated by measuring an odds ratio (OR) with a 95% confidence interval (CI). We also used a chi-squared-based Q test to examine the between-study heterogeneity, and used funnel plots to evaluate publication bias. The data on melanoma prognosis, which were analyzed by bioinformatics methods, were downloaded from The Cancer Genome Atlas (TCGA) project. The effect sizes were estimated by measuring the hazard ratios (HRs) with a 95% confidence interval (CI). Results Our meta-analysis included 10 articles. We found that RASSF1A gene promoter methylation was closely related to melanoma susceptibility (OR = 12.67, 95% CI: 6.16 ∼ 26.05, z = 6.90, P<0.0001 according to a fixed effects model and OR = 9.25, 95% CI: 4.37 ∼ 19.54, z = 5.82, P<0.0001 according to a random effects model). The results of the meta-analysis did not reveal any heterogeneity (tau2 = 0.00; H = 1 [1; 1.55]; I2 = 0% [0%; 58.6%], P = 0.5158) or publication bias (t = 0.87, P = 0.4073 by Egger’s test; Z = 0.45, P = 0.6547 by Begg’s test); therefore, we believe that the results of our meta-analysis were more reliable. To explore the relationship between RASSF1A gene methylation, the prognosis of melanoma and the clinical features of

  12. Integrative bioinformatics analysis of genomic and proteomic approaches to understand the transcriptional regulatory program in coronary artery disease pathways.

    Directory of Open Access Journals (Sweden)

    Rajani Kanth Vangala

    Full Text Available Patients with cardiovascular disease show a panel of differentially regulated serum biomarkers indicative of modulation of several pathways from disease onset to progression. Few of these biomarkers have been proposed for multimarker risk prediction methods. However, the underlying mechanism of the expression changes and modulation of the pathways is not yet addressed in entirety. Our present work focuses on understanding the regulatory mechanisms at transcriptional level by identifying the core and specific transcription factors that regulate the coronary artery disease associated pathways. Using the principles of systems biology we integrated the genomics and proteomics data with computational tools. We selected biomarkers from 7 different pathways based on their association with the disease and assayed 24 biomarkers along with gene expression studies and built network modules which are highly regulated by 5 core regulators PPARG, EGR1, ETV1, KLF7 and ESRRA. These network modules in turn comprise of biomarkers from different pathways showing that the core regulatory transcription factors may work together in differential regulation of several pathways potentially leading to the disease. This kind of analysis can enhance the elucidation of mechanisms in the disease and give better strategies of developing multimarker module based risk predictions.

  13. Integrative bioinformatics analysis of genomic and proteomic approaches to understand the transcriptional regulatory program in coronary artery disease pathways.

    Science.gov (United States)

    Vangala, Rajani Kanth; Ravindran, Vandana; Ghatge, Madan; Shanker, Jayashree; Arvind, Prathima; Bindu, Hima; Shekar, Meghala; Rao, Veena S

    2013-01-01

    Patients with cardiovascular disease show a panel of differentially regulated serum biomarkers indicative of modulation of several pathways from disease onset to progression. Few of these biomarkers have been proposed for multimarker risk prediction methods. However, the underlying mechanism of the expression changes and modulation of the pathways is not yet addressed in entirety. Our present work focuses on understanding the regulatory mechanisms at transcriptional level by identifying the core and specific transcription factors that regulate the coronary artery disease associated pathways. Using the principles of systems biology we integrated the genomics and proteomics data with computational tools. We selected biomarkers from 7 different pathways based on their association with the disease and assayed 24 biomarkers along with gene expression studies and built network modules which are highly regulated by 5 core regulators PPARG, EGR1, ETV1, KLF7 and ESRRA. These network modules in turn comprise of biomarkers from different pathways showing that the core regulatory transcription factors may work together in differential regulation of several pathways potentially leading to the disease. This kind of analysis can enhance the elucidation of mechanisms in the disease and give better strategies of developing multimarker module based risk predictions.

  14. Bioinformatic analysis of the distribution of inorganic carbon transporters and prospective targets for bioengineering to increase Ci uptake by cyanobacteria.

    Science.gov (United States)

    Gaudana, Sandeep B; Zarzycki, Jan; Moparthi, Vamsi K; Kerfeld, Cheryl A

    2015-10-01

    Cyanobacteria have evolved a carbon-concentrating mechanism (CCM) which has enabled them to inhabit diverse environments encompassing a range of inorganic carbon (Ci: [Formula: see text] and CO2) concentrations. Several uptake systems facilitate inorganic carbon accumulation in the cell, which can in turn be fixed by ribulose 1,5-bisphosphate carboxylase/oxygenase. Here we survey the distribution of genes encoding known Ci uptake systems in cyanobacterial genomes and, using a pfam- and gene context-based approach, identify in the marine (alpha) cyanobacteria a heretofore unrecognized number of putative counterparts to the well-known Ci transporters of beta cyanobacteria. In addition, our analysis shows that there is a huge repertoire of transport systems in cyanobacteria of unknown function, many with homology to characterized Ci transporters. These can be viewed as prospective targets for conversion into ancillary Ci transporters through bioengineering. Increasing intracellular Ci concentration coupled with efforts to increase carbon fixation will be beneficial for the downstream conversion of fixed carbon into value-added products including biofuels. In addition to CCM transporter homologs, we also survey the occurrence of rhodopsin homologs in cyanobacteria, including bacteriorhodopsin, a class of retinal-binding, light-activated proton pumps. Because they are light driven and because of the apparent ease of altering their ion selectivity, we use this as an example of re-purposing an endogenous transporter for the augmentation of Ci uptake by cyanobacteria and potentially chloroplasts.

  15. Bioinformatics and functional analysis of an Entamoeba histolytica mannosyltransferase necessary for parasite complement resistance and hepatical infection.

    Directory of Open Access Journals (Sweden)

    Christian Weber

    Full Text Available The glycosylphosphatidylinositol (GPI moiety is one of the ways by which many cell surface proteins, such as Gal/GalNAc lectin and proteophosphoglycans (PPGs attach to the surface of Entamoeba histolytica, the agent of human amoebiasis. It is believed that these GPI-anchored molecules are involved in parasite adhesion to cells, mucus and the extracellular matrix. We identified an E. histolytica homolog of PIG-M, which is a mannosyltransferase required for synthesis of GPI. The sequence and structural analysis led to the conclusion that EhPIG-M1 is composed of one signal peptide and 11 transmembrane domains with two large intra luminal loops, one of which contains the DXD motif, involved in the enzymatic catalysis and conserved in most glycosyltransferases. Expressing a fragment of the EhPIG-M1 encoding gene in antisense orientation generated parasite lines diminished in EhPIG-M1 levels; these lines displayed reduced GPI production, were highly sensitive to complement and were dramatically inhibited for amoebic abscess formation. The data suggest a role for GPI surface anchored molecules in the survival of E. histolytica during pathogenesis.

  16. Bioinformatic Analysis on Jellyfish Hematoxin%水母溶血毒素的生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    欧阳春磊; 高佳栋; 任玉坤; 肖良; 王倩倩; 郭玉峰; 蔡滨欣; 张黎明

    2009-01-01

    AIM: To illustrate the structure and function of five hematoxin sequences reported in four jellyfish and explore the possible mechanism of pathogenesis.METHODS: Bioinformatic methods were used to analyze the composition and sequence of amino acid residues, physico-chemical property, signal peptide, membrane spanning domain, hydrophobicity or hydrophilicity, secon- dary structure, conserved region and molecular phylogenetic evolution.RESULT and CONCLUSIONS: The results showed that the five jellyfish hematoxins were similar in composition and sequence of amino acid residues and physico-chemical property.The amino acid sequences of jellyfish hematoxins contained membrane spanning domain and hydrophobic regions; with a possible cleavage site in the signal peptide between the amino acid residues 20 and 21; α-helix and random coil were the major motifs of predicted secondary structure while β-tum and extended strand spread in the whole protein; There were four conserved amino acid sequences in three of the five hematoxins and similar phylogenetic trees were constructed by both NJ and MP methods.%目的与方法:为阐明已报道的5种水母溶血毒素的氨基酸组成和序列、信号肽、跨膜结构域、疏水性/亲水性、二级结构,保守区域、分子进化关系等,利用生物信息学方法对其进行了分析和预测.结果与结论:不同水母溶血毒素的氨基酸组成和理化性质相类似;水母溶血毒素存在跨膜结构域和疏水区域;在20~21位点最有可能存在信号肽切割位点:α螺旋、不规则卷曲是二级结构中最大量的结构元件,β折叠散布于整个蛋白质中;在其中3种水母毒素蛋白中存在4个保守区;以MP法和NJ法构建的系统发生树基本一致.

  17. Bioinformatic analysis of the dirigent gene family in rice%水稻dirigent基因家族生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    穰中文; 周清明

    2013-01-01

    利用现有的水稻生物信息资源,共鉴定出了53个水稻dirigent (OsDIR)基因,它们分布在8条水稻染色体上;基因结构分析显示,有32个OsDIR基因不含内含子,占总数的60.4%;保守功能区域预测表明,OsDIR基因至少含有1个保守的DIR功能域;模块预测显示,水稻DIR蛋白拥有至少10个大小不同的保守模块,且不同模块在基因家族成员中出现的频率有较大的差异;蛋白序列比对表明,该基因家族蛋白保守序列均位于DIR功能域内;蛋白功能预测表明,大多数OsDIR蛋白为稳定的疏水性蛋白,表达于大多数细胞器中,且在细胞壁中表达最为丰富;同源基因分子遗传进化分析表明,OsDIR基因可分为5个亚类,功能域片段与基因的复制特征表明,OsDIR基因可能起源于共同的祖先(基因).%Fifty-three OsDIR genes distributed on 8 different rice chromosomes were identified according to rice bioinformatics resource. Gene structural analyses showed that 32 OsDIRs, accounting for 60.4% of the total, are intron-less gene; conserved domain prediction demonstrated that each of the OsDIR genes contains at least one conserved dirigent domain and 10 conserved motifs with different sizes, among which each motif showed different frequencies in all members of OsDIR gene family, were predicted through motif analysis. Multiple alignment of protein sequences showed that all the conserved protein sequences were located in the conserved domain, and most of the proteins are stabilized hydrophobic proteins, which expressed in most of the organelles with the most of which abundantly expressed in the cell wall, according to OsDIR protein sequences functional prediction. The OsDIR genes were classified into five subgroups according to phylogenetic analyses based on conserved protein domain sequences. Additionally, the replication characteristics of domain fragments and genes indicated that the OsDIR genes origin from the common ancestor/gene.

  18. 草菇α-淀粉酶基因的生物信息学分析%Bioinformatic Analysis of α-Amylase Genes in Volvariella volvacea

    Institute of Scientific and Technical Information of China (English)

    杜慕云; 杨仁德; 李剑; 谢宝贵

    2014-01-01

    基于草菇(Volvariella volvacea )基因组和转录组数据,通过生物信息学的方法对草菇α-淀粉酶基因进行基本理化性质、内含子和外显子结构、信号肽、亚细胞定位和功能位点的预测与分析,并构建系统发育树。结果表明:编码草菇α-淀粉酶的基因有5个,分别为 GME 2151、GME 6695、GME 9075、GME 10698和GME 10705;5个基因编码的蛋白相对分子量介于38.8~64.6 kD 之间,磷酸化以 Ser 位点为主,大都存在信号肽,亚细胞定位在细胞外,保守结构域和空间结构相似度较高。和其它的担子菌一样,草菇α-淀粉酶可以分为两类:GME9075和 GME10698归为α-淀粉酶Ⅰ类,GME2151、GME6695和 GME10705属于α-淀粉酶Ⅱ类。%Five genes (GME 2151、GME 6695、GME 9075、GME 1069 and GME 10705 ) were identified as encoding α-amylases in Volvariella volvacea , the molecular weights of which varied from 38.8 kD to 64.4 kD.Bioinformatic methods based on genome and transcriptome sequences have been used to analyze gene intron:exon distribution patterns and the physicochemical properties of the encoded α-amylases.Signal peptides,sub-cellular localization patterns and functional sites of the α-amylases were predicted,and a phylogenetic tree was constructed based onα-amylases from different fungi.Serine phosphorylation sites were the primary sites of amylase protein phosphorylation. The amylases contained signal peptides, transmembrane helices,conserved amino acid residues,similar three dimensional structures of amylase,and were located both intra-and extracellularly.Analysis of the phylogenetic tree revealed that the α-amylases were of two types:GME9075 and GME10698 belonged to α-amylase type I,and GME2151,GME6695 and GME10705 α-amylase type II. This is consistent with the classification of amylases from other basidiomycetes.Our data provide useful information relating to matrix degradation by the mycelium of V .volvacea and other macro-basidiomycetes.

  19. Bioinformatic Analysis of GIF Protein Family in Chinese Cabbage%大白菜GIF蛋白家族的生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    王凤德; 李利斌; 李化银; 刘立峰; 高建伟

    2012-01-01

    GIF( GRF - interacting factor)家族是一类含有SNH和QG结构域的蛋白质,可与GRF( Growth regulating factor)转录因子蛋白相结合形成功能复合体,通过促进和维持细胞的分裂能力参与调控植物叶器官的发育.本研究系统鉴定了5个大白菜的GIF基因,并对这些基因编码的蛋白质序列进行了保守性和系统进化分析,最后对BrGIF1基因的表达进行了分析.结果表明,所有的大白菜和拟南芥GIF蛋白家族成员都具有高度保守的SNH和QG结构域.在进化上,GIF蛋白家族可分为两个不同的亚家族,并且这种特征在大白菜和拟南芥分离之前就已经形成.在表达模式上,BrGIF1基因在具有较大叶球的白菜自交系以及具有较强细胞分裂能力的组织中的转录表达水平较高.另外,BrGIF1基因的表达受到NAA的诱导和ABA的抑制.这些结果表明大白菜GIF蛋白可能具有和拟南芥GIF蛋白相似的生物学功能,在调控植物器官发育中具有重要作用.%GIF ( GRF - interacting factor) protein family is one kind of transcription coactivator which features the existence of SNH and QG domains. The members of this family could form a functional complex with GRF and act synergistically in regulating the development of leaves through the promotion and/or maintenance of cell proliferation activity in leaf primordia. In this study, five GIF genes were identified from Chinese cabbage by bioinformatics analysis. The conserved sequences of these proteins were analyzed and a polygenetic tree was constructed based on the corresponding GIF proteins from Chinese cabbage and Arabidopsis thaliana. The expression of these genes was analyzed, too. The results showed that all GIF proteins had the highly conservative SNH and QG domains and could be divided into two sub - family groups, which might have existed before the split of Chinese cabbage and Arabidopsis thalina. The expression level of BrGIF1 was higher not only in the inbred lines

  20. 瘢痕疙瘩相关基因的生物信息学分析%Literature Mining and Bioinformatic Analysis of Dysregulated Genes in Keloid

    Institute of Scientific and Technical Information of China (English)

    边曦; 黄琛; 李博仑; 秦泽莲

    2012-01-01

    Objective To explore the pathogenesis of keloid by comparing the gene expression in keloid and normal skin tissues, so that to seek new therapeutic approaches for keloid. Methods The differentially expressed genes between keloid and normal skin were obtained by mining PubMed. The dysregulated genes in keloid were analyzed by bioinformatics methods, including protein-protein interaction networks, biological pathways, gene ontology and functional annotation clustering analysis. Results Eight differential gene eipression datasets and 922 articles were obtained. A total of 94 dysregulated genes in keloid were identified (71 up-regulated genes and 23 down-regulated genes). Eighty-six genes were found to encode proteins with interaction network, including TGFB1, FN1, COL1A1, MMP9, VEGFA, TP53, IL6 and MMP2 as the central nodes for this network. The dysregulated genes in keloid were involved in a variety of biological pathways, including signal transduction and tumor formation. Furthermore, the dysregulated genes in keloid played important roles in biological processes of apoptosis and cell motility. Additionally, some of the dysregulated genes participated in cellular components expression, forming such as cell membrane structure, extracellular matrix and collagen components. Conclusions Key genes including TGFB1, FN1, COL1A1, MMP9, VEGFA, TP53, IL6, and MMP2, along with TGF- β signal transduction, cell proliferation and apoptosis, tumor formation may play important roles in the development of keloid.%目的 比较瘢痕疙瘩与正常皮肤的基因表达差异,从分子水平探讨瘢痕疙瘩的发病机制,为临床治疗提供新思路. 方法 用PubMed数据库文献检索瘢痕疙瘩与正常皮肤的差异表达基因,对与瘢痕疙瘩相关的基因进行蛋白-蛋白相互作用网络、生物学通路、基因本体( gene ontology,GO)和功能注释聚类的生物信息学分析. 结果 获得差异表达基因谱8个和文献922篇,

  1. Pattern recognition in bioinformatics.

    Science.gov (United States)

    de Ridder, Dick; de Ridder, Jeroen; Reinders, Marcel J T

    2013-09-01

    Pattern recognition is concerned with the development of systems that learn to solve a given problem using a set of example instances, each represented by a number of features. These problems include clustering, the grouping of similar instances; classification, the task of assigning a discrete label to a given instance; and dimensionality reduction, combining or selecting features to arrive at a more useful representation. The use of statistical pattern recognition algorithms in bioinformatics is pervasive. Classification and clustering are often applied to high-throughput measurement data arising from microarray, mass spectrometry and next-generation sequencing experiments for selecting markers, predicting phenotype and grouping objects or genes. Less explicitly, classification is at the core of a wide range of tools such as predictors of genes, protein function, functional or genetic interactions, etc., and used extensively in systems biology. A course on pattern recognition (or machine learning) should therefore be at the core of any bioinformatics education program. In this review, we discuss the main elements of a pattern recognition course, based on material developed for courses taught at the BSc, MSc and PhD levels to an audience of bioinformaticians, computer scientists and life scientists. We pay attention to common problems and pitfalls encountered in applications and in interpretation of the results obtained.

  2. Bioconductor: open software development for computational biology and bioinformatics

    DEFF Research Database (Denmark)

    Gentleman, R.C.; Carey, V.J.; Bates, D.M.;

    2004-01-01

    into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples....

  3. Bioinformatics in microbial biotechnology – a mini review

    Directory of Open Access Journals (Sweden)

    Bansal Arvind K

    2005-06-01

    Full Text Available Abstract The revolutionary growth in the computation speed and memory storage capability has fueled a new era in the analysis of biological data. Hundreds of microbial genomes and many eukaryotic genomes including a cleaner draft of human genome have been sequenced raising the expectation of better control of microorganisms. The goals are as lofty as the development of rational drugs and antimicrobial agents, development of new enhanced bacterial strains for bioremediation and pollution control, development of better and easy to administer vaccines, the development of protein biomarkers for various bacterial diseases, and better understanding of host-bacteria interaction to prevent bacterial infections. In the last decade the development of many new bioinformatics techniques and integrated databases has facilitated the realization of these goals. Current research in bioinformatics can be classified into: (i genomics – sequencing and comparative study of genomes to identify gene and genome functionality, (ii proteomics – identification and characterization of protein related properties and reconstruction of metabolic and regulatory pathways, (iii cell visualization and simulation to study and model cell behavior, and (iv application to the development of drugs and anti-microbial agents. In this article, we will focus on the techniques and their limitations in genomics and proteomics. Bioinformatics research can be classified under three major approaches: (1 analysis based upon the available experimental wet-lab data, (2 the use of mathematical modeling to derive new information, and (3 an integrated approach that integrates search techniques with mathematical modeling. The major impact of bioinformatics research has been to automate the genome sequencing, automated development of integrated genomics and proteomics databases, automated genome comparisons to identify the genome function, automated derivation of metabolic pathways, gene

  4. Systematic enrichment analysis of gene expression profiling studies identifies consensus pathways implicated in colorectal cancer development

    Directory of Open Access Journals (Sweden)

    Jesús Lascorz

    2011-01-01

    Full Text Available Background: A large number of gene expression profiling (GEP studies on colorectal carcinogenesis have been performed but no reliable gene signature has been identified so far due to the lack of reproducibility in the reported genes. There is growing evidence that functionally related genes, rather than individual genes, contribute to the etiology of complex traits. We used, as a novel approach, pathway enrichment tools to define functionally related genes that are consistently up- or down-regulated in colorectal carcinogenesis. Materials and Methods: We started the analysis with 242 unique annotated genes that had been reported by any of three recent meta-analyses covering GEP studies on genes differentially expressed in carcinoma vs normal mucosa. Most of these genes (218, 91.9% had been reported in at least three GEP studies. These 242 genes were submitted to bioinformatic analysis using a total of nine tools to detect enrichment of Gene Ontology (GO categories or Kyoto Encyclopedia of Genes and Genomes (KEGG pathways. As a final consistency criterion the pathway categories had to be enriched by several tools to be taken into consideration. Results: Our pathway-based enrichment analysis identified the categories of ribosomal protein constituents, extracellular matrix receptor interaction, carbonic anhydrase isozymes, and a general category related to inflammation and cellular response as significantly and consistently overrepresented entities. Conclusions: We triaged the genes covered by the published GEP literature on colorectal carcinogenesis and subjected them to multiple enrichment tools in order to identify the consistently enriched gene categories. These turned out to have known functional relationships to cancer development and thus deserve further investigation.

  5. Initial Bioinformatics Analysis and Verification of a Novel Gene Named Nischarin%新基因Nischarin生物信息学分析及初步验证

    Institute of Scientific and Technical Information of China (English)

    赵太云; 王勃; 苏瑞斌; 吴宁; 李锦

    2012-01-01

    目的:对新基因Nischarin进行生物信息学分析,探索其新功能特征,并通过实验进行初步验证.方法:用生物信息学方法对Nischarin进行初步分析,阐明了它的基因结构、染色体定位、编码蛋白质的理化性质、相互作用基因、相互作用蛋白、亚细胞定位、蛋白质功能域等信息.最后采用细胞免疫荧光对其DNA结合位点进行初步验证.结果:对新基因Nischarin的上述性质进行了有效的预测,分析表明该基因结构复杂,相互作用基因或蛋白多,亚细胞分布预测复杂.验证了Nishcarin存在的DNA结合位点.结论:通过生物信息学分析,表明新基因Nischarin是一个复杂的基因,可能存在的多种蛋白表达形式、这些不同的蛋白可能存在不同的亚细胞分布,且该蛋白可能与多种蛋白存在相互作用,上述基因和蛋白特性可能是Ⅰ型咪唑啉受体(Imidazoline-1 receptor,I1R)复杂药理学作用的分子基础.%Objective: To explore the new function of the novel gene named Nischarin and verify the real nature, the gene was analyzed by the bioinformatics analysis. Methods: Initial bioinformatics analysis was performed on the novel gene named Nischarin. Its gene structure, genome localization, the physical and chemical characteristics of the protein, subcellular localization of the protein, functional domain and so on was predicted. On the basis of the predicted result, the DNA binding motifs (Leucine zipper pattern) was verified by cell immunofluorescence. Results: Through efficient bioinformatics analysis, Nischarin gene's structure is very complex. It may be expressed by many kinds of proteins. The proteins have no signal peptide and it's subcellular localization is complex. There may be a DNA binding motifs in the Nischarin protein. Conclusion: Nischarin is a very complex gene, it may be expressed many kinds of proteins which maybe have different subcellular localization. Nischarin protein has interaction with

  6. Functional analysis of TPM domain containing Rv2345 of Mycobacterium tuberculosis identifies its phosphatase activity.

    Science.gov (United States)

    Sinha, Avni; Eniyan, Kandasamy; Sinha, Swati; Lynn, Andrew Michael; Bajpai, Urmi

    2015-07-01

    Mycobacterium tuberculosis (Mtb) is the causal agent of tuberculosis, the second largest infectious disease. With the rise of multi-drug resistant strains of M. tuberculosis, serious challenge lies ahead of us in treating the disease. The availability of complete genome sequence of Mtb has improved the scope for identifying new proteins that would not only further our understanding of biology of the organism but could also serve to discover new drug targets. In this study, Rv2345, a hypothetical membrane protein of M. tuberculosis H37Rv, which is reported to be a putative ortholog of ZipA cell division protein has been assigned function through functional annotation using bioinformatics tools followed by experimental validation. Sequence analysis showed Rv2345 to have a TPM domain at its N-terminal region and predicted it to have phosphatase activity. The TPM domain containing region of Rv2345 was cloned and expressed using pET28a vector in Escherichia coli and purified by Nickel affinity chromatography. The purified TPM domain was tested in vitro and our results confirmed it to have phosphatase activity. The enzyme activity was first checked and optimized with pNPP as substrate, followed by using ATP, which was also found to be used as substrate by the purified protein. Hence sequence analysis followed by in vitro studies characterizes TPM domain of Rv2345 to contain phosphatase activity.

  7. Functional and bioinformatics analysis of two Campylobacter jejuni homologs of the thiol-disulfide oxidoreductase, DsbA.

    Directory of Open Access Journals (Sweden)

    Anna D Grabowska

    Full Text Available BACKGROUND: Bacterial Dsb enzymes are involved in the oxidative folding of many proteins, through the formation of disulfide bonds between their cysteine residues. The Dsb protein network has been well characterized in cells of the model microorganism Escherichia coli. To gain insight into the functioning of the Dsb system in epsilon-Proteobacteria, where it plays an important role in the colonization process, we studied two homologs of the main Escherichia coli Dsb oxidase (EcDsbA that are present in the cells of the enteric pathogen Campylobacter jejuni, the most frequently reported bacterial cause of human enteritis in the world. METHODS AND RESULTS: Phylogenetic analysis suggests the horizontal transfer of the epsilon-Proteobacterial DsbAs from a common ancestor to gamma-Proteobacteria, which then gave rise to the DsbL lineage. Phenotype and enzymatic assays suggest that the two C. jejuni DsbAs play different roles in bacterial cells and have divergent substrate spectra. CjDsbA1 is essential for the motility and autoagglutination phenotypes, while CjDsbA2 has no impact on those processes. CjDsbA1 plays a critical role in the oxidative folding that ensures the activity of alkaline phosphatase CjPhoX, whereas CjDsbA2 is crucial for the activity of arylsulfotransferase CjAstA, encoded within the dsbA2-dsbB-astA operon. CONCLUSIONS: Our results show that CjDsbA1 is the primary thiol-oxidoreductase affecting life processes associated with bacterial spread and host colonization, as well as ensuring the oxidative folding of particular protein substrates. In contrast, CjDsbA2 activity does not affect the same processes and so far its oxidative folding activity has been demonstrated for one substrate, arylsulfotransferase CjAstA. The results suggest the cooperation between CjDsbA2 and CjDsbB. In the case of the CjDsbA1, this cooperation is not exclusive and there is probably another protein to be identified in C. jejuni cells that acts to re

  8. 细粒棘球绦虫14-3-3zeta蛋白的生物信息学分析%Application of bioinformatic analysis in 14-3-3zeta protein of Echinococcus granulosus

    Institute of Scientific and Technical Information of China (English)

    符瑞佳; 吕刚; 尹飞飞; 梁培

    2015-01-01

    目的:应用生物信息学技术对细粒棘球绦虫(Echinococcus granulosus)14-3-3zeta蛋白的结构和功能进行预测和分析,为进一步的实验研究提供依据。方法利用美国国家生物技术信息中心(NCBI,http://www.ncbi.nlm.nih.gov/)和瑞士生物信息学研究所的蛋白分析专家系统(ExPASY,http://expasy.org/)提供的各种有关基因和蛋白序列、结构信息分析的工具,并结合其它生物信息学分析软件,对该蛋白质的结构和功能进行预测和分析。结果该基因全长为771 bp ,编码256个氨基酸,其编码的蛋白相对分子量理论预测值和等电点分别是29.4 kDa和5.04。预测该蛋白无信号肽和跨膜区,二级结构含8个α-螺旋和12个β-折叠股,氨基酸序列中有9个潜在抗原表位。结论初步认识了细粒棘球绦虫14-3-3zeta蛋白的基本特征,为深入研究该蛋白的生物学功能奠定了基础。%Objective To predict and analyze the structure and function of 14-3-3zeta protein from Echinococcus granulosus by bioinformatics technology. Methods The structure and function of Eg14-3-3zeta protein was identified from two biological information sites, USA National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/), and Expert System for analysis of protein of the Swiss Institute of bioinformatics (ExPASY,http://expasy.org/), which offer the analysis of various related gene and protein sequence, structure information tools, and other bioinformatics analysis software. Results The full-length cDNA sequence encoding Eg14-3-3zeta included a complete open reading frame (ORF) of 771 bp coding to a putative protein with 256 amino acids. Molecular weight of Eg14-3-3zeta was predicted to be 29.4 kDa and its isoelectric point was 5.04. The protein had no signal peptide site and transmembrane do-main. Secondary structure of Eg14-3-3zeta contained 8 alpha-helices and 12 beta-strands.There were

  9. GALT Protein Database, a Bioinformatics Resource for the Manage-ment and Analysis of Structural Features of a Galactosemia-related Protein and Its Mutants

    Institute of Scientific and Technical Information of China (English)

    Antonio d'Acierno; Angelo Facchiano; Anna Marabotti

    2009-01-01

    We describe the GALT-Prot database and its related web-based application that have been developed to collect information about the structural and functional effects of mutations on the human enzyme galactose-1-phosphate uridyltransferase (GALT) involved in the genetic disease named galactosemia type Ⅰ. Besides a list of missense mutations at gene and protein sequence levels, GALT-Prot reports the analysis results of mutant GALT structures. In addition to the structural information about the wild-type enzyme, the database also includes structures of over 100 single point mutants simulated by means of a computational procedure, and the analysis to each mutant was made with several bioinformatics programs in order to investigate the effect of the mutations. The web-based interface allows querying of the database, and several links are also provided in order to guarantee a high integration with other resources already present on the web. Moreover, the architecture of the database and the web application is flexible and can be easily adapted to store data related to other proteins with point mutations. GALT-Prot is freely available at http://bioinformatica.isa.cnr.it/GALT/.

  10. Identifying clinical course patterns in SMS data using cluster analysis

    DEFF Research Database (Denmark)

    Kent, Peter; Kongsted, Alice

    2012-01-01

    ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically importa...... of cluster analysis. More research is needed, especially head-to-head studies, to identify which technique is best to use under what circumstances.......ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically important...... by spline analysis. However, cluster analysis of SMS data in its original untransformed form may be simpler and offer other advantages. Therefore, the aim of this study was to determine whether cluster analysis could be used for identifying clinical course patterns distinct from the pattern of the whole...

  11. 光滑念珠菌 Cdc42基因生物信息分析%Bioinformatics Analysis of Cdc42 Gene from Candida Glabrata

    Institute of Scientific and Technical Information of China (English)

    赵静; 黄怀球; 袁立燕; 钟毅; 张静; 张晓辉

    2013-01-01

    目的:分析和预测光滑念珠菌Cdc42基因及其编码蛋白的结构和特性。方法:利用NCBI、Ex-PASy和CBS网站中的各种信息分析工具,并结合Vector NTI suite 8.0生物信息学分析软件包,分析预测光滑念珠菌Cdc42基因并预测该基因编码蛋白结构的特征和功能。结果:Cdc42基因全长为576 bp,编码区具有191个氨基酸,在GenBank同源序列中,其与酵母 Cdc42氨基酸序列一致性达到99%,且有Cdc42保守域。 Cdc42蛋白相对分子量预测为21420.83,理论等电点为6.31。预测Cdc42编码蛋白ɑ螺旋(H)、β折叠(E)、无规则卷(L)的比例分别是29.84%、28.70%、41.88%,1个GTP/ATP结合位点。 Cdc42蛋白为疏水蛋白,无跨膜区,无信号肽。结论:成功预测Cdc42基因及编码蛋白生化及结构特征,为下一步对其进行克隆和表达奠定基础。%Objective:To analyze and predict the structure and properties about encoding pro-tein of cell division cycle 42(Cdc42) from Candida glabrata by bioinformatics.Methods:A full-length cDNA sequence encoding Cdc 42 from Candida glabrata was identified by using tools of bioinformatics at webs sites of NCBI , ExPASy, CBS and software Vector NTI suite 8.0.The char-acteristics of the protein were predicted by employing bioinformatics software package supplied by the website of ExPaSy .Results:The full length of Cdc42 is 576 bp, and its ORF encodes 191 ami-no acid.The relationship of phylogenesis between Candida glabrata and other fungus is close .The prediction shows that Cdc 42 had a Cdc42 conserved domain , the molecular weight and theoretical pI of Cg.Cdc42 was 21 420.83 and 6.31 respectively, and the coding protein contains 29.84%ɑ-helix, 28.70%extended strand,41.88% random coil,and one GTP/ATP motif.Cdc42 enco-ding protein is hydrophobic , extra-membrane protein , without signal peptide .Conclusion:The structure and characteristics of the gene and protein of Cg .Cdc42 was

  12. Virtual Bioinformatics Distance Learning Suite

    Science.gov (United States)

    Tolvanen, Martti; Vihinen, Mauno

    2004-01-01

    Distance learning as a computer-aided concept allows students to take courses from anywhere at any time. In bioinformatics, computers are needed to collect, store, process, and analyze massive amounts of biological and biomedical data. We have applied the concept of distance learning in virtual bioinformatics to provide university course material…

  13. Bioinformatics meets parasitology.

    Science.gov (United States)

    Cantacessi, C; Campbell, B E; Jex, A R; Young, N D; Hall, R S; Ranganathan, S; Gasser, R B

    2012-05-01

    The advent and integration of high-throughput '-omics' technologies (e.g. genomics, transcriptomics, proteomics, metabolomics, glycomics and lipidomics) are revolutionizing the way biology is done, allowing the systems biology of organisms to be explored. These technologies are now providing unique opportunities for global, molecular investigations of parasites. For example, studies of a transcriptome (all transcripts in an organism, tissue or cell) have become instrumental in providing insights into aspects of gene expression, regulation and function in a parasite, which is a major step to understanding its biology. The purpose of this article was to review recent applications of next-generation sequencing technologies and bioinformatic tools to large-scale investigations of the transcriptomes of parasitic nematodes of socio-economic significance (particularly key species of the order Strongylida) and to indicate the prospects and implications of these explorations for developing novel methods of parasite intervention.

  14. Emergent Computation Emphasizing Bioinformatics

    CERN Document Server

    Simon, Matthew

    2005-01-01

    Emergent Computation is concerned with recent applications of Mathematical Linguistics or Automata Theory. This subject has a primary focus upon "Bioinformatics" (the Genome and arising interest in the Proteome), but the closing chapter also examines applications in Biology, Medicine, Anthropology, etc. The book is composed of an organized examination of DNA, RNA, and the assembly of amino acids into proteins. Rather than examine these areas from a purely mathematical viewpoint (that excludes much of the biochemical reality), the author uses scientific papers written mostly by biochemists based upon their laboratory observations. Thus while DNA may exist in its double stranded form, triple stranded forms are not excluded. Similarly, while bases exist in Watson-Crick complements, mismatched bases and abasic pairs are not excluded, nor are Hoogsteen bonds. Just as there are four bases naturally found in DNA, the existence of additional bases is not ignored, nor amino acids in addition to the usual complement of...

  15. Engineering BioInformatics

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    @@ With the completion of human genome sequencing, a new era of bioinformatics st arts. On one hand, due to the advance of high throughput DNA microarray technol ogies, functional genomics such as gene expression information has increased exp onentially and will continue to do so for the foreseeable future. Conventional m eans of storing, analysing and comparing related data are already overburdened. Moreover, the rich information in genes , their functions and their associated wide biological implication requires new technologies of analysing data that employ sophisticated statistical and machine learning algorithms, powerful com puters and intensive interaction together different data sources such as seque nce data, gene expression data, proteomics data and metabolic pathway informati on to discover complex genomic structures and functional patterns with other bi ological process to gain a comprehensive understanding of cell physiology.

  16. Alteration of microRNA expression in cerebrospinal fluid of unconscious patients after traumatic brain injury and a bioinformatic analysis of related single nucleotide polymorphisms

    Institute of Scientific and Technical Information of China (English)

    Wen-Dong You; Qi-Lin Tang; Lei Wang; Jin Lei; Jun-Feng Feng; Qing Mao; Guo-Yi Gao

    2016-01-01

    Purpose:It is becoming increasingly clear that genetic factors play a role in traumatic brain injury (TBI),whether in modifying clinical outcome after TBI or determining susceptibility to it.MicroRNAs are small RNA molecules involved in various pathophysiological processes by repressing target genes at the posttranscriptional level,and TBI alters microRNA expression levels in the hippocampus and cortex.This study was designed to detect differentially expressed microRNAs in the cerebrospinal fluid (CSF) of TBI patients remaining unconscious two weeks after initial injury and to explore related single nucleotide polymorphisms (SNPs).Methods:We used a microarray platform to detect differential microRNA expression levels in CSF samples from patients with post-traumatic coma compared with samples from controls.A bioinformatic scan was performed covering microRNA gene promoter regions to identify potential functional SNPs.Results:Totally 26 coma patients and 21 controls were included in this study,with similar distribution of age and gender between the two groups.Microarray showed that fourteen microRNAs were differentially expressed,ten at higher and four at lower expression levels in CSF of traumatic coma patients compared with controls (p < 0.05).One SNP (rs11851174 allele:C/T) was identified in the motif area of the microRNA hsa-miR-431-3P gene promoter region.Conclusion:The altered microRNA expression levels in CSF after brain injury together with SNP identified within the microRNA gene promoter area provide a new perspective on the mechanism of impaired consciousness after TBI.Further studies are needed to explore the association between the specific microRNAs and their related SNPs with post-traumatic unconsciousness.

  17. Virtual bioinformatics distance learning suite*.

    Science.gov (United States)

    Tolvanen, Martti; Vihinen, Mauno

    2004-05-01

    Distance learning as a computer-aided concept allows students to take courses from anywhere at any time. In bioinformatics, computers are needed to collect, store, process, and analyze massive amounts of biological and biomedical data. We have applied the concept of distance learning in virtual bioinformatics to provide university course material over the Internet. Currently, we provide two fully computer-based courses, "Introduction to Bioinformatics" and "Bioinformatics in Functional Genomics." Here we will discuss the application of distance learning in bioinformatics training and our experiences gained during the 3 years that we have run the courses, with about 400 students from a number of universities. The courses are available at bioinf.uta.fi.

  18. Next-Generation Sequencing of Elite Berry Germplasm and Data Analysis Using a Bioinformatics Pipeline for Virus Detection and Discovery

    Science.gov (United States)

    Berry crops (members of the genera Fragaria, Ribes, Rubus, Sambucus and Vaccinium) are known hosts for more than 70 viruses and new ones are identified frequently. In modern berry cultivars, viruses tend to be asymptomatic in single infections and symptoms only develop after plants accumulate multip...

  19. Next Generation Sequencing of Elite Berry Germplasm and Data Analysis Using a Bioinformatics Pipeline for Virus Detection and Discovery

    Science.gov (United States)

    Berry crops (members of the genera Fragaria, Ribes, Rubus, Sambucus and Vaccinium) are known hosts for more than 70 viruses and new ones are identified continually. In modern berry cultivars, viruses tend to be be asymptomatic in single infections and symptoms only develop after plants accumulate m...

  20. MISIS-2: A bioinformatics tool for in-depth analysis of small RNAs and representation of consensus master genome in viral quasispecies.

    Science.gov (United States)

    Seguin, Jonathan; Otten, Patricia; Baerlocher, Loïc; Farinelli, Laurent; Pooggin, Mikhail M

    2016-07-01

    In most eukaryotes, small RNA (sRNA) molecules such as miRNAs, siRNAs and piRNAs regulate gene expression and repress transposons and viruses. AGO/PIWI family proteins sort functional sRNAs based on size, 5'-nucleotide and other sequence features. In plants and some animals, viral sRNAs are extremely diverse and cover the entire viral genome sequences, which allows for de novo reconstruction of a complete viral genome by deep sequencing and bioinformatics analysis of viral sRNAs. Previously, we have developed a tool MISIS to view and analyze sRNA maps of viruses and cellular genome regions which spawn multiple sRNAs. Here we describe a new release of MISIS, MISIS-2, which enables to determine and visualize a consensus sequence and count sRNAs of any chosen sizes and 5'-terminal nucleotide identities. Furthermore we demonstrate the utility of MISIS-2 for identification of single nucleotide polymorphisms (SNPs) at each position of a reference sequence and reconstruction of a consensus master genome in evolving viral quasispecies. MISIS-2 is a Java standalone program. It is freely available along with the source code at the website http://www.fasteris.com/apps.

  1. Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies

    Science.gov (United States)

    Delmont, Tom O.

    2016-01-01

    High-throughput sequencing provides a fast and cost-effective mean to recover genomes of organisms from all domains of life. However, adequate curation of the assembly results against potential contamination of non-target organisms requires advanced bioinformatics approaches and practices. Here, we re-analyzed the sequencing data generated for the tardigrade Hypsibius dujardini, and created a holistic display of the eukaryotic genome assembly using DNA data originating from two groups and eleven sequencing libraries. By using bacterial single-copy genes, k-mer frequencies, and coverage values of scaffolds we could identify and characterize multiple near-complete bacterial genomes from the raw assembly, and curate a 182 Mbp draft genome for H. dujardini supported by RNA-Seq data. Our results indicate that most contaminant scaffolds were assembled from Moleculo long-read libraries, and most of these contaminants have differed between library preparations. Our re-analysis shows that visualization and curation of eukaryotic genome assemblies can benefit from tools designed to address the needs of today’s microbiologists, who are constantly challenged by the difficulties associated with the identification of distinct microbial genomes in complex environmental metagenomes. PMID:27069789

  2. Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies

    Directory of Open Access Journals (Sweden)

    Tom O. Delmont

    2016-03-01

    Full Text Available High-throughput sequencing provides a fast and cost-effective mean to recover genomes of organisms from all domains of life. However, adequate curation of the assembly results against potential contamination of non-target organisms requires advanced bioinformatics approaches and practices. Here, we re-analyzed the sequencing data generated for the tardigrade Hypsibius dujardini, and created a holistic display of the eukaryotic genome assembly using DNA data originating from two groups and eleven sequencing libraries. By using bacterial single-copy genes, k-mer frequencies, and coverage values of scaffolds we could identify and characterize multiple near-complete bacterial genomes from the raw assembly, and curate a 182 Mbp draft genome for H. dujardini supported by RNA-Seq data. Our results indicate that most contaminant scaffolds were assembled from Moleculo long-read libraries, and most of these contaminants have differed between library preparations. Our re-analysis shows that visualization and curation of eukaryotic genome assemblies can benefit from tools designed to address the needs of today’s microbiologists, who are constantly challenged by the difficulties associated with the identification of distinct microbial genomes in complex environmental metagenomes.

  3. Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies.

    Science.gov (United States)

    Delmont, Tom O; Eren, A Murat

    2016-01-01

    High-throughput sequencing provides a fast and cost-effective mean to recover genomes of organisms from all domains of life. However, adequate curation of the assembly results against potential contamination of non-target organisms requires advanced bioinformatics approaches and practices. Here, we re-analyzed the sequencing data generated for the tardigrade Hypsibius dujardini, and created a holistic display of the eukaryotic genome assembly using DNA data originating from two groups and eleven sequencing libraries. By using bacterial single-copy genes, k-mer frequencies, and coverage values of scaffolds we could identify and characterize multiple near-complete bacterial genomes from the raw assembly, and curate a 182 Mbp draft genome for H. dujardini supported by RNA-Seq data. Our results indicate that most contaminant scaffolds were assembled from Moleculo long-read libraries, and most of these contaminants have differed between library preparations. Our re-analysis shows that visualization and curation of eukaryotic genome assemblies can benefit from tools designed to address the needs of today's microbiologists, who are constantly challenged by the difficulties associated with the identification of distinct microbial genomes in complex environmental metagenomes.

  4. PhyloToAST: Bioinformatics tools for species-level analysis and visualization of complex microbial datasets.

    Science.gov (United States)

    Dabdoub, Shareef M; Fellows, Megan L; Paropkari, Akshay D; Mason, Matthew R; Huja, Sarandeep S; Tsigarida, Alexandra A; Kumar, Purnima S

    2016-06-30

    The 16S rRNA gene is widely used for taxonomic profiling of microbial ecosystems; and recent advances in sequencing chemistry have allowed extremely large numbers of sequences to be generated from minimal amounts of biological samples. Analysis speed and resolution of data to species-level taxa are two important factors in large-scale explorations of complex microbiomes using 16S sequencing. We present here new software, Phylogenetic Tools for Analysis of Species-level Taxa (PhyloToAST), that completely integrates with the QIIME pipeline to improve analysis speed, reduce primer bias (requiring two sequencing primers), enhance species-level analysis, and add new visualization tools. The code is free and open source, and can be accessed at http://phylotoast.org.

  5. Bioinformatics analyses for signal transduction networks

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Research in signaling networks contributes to a deeper understanding of organism living activities. With the development of experimental methods in the signal transduction field, more and more mechanisms of signaling pathways have been discovered. This paper introduces such popular bioin-formatics analysis methods for signaling networks as the common mechanism of signaling pathways and database resource on the Internet, summerizes the methods of analyzing the structural properties of networks, including structural Motif finding and automated pathways generation, and discusses the modeling and simulation of signaling networks in detail, as well as the research situation and tendency in this area. Now the investigation of signal transduction is developing from small-scale experiments to large-scale network analysis, and dynamic simulation of networks is closer to the real system. With the investigation going deeper than ever, the bioinformatics analysis of signal transduction would have immense space for development and application.

  6. The utility of optical detection system (qPCR) and bioinformatics methods in reference gene expression analysis

    Science.gov (United States)

    Skarzyńska, Agnieszka; Pawełkowicz, Magdalena; PlÄ der, Wojciech; Przybecki, Zbigniew

    2016-09-01

    Real-time quantitative polymerase chain reaction is consider as the most reliable method for gene expression studies. However, the expression of target gene could be misinterpreted due to improper normalization. Therefore, the crucial step for analysing of qPCR data is selection of suitable reference genes, which should be validated experimentally. In order to choice the gene with stable expression in the designed experiment, we performed reference gene expression analysis. In this study genes described in the literature and novel genes predicted as control genes, based on the in silico analysis of transcriptome data were used. Analysis with geNorm and NormFinder algorithms allow to create the ranking of candidate genes and indicate the best reference for flower morphogenesis study. According to the results, genes CACS and CYCL were characterised the most stable expression, but the least suitable genes were TUA and EF.

  7. Agile parallel bioinformatics workflow management using Pwrake

    Directory of Open Access Journals (Sweden)

    Tanaka Masahiro

    2011-09-01

    Full Text Available Abstract Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows

  8. Biophysics and bioinformatics of transcription regulation in bacteria and bacteriophages

    Science.gov (United States)

    Djordjevic, Marko

    2005-11-01

    Due to rapid accumulation of biological data, bioinformatics has become a very important branch of biological research. In this thesis, we develop novel bioinformatic approaches and aid design of biological experiments by using ideas and methods from statistical physics. Identification of transcription factor binding sites within the regulatory segments of genomic DNA is an important step towards understanding of the regulatory circuits that control expression of genes. We propose a novel, biophysics based algorithm, for the supervised detection of transcription factor (TF) binding sites. The method classifies potential binding sites by explicitly estimating the sequence-specific binding energy and the chemical potential of a given TF. In contrast with the widely used information theory based weight matrix method, our approach correctly incorporates saturation in the transcription factor/DNA binding probability. This results in a significant reduction in the number of expected false positives, and in the explicit appearance---and determination---of a binding threshold. The new method was used to identify likely genomic binding sites for the Escherichia coli TFs, and to examine the relationship between TF binding specificity and degree of pleiotropy (number of regulatory targets). We next address how parameters of protein-DNA interactions can be obtained from data on protein binding to random oligos under controlled conditions (SELEX experiment data). We show that 'robust' generation of an appropriate data set is achieved by a suitable modification of the standard SELEX procedure, and propose a novel bioinformatic algorithm for analysis of such data. Finally, we use quantitative data analysis, bioinformatic methods and kinetic modeling to analyze gene expression strategies of bacterial viruses. We study bacteriophage Xp10 that infects rice pathogen Xanthomonas oryzae. Xp10 is an unusual bacteriophage, which has morphology and genome organization that most closely

  9. A bioinformatics insight to rhizobial globins: gene identification and mapping, polypeptide sequence and phenetic analysis, and protein modeling. [v1; ref status: indexed, http://f1000r.es/5ai

    Directory of Open Access Journals (Sweden)

    Reinier Gesto-Borroto

    2015-05-01

    Full Text Available Globins (Glbs are proteins widely distributed in organisms. Three evolutionary families have been identified in Glbs: the M, S and T Glb families. The M Glbs include flavohemoglobins (fHbs and single-domain Glbs (SDgbs; the S Glbs include globin-coupled sensors (GCSs, protoglobins and sensor single domain globins, and the T Glbs include truncated Glbs (tHbs. Structurally, the M and S Glbs exhibit 3/3-folding whereas the T Glbs exhibit 2/2-folding. Glbs are widespread in bacteria, including several rhizobial genomes. However, only few rhizobial Glbs have been characterized. Hence, we characterized Glbs from 62 rhizobial genomes using bioinformatics methods such as data mining in databases, sequence alignment, phenogram construction and protein modeling. Also, we analyzed soluble extracts from Bradyrhizobium japonicum USDA38 and USDA58 by (reduced + carbon monoxide (CO minus reduced differential spectroscopy. Database searching showed that only fhb, sdgb, gcs and thb genes exist in the rhizobia analyzed in this work. Promoter analysis revealed that apparently several rhizobial glb genes are not regulated by a -10 promoter but might be regulated by -35 and Fnr (fumarate-nitrate reduction regulator-like promoters. Mapping analysis revealed that rhizobial fhbs and thbs are flanked by a variety of genes whereas several rhizobial sdgbs and gcss are flanked by genes coding for proteins involved in the metabolism of nitrates and nitrites and chemotaxis, respectively. Phenetic analysis showed that rhizobial Glbs segregate into the M, S and T Glb families, while structural analysis showed that predicted rhizobial SDgbs and fHbs and GCSs globin domain and tHbs fold into the 3/3- and 2/2-folding, respectively. Spectra from B. japonicum USDA38 and USDA58 soluble extracts exhibited peaks and troughs characteristic of bacterial and vertebrate Glbs thus indicating that putative Glbs are synthesized in B. japonicum USDA38 and USDA58.

  10. Survey of MapReduce frame operation in bioinformatics.

    Science.gov (United States)

    Zou, Quan; Li, Xu-Bin; Jiang, Wen-Rui; Lin, Zi-Yu; Li, Gui-Lin; Chen, Ke

    2014-07-01

    Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics.

  11. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    Science.gov (United States)

    Magana, Alejandra J.; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the…

  12. Comparative void-volume analysis of psychrophilic and mesophilic enzymes: Structural bioinformatics of psychrophilic enzymes reveals sources of core flexibility

    Directory of Open Access Journals (Sweden)

    Bystroff Christopher

    2011-10-01

    Full Text Available Abstract Background Psychrophiles, cold-adapted organisms, have adapted to live at low temperatures by using a variety of mechanisms. Their enzymes are active at cold temperatures by being structurally more flexible than mesophilic enzymes. Even though, there are some indications of the possible structural mechanisms by which psychrophilic enzymes are catalytic active at cold temperatures, there is not a generalized structural property common to all psychrophilic enzymes. Results We examine twenty homologous enzyme pairs from psychrophiles and mesophiles to investigate flexibility as a key characteristic for cold adaptation. B-factors in protein X-ray structures are one way to measure flexibility. Comparing psychrophilic to mesophilic protein B-factors reveals that psychrophilic enzymes are more flexible in 5-turn and strand secondary structures. Enzyme cavities, identified using CASTp at various probe sizes, indicate that psychrophilic enzymes have larger average cavity sizes at probe radii of 1.4-1.5 Å, sufficient for water molecules. Furthermore, amino acid side chains lining these cavities show an increased frequency of acidic groups in psychrophilic enzymes. Conclusions These findings suggest that embedded water molecules may play a significant role in cavity flexibility, and therefore, overall protein flexibility. Thus, our results point to the important role enzyme flexibility plays in adaptation to cold environments.

  13. VLSI Microsystem for Rapid Bioinformatic Pattern Recognition

    Science.gov (United States)

    Fang, Wai-Chi; Lue, Jaw-Chyng

    2009-01-01

    A system comprising very-large-scale integrated (VLSI) circuits is being developed as a means of bioinformatics-oriented analysis and recognition of patterns of fluorescence generated in a microarray in an advanced, highly miniaturized, portable genetic-expression-assay instrument. Such an instrument implements an on-chip combination of polymerase chain reactions and electrochemical transduction for amplification and detection of deoxyribonucleic acid (DNA).

  14. Statistical approach, Sensory analysis, brief application of Bioinformatics Tool, Melanin, Allicin and Glucosinolate presence in Mango pulp for Pharmacological Benefits

    Directory of Open Access Journals (Sweden)

    Saranya Chitturi

    2013-06-01

    Full Text Available Information on important flavor components for fruit and vegetables is lacking and would be useful for breeders and molecular biologists . In this study five acid treatments, were formulated and the effects of Citric Acid (CA and Malic Acid (MA levels on canned mango pulp (Mangifera indica L. flavor perception was evaluated . Depiction of pulp components was executed in the Rasmol V 2 7.1 visualizing pectin, melanin and allinase compounds as a part of brief bioformatic analysis of the pulp. Melanin content, allicin and glucosinolate’s presence were assessed and their % concentration variations against different treatments was depicted . As we correlated the values of TSS and pH by different statistical analysis methods like Pearson’s correlation coefficient, Spearman’s and Regression plots by a statistical software we found that these two variables are positively correlated to each other. We have the alternate hypothesis H1 with p value < 0.05 being accepted for the sensory quality estimation based on Larmond’s 9-point hedonic scale sensory evaluation. The lowest levels of allicin was found in T2 about 0.14% where as the highest was noted to be about 4.28% in T3. The T5 treatment showed low concentration of melanin about 3.98% and the highest was about 9.43% in T4.The glucosinolate concentrations also varied according to the treatment administered. Low level of about 3.34% in T3 and about 7.9% concentration was observed in T4 . All these findings can further invariably help in extending the shelf life and increasing the marketability of the mango based products

  15. 生物信息学分析肝癌干细胞中致瘤基因表达的差异%Bioinformatics analysis of differentially expressed oncogenes in liver cancer stem cells

    Institute of Scientific and Technical Information of China (English)

    李淑娜; 宋娜娜

    2016-01-01

    BACKGROUND:Studieshavesuggestedthat oncogene expression in liver cancer stem cels has a certain relationship withtheoccurrence and development of liver cancer, but there is stil a lack of research on bioinformatics and mechanisms. OBJECTIVE:To identify differentialy expressed genes in liver cancer stem cels and to analyze these genes by bioinformatics. METHODS:The human hepatoma cel line HepG2 was the cel tool of liver cancer stem cels to prepare total RNA, andfluorescent labelingexperiment was conducted.Usinggene chipshybridization, mRNA expression profiles were obtained and were screened, and then differentialy expressed mRNAs were obtained,GO and Pathway annotationswere analyzedusing bioinformatics. RESULTS AND CONCLUSION:The detection rate was 73.21% for hybridization experiment, indicating the hybridization experiment is successful. In this study, a total of 38342 mRNA werefound, and after further analysis, 1236 differentialy expressed genes werescreened (P  目的:对于肝癌干细胞相关的致瘤性差异基因进行筛选,并对其进行生物信息学分析。  方法:将人肝癌细胞株HepG2作为肝癌干细胞的细胞工具制备总RNA并进行荧光标记,使用基因芯片进行杂交试验,将获得的 mRNA 表达谱进行筛选后获得差异性 mRNAs,利用生物信息学进行 GO注释和Pathway注释分析。  结果与结论:芯片杂交实验的检出率为73.21%,说明杂交试验是成功的;实验共有38342个mRNA被发现,进一步分析后,筛选差异性表达基因1236个(P<0.05,fold change≥2),其中上调和下调表达基因的数量分别为599和637个(P <0.05);表达差异基因的生物学功能主要涉及到了组蛋白的H4乙酰化,细胞的有丝分裂以及增殖、细胞相关蛋白质的合成以及分解、染色体的分离、细胞的分化以及凋亡、信号的转导、营养物质的运输、转录;Pathway 注释主要涉及细胞因子介导的炎症信号通路

  16. BIOELECTRICAL IMPEDANCE VECTOR ANALYSIS IDENTIFIES SARCOPENIA IN NURSING HOME RESIDENTS

    Science.gov (United States)

    Loss of muscle mass and water shifts between body compartments are contributing factors to frailty in the elderly. The body composition changes are especially pronounced in institutionalized elderly. We investigated the ability of single-frequency bioelectrical impedance analysis (BIA) to identify b...

  17. Identifying subgroups of patients using latent class analysis

    DEFF Research Database (Denmark)

    Nielsen, Anne Mølgaard; Kent, Peter; Hestbæk, Lise

    2017-01-01

    BACKGROUND: Heterogeneity in patients with low back pain (LBP) is well recognised and different approaches to subgrouping have been proposed. Latent Class Analysis (LCA) is a statistical technique that is increasingly being used to identify subgroups based on patient characteristics. However, as ...

  18. Molecular cloning of soluble trehalase from Chironomus riparius larvae, its heterologous expression in Escherichia coli and bioinformatic analysis.

    Science.gov (United States)

    Forcella, Matilde; Mozzi, Alessandra; Bigi, Alessandra; Parenti, Paolo; Fusi, Paola

    2012-10-01

    Trehalase is involved in the control of trehalose concentration, the main blood sugar in insects. Here, we describe the molecular cloning of the cDNA encoding for the soluble form of the trehalase from the midge larvae of Chironomus riparius, a well-known bioindicator of the quality of freshwater environments. Molecular cloning was achieved through multiple alignment of Diptera trehalase sequences, allowing the synthesis of internal homology-based primers; the complete open reading frame(ORF) was subsequently obtained through RACE-PCR(where RACE is rapid amplification of cDNA ends). The cDNA contained the 5' untranslated region (UTR), the 3' UTR including a poly(A) tail and the ORF of 1,725 bp consisting of 574 amino acid residues with a predicted molecular mass of 65,778 Da. Recombinant trehalase was successfully expressed in Escherichia coli as a His-tagged protein and purified on Ni-NTA affinity chromatography. Primary structure analysis showed a series of characteristic features shared by all insect trehalases, while three-dimensional structure prediction yielded the typical glucosidase fold, the two key residues involved in the catalytic mechanism being conserved. Production of recombinant insect trehalases opens the way to structural characterizations of the catalytic site, which might represent, among others, an element for reconsidering the enzyme as a target in pest insects' control.

  19. Identification of a novel carbohydrate esterase from Bjerkandera adusta: structural and function predictions through bioinformatics analysis and molecular modeling.

    Science.gov (United States)

    Cuervo-Soto, Laura I; Valdés-García, Gilberto; Batista-García, Ramón; del Rayo Sánchez-Carbente, María; Balcázar-López, Edgar; Lira-Ruan, Verónica; Pastor, Nina; Folch-Mallol, Jorge Luis

    2015-03-01

    A new gene from Bjerkandera adusta strain UAMH 8258 encoding a carbohydrate esterase (designated as BacesI) was isolated and expressed in Pichia pastoris. The gene had an open reading frame of 1410 bp encoding a polypeptide of 470 amino acid residues, the first 18 serving as a secretion signal peptide. Homology and phylogenetic analyses showed that BaCesI belongs to carbohydrate esterases family 4. Three-dimensional modeling of the protein and normal mode analysis revealed a breathing mode of the active site that could be relevant for esterase activity. Furthermore, the overall negative electrostatic potential of this enzyme suggests that it degrades neutral substrates and will not act on negative substrates such as peptidoglycan or p-nitrophenol derivatives. The enzyme shows a specific activity of 1.118 U mg(-1) protein on 2-naphthyl acetate. No activity was detected on p-nitrophenol derivatives as proposed from the electrostatic potential data. The deacetylation activity of the recombinant BaCesI was confirmed by measuring the release of acetic acid from several substrates, including oat xylan, shrimp shell chitin, N-acetylglucosamine, and natural substrates such as sugar cane bagasse and grass. This makes the protein very interesting for the biofuels production industry from lignocellulosic materials and for the production of chitosan from chitin.

  20. Two isocitrate dehydrogenases from a plant pathogen Xanthomonas campestris pv. campestris 8004. Bioinformatic analysis, enzymatic characterization, and implication in virulence.

    Science.gov (United States)

    Lv, Changqi; Wang, Peng; Wang, Wencai; Su, Ruirui; Ge, Yadong; Zhu, Youming; Zhu, Guoping

    2016-09-01

    Isocitrate dehydrogenase (IDH) is a key enzyme in the tricarboxylate (TCA) cycle, which may play an important role in the virulence of pathogenic bacteria. Here, two structurally different IDHs from a plant pathogen Xanthomonas campestris pv. campestris 8004 (XccIDH1 and XccIDH2) were characterized in detail. The recombinant XccIDH1 forms homodimer in solution, while the recombinant XccIDH2 is a typical monomer. Phylogenetic analysis showed that XccIDH1 belongs to the type I IDH subfamily and XccIDH2 groups into the monomeric IDH clade. Kinetic characterization demonstrated that XccIDH1's specificity towards NAD(+) was 110-fold greater than NADP(+) , while XccIDH2's specificity towards NADP(+) was 353-fold greater than NAD(+) . The putative coenzyme discriminating amino acids (Asp268, Ile269 and Ala275 for XccIDH1, and Lys589, His590 and Arg601 for XccIDH2) were studied by site-directed mutagenesis. The coenzyme specificities of the two mutants, mXccIDH1 and mXccIDH2, were completely reversed from NAD(+) to NADP(+) , and NADP(+) to NAD(+) , respectively. Furthermore, Ser80 of XccIDH1, and Lys256 and Tyr421 of XccIDH2, were the determinants for the substrate binding. The detailed biochemical properties, such as optimal pH and temperature, thermostability, and metal ion effects, of XccIDH1 and XccIDH2 were further investigated. The possibility of taking the two IDHs into consideration as the targets for drug development to control the plant diseases caused by Xcc 8004 were described and discussed thoroughly.

  1. Residue analysis of a CTL epitope of SARS-CoV spike protein by IFN-gamma production and bioinformatics prediction

    Directory of Open Access Journals (Sweden)

    Huang Jun

    2012-09-01

    Full Text Available Abstract Background Severe acute respiratory syndrome (SARS is an emerging infectious disease caused by the novel coronavirus SARS-CoV. The T cell epitopes of the SARS CoV spike protein are well known, but no systematic evaluation of the functional and structural roles of each residue has been reported for these antigenic epitopes. Analysis of the functional importance of side-chains by mutational study may exaggerate the effect by imposing a structural disturbance or an unusual steric, electrostatic or hydrophobic interaction. Results We demonstrated that N50 could induce significant IFN-gamma response from SARS-CoV S DNA immunized mice splenocytes by the means of ELISA, ELISPOT and FACS. Moreover, S366-374 was predicted to be an optimal epitope by bioinformatics tools: ANN, SMM, ARB and BIMAS, and confirmed by IFN-gamma response induced by a series of S358-374-derived peptides. Furthermore, each of S366-374 was replaced by alanine (A, lysine (K or aspartic acid (D, respectively. ANN was used to estimate the binding affinity of single S366-374 mutants to H-2 Kd. Y367 and L374 were predicated to possess the most important role in peptide binding. Additionally, these one residue mutated peptides were synthesized, and IFN-gamma production induced by G368, V369, A371, T372 and K373 mutated S366-374 were decreased obviously. Conclusions We demonstrated that S366-374 is an optimal H-2 Kd CTL epitope in the SARS CoV S protein. Moreover, Y367, S370, and L374 are anchors in the epitope, while C366, G368, V369, A371, T372, and K373 may directly interact with TCR on the surface of CD8-T cells.

  2. Bioinformatic Analysis of FoxA1 Protein%FoxA1蛋白的生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    赵小峰; 葛保健

    2012-01-01

    Transcription factor FoxAl is a 'pioneer' factor that binds to chromatinized DNA and regulates cell signaling and cell cycle. High expression of FoxAl has been reported in various tumors, and it maybe a potential therapeutic target of the cancer. This study aimed to obtain more information about FoxAl. The structures and functions, protein interaction network, multiple sequence alignment were analyzed with software tools and database. We obtained more biological information about FoxAl protein by bioinformatic analysis, which is very useful for further research.%转录因子FoxA1通过与染色体结合释放出DNA结合位点对信号转导和细胞增殖进行调控.研究发现多种肿瘤组织中FoxA1表达上调,参与肿瘤生长调控,揭示FoxA1有可能成为新的肿瘤治疗靶点.该研究采用生物信息学方法,在获得FoxA1基因和蛋白序列的基础上,对其结构、性质以及与其有相互作用的蛋白进行初步的生物信息学分析,以期为进一步研究FoxA1的生物学特性奠定基础.

  3. Bioinformatics analysis of genes related to the progress of cervical intraepithelial neoplasia%宫颈上皮内瘤变进展相关基因的生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    蒋燕明; 李力

    2016-01-01

    Objective:To investigate the potential genes associated with cervical intraepithelial neoplasia (CIN) progression through mi-croarray expression profiling data analysis and bioinformatics approaches. Methods:mRNA expression microarray data related to CIN progression were screened from GEO database for the first time. They were re-analyzed by bioinformatics analysis. Results: Two mRNA expression microarray datasets were obtained from the GEO database. Pathway enrichment analysis of the common differen-tially expressed genes identified 3 signaling pathways associated with CIN progression, including Wnt, Endocytosis, and Vibrio cholerae infection. Fourteen differentially expressed genes were also identified. Biological annotation and text mining showed that 3 genes were directly related to CIN progression, and 9 other genes were associated with tumor progression and recurrence. GeneMania tool analysis demonstrated the protein interaction network formed between all the differentially expressed genes and the 24 reported genes. CCND2 and TGFBR2 formed direct interaction with many reported genes. Conclusion:Three signaling pathways and 14 differen-tially expressed genes were associated with CIN progression, as indicated by microarray data analysis results.%目的:基于芯片数据采用生物信息学方法,挖掘与宫颈上皮内瘤变(cervical intraepithelial neoplasia,CIN)进展相关的信号通路和潜在差异表达基因。方法:在GEO数据库中筛选CIN进展相关mRNA表达谱芯片数据,并通过生物信息学方法进行再次分析。结果:在GEO数据库获得GSE63514、GSE51993芯片数据,将共同差异表达基因信号通路富集获得Wnt、Endocytosis、Vibrio cholerae infection与CIN进展显著相关的3条信号通路,及调控这些信号通路的14个差异表达基因。通过生物学注释与文本挖掘,发现3个基因与CIN进展相关,另有9个基因与肿瘤的进展和复发相关。通过GeneMania工具

  4. Bioinformatic Analysis of the Nitrate Reductase Gene in Antartic Ice Algae Chlamydomonas sp. ICE-L%南极衣藻Chlamydomonas sp.ICE-L硝酸还原酶基因的生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    林敏卓; 刘晨临; 黄晓航; 杨平平

    2012-01-01

    Nitrate reductase (NR) plays an important role in the abiotic stress adaptation in plants by regulating nitrogen metabolism. A nitrate reductase (NR) gene of Antarctic ice algae, Chlamydomonas sp. ICE-L, was identified from the cDNA library and sequenced. The encoded protein sequence of NR gene was investigated by bioinformatic analysis. Through sequence alignment the active sites of ICE-L NR protein sequence which may related to stress acclimation was identified. In addition, the tertiary structure of ICE- L NR protein sequence was predicted. The full-length of Chlamydomonas ICE-L NR gene contained an open reading frame of 2,589 bp encoding a nitrate reductase of 863 amino acids. Phylogenetic analysis showed that the gene was homologous to known green algae NRs with identity of 63%, 61%, 60% and 54% to Volvox carteri, Chlamydomonas reinhardtii, Dunaliella tertiolecta and Chlorella vulgaris respectively. The functional prediction analysis revealed that NR gene sequence has 3 different functional domains which was similar to higher plant. This bioinformatic analysis about NR gene of ICE- L will help us further understand and deeply expand the recearch on the acclimatizing mechanism of Antarctic ice alga Chlamydomonas in the extreme environment from the angle of NR gene.%硝酸还原酶(NR)除调节植物的氮代谢外,在植物的各种非生物胁迫的适应过程中也发挥着重要的作用.从南极冰藻Chlamydomonas sp.ICE-L的cDNA文库中筛选到了硝酸还原酶的全长基因,对其进行测序并对其编码的蛋白序列进行了生物信息学分析,构建了NR的系统进化树,通过多序列比对探讨了可能与该酶逆境适应性相关的活性位点,并对该蛋白进行了三级结构预测分析.结果显示,NR基因的编码区长2 589 bp,编码863个氨基酸.在以氨基酸序列构建的系统进化树中,南极衣藻的NR序列和其他绿藻类的聚在一起,与团藻、莱茵衣藻、杜氏盐藻和小球藻

  5. Bioinformatic analysis of the nucleolus

    DEFF Research Database (Denmark)

    Leung, Anthony K L; Andersen, Jens S; Mann, Matthias;

    2003-01-01

    The nucleolus is a plurifunctional, nuclear organelle, which is responsible for ribosome biogenesis and many other functions in eukaryotes, including RNA processing, viral replication and tumour suppression. Our knowledge of the human nucleolar proteome has been expanded dramatically by the two r...

  6. Identifying influential factors of business process performance using dependency analysis

    Science.gov (United States)

    Wetzstein, Branimir; Leitner, Philipp; Rosenberg, Florian; Dustdar, Schahram; Leymann, Frank

    2011-02-01

    We present a comprehensive framework for identifying influential factors of business process performance. In particular, our approach combines monitoring of process events and Quality of Service (QoS) measurements with dependency analysis to effectively identify influential factors. The framework uses data mining techniques to construct tree structures to represent dependencies of a key performance indicator (KPI) on process and QoS metrics. These dependency trees allow business analysts to determine how process KPIs depend on lower-level process metrics and QoS characteristics of the IT infrastructure. The structure of the dependencies enables a drill-down analysis of single factors of influence to gain a deeper knowledge why certain KPI targets are not met.

  7. Latent cluster analysis of ALS phenotypes identifies prognostically differing groups.

    Directory of Open Access Journals (Sweden)

    Jeban Ganesalingam

    Full Text Available BACKGROUND: Amyotrophic lateral sclerosis (ALS is a degenerative disease predominantly affecting motor neurons and manifesting as several different phenotypes. Whether these phenotypes correspond to different underlying disease processes is unknown. We used latent cluster analysis to identify groupings of clinical variables in an objective and unbiased way to improve phenotyping for clinical and research purposes. METHODS: Latent class cluster analysis was applied to a large database consisting of 1467 records of people with ALS, using discrete variables which can be readily determined at the first clinic appointment. The model was tested for clinical relevance by survival analysis of the phenotypic groupings using the Kaplan-Meier method. RESULTS: The best model generated five distinct phenotypic classes that strongly predicted survival (p<0.0001. Eight variables were used for the latent class analysis, but a good estimate of the classification could be obtained using just two variables: site of first symptoms (bulbar or limb and time from symptom onset to diagnosis (p<0.00001. CONCLUSION: The five phenotypic classes identified using latent cluster analysis can predict prognosis. They could be used to stratify patients recruited into clinical trials and generating more homogeneous disease groups for genetic, proteomic and risk factor research.

  8. Identification through bioinformatics of cDNAs encoding human thymic shared Ag-1/stem cell Ag-2. A new member of the human Ly-6 family.

    Science.gov (United States)

    Capone, M C; Gorman, D M; Ching, E P; Zlotnik, A

    1996-08-01

    The Ly-6 family of cell surface molecules includes many members that have been characterized in the mouse. Until recently, very few Ly-6 family members had been described in the human. A significant development with important implications for novel gene discovery has been the growth of the public Expressed Sequence Tag (EST) database. Here we report that, through the application of bioinformatics analysis to the dbEST database, we obtained the sequence of human TSA-1/SCA-2, a new member of the human Ly-6 family. In addition, we identified full-length clones encoding this molecule as well as expression data in various tissues. Sequencing of the clones identified this way confirmed the sequence predicted through bioinformatics. This study constitutes an example of the application of bioinformatics to the analysis of the recently expanded databases for the identification of genes of potential importance in the immune system.

  9. Application of bioinformatics in chronobiology research.

    Science.gov (United States)

    Lopes, Robson da Silva; Resende, Nathalia Maria; Honorio-França, Adenilda Cristina; França, Eduardo Luzía

    2013-01-01

    Bioinformatics and other well-established sciences, such as molecular biology, genetics, and biochemistry, provide a scientific approach for the analysis of data generated through "omics" projects that may be used in studies of chronobiology. The results of studies that apply these techniques demonstrate how they significantly aided the understanding of chronobiology. However, bioinformatics tools alone cannot eliminate the need for an understanding of the field of research or the data to be considered, nor can such tools replace analysts and researchers. It is often necessary to conduct an evaluation of the results of a data mining effort to determine the degree of reliability. To this end, familiarity with the field of investigation is necessary. It is evident that the knowledge that has been accumulated through chronobiology and the use of tools derived from bioinformatics has contributed to the recognition and understanding of the patterns and biological rhythms found in living organisms. The current work aims to develop new and important applications in the near future through chronobiology research.

  10. Application of Bioinformatics in Chronobiology Research

    Directory of Open Access Journals (Sweden)

    Robson da Silva Lopes

    2013-01-01

    Full Text Available Bioinformatics and other well-established sciences, such as molecular biology, genetics, and biochemistry, provide a scientific approach for the analysis of data generated through “omics” projects that may be used in studies of chronobiology. The results of studies that apply these techniques demonstrate how they significantly aided the understanding of chronobiology. However, bioinformatics tools alone cannot eliminate the need for an understanding of the field of research or the data to be considered, nor can such tools replace analysts and researchers. It is often necessary to conduct an evaluation of the results of a data mining effort to determine the degree of reliability. To this end, familiarity with the field of investigation is necessary. It is evident that the knowledge that has been accumulated through chronobiology and the use of tools derived from bioinformatics has contributed to the recognition and understanding of the patterns and biological rhythms found in living organisms. The current work aims to develop new and important applications in the near future through chronobiology research.

  11. Bioinformatics tools for analysing viral genomic data.

    Science.gov (United States)

    Orton, R J; Gu, Q; Hughes, J; Maabar, M; Modha, S; Vattipally, S B; Wilkie, G S; Davison, A J

    2016-04-01

    The field of viral genomics and bioinformatics is experiencing a strong resurgence due to high-throughput sequencing (HTS) technology, which enables the rapid and cost-effective sequencing and subsequent assembly of large numbers of viral genomes. In addition, the unprecedented power of HTS technologies has enabled the analysis of intra-host viral diversity and quasispecies dynamics in relation to important biological questions on viral transmission, vaccine resistance and host jumping. HTS also enables the rapid identification of both known and potentially new viruses from field and clinical samples, thus adding new tools to the fields of viral discovery and metagenomics. Bioinformatics has been central to the rise of HTS applications because new algorithms and software tools are continually needed to process and analyse the large, complex datasets generated in this rapidly evolving area. In this paper, the authors give a brief overview of the main bioinformatics tools available for viral genomic research, with a particular emphasis on HTS technologies and their main applications. They summarise the major steps in various HTS analyses, starting with quality control of raw reads and encompassing activities ranging from consensus and de novo genome assembly to variant calling and metagenomics, as well as RNA sequencing.

  12. Bringing Web 2.0 to bioinformatics.

    Science.gov (United States)

    Zhang, Zhang; Cheung, Kei-Hoi; Townsend, Jeffrey P

    2009-01-01

    Enabling deft data integration from numerous, voluminous and heterogeneous data sources is a major bioinformatic challenge. Several approaches have been proposed to address this challenge, including data warehousing and federated databasing. Yet despite the rise of these approaches, integration of data from multiple sources remains problematic and toilsome. These two approaches follow a user-to-computer communication model for data exchange, and do not facilitate a broader concept of data sharing or collaboration among users. In this report, we discuss the potential of Web 2.0 technologies to transcend this model and enhance bioinformatics research. We propose a Web 2.0-based Scientific Social Community (SSC) model for the implementation of these technologies. By establishing a social, collective and collaborative platform for data creation, sharing and integration, we promote a web services-based pipeline featuring web services for computer-to-computer data exchange as users add value. This pipeline aims to simplify data integration and creation, to realize automatic analysis, and to facilitate reuse and sharing of data. SSC can foster collaboration and harness collective intelligence to create and discover new knowledge. In addition to its research potential, we also describe its potential role as an e-learning platform in education. We discuss lessons from information technology, predict the next generation of Web (Web 3.0), and describe its potential impact on the future of bioinformatics studies.

  13. Translational bioinformatics in psychoneuroimmunology: methods and applications.

    Science.gov (United States)

    Yan, Qing

    2012-01-01

    Translational bioinformatics plays an indispensable role in transforming psychoneuroimmunology (PNI) into personalized medicine. It provides a powerful method to bridge the gaps between various knowledge domains in PNI and systems biology. Translational bioinformatics methods at various systems levels can facilitate pattern recognition, and expedite and validate the discovery of systemic biomarkers to allow their incorporation into clinical trials and outcome assessments. Analysis of the correlations between genotypes and phenotypes including the behavioral-based profiles will contribute to the transition from the disease-based medicine to human-centered medicine. Translational bioinformatics would also enable the establishment of predictive models for patient responses to diseases, vaccines, and drugs. In PNI research, the development of systems biology models such as those of the neurons would play a critical role. Methods based on data integration, data mining, and knowledge representation are essential elements in building health information systems such as electronic health records and computerized decision support systems. Data integration of genes, pathophysiology, and behaviors are needed for a broad range of PNI studies. Knowledge discovery approaches such as network-based systems biology methods are valuable in studying the cross-talks among pathways in various brain regions involved in disorders such as Alzheimer's disease.

  14. 基因芯片结合生物信息学筛选辐射损伤恢复期造血相关的枢纽基因%Microarray combined with multiple bioinformatics for identifying hub hematopietic genes during recovery phase of irradiation injury

    Institute of Scientific and Technical Information of China (English)

    杨悦; 张晶; 王寅; 张金元; 王泽剑; 沈翰林; 殷明; 沈旭东

    2012-01-01

    Objective To investigate the changes of global gene expression during bone marrow recovery period following sublethal ionizing radiation (IR) in mice. Methods The mice were exposed to 4 Gy of 60Co γ irradiation, and RNA samples were extracted from bone marrow cells at day 0, 3, 7, 11 and 21 after irradiation and were subjected to microarray analysis for identifying differentially expressed genes. Multiple bioinformatics analyses, including clustering analysis, gene ontology (GO) analysis, and dynamic gene network analysis, were conducted to identify key hub genes, pathways and biological processes during bone marrow recovery phase. Analysis was also made for the protein of the identified hub genes. Results Compared with non-IR stimulation group, 1 302 differential genes were identified by global gene expression profiling of the irradiation-damaged bone marrow. Clustering and GO analyses revealed that the immune response (especially hematopoiesis) associated genes played a critical role in the body function recovery after IR injury. Twenty-five of the differential genes were defined as the hub genes participating in two pathways including immune response and transcription/nucleosome assembly. Key node CCL3 improved the proliferation of hematopoietic stem cells (HSCs) by spontaneous down-regulation and increased degradation by CtsG. Conclusion The 25 genes identified by microarray analysis and bioinformatics analyses may play critical roles in recovery phase after IR. Key node CCL3 may increase the proliferation of HSCs by spontaneous down-regulation and increase of protein hydrolyzation.%目的 研究小鼠在接受亚致死剂量辐射刺激后,在损伤修复阶段小鼠骨髓全基因组表达的改变.方法 以4 Gy 60Co γ射线辐射刺激小鼠,取辐射刺激后0、3、7、11、21 d的小鼠骨髓细胞RNA样本进行基因芯片分析,通过聚类分析、功能分析和动态网络分析等生物信息学方法,全面挖掘在辐射损伤小鼠骨髓的

  15. Identifying Organizational Inefficiencies with Pictorial Process Analysis (PPA

    Directory of Open Access Journals (Sweden)

    David John Patrishkoff

    2013-11-01

    Full Text Available Pictorial Process Analysis (PPA was created by the author in 2004. PPA is a unique methodology which offers ten layers of additional analysis when compared to standard process mapping techniques.  The goal of PPA is to identify and eliminate waste, inefficiencies and risk in manufacturing or transactional business processes at 5 levels in an organization. The highest level being assessed is the process management, followed by the process work environment, detailed work habits, process performance metrics and general attitudes towards the process. This detailed process assessment and analysis is carried out during process improvement brainstorming efforts and Kaizen events. PPA creates a detailed visual efficiency rating for each step of the process under review.  A selection of 54 pictorial Inefficiency Icons (cards are available for use to highlight major inefficiencies and risks that are present in the business process under review. These inefficiency icons were identified during the author's independent research on the topic of why things go wrong in business. This paper will highlight how PPA was developed and show the steps required to conduct Pictorial Process Analysis on a sample manufacturing process. The author has successfully used PPA to dramatically improve business processes in over 55 different industries since 2004.  

  16. Microarray analysis identifies keratin loci as sensitive biomarkers for thyroid hormone disruption in the salamander Ambystoma mexicanum.

    Science.gov (United States)

    Page, Robert B; Monaghan, James R; Samuels, Amy K; Smith, Jeramiah J; Beachy, Christopher K; Voss, S Randal

    2007-02-01

    Ambystomatid salamanders offer several advantages for endocrine disruption research, including genomic and bioinformatics resources, an accessible laboratory model (Ambystoma mexicanum), and natural lineages that are broadly distributed among North American habitats. We used microarray analysis to measure the relative abundance of transcripts isolated from A. mexicanum epidermis (skin) after exogenous application of thyroid hormone (TH). Only one gene had a >2-fold change in transcript abundance after 2 days of TH treatment. However, hundreds of genes showed significantly different transcript levels at days 12 and 28 in comparison to day 0. A list of 123 TH-responsive genes was identified using statistical, BLAST, and fold level criteria. Cluster analysis identified two groups of genes with similar transcription patterns: up-regulated versus down-regulated. Most notably, several keratins exhibited dramatic (1000 fold) increases or decreases in transcript abundance. Keratin gene expression changes coincided with morphological remodeling of epithelial tissues. This suggests that keratin loci can be developed as sensitive biomarkers to assay temporal disruptions of larval-to-adult gene expression programs. Our study has identified the first collection of loci that are regulated during TH-induced metamorphosis in a salamander, thus setting the stage for future investigations of TH disruption in the Mexican axolotl and other salamanders of the genus Ambystoma.

  17. 儿童噬血细胞性淋巴组织细胞增生症发病机制的生物信息学研究%Mechanisms of childhood hemophagocytic lymphohistiocytosis:A bioinformatic analysis

    Institute of Scientific and Technical Information of China (English)

    欧丹艳; 袁媛; 罗建明

    2014-01-01

    Objective Hemophagocytic lymphohistiocytosis (HLH) is a life-threatening condition characterized by excessive inflammation, with a high incidence in children and a death rate of 40%.This study was to analyze the gene expression profile in child-hood HLH and explore the important pathways of childhood HLH using bioinformatic methods . Methods The childhood HLH gene ex-pression profile data GSE26050 were obtained from the Gene Expression Omnibus (GEO) database of the National Center for Biotechnolo-gy Information.Differentially expressed genes were identified with the GEO 2R online analysis tools released recently .The key pathways of the differentially expressed genes were investigated using the Kyoto Encyclopedia of Genes and Genomes ( KEGG) pathway enrichment a-nalysis. Results A total of 184 differentially expressed genes were identified , 126 upregulated and the other 58 downregulated .They were enriched in 3 pathways, including cytokine-cytokine receptor interaction , hematopoietic cell lineage and NOD-like receptor signaling pathways. Conclusion Bioinformatic tools allow the identification of the key genes and pathways associated with the development and progression of childhood HLH and point out the potential directions for researches on the mechanisms of childhood HLH .%目的:噬血细胞性淋巴组织细胞增生症( hemophagocytic lymphohistiocytosis , HLH)是一种致命的过度炎症性疾病,好发于儿童,死亡率高达40%。为进一步了解儿童HLH的发病机制,研究利用生物信息学方法筛选儿童HLH相关基因,并进行通路富集分析。方法从GEO数据库中获得儿童HLH 外周血单个核细胞基因表达谱数据集GSE26050,利用GEO2R在线分析工具筛选差异表达基因,随后使用DAVID数据库的KEGG通路富集方法对其进行分析。结果筛选出表达差异2倍及以上的基因184个,其中126个为上调基因、58个为下调基因;富集出3条通路:细胞因子-细胞

  18. 玉米谷胱甘肽过氧化物酶生物信息学分析%Bioinformatics Analysis of Glutathione Peroxidase in Zea Mays

    Institute of Scientific and Technical Information of China (English)

    张媛; 张钟仁; 咸丽霞; 邢国芳

    2013-01-01

    谷胱甘肽过氧化物酶(GPX)是生物体内重要的活性氧自由基清除剂,它能够清除生物体内的过氧化氢和脂质过氧化物,阻断活性氧自由基对机体的进一步损伤,保证生物体能正常进行生命活动.以玉米谷胱甘肽过氧化物酶基因家族的11个成员为研究对象,对其编码的蛋白质的结构和功能进行分析,包括等电点、分子量、亲水性值、二级结构和亚细胞定位等,并建立了分子系统进化树.结果发现,玉米谷胱甘肽过氧化物酶基因家族的11个成员的等电点和相对分子量存在差异,而二级结构存在相似特征,其中,二级结构包括α-螺旋、β-折叠、β-转角和无规则卷曲.以上分析为全面解析玉米谷胱甘肽过氧化物酶的功能奠定了基础,并可为植物抵御氧化胁迫研究提供理论依据.%The GPX (Glutathione Peroxidase) is important active oxygen free radical scavengers in biosome,which can remove hydrogen peroxide and lipid peroxides,block active oxygen free radical to damage the body,and ensure the normal biological activities.In this study,the structure and function of GPX family genes encoding protein in Zea Mays such as isoelectric point,molecular weight,the number of amino acids,hydrophilic property,secondary structure and subcellular localization were analyzed,and the phylogenetic tree was built by a series of bioinformatics software.The results showed that:the diversity characteristic of isoelectric point and molecular weight was observed among these GPX genes,and the similar characteristics such as secondary structure was observed.The secondary structure included α-helix,β-sheet,β-turn and random coil.The above results lay a foundation for comprehensive analysis of GPX in Zea Mays and provide theoretical basis for the resisting oxidative stress.

  19. Robust Bioinformatics Recognition with VLSI Biochip Microsystem

    Science.gov (United States)

    Lue, Jaw-Chyng L.; Fang, Wai-Chi

    2006-01-01

    A microsystem architecture for real-time, on-site, robust bioinformatic patterns recognition and analysis has been proposed. This system is compatible with on-chip DNA analysis means such as polymerase chain reaction (PCR)amplification. A corresponding novel artificial neural network (ANN) learning algorithm using new sigmoid-logarithmic transfer function based on error backpropagation (EBP) algorithm is invented. Our results show the trained new ANN can recognize low fluorescence patterns better than the conventional sigmoidal ANN does. A differential logarithmic imaging chip is designed for calculating logarithm of relative intensities of fluorescence signals. The single-rail logarithmic circuit and a prototype ANN chip are designed, fabricated and characterized.

  20. Rice Transcriptome Analysis to Identify Possible Herbicide Quinclorac Detoxification Genes

    Directory of Open Access Journals (Sweden)

    Wenying eXu

    2015-09-01

    Full Text Available Quinclorac is a highly selective auxin-type herbicide, and is widely used in the effective control of barnyard grass in paddy rice fields, improving the world’s rice yield. The herbicide mode of action of quinclorac has been proposed and hormone interactions affect quinclorac signaling. Because of widespread use, quinclorac may be transported outside rice fields with the drainage waters, leading to soil and water pollution and environmental health problems.In this study, we used 57K Affymetrix rice whole-genome array to identify quinclorac signaling response genes to study the molecular mechanisms of action and detoxification of quinclorac in rice plants. Overall, 637 probe sets were identified with differential expression levels under either 6 or 24 h of quinclorac treatment. Auxin-related genes such as GH3 and OsIAAs responded to quinclorac treatment. Gene Ontology analysis showed that genes of detoxification-related family genes were significantly enriched, including cytochrome P450, GST, UGT, and ABC and drug transporter genes. Moreover, real-time RT-PCR analysis showed that top candidate P450 families such as CYP81, CYP709C and CYP72A genes were universally induced by different herbicides. Some Arabidopsis genes for the same P450 family were up-regulated under quinclorac treatment.We conduct rice whole-genome GeneChip analysis and the first global identification of quinclorac response genes. This work may provide potential markers for detoxification of quinclorac and biomonitors of environmental chemical pollution.

  1. Training Experimental Biologists in Bioinformatics

    Directory of Open Access Journals (Sweden)

    Pedro Fernandes

    2012-01-01

    Full Text Available Bioinformatics, for its very nature, is devoted to a set of targets that constantly evolve. Training is probably the best response to the constant need for the acquisition of bioinformatics skills. It is interesting to assess the effects of training in the different sets of researchers that make use of it. While training bench experimentalists in the life sciences, we have observed instances of changes in their attitudes in research that, if well exploited, can have beneficial impacts in the dialogue with professional bioinformaticians and influence the conduction of the research itself.

  2. Identifying Sources of Difference in Reliability in Content Analysis

    Directory of Open Access Journals (Sweden)

    Elizabeth Murphy

    2005-07-01

    Full Text Available This paper reports on a case study which identifies and illustrates sources of difference in agreement in relation to reliability in a context of quantitative content analysis of a transcript of an online asynchronous discussion (OAD. Transcripts of 10 students in a month-long online asynchronous discussion were coded by two coders using an instrument with two categories, five processes, and 19 indicators of Problem Formulation and Resolution (PFR. Sources of difference were identified in relation to: coders; tasks; and students. Reliability values were calculated at the levels of categories, processes, and indicators. At the most detailed level of coding on the basis of the indicator, findings revealed that the overall level of reliability between coders was .591 when measured with Cohen’s kappa. The difference between tasks at the same level ranged from .349 to .664, and the difference between participants ranged from .390 to .907. Implications for training and research are discussed.

  3. Proteogenomic Analysis Identifies a Novel Human SHANK3 Isoform

    Directory of Open Access Journals (Sweden)

    Fahad Benthani

    2015-05-01

    Full Text Available Mutations of the SHANK3 gene have been associated with autism spectrum disorder. Individuals harboring different SHANK3 mutations display considerable heterogeneity in their cognitive impairment, likely due to the high SHANK3 transcriptional diversity. In this study, we report a novel interaction between the Mutated in colorectal cancer (MCC protein and a newly identified SHANK3 protein isoform in human colon cancer cells and mouse brain tissue. Hence, our proteogenomic analysis identifies a new human long isoform of the key synaptic protein SHANK3 that was not predicted by the human reference genome. Taken together, our findings describe a potential new role for MCC in neurons, a new human SHANK3 long isoform and, importantly, highlight the use of proteomic data towards the re-annotation of GC-rich genomic regions.

  4. An innovative approach for testing bioinformatics programs using metamorphic testing

    Directory of Open Access Journals (Sweden)

    Liu Huai

    2009-01-01

    Full Text Available Abstract Background Recent advances in experimental and computational technologies have fueled the development of many sophisticated bioinformatics programs. The correctness of such programs is crucial as incorrectly computed results may lead to wrong biological conclusion or misguide downstream experimentation. Common software testing procedures involve executing the target program with a set of test inputs and then verifying the correctness of the test outputs. However, due to the complexity of many bioinformatics programs, it is often difficult to verify the correctness of the test outputs. Therefore our ability to perform systematic software testing is greatly hindered. Results We propose to use a novel software testing technique, metamorphic testing (MT, to test a range of bioinformatics programs. Instead of requiring a mechanism to verify whether an individual test output is correct, the MT technique verifies whether a pair of test outputs conform to a set of domain specific properties, called metamorphic relations (MRs, thus greatly increases the number and variety of test cases that can be applied. To demonstrate how MT is used in practice, we applied MT to test two open-source bioinformatics programs, namely GNLab and SeqMap. In particular we show that MT is simple to implement, and is effective in detecting faults in a real-life program and some artificially fault-seeded programs. Further, we discuss how MT can be applied to test programs from various domains of bioinformatics. Conclusion This paper describes the application of a simple, effective and automated technique to systematically test a range of bioinformatics programs. We show how MT can be implemented in practice through two real-life case studies. Since many bioinformatics programs, particularly those for large scale simulation and data analysis, are hard to test systematically, their developers may benefit from using MT as part of the testing strategy. Therefore our work

  5. Lidar point density analysis: implications for identifying water bodies

    Science.gov (United States)

    Worstell, Bruce B.; Poppenga, Sandra; Evans, Gayla A.; Prince, Sandra

    2014-01-01

    Most airborne topographic light detection and ranging (lidar) systems operate within the near-infrared spectrum. Laser pulses from these systems frequently are absorbed by water and therefore do not generate reflected returns on water bodies in the resulting void regions within the lidar point cloud. Thus, an analysis of lidar voids has implications for identifying water bodies. Data analysis techniques to detect reduced lidar return densities were evaluated for test sites in Blackhawk County, Iowa, and Beltrami County, Minnesota, to delineate contiguous areas that have few or no lidar returns. Results from this study indicated a 5-meter radius moving window with fewer than 23 returns (28 percent of the moving window) was sufficient for delineating void regions. Techniques to provide elevation values for void regions to flatten water features and to force channel flow in the downstream direction also are presented.

  6. Cluster analysis of clinical data identifies fibromyalgia subgroups.

    Directory of Open Access Journals (Sweden)

    Elisa Docampo

    Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.

  7. BioRuby: Bioinformatics software for the Ruby programming language

    NARCIS (Netherlands)

    Goto, N.; Prins, J.C.P.; Nakao, M.; Bonnal, R.; Aerts, J.; Katayama, A.

    2010-01-01

    The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it suppor

  8. BioRuby : bioinformatics software for the Ruby programming language

    NARCIS (Netherlands)

    Goto, Naohisa; Prins, Pjotr; Nakao, Mitsuteru; Bonnal, Raoul; Aerts, Jan; Katayama, Toshiaki

    2010-01-01

    The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it suppor

  9. 梅花鹿 FGF10基因的生物信息学分析%Bioinformatics analysis of on FGF10 in Cervus nippon

    Institute of Scientific and Technical Information of China (English)

    鞠妍; 刘华淼; 魏海军

    2014-01-01

    应用生物信息学的方法对梅花鹿 FG F10基因的核苷酸和氨基酸序列进行了初步的生物信息学分析,包括理化性质分析、信号肽和跨膜结构域分析、磷酸化位点和疏水性分析、蛋白质二级结构分析、功能结构域分析以及系统进化分析.结果表明:梅花鹿FG F10基因编码213个氨基酸,蛋白相对分子量为23.84 kD ,为碱性不稳定蛋白;存在信号肽和跨膜结构域;共有26个磷酸化位点;二级结构主要由α螺旋、β转角、延伸链和随机卷曲组成.具有FG F典型的FG F结构域.系统进化分析显示,梅花鹿FG F10与哺乳动物FG F10相似性较高,并且与牛、羊在亲缘关系上最相近.为梅花鹿 FG F10基因的结构和功能的进一步研究打下了坚实的理论基础.%We use the bioinformatics approach to analyse the nucleotide and amino acid sequence of FGF10 of Cervus nippon ,including as the character of physical and chemical ,the sigal peptide and transmembrane domain prediction ,analysis of phosphorylation sites and hydrophobicity ,secondary structure prediction domains ,prediction of the conserved domains and phylogenetic analysis .The re-sults showed that FGF10 of Cervus nippon encode a deduced 213 amino acid ,had a predicted molecu-lar weight of 23 .84 kD ,and the protein which had sigal peptide ,transmembrane domain and 26 phosphorylation sites is alkaline and instability .Helix secondary structure mainly composed of αhe-lix ,βturn ,extended strand and random crimped .And it also had a special FGF domain .Phyloge-netic analysis showed that FGF10 of deer had a very high homology with mammals ,and with cattle and sheep on the closest genetic relationship .These studies built a theoretical foundation for the fur-ther research on the structure and function of FG F10 .

  10. Expressions of gene Lyzl6 in mice with different postnatal days and its related bioinformatics analysis*%Lyzl6基因在小鼠中的表达及其生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    张小悦; 李家强; 吴汉伟; 江智茂; 唐爱发

    2013-01-01

    Objective To investigate the expression characteristics of Lyzl6 in mouse with different postnatal days. Methods Testes tissue samples and other tissue samples were collected respectively from C57BL/6J mice with different postnatal days. Expressions of Lyzl6 gene in testes tissue and other tissue were identified by RT-PCR. The expression characteristic of Lyzl6 gene and the related bioinformatics analysis were done. Results RT-PCR assay showed that Lyzl6 gene high expressed in mouse testis, and Lyzl6 gene was persistently high expression in testis of the mouse from 28 days to 6 months after birth. Bioinformatics analysis showed that L yzl6 was a member of c-type lysozymes family, and it was a secretory protein, its’ signal peptide cleavage site was between position 19 and 20 amino acids. Lyzl6 protein amino acid sequence of human and mouse had a high degree of similarity and homology, indicated that this gene was highly conserved in mammals. Conclusion Lyzl6 gene was a age-dependent expression gene. and it’shigh expression in mouse testis suggested that it might play an important role in spermatogenesis.%  目的分析Lyzl6基因在小鼠中的表达特点.方法本研究通过RT-PCR实验明确Lyzl6基因在9、15、18、21、28、35d和6月龄小鼠睾丸中的表达变化及Lyzl6基因在成年小鼠各组织表达特点.采用多种生物信息学方法对该基因进行生物信息学分析.结果 RT-PCR结果显示,Lyzl6基因在小鼠21d龄及之前的睾丸中没有表达,28d睾丸中开始持续高表达.Lyzl6基因在14种小鼠组织中呈现睾丸显著高表达.经生物信息学分析,Lyzl6蛋白属于典型的C型溶菌酶,是一种分泌蛋白,其蛋白的信号肽剪切位点在第19~20个氨基酸之间.小鼠和人Lyzl6蛋白氨基酸序列有高度相似性和同源性,显示该基因在哺乳动物中高度保守.结论Lyzl6基因为小鼠睾丸年龄依赖性表达基因,在小鼠睾丸中高表达,显示其可能在生精过程发挥重要作用.

  11. Bioinformatics interoperability: all together now !

    NARCIS (Netherlands)

    Meganck, B.; Mergen, P.; Meirte, D.

    2009-01-01

    The following text presents some personal ideas about the way (bio)informatics2 is heading, along with some examples of how our institution – the Royal Museum for Central Africa (RMCA) – is gearing up for these new times ahead. It tries to find the important trends amongst the buzzwords, and to demo

  12. Reproducible Bioinformatics Research for Biologists

    Science.gov (United States)

    This book chapter describes the current Big Data problem in Bioinformatics and the resulting issues with performing reproducible computational research. The core of the chapter provides guidelines and summaries of current tools/techniques that a noncomputational researcher would need to learn to pe...

  13. A tyrosine-rich cell surface protein in the diatom Amphora coffeaeformis identified through transcriptome analysis and genetic transformation.

    Directory of Open Access Journals (Sweden)

    Matthias T Buhmann

    Full Text Available Diatoms are single-celled eukaryotic microalgae that are ubiquitously found in almost all aquatic ecosystems, and are characterized by their intricately structured SiO2 (silica-based cell walls. Diatoms with a benthic life style are capable of attaching to any natural or man-made submerged surface, thus contributing substantially to both microbial biofilm communities and economic losses through biofouling. Surface attachment of diatoms is mediated by a carbohydrate- and protein- based glue, yet no protein involved in diatom underwater adhesion has been identified so far. In the present work, we have generated a normalized transcriptome database from the model adhesion diatom Amphora coffeaeformis. Using an unconventional bioinformatics analysis we have identified five proteins that exhibit unique amino acid sequences resembling the amino acid composition of the tyrosine-rich adhesion proteins from mussel footpads. Establishing the first method for the molecular genetic transformation of A. coffeaeformis has enabled investigations into the function of one of these proteins, AC3362, through expression as YFP fusion protein. Biochemical analysis and imaging by fluorescence microscopy revealed that AC3362 is not involved in adhesion, but rather plays a role in biosynthesis and/or structural stability of the cell wall. The methods established in the present study have paved the way for further molecular studies on the mechanisms of underwater adhesion and biological silica formation in the diatom A. coffeaeformis.

  14. Virginia Bioinformatics Institute awards Transdisciplinary Team Science

    OpenAIRE

    Bland, Susan

    2009-01-01

    The Virginia Bioinformatics Institute at Virginia Tech, in collaboration with Virginia Tech's Ph.D. program in genetics, bioinformatics, and computational biology, has awarded three fellowships in support of graduate work in transdisciplinary team science.

  15. Bioinformatics Challenge Days

    Science.gov (United States)

    2013-12-30

    concentrations 3. Genome Assembly for the Clinic: Performing de novo assembly from clinical samples with an emphasis on pathogen identification 4...The Illumina Genome Analyzer II was used. The sample contained 1.4 Gbp of data. DE NOVO GENETIC ASSEMBLY (BCD1, 2) The same datasets were used...Challenge (BCD1) 21   From Metagenomic Sample to Useful Visual (BCD1, 2) 21   De Novo Genetic Assembly (BCD1, 2) 22   Identifying Markers of Genetic

  16. Application of bioinformatics in tropical medicine

    Institute of Scientific and Technical Information of China (English)

    Wiwanitkit V

    2008-01-01

    Bioinformatics is a usage of information technology to help solve biological problems by designing novel and in-cisive algorithms and methods of analyses.Bioinformatics becomes a discipline vital in the era of post-genom-ics.In this review article,the application of bioinformatics in tropical medicine will be presented and dis-cussed.

  17. Analysis of an Image Secret Sharing Scheme to Identify Cheaters

    Directory of Open Access Journals (Sweden)

    Jung-San LEe

    2010-09-01

    Full Text Available Secret image sharing mechanisms have been widely applied to the military, e-commerce, and communications fields. Zhao et al. introduced the concept of cheater detection into image sharing schemes recently. This functionality enables the image owner and authorized members to identify the cheater in reconstructing the secret image. Here, we provide an analysis of Zhao et al.¡¦s method: an authorized participant is able to restore the secret image by him/herself. This contradicts the requirement of secret image sharing schemes. The authorized participant utilizes an exhaustive search to achieve the attempt, though, simulation results show that it can be done within a reasonable time period.

  18. Sequencing and bioinformatic analysis of genome of Acinetobacter baumannii bacteriophage AB3%鲍曼不动杆菌噬菌体AB3的全基因组测序及生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    张劼; 刘茜; 甘丹

    2013-01-01

    目的 对本研究小组分离的鲍曼不动杆菌噬菌体AB3进行测序和基因组生物信息学分析,阐明其亲缘关系.方法 采用鸟枪法和重叠群组装的策略对噬菌体AB3进行基因组测序,并通过EditSeq、tRNAscan-SE、TRF、FindTerm、ORF finder、BPROM、GeneMarkTM、Clustalx、phylip等软件对所获噬菌体AB3基因组的一般特性、编码基因的功能预测、RNA聚合酶基因系统的进化进行分析.结果 噬菌体AB3基因组为全长31 185 bp的双链DNA,G+C含量为39.18%,包含28个预测基因,1个转录终止子和4个可能的启动子序列.结论 基因分析和RNA聚合酶基因进化分析显示噬菌体AB3与噬菌体AB1类似,均属于phiKMV-like病毒属.%Objective To sequence the Acinetobacter baumannii bacteriophage AB3 separated by our team,and to perform bioinformatics analysis,so as to identify the classification of its phylogenetic relationship.Methods Shot-gun library and config package strategy were carried out for sequencing the genome of bacteriophage AB3.Such software as EditSeq,tRNAscan-SE,TRF,FindTerm,ORF finder,BPROM and GeneMarkTM were applied to predict both general characteristics of the bacteriophage AB3 genome and the coding gene function.In addition,the evolution of RNA polymerase gene system was analyzed with the software of Clustalx and phylip.Results The genome of bacteriophage AB3 was a double-strand DNA with a full length of 31 185 bp,in which G + C mol% was 39.18% and 28 predicted genes,1 transcription terminator,and 4 possible promoter sequences were included.Conclusion Genetic analysis and RNA polymerase gene evolution analysis indicate that bacteriophage AB3 is similar to bacteriophage AB1,and both of them belong to phiKMV-like virus.

  19. Identifying avian sources of faecal contamination using sterol analysis.

    Science.gov (United States)

    Devane, Megan L; Wood, David; Chappell, Andrew; Robson, Beth; Webster-Brown, Jenny; Gilpin, Brent J

    2015-10-01

    Discrimination of the source of faecal pollution in water bodies is an important step in the assessment and mitigation of public health risk. One tool for faecal source tracking is the analysis of faecal sterols which are present in faeces of animals in a range of distinctive ratios. Published ratios are able to discriminate between human and herbivore mammal faecal inputs but are of less value for identifying pollution from wildfowl, which can be a common cause of elevated bacterial indicators in rivers and streams. In this study, the sterol profiles of 50 avian-derived faecal specimens (seagulls, ducks and chickens) were examined alongside those of 57 ruminant faeces and previously published sterol profiles of human wastewater, chicken effluent and animal meatwork effluent. Two novel sterol ratios were identified as specific to avian faecal scats, which, when incorporated into a decision tree with human and herbivore mammal indicative ratios, were able to identify sterols from avian-polluted waterways. For samples where the sterol profile was not consistent with herbivore mammal or human pollution, avian pollution is indicated when the ratio of 24-ethylcholestanol/(24-ethylcholestanol + 24-ethylcoprostanol + 24-ethylepicoprostanol) is ≥0.4 (avian ratio 1) and the ratio of cholestanol/(cholestanol + coprostanol + epicoprostanol) is ≥0.5 (avian ratio 2). When avian pollution is indicated, further confirmation by targeted PCR specific markers can be employed if greater confidence in the pollution source is required. A 66% concordance between sterol ratios and current avian PCR markers was achieved when 56 water samples from polluted waterways were analysed.

  20. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    Science.gov (United States)

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  1. Archetypal TRMM Radar Profiles Identified Through Cluster Analysis

    Science.gov (United States)

    Boccippio, Dennis J.

    2003-01-01

    It is widely held that identifiable 'convective regimes' exist in nature, although precise definitions of these are elusive. Examples include land / Ocean distinctions, break / monsoon beahvior, seasonal differences in the Amazon (SON vs DJF), etc. These regimes are often described by differences in the realized local convective spectra, and measured by various metrics of convective intensity, depth, areal coverage and rainfall amount. Objective regime identification may be valuable in several ways: regimes may serve as natural 'branch points' in satellite retrieval algorithms or data assimilation efforts; one example might be objective identification of regions that 'should' share a similar 2-R relationship. Similarly, objectively defined regimes may provide guidance on optimal siting of ground validation efforts. Objectively defined regimes could also serve as natural (rather than arbitrary geographic) domain 'controls' in studies of convective response to environmental forcing. Quantification of convective vertical structure has traditionally involved parametric study of prescribed quantities thought to be important to convective dynamics: maximum radar reflectivity, cloud top height, 30-35 dBZ echo top height, rain rate, etc. Individually, these parameters are somewhat deficient as their interpretation is often nonunique (the same metric value may signify different physics in different storm realizations). Individual metrics also fail to capture the coherence and interrelationships between vertical levels available in full 3-D radar datasets. An alternative approach is discovery of natural partitions of vertical structure in a globally representative dataset, or 'archetypal' reflectivity profiles. In this study, this is accomplished through cluster analysis of a very large sample (0[107) of TRMM-PR reflectivity columns. Once achieved, the rainconditional and unconditional 'mix' of archetypal profile types in a given location and/or season provides a description

  2. Social network analysis in identifying influential webloggers: A preliminary study

    Science.gov (United States)

    Hasmuni, Noraini; Sulaiman, Nor Intan Saniah; Zaibidi, Nerda Zura

    2014-12-01

    In recent years, second generation of internet-based services such as weblog has become an effective communication tool to publish information on the Web. Weblogs have unique characteristics that deserve users' attention. Some of webloggers have seen weblogs as appropriate medium to initiate and expand business. These webloggers or also known as direct profit-oriented webloggers (DPOWs) communicate and share knowledge with each other through social interaction. However, survivability is the main issue among DPOW. Frequent communication with influential webloggers is one of the way to keep survive as DPOW. This paper aims to understand the network structure and identify influential webloggers within the network. Proper understanding of the network structure can assist us in knowing how the information is exchanged among members and enhance survivability among DPOW. 30 DPOW were involved in this study. Degree centrality and betweenness centrality measurement in Social Network Analysis (SNA) were used to examine the strength relation and identify influential webloggers within the network. Thus, webloggers with the highest value of these measurements are considered as the most influential webloggers in the network.

  3. Bioinformatics decoding the genome

    CERN Document Server

    CERN. Geneva; Deutsch, Sam; Michielin, Olivier; Thomas, Arthur; Descombes, Patrick

    2006-01-01

    Extracting the fundamental genomic sequence from the DNA From Genome to Sequence : Biology in the early 21st century has been radically transformed by the availability of the full genome sequences of an ever increasing number of life forms, from bacteria to major crop plants and to humans. The lecture will concentrate on the computational challenges associated with the production, storage and analysis of genome sequence data, with an emphasis on mammalian genomes. The quality and usability of genome sequences is increasingly conditioned by the careful integration of strategies for data collection and computational analysis, from the construction of maps and libraries to the assembly of raw data into sequence contigs and chromosome-sized scaffolds. Once the sequence is assembled, a major challenge is the mapping of biologically relevant information onto this sequence: promoters, introns and exons of protein-encoding genes, regulatory elements, functional RNAs, pseudogenes, transposons, etc. The methodological ...

  4. Bioinformatics for cancer immunology and immunotherapy.

    Science.gov (United States)

    Charoentong, Pornpimol; Angelova, Mihaela; Efremova, Mirjana; Gallasch, Ralf; Hackl, Hubert; Galon, Jerome; Trajanoski, Zlatko

    2012-11-01

    Recent mechanistic insights obtained from preclinical studies and the approval of the first immunotherapies has motivated increasing number of academic investigators and pharmaceutical/biotech companies to further elucidate the role of immunity in tumor pathogenesis and to reconsider the role of immunotherapy. Additionally, technological advances (e.g., next-generation sequencing) are providing unprecedented opportunities to draw a comprehensive picture of the tumor genomics landscape and ultimately enable individualized treatment. However, the increasing complexity of the generated data and the plethora of bioinformatics methods and tools pose considerable challenges to both tumor immunologists and clinical oncologists. In this review, we describe current concepts and future challenges for the management and analysis of data for cancer immunology and immunotherapy. We first highlight publicly available databases with specific focus on cancer immunology including databases for somatic mutations and epitope databases. We then give an overview of the bioinformatics methods for the analysis of next-generation sequencing data (whole-genome and exome sequencing), epitope prediction tools as well as methods for integrative data analysis and network modeling. Mathematical models are powerful tools that can predict and explain important patterns in the genetic and clinical progression of cancer. Therefore, a survey of mathematical models for tumor evolution and tumor-immune cell interaction is included. Finally, we discuss future challenges for individualized immunotherapy and suggest how a combined computational/experimental approaches can lead to new insights into the molecular mechanisms of cancer, improved diagnosis, and prognosis of the disease and pinpoint novel therapeutic targets.

  5. Visualizing analysis of bioinformatics software research%生物信息学软件研究的可视化分析

    Institute of Scientific and Technical Information of China (English)

    种乐熹; 胡德华

    2015-01-01

    Taking the web of science database as the data source, we analyzed visually the literatures about bioinformatics software published in Nucleic Acids Research Journal through CiteSpace and UCINET, the research effort, author teams, highly cited authors, knowledge base, journal distribution, research focuses and fronts in this field are explored to provide necessarys references for bioinformatics software research and development.%以Web of Science数据库为数据来源,利用CiteSpace和UCINET软件对发表在Nucleic Acids Research期刊上有关生物信息学软件研究的文献做了可视化分析,揭示了该领域的研究力量、作者团队与高被引作者、知识基础、期刊分布、研究热点与前沿,为生物信息学软件的研究和发展提供必要的参考依据。

  6. Review of bioinformatics data analysis in alternative splicing%可变剪接的生物信息数据分析综述

    Institute of Scientific and Technical Information of China (English)

    章天骄

    2012-01-01

    前体mRNA的可变剪接是扩大真核生物蛋白质组多样性的重要基因调控机制.可变剪接的错误调节可以引起多种人类疾病.由于高通量技术的发展,生物信息学成为可变剪接研究的主要手段.本文总结了可变剪接在生物信息学领域的研究方法,同时也分析并预测了可变剪接的发展方向.%Alternative pre - mRNA splicing is an important gene regulation mechanism for expanding proteomic diversity in higher eukaryotes. The misregulation of alternative splicing underlies many human diseases. With the development of high - throughput technology, bioinformatics becomes to the main method in study of alternative splicing. This article summarizes the bioinformatics methods in alternative splicing research, as well as analyzes and predicts the direction of alternative splicing.

  7. Undergraduate Bioinformatics Workshops Provide Perceived Skills

    Directory of Open Access Journals (Sweden)

    Robin Herlands Cresiski

    2014-07-01

    Full Text Available Bioinformatics is becoming an important part of undergraduate curriculum, but expertise and well-evaluated teaching materials may not be available on every campus. Here, a guest speaker was utilized to introduce bioinformatics and web-available exercises were adapted for student investigation. Students used web-based nucleotide comparison tools to examine the medical and evolutionary relevance of a unidentified genetic sequence. Based on pre- and post-workshop surveys, there were significant gains in the students understanding of bioinformatics, as well as their perceived skills in using bioinformatics tools. The relevance of bioinformatics to a student’s career seemed dependent on career aspirations.

  8. Hydroxysteroid dehydrogenases (HSDs) in bacteria: a bioinformatic perspective.

    Science.gov (United States)

    Kisiela, Michael; Skarka, Adam; Ebert, Bettina; Maser, Edmund

    2012-03-01

    Steroidal compounds including cholesterol, bile acids and steroid hormones play a central role in various physiological processes such as cell signaling, growth, reproduction, and energy homeostasis. Hydroxysteroid dehydrogenases (HSDs), which belong to the superfamily of short-chain dehydrogenases/reductases (SDR) or aldo-keto reductases (AKR), are important enzymes involved in the steroid hormone metabolism. HSDs function as an enzymatic switch that controls the access of receptor-active steroids to nuclear hormone receptors and thereby mediate a fine-tuning of the steroid response. The aim of this study was the identification of classified functional HSDs and the bioinformatic annotation of these proteins in all complete sequenced bacterial genomes followed by a phylogenetic analysis. For the bioinformatic annotation we constructed specific hidden Markov models in an iterative approach to provide a reliable identification for the specific catalytic groups of HSDs. Here, we show a detailed phylogenetic analysis of 3α-, 7α-, 12α-HSDs and two further functional related enzymes (3-ketosteroid-Δ(1)-dehydrogenase, 3-ketosteroid-Δ(4)(5α)-dehydrogenase) from the superfamily of SDRs. For some bacteria that have been previously reported to posses a specific HSD activity, we could annotate the corresponding HSD protein. The dominating phyla that were identified to express HSDs were that of Actinobacteria, Proteobacteria, and Firmicutes. Moreover, some evolutionarily more ancient microorganisms (e.g., Cyanobacteria and Euryachaeota) were found as well. A large number of HSD-expressing bacteria constitute the normal human gastro-intestinal flora. Another group of bacteria were originally isolated from natural habitats like seawater, soil, marine and permafrost sediments. These bacteria include polycyclic aromatic hydrocarbons-degrading species such as Pseudomonas, Burkholderia and Rhodococcus. In conclusion, HSDs are found in a wide variety of microorganisms including

  9. Bioinformatics analysis of the BRX gene family in grape%葡萄BRX基因家族生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    李文芳; 陈佰鸿; 毛娟; 马宗桓; 杨世茂

    2015-01-01

    BRX基因家族是一类植物特有的转录因子家族,在拟南芥中参与调节根细胞的增殖与伸长. 利用生物信息学方法对葡萄基因组中存在的BRX 基因家族进行了电子克隆,并对其进行了基因组的定位、蛋白质的结构、理化性质、二级结构及亚细胞定位的预测与分析,并对其与其它植物进化的亲缘关系进行了研究. 基因组定位结果发现:葡萄基因组中6个BRX基因集中分布在3条染色体上,其中VvBRX1和VvBRX2分布在第2条染色体上,VvBRX3和VvBRX4分布在第9条染色体上,VvBRX5和VvBRX6分布在第11条染色体上;编码蛋白的氨基酸数目为360~560个,VvBRX5 的相对分子量(61 884.4)和理论等电点(9.38)均最大,而VvBRX1 的相对分子量(40 239.1)和理论等电点(6.23)均最小. 研究显示,不同成员间氨基酸数目、氨基酸序列间存在一定的差异,但都为疏水性蛋白;α-螺旋和无规则卷曲为6个BRX氨基酸序列的主要组成部分;均不存在跨膜域及信号肽. 基因结构分析表明,6个BRX基因都含有外显子和内含子结构. 亚细胞定位分析表明:6个VvBRX基因均定位于细胞核. 系统进化分析结果表明,VvBRX1、VvBRX2基因与胡杨的亲缘关系最近,相似性达96%;VvBRX3、VvBRX4与蓖麻、麻疯树、柑橘、可可、大豆聚为一类,说明其进化关系较近;VvBRX5与其它VvBRX基因明显分开;VvBRX6基因与莲的亲缘关系最近. 试验结果为葡萄BRX 基因家族的克隆和功能分析奠定了一定的研究基础.%BRX gene family is a class of transcriptional factors that present only in plant, and it plays an important role in the regulation of cell proliferation and root elongation in Arabidopsis. With the approaches of bioinformatics, BRX gene family present in the grape genome was performed in silico cloning, genome localization, protein structure, physical and chemical characteristics, secondary structure as well as subcellular localization prediction

  10. An introduction to proteome bioinformatics.

    Science.gov (United States)

    Jones, Andrew R; Hubbard, Simon J

    2010-01-01

    This book is part of the Methods in Molecular Biology series, and provides a general overview of computational approaches used in proteome research. In this chapter, we give an overview of the scope of the book in terms of current proteomics experimental techniques and the reasons why computational approaches are needed. We then give a summary of each chapter, which together provide a picture of the state of the art in proteome bioinformatics research.

  11. Application of Bioinformatics and Systems Biology in Medicinal Plant Studies

    Institute of Scientific and Technical Information of China (English)

    DENG You-ping; AI Jun-mei; XIAO Pei-gen

    2010-01-01

    One important purpose to investigate medicinal plants is to understand genes and enzymes that govern the biological metabolic process to produce bioactive compounds.Genome wide high throughput technologies such as genomics,transcriptomics,proteomics and metabolomics can help reach that goal.Such technologies can produce a vast amount of data which desperately need bioinformatics and systems biology to process,manage,distribute and understand these data.By dealing with the"omics"data,bioinformatics and systems biology can also help improve the quality of traditional medicinal materials,develop new approaches for the classification and authentication of medicinal plants,identify new active compounds,and cultivate medicinal plant species that tolerate harsh environmental conditions.In this review,the application of bioinformatics and systems biology in medicinal plants is briefly introduced.

  12. CROSSWORK for Glycans: Glycan Identificatin Through Mass Spectrometry and Bioinformatics

    DEFF Research Database (Denmark)

    Rasmussen, Morten; Thaysen-Andersen, Morten; Højrup, Peter

      We have developed "GLYCANthrope " - CROSSWORKS for glycans:  a bioinformatics tool, which assists in identifying N-linked glycosylated peptides as well as their glycan moieties from MS2 data of enzymatically digested glycoproteins. The program runs either as a stand-alone application or as a plug...

  13. Performance Analysis: Work Control Events Identified January - August 2010

    Energy Technology Data Exchange (ETDEWEB)

    De Grange, C E; Freeman, J W; Kerr, C E; Holman, G; Marsh, K; Beach, R

    2011-01-14

    This performance analysis evaluated 24 events that occurred at LLNL from January through August 2010. The analysis identified areas of potential work control process and/or implementation weaknesses and several common underlying causes. Human performance improvement and safety culture factors were part of the causal analysis of each event and were analyzed. The collective significance of all events in 2010, as measured by the occurrence reporting significance category and by the proportion of events that have been reported to the DOE ORPS under the ''management concerns'' reporting criteria, does not appear to have increased in 2010. The frequency of reporting in each of the significance categories has not changed in 2010 compared to the previous four years. There is no change indicating a trend in the significance category and there has been no increase in the proportion of occurrences reported in the higher significance category. Also, the frequency of events, 42 events reported through August 2010, is not greater than in previous years and is below the average of 63 occurrences per year at LLNL since 2006. Over the previous four years, an average of 43% of the LLNL's reported occurrences have been reported as either ''management concerns'' or ''near misses.'' In 2010, 29% of the occurrences have been reported as ''management concerns'' or ''near misses.'' This rate indicates that LLNL is now reporting fewer ''management concern'' and ''near miss'' occurrences compared to the previous four years. From 2008 to the present, LLNL senior management has undertaken a series of initiatives to strengthen the work planning and control system with the primary objective to improve worker safety. In 2008, the LLNL Deputy Director established the Work Control Integrated Project Team to develop the core requirements and graded

  14. Identifying redundancy and exposing provenance in crowdsourced data analysis.

    Science.gov (United States)

    Willett, Wesley; Ginosar, Shiry; Steinitz, Avital; Hartmann, Björn; Agrawala, Maneesh

    2013-12-01

    We present a system that lets analysts use paid crowd workers to explore data sets and helps analysts interactively examine and build upon workers' insights. We take advantage of the fact that, for many types of data, independent crowd workers can readily perform basic analysis tasks like examining views and generating explanations for trends and patterns. However, workers operating in parallel can often generate redundant explanations. Moreover, because workers have different competencies and domain knowledge, some responses are likely to be more plausible than others. To efficiently utilize the crowd's work, analysts must be able to quickly identify and consolidate redundant responses and determine which explanations are the most plausible. In this paper, we demonstrate several crowd-assisted techniques to help analysts make better use of crowdsourced explanations: (1) We explore crowd-assisted strategies that utilize multiple workers to detect redundant explanations. We introduce color clustering with representative selection--a strategy in which multiple workers cluster explanations and we automatically select the most-representative result--and show that it generates clusterings that are as good as those produced by experts. (2) We capture explanation provenance by introducing highlighting tasks and capturing workers' browsing behavior via an embedded web browser, and refine that provenance information via source-review tasks. We expose this information in an explanation-management interface that allows analysts to interactively filter and sort responses, select the most plausible explanations, and decide which to explore further.

  15. A Sensitivity Analysis Approach to Identify Key Environmental Performance Factors

    Directory of Open Access Journals (Sweden)

    Xi Yu

    2014-01-01

    Full Text Available Life cycle assessment (LCA is widely used in design phase to reduce the product’s environmental impacts through the whole product life cycle (PLC during the last two decades. The traditional LCA is restricted to assessing the environmental impacts of a product and the results cannot reflect the effects of changes within the life cycle. In order to improve the quality of ecodesign, it is a growing need to develop an approach which can reflect the changes between the design parameters and product’s environmental impacts. A sensitivity analysis approach based on LCA and ecodesign is proposed in this paper. The key environmental performance factors which have significant influence on the products’ environmental impacts can be identified by analyzing the relationship between environmental impacts and the design parameters. Users without much environmental knowledge can use this approach to determine which design parameter should be first considered when (redesigning a product. A printed circuit board (PCB case study is conducted; eight design parameters are chosen to be analyzed by our approach. The result shows that the carbon dioxide emission during the PCB manufacture is highly sensitive to the area of PCB panel.

  16. Performance Analysis: Work Control Events Identified January - August 2010

    Energy Technology Data Exchange (ETDEWEB)

    De Grange, C E; Freeman, J W; Kerr, C E; Holman, G; Marsh, K; Beach, R

    2011-01-14

    This performance analysis evaluated 24 events that occurred at LLNL from January through August 2010. The analysis identified areas of potential work control process and/or implementation weaknesses and several common underlying causes. Human performance improvement and safety culture factors were part of the causal analysis of each event and were analyzed. The collective significance of all events in 2010, as measured by the occurrence reporting significance category and by the proportion of events that have been reported to the DOE ORPS under the ''management concerns'' reporting criteria, does not appear to have increased in 2010. The frequency of reporting in each of the significance categories has not changed in 2010 compared to the previous four years. There is no change indicating a trend in the significance category and there has been no increase in the proportion of occurrences reported in the higher significance category. Also, the frequency of events, 42 events reported through August 2010, is not greater than in previous years and is below the average of 63 occurrences per year at LLNL since 2006. Over the previous four years, an average of 43% of the LLNL's reported occurrences have been reported as either ''management concerns'' or ''near misses.'' In 2010, 29% of the occurrences have been reported as ''management concerns'' or ''near misses.'' This rate indicates that LLNL is now reporting fewer ''management concern'' and ''near miss'' occurrences compared to the previous four years. From 2008 to the present, LLNL senior management has undertaken a series of initiatives to strengthen the work planning and control system with the primary objective to improve worker safety. In 2008, the LLNL Deputy Director established the Work Control Integrated Project Team to develop the core requirements and graded

  17. Storage, data management, and retrieval in bioinformatics

    Science.gov (United States)

    Wong, Stephen T. C.; Patwardhan, Anil

    2001-12-01

    The evolution of biology into a large-scale quantitative molecular science has been paralleled by concomitant advances in computer storage systems, processing power, and data-analysis algorithms. The application of computer technologies to molecular biology data has given rise to a new system-based approach to biological research. Bioinformatics addresses problems related to the storage, retrieval and analysis of information about biological structure, sequence and function. Its goals include the development of integrated storage systems and analysis tools to interpret molecular biology data in a biologically meaningful manner in normal and disease processes and in efforts for drug discovery. This paper reviews recent developments in data management, storage, and retrieval that are central to the effective use of structural and functional genomics in fulfilling these goals.

  18. 云锦杜鹃 NRT 基因序列的生物信息学分析%Bioinformatics analysis of NRT gene sequences in Rhododendron fortunei

    Institute of Scientific and Technical Information of China (English)

    2014-01-01

    We used next generation sequencing technology to investigate the transcriptomes between inoculated roots and uninoculated roots of the Rhododendron fortunei and obtained many differentially expressed genes.In this paper,the nu-cleic acid sequences and amino acid sequences of nitrate transporters from R .fortunei ,were analyzed by bioinformatics tools.Several parameters of these sequences,including sequences composition,physicochemical property,leader peptide, topological structure of transmembrane regions,hydrophobicity or hydrophilicity,secondary structures,functional domains and protein structures,were predicted.Phylogenetic tree was reconstructed for the nitrate transporters protein family. Provide the bioinformatics foundation to understand NRT gene’s function in inoculated seedling roots.%通过转录组测序,获得在接种 ERM 真菌的云锦杜鹃苗根系中显著差异表达的基因,其中硝酸根转运蛋白(NRT )基因是硝态氮吸收转运的关键基因。利用生物信息学方法,分析云锦杜鹃根转录组的硝酸根转运蛋白(NRT )基因序列,对其推导的氨基酸的理化性质、亲水性/疏水性、跨膜结构、导肽、二级结构、高级结构进行预测,并对硝酸根转运蛋白的氨基酸做进化发育分析。为进一步了解 NRT 基因在云锦杜鹃接种苗根系氮素吸收的作用奠定了基础。

  19. Bioinformatic Analysis of Deleterious Non-Synonymous Single Nucleotide Polymorphisms (nsSNPs in the Coding Regions of Human Prion Protein Gene (PRNP

    Directory of Open Access Journals (Sweden)

    Kourosh Bamdad

    2016-12-01

    Full Text Available Background & Objective: Single nucleotide polymorphisms are the cause of genetic variation to living organisms. Single nucleotide polymorphisms alter residues in the protein sequence. In this investigation, the relationship between prion protein gene polymorphisms and its relevance to pathogenicity was studied. Material & Method: Amino acid sequence of the main isoform from the human prion protein gene (PRNP was extracted from UniProt database and evaluated by FoldAmyloid and AmylPred servers. All non-synonymous single nucleotide polymorphisms (nsSNPs from SNP database (dbSNP were further analyzed by bioinformatics servers including SIFT, PolyPhen-2, I-Mutant-3.0, PANTHER, SNPs & GO, PHD-SNP, Meta-SNP, and MutPred to determine the most damaging nsSNPs. Results: The results of the first structure analyses by FoldAmyloid and AmylPerd servers implied that regions including 5-15, 174-178, 180-184, 211-217, and 240-252 were the most sensitive parts of the protein sequence to amyloidosis. Screening all nsSNPs of the main protein isoform using bioinformatic servers revealed that substitution of Aspartic acid with Valine at position 178 (ID code: rs11538766 was the most deleterious nsSNP in the protein structure. Conclusion:  Substitution of the Aspartic acid with Valine at position 178 (D178V was the most pathogenic mutation in the human prion protein gene. Analyses from the MutPred server also showed that beta-sheets’ increment in the secondary structure was the main reason behind the molecular mechanism of the prion protein aggregation.

  20. The Bioinformatic Analysis of the blcap Gene%宫颈癌相关blcap基因的生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    刘娟; 熊金虎; 伍欣星

    2004-01-01

    BLCAP is a potential gene for suppression of cervical carcinoma, which was found by analysing the cervical carcinoma specimen with the oncogene and anti-oncogene cDNA microarray. Basing on the bioinformatical analyses, we try to predict the function of blcap gene. The results show that there are several genes that highly resemble with blcap. The comparability between the sequences of blcap and Homo sapiens mRNA (DKFZp564M053) or BC10 is 99% and 87%, respectively. The protein encoded by BLCAP is composed of Leu(19.5%), pro(9.19%), ser(8.04%)、 cys(8.04%) and other amino acids. The secondary structure of the N-terminal of BLCAP encoded protein is an alpha helix. In the C-terminal, it is beta sheet and in the middle, it is coil. The of the terminals is more hydrophobile than the middle region. Between 45-55aa, there is a transmembrane region. Therefore, we forecast the BLCAP is a member of transmembrane protein I. By analyzing the signal peptide and the procedure of blcap gene with the program of SignalP (V1.1), we found a cleavage site in 59-66aa. By using the program of Netpho, we predicted there might be three phospholate sites at 68aa, 73aa and 78aa. At 78-81aa, we found a typical [ST]-X [2] -[DE] structure—the phospholate site of tyrosine protein kinase, which might be related to its function. Bioinformatic studies of blcap provided the foundation for the function researches of BLCAP in laboratory.

  1. Electronic cloning and bioinformatics analysis of the pig LBP gene%猪LBP基因电子克隆及生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    喻礼怀; 王靖; 傅聪; 顾雯雯; 王龙

    2012-01-01

    The lipopolysaccharide-binding (LBP) gene participates in inflammatory reaction about LPS, belonging to the important immune related gene. Using electronic cloning technology, the LBP gene was cloned by seed sequence based on the human LBP gene (NM004139). The results showed that LBP gene consisted of 1 446 bp, coding 481 ami-no acids. The amino acid sequence had similarity compared with human being (74. 2%), mice (64. 8%) , rat (63. 2%), cattle (77. 1%), dog (74. 6%). Evolutionary tree analysis showed that the pig LBP gene was the nearest to the cattle and furthest from the rat. The bioinformatics analysis showed that the molecular weight of LBP protein was 53. 036 7 ku and the theory isoelectric point was 6. 43. Sub-cellular localization forecast showed that the LBP belonged to secrete proteins in the mitochondria (44. 4%), golgi apparatus (22. 2%). A signal peptide existed in N-terminal, and there might be a schizolysis site in 25 - 26 amino acids. There was a higher hydrophobicity in N-terminal located in the signal peptide. The maximum hydrophobic value was 2. 478 and maximum hydrophilic value was 1. 956. The membrane protein was made up of extracellular domain (1 - 6 amino acids), transmembrane region (7 - 29 amino acids), intracellular domain (30 - 481 amino acids). There were nine Ser, eight Thr, two Tyr, which might be the protein kinase phosphorylation site. The structure of extracellular region of LBP protein showed a forniciform helix structure, and consisted of a lot of α-helix in inside and β-sheet in outside of the arc and they arranged parallely and alternately. There were BPI1 and BPI2 conservative structure domain in 33 - 256 and 271 - 474 amino acid residues.%利用电子克隆技术,以人脂多糖结合蛋白(LBP)基因(NM 004139)为种子序列,克隆猪LBP基因.结果表明:所克隆的LBP基因开放阅读框长为1 446 bp,编码481个氨基酸.推导其氨基酸序列与人、小鼠、大鼠、牛、狗的相似性分别为74.2%、64

  2. Technosciences in Academia: Rethinking a Conceptual Framework for Bioinformatics Undergraduate Curricula

    Science.gov (United States)

    Symeonidis, Iphigenia Sofia

    This paper aims to elucidate guiding concepts for the design of powerful undergraduate bioinformatics degrees which will lead to a conceptual framework for the curriculum. "Powerful" here should be understood as having truly bioinformatics objectives rather than enrichment of existing computer science or life science degrees on which bioinformatics degrees are often based. As such, the conceptual framework will be one which aims to demonstrate intellectual honesty in regards to the field of bioinformatics. A synthesis/conceptual analysis approach was followed as elaborated by Hurd (1983). The approach takes into account the following: bioinfonnatics educational needs and goals as expressed by different authorities, five undergraduate bioinformatics degrees case-studies, educational implications of bioinformatics as a technoscience and approaches to curriculum design promoting interdisciplinarity and integration. Given these considerations, guiding concepts emerged and a conceptual framework was elaborated. The practice of bioinformatics was given a closer look, which led to defining tool-integration skills and tool-thinking capacity as crucial areas of the bioinformatics activities spectrum. It was argued, finally, that a process-based curriculum as a variation of a concept-based curriculum (where the concepts are processes) might be more conducive to the teaching of bioinformatics given a foundational first year of integrated science education as envisioned by Bialek and Botstein (2004). Furthermore, the curriculum design needs to define new avenues of communication and learning which bypass the traditional disciplinary barriers of academic settings as undertaken by Tador and Tidmor (2005) for graduate studies.

  3. Bioinformatics Analysis Reveals Distinct Molecular Characteristics of Hepatitis B-Related Hepatocellular Carcinomas from Very Early to Advanced Barcelona Clinic Liver Cancer Stages.

    Directory of Open Access Journals (Sweden)

    Fan-Yun Kong

    Full Text Available Hepatocellular carcinoma (HCCis the fifth most common malignancy associated with high mortality. One of the risk factors for HCC is chronic hepatitis B virus (HBV infection. The treatment strategy for the disease is dependent on the stage of HCC, and the Barcelona clinic liver cancer (BCLC staging system is used in most HCC cases. However, the molecular characteristics of HBV-related HCC in different BCLC stages are still unknown. Using GSE14520 microarray data from HBV-related HCC cases with BCLC stages from 0 (very early stage to C (advanced stage in the gene expression omnibus (GEO database, differentially expressed genes (DEGs, including common DEGs and unique DEGs in different BCLC stages, were identified. These DEGs were located on different chromosomes. The molecular functions and biology pathways of DEGs were identified by gene ontology (GO analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG pathway analysis, and the interactome networks of DEGs were constructed using the NetVenn online tool. The results revealed that both common DEGs and stage-specific DEGs were associated with various molecular functions and were involved in special biological pathways. In addition, several hub genes were found in the interactome networks of DEGs. The identified DEGs and hub genes promote our understanding of the molecular mechanisms underlying the development of HBV-related HCC through the different BCLC stages, and might be used as staging biomarkers or molecular targets for the treatment of HCC with HBV infection.

  4. Bioinformatics Analysis Reveals Distinct Molecular Characteristics of Hepatitis B-Related Hepatocellular Carcinomas from Very Early to Advanced Barcelona Clinic Liver Cancer Stages

    Science.gov (United States)

    Hu, Wei; Kou, Yan-Bo; You, Hong-Juan; Liu, Xiao-Mei; Zheng, Kui-Yang; Tang, Ren-Xian

    2016-01-01

    Hepatocellular carcinoma (HCC)is the fifth most common malignancy associated with high mortality. One of the risk factors for HCC is chronic hepatitis B virus (HBV) infection. The treatment strategy for the disease is dependent on the stage of HCC, and the Barcelona clinic liver cancer (BCLC) staging system is used in most HCC cases. However, the molecular characteristics of HBV-related HCC in different BCLC stages are still unknown. Using GSE14520 microarray data from HBV-related HCC cases with BCLC stages from 0 (very early stage) to C (advanced stage) in the gene expression omnibus (GEO) database, differentially expressed genes (DEGs), including common DEGs and unique DEGs in different BCLC stages, were identified. These DEGs were located on different chromosomes. The molecular functions and biology pathways of DEGs were identified by gene ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and the interactome networks of DEGs were constructed using the NetVenn online tool. The results revealed that both common DEGs and stage-specific DEGs were associated with various molecular functions and were involved in special biological pathways. In addition, several hub genes were found in the interactome networks of DEGs. The identified DEGs and hub genes promote our understanding of the molecular mechanisms underlying the development of HBV-related HCC through the different BCLC stages, and might be used as staging biomarkers or molecular targets for the treatment of HCC with HBV infection. PMID:27454179

  5. 青杄中sPPa1的cDNA序列克隆及其生物信息学分析%cDNA Cloning and Bioinformatic Analysis of the sPPa1 Gene from Picea wilsonii

    Institute of Scientific and Technical Information of China (English)

    曹一博; 刘亚静; 张凌云

    2012-01-01

    以青杄(Picea wilsonii)均一化cDNA文库为模板,通过RACE方法克隆得到青杄PPa1基因cDNA全长,对该cDNA序列、核苷酸序列的相似性、理化性质、疏水性、二级结构、三级结构及是否跨膜进行了分析预测;进行了多序列比对并构建了系统树,同时对PPa1在青杄各组织中的表达量进行了检测.结果表明:青杄PPa1基因共由216个氨基酸组成,分子量为24.55 kD,理论PI为5.83,属可溶性蛋白;二级结构主要由α-螺旋、不规则卷曲和β-折叠构成;PPa1在青杄花粉中表达量最高.研究为进一步研究青杄PPa1的功能奠定了基础.%The full-length cDNA sequence of the PPa1 gene was obtained by the RACE method based on the cDNA library of Picea wilsonii. Bioinformatic analysis was used to predict the physicochemical properties,hydrophobicity,secondary structure,and tertiary structure of PwP-Pa1. Multiple sequences alignment and phylogenetic trees were also constructed to predict the conserved domain and genetic relationship with other species. The RT-qPCR assays were used to identify the tissue expression level of PPa1 in Picea wilsonii. The results showed that PwPPal consisted of 216 amino acids. The molecular weight was 24. 55 kD and theoretical PI was 5. 83. The PPa1 was a hydrophilic protein and the secondary structure of PPa1 was mainly composed of alpha helix,random coil,and extended strand.The expression level of PPa1 was highest in the pollen. This research provides the foundation for further studies on the functions of PwPPa1.

  6. Bioinformatics Training Network (BTN): a community resource for bioinformatics trainers

    DEFF Research Database (Denmark)

    Schneider, Maria V.; Walter, Peter; Blatter, Marie-Claude

    2012-01-01

    Funding bodies are increasingly recognizing the need to provide graduates and researchers with access to short intensive courses in a variety of disciplines, in order both to improve the general skills base and to provide solid foundations on which researchers may build their careers. In response...... and clearly tagged in relation to target audiences, learning objectives, etc. Ideally, they would also be peer reviewed, and easily and efficiently accessible for downloading. Here, we present the Bioinformatics Training Network (BTN), a new enterprise that has been initiated to address these needs and review...

  7. Approaches in integrative bioinformatics towards the virtual cell

    CERN Document Server

    Chen, Ming

    2014-01-01

    Approaches in Integrative Bioinformatics provides a basic introduction to biological information systems, as well as guidance for the computational analysis of systems biology. This book also covers a range of issues and methods that reveal the multitude of omics data integration types and the relevance that integrative bioinformatics has today. Topics include biological data integration and manipulation, modeling and simulation of metabolic networks, transcriptomics and phenomics, and virtual cell approaches, as well as a number of applications of network biology. It helps to illustrat

  8. Bioinformatics Analysis of UFGT Gene from Several Economic Plants%几种经济植物UFGT基因的生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    付海辉; 辛培尧; 许玉兰; 刘岩; 韦援教; 董娇; 曹有龙; 周军

    2011-01-01

    In this paper, the nucleic acid sequences and amino acid sequences of flavonoid-3-O-glucosyltransferas from six species such as Vitis vinifera, Zea mays, Oryza sativa, Fragaria×ananassa, Arabidopsis thaliana and Malus domestica, were analyzed by bioinformatics tools. Several parameters of these sequences logined in GeneBank, including sequences composition, physicochemical property, leader peptide, topological structure of transmembrane regions, hydrophobicity or hydrophilicity, secondary structures, functional domains and protein structures, were predicted. Phylogenetic tree was reconstructed for the flavonoid-3-O-glucosy-ltransferase protein family. UFGT genes contain only one exon in most of these plants, but two exons in V. vinifera. Abundant with Leu, Ala, Gly, Val and Ser ammo acids, most of these UFGT proteins are structural stable except for those from V.vinifera and apple. In all these UFGT protein, evident hydrophobic or hydrophilic domain, signal peptide, transmembrane region and coiled-coil domain were predicted: and similar ratio in secondary structure composition made from α-helixes, sheets and random coils were detected. All these proteins have UDPGT, COG1819, MGT conserved domains. The 3-dimensional structures were constructed through rough homology modeling method successfully for most of these proteins but not for the one from Arabidopsis. In evolutionary analysis all these UFGT proteins were divided into five groups, among which were three major ones and two single one that from Gossypium hirsutum L. and Camptotheca acuminate respectively. This work would provide a basis for the function determination of this protein in color variance of different plant organs, such as flowers, fruits and leaves, and develop biological macromolecule structual simulation and molecular drug design.%本文根据在GenBank中已登录的葡萄、玉米、水稻、草莓、拟南芥和苹果等植物的类黄酮-3-O-葡萄糖基转移酶基因的核苷酸序列和

  9. Bioinformatics-driven identification and examination of candidate genes for non-alcoholic fatty liver disease.

    Directory of Open Access Journals (Sweden)

    Karina Banasik

    Full Text Available OBJECTIVE: Candidate genes for non-alcoholic fatty liver disease (NAFLD identified by a bioinformatics approach were examined for variant associations to quantitative traits of NAFLD-related phenotypes. RESEARCH DESIGN AND METHODS: By integrating public database text mining, trans-organism protein-protein interaction transferal, and information on liver protein expression a protein-protein interaction network was constructed and from this a smaller isolated interactome was identified. Five genes from this interactome were selected for genetic analysis. Twenty-one tag single-nucleotide polymorphisms (SNPs which captured all common variation in these genes were genotyped in 10,196 Danes, and analyzed for association with NAFLD-related quantitative traits, type 2 diabetes (T2D, central obesity, and WHO-defined metabolic syndrome (MetS. RESULTS: 273 genes were included in the protein-protein interaction analysis and EHHADH, ECHS1, HADHA, HADHB, and ACADL were selected for further examination. A total of 10 nominal statistical significant associations (P<0.05 to quantitative metabolic traits were identified. Also, the case-control study showed associations between variation in the five genes and T2D, central obesity, and MetS, respectively. Bonferroni adjustments for multiple testing negated all associations. CONCLUSIONS: Using a bioinformatics approach we identified five candidate genes for NAFLD. However, we failed to provide evidence of associations with major effects between SNPs in these five genes and NAFLD-related quantitative traits, T2D, central obesity, and MetS.

  10. A Tool for Creating and Parallelizing Bioinformatics Pipelines

    Science.gov (United States)

    2007-06-01

    well as that are incorporated into InterPro (Mulder, et al., 2005). other users’ work. PUMA2 ( Maltsev , et al., 2006) incorporates more than 20 0-7695...pipeline for protocol-based bioinformatics analysis." Genome Res., 13(8), pp. 1904-1915, 2003. Maltsev , N. and E. Glass, et al., "PUMA2--grid-based 4

  11. Intrageneric Primer Design: Bringing Bioinformatics Tools to the Class

    Science.gov (United States)

    Lima, Andre O. S.; Garces, Sergio P. S.

    2006-01-01

    Bioinformatics is one of the fastest growing scientific areas over the last decade. It focuses on the use of informatics tools for the organization and analysis of biological data. An example of their importance is the availability nowadays of dozens of software programs for genomic and proteomic studies. Thus, there is a growing field (private…

  12. Bi-directional gene set enrichment and canonical correlation analysis identify key diet-sensitive pathways and biomarkers of metabolic syndrome

    Directory of Open Access Journals (Sweden)

    Gaora Peadar Ó

    2010-10-01

    Full Text Available Abstract Background Currently, a number of bioinformatics methods are available to generate appropriate lists of genes from a microarray experiment. While these lists represent an accurate primary analysis of the data, fewer options exist to contextualise those lists. The development and validation of such methods is crucial to the wider application of microarray technology in the clinical setting. Two key challenges in clinical bioinformatics involve appropriate statistical modelling of dynamic transcriptomic changes, and extraction of clinically relevant meaning from very large datasets. Results Here, we apply an approach to gene set enrichment analysis that allows for detection of bi-directional enrichment within a gene set. Furthermore, we apply canonical correlation analysis and Fisher's exact test, using plasma marker data with known clinical relevance to aid identification of the most important gene and pathway changes in our transcriptomic dataset. After a 28-day dietary intervention with high-CLA beef, a range of plasma markers indicated a marked improvement in the metabolic health of genetically obese mice. Tissue transcriptomic profiles indicated that the effects were most dramatic in liver (1270 genes significantly changed; p Conclusion Bi-directional gene set enrichment analysis more accurately reflects dynamic regulatory behaviour in biochemical pathways, and as such highlighted biologically relevant changes that were not detected using a traditional approach. In such cases where transcriptomic response to treatment is exceptionally large, canonical correlation analysis in conjunction with Fisher's exact test highlights the subset of pathways showing strongest correlation with the clinical markers of interest. In this case, we have identified selenoamino acid metabolism and steroid biosynthesis as key pathways mediating the observed relationship between metabolic health and high-CLA beef. These results indicate that this type of

  13. Identifying Ecosystem Services of Rivers and Streams Through Content Analysis

    Science.gov (United States)

    While much ecosystem services research focuses on analysis such as mapping and/or valuation, fewer research efforts are directed toward in-depth understanding of the specific ecological quantities people value. Ecosystem service monitoring and analysis efforts and communications ...

  14. Bioinformatic analysis reveals a pattern of STAT3-associated gene expression specific to basal-like breast cancers in human tumors.

    Science.gov (United States)

    Tell, Robert W; Horvath, Curt M

    2014-09-02

    Signal transducer and activator of transcription 3 (STAT3), a latent transcription factor associated with inflammatory signaling and innate and adaptive immune responses, is known to be aberrantly activated in a wide variety of cancers. In vitro analysis of STAT3 in human cancer cell lines has elucidated a number of specific targets associated with poor prognosis in breast cancer. However, to date, no comparison of cancer subtype and gene expression associated with STAT3 signaling in human patients has been reported. In silico analysis of human breast cancer microarray and reverse-phase protein array data was performed to identify expression patterns associated with STAT3 in basal-like and luminal breast cancers. Results indicate clearly identifiable STAT3-regulated signatures common to basal-like breast cancers but not to luminal A or luminal B cancers. Furthermore, these differentially expressed genes are associated with immune signaling and inflammation, a known phenotype of basal-like cancers. These findings demonstrate a distinct role for STAT3 signaling in basal breast cancers, and underscore the importance of considering subtype-specific molecular pathways that contribute to tissue-specific cancers.

  15. Establishing bioinformatics research in the Asia Pacific

    Directory of Open Access Journals (Sweden)

    Tammi Martti

    2006-12-01

    Full Text Available Abstract In 1998, the Asia Pacific Bioinformatics Network (APBioNet, Asia's oldest bioinformatics organisation was set up to champion the advancement of bioinformatics in the Asia Pacific. By 2002, APBioNet was able to gain sufficient critical mass to initiate the first International Conference on Bioinformatics (InCoB bringing together scientists working in the field of bioinformatics in the region. This year, the InCoB2006 Conference was organized as the 5th annual conference of the Asia-Pacific Bioinformatics Network, on Dec. 18–20, 2006 in New Delhi, India, following a series of successful events in Bangkok (Thailand, Penang (Malaysia, Auckland (New Zealand and Busan (South Korea. This Introduction provides a brief overview of the peer-reviewed manuscripts accepted for publication in this Supplement. It exemplifies a typical snapshot of the growing research excellence in bioinformatics of the region as we embark on a trajectory of establishing a solid bioinformatics research culture in the Asia Pacific that is able to contribute fully to the global bioinformatics community.

  16. Integrative Functional Genomics Analysis of Sustained Polyploidy Phenotypes in Breast Cancer Cells Identifies an Oncogenic Profile for GINS2

    Directory of Open Access Journals (Sweden)

    Juha K. Rantala

    2010-11-01

    Full Text Available Aneuploidy is among the most obvious differences between normal and cancer cells. However, mechanisms contributing to development and maintenance of aneuploid cell growth are diverse and incompletely understood. Functional genomics analyses have shown that aneuploidy in cancer cells is correlated with diffuse gene expression signatures and aneuploidy can arise by a variety of mechanisms, including cytokinesis failures, DNA endoreplication, and possibly through polyploid intermediate states. To identify molecular processes contributing to development of aneuploidy, we used a cell spot microarray technique to identify genes inducing polyploidy and/or allowing maintenance of polyploid cell growth in breast cancer cells. Of 5760 human genes screened, 177 were found to induce severe DNA content alterations on prolonged transient silencing. Association with response to DNA damage stimulus and DNA repair was found to be the most enriched cellular processes among the candidate genes. Functional validation analysis of these genes highlighted GINS2 as the highest ranking candidate inducing polyploidy, accumulation of endogenous DNA damage, and impairing cell proliferation on inhibition. The cell growth inhibition and induction of polyploidy by suppression of GINS2 was verified in a panel of breast cancer cell lines. Bioinformatic analysis of published gene expression and DNA copy number studies of clinical breast tumors suggested GINS2 to be associated with the aggressive characteristics of a subgroup of breast cancers in vivo. In addition, nuclear GINS2 protein levels distinguished actively proliferating cancer cells suggesting potential use of GINS2 staining as a biomarker of cell proliferation as well as a potential therapeutic target.

  17. Cloning and bioinformatics analysis of antifreeze protein from Tenebrio molitor%黄粉甲抗冻蛋白基因克隆及生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    任谦; 熊鸿燕; 朱才众; 张世界

    2009-01-01

    目的:获得黄粉甲抗冻蛋白基因afpTx及相关生物信息学资料.方法:从黄粉甲幼虫中提取总RNA,通过RT-PCR合成黄粉甲抗冻蛋白基因afpTx的eDNA片段,克隆入载体pMDl9-T,进行测序分析.酶切后将其亚克隆入表达栽体pET32a(+),构建表达质粒pET32a-afpllx,并转化到大肠杆菌DL21后提取质粒,双酶切鉴定.采用MEGA 4.0,BioEdit 5.0.6软件对本研究克隆的抗冻蛋白基因afpTx进行氨基酸序列同源性变异及进化分析.结果:测序结果afpTx的cDNA长度为336 bp;编码112个氨基酸;酶切、电泳结果表明克隆和亚克隆获得成功.抗冻蛋白氨基酸序列相似性分析表明afpTx与GenBank上提交的23条黄粉甲抗冻蛋白的氨基酸序列平均一致性为88%;与11条赤翅甲抗冻蛋白氨基酸序列的平均一致性为67%,2种甲虫的平均一致性为63%.进化树分析结果显示黄粉甲与赤翅甲抗冻蛋白序列是同源序列.赤翅甲的序列趋异度显著大于黄粉甲抗冻蛋白基因序列.结论:成功克隆了本地黄粉甲的afpTx基因,该序列是GenBank上提交的黄粉甲与赤翅甲抗冻蛋白的同源序列.%AIM: To obtain sequence coding gene for the antifreeze proteins (AFP) from local Tenebrio molitor and to elucidate the related bioinformatics data. METHODS: After the total RNA was isolated, from the larva of Tenebrio molitor. cDNA encoding the afpTx was synthesized by RT-PCR, and the PCR products were inserted into the vector pMD19-T simple, which were subcloned into pET-32a( + ) and transformed into E. coli and identified with restriction enzyme analysis. Then the sequencing result was analyzed by MEGA 4. 0 and BioEdit 5. 0. 6 computer program for amino acid sequence homology and evolutionary variance. RESULTS: Sequencing result showed a correctly constructed vector that containing 336 bp antifreeze protein cDNA. Digestion and electrophoresis results confirmed that gene was successfully cloned and subcloned into pET32a

  18. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Directory of Open Access Journals (Sweden)

    Roslyn D Noar

    Full Text Available Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that

  19. Use of Photogrammetry and Biomechanical Gait analysis to Identify Individuals

    DEFF Research Database (Denmark)

    Larsen, Peter Kastmand; Simonsen, Erik Bruun; Lynnerup, Niels

    Photogrammetry and recognition of gait patterns are valuable tools to help identify perpetrators based on surveillance recordings. We have found that stature but only few other measures have a satisfying reproducibility for use in forensics. Several gait variables with high recognition rates were...

  20. Similarity transformation approach to identifiability analysis of nonlinear compartmental models.

    Science.gov (United States)

    Vajda, S; Godfrey, K R; Rabitz, H

    1989-04-01

    Through use of the local state isomorphism theorem instead of the algebraic equivalence theorem of linear systems theory, the similarity transformation approach is extended to nonlinear models, resulting in finitely verifiable sufficient and necessary conditions for global and local identifiability. The approach requires testing of certain controllability and observability conditions, but in many practical examples these conditions prove very easy to verify. In principle the method also involves nonlinear state variable transformations, but in all of the examples presented in the paper the transformations turn out to be linear. The method is applied to an unidentifiable nonlinear model and a locally identifiable nonlinear model, and these are the first nonlinear models other than bilinear models where the reason for lack of global identifiability is nontrivial. The method is also applied to two models with Michaelis-Menten elimination kinetics, both of considerable importance in pharmacokinetics, and for both of which the complicated nature of the algebraic equations arising from the Taylor series approach has hitherto defeated attempts to establish identifiability results for specific input functions.

  1. Identifying failure mechanisms in LDMOS transistors by analytical stability analysis

    NARCIS (Netherlands)

    Ferrara, A.; Steeneken, P.G.; Boksteen, B.K.; Heringa, A.; Scholten, A.J.; Schmitz, J.; Hueting, R.J.E.

    2014-01-01

    In this work, analytical stability equations are derived and combined with a physics-based model of an LDMOS transistor in order to identify the primary cause of failure in different operating and bias conditions. It is found that there is a gradual boundary between an electrical failure region at h

  2. Identifying news clusters using Q-analysis and modularity

    OpenAIRE

    2013-01-01

    With online publication and social media taking the main role in dissemination of news, and with the decline of traditional printed media, it has become necessary to devise ways to automatically extract meaningful information from the plethora of sources available and to make that information readily available to interested parties. In this paper we present a method of automated analysis of the underlying structure of online newspapers based on Q-analysis and modularity. We show how the combi...

  3. A multiway analysis for identifying high integrity bovine BACs

    OpenAIRE

    McEwan John C; Brauning Rudiger; McWilliam Sean; Barris Wesley; Ratnakumar Abhirami; Snelling Warren M; Dalrymple Brian P

    2009-01-01

    Abstract Background In large genomics projects involving many different types of analyses of bacterial artificial chromosomes (BACs), such as fingerprinting, end sequencing (BES) and full BAC sequencing there are many opportunities for the identities of BACs to become confused. However, by comparing the results from the different analyses, inconsistencies can be identified and a set of high integrity BACs preferred for future research can be defined. Results The location of each bovine BAC in...

  4. Identifiability analysis of the CSTR river water quality model.

    Science.gov (United States)

    Chen, J; Deng, Y

    2006-01-01

    Conceptual river water quality models are widely known to lack identifiability. The causes for that can be due to model structure errors, observational errors and less frequent samplings. Although significant efforts have been directed towards better identification of river water quality models, it is not clear whether a given model is structurally identifiable. Information is also limited regarding the contribution of different unidentifiability sources. Taking the widely applied CSTR river water quality model as an example, this paper presents a theoretical proof that the CSTR model is indeed structurally identifiable. Its uncertainty is thus dominantly from observational errors and less frequent samplings. Given the current monitoring accuracy and sampling frequency, the unidentifiability from sampling frequency is found to be more significant than that from observational errors. It is also noted that there is a crucial sampling frequency between 0.1 and 1 day, over which the simulated river system could be represented by different illusions and the model application could be far less reliable.

  5. 9th International Conference on Practical Applications of Computational Biology and Bioinformatics

    CERN Document Server

    Rocha, Miguel; Fdez-Riverola, Florentino; Paz, Juan

    2015-01-01

    This proceedings presents recent practical applications of Computational Biology and  Bioinformatics. It contains the proceedings of the 9th International Conference on Practical Applications of Computational Biology & Bioinformatics held at University of Salamanca, Spain, at June 3rd-5th, 2015. The International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB) is an annual international meeting dedicated to emerging and challenging applied research in Bioinformatics and Computational Biology. Biological and biomedical research are increasingly driven by experimental techniques that challenge our ability to analyse, process and extract meaningful knowledge from the underlying data. The impressive capabilities of next generation sequencing technologies, together with novel and ever evolving distinct types of omics data technologies, have put an increasingly complex set of challenges for the growing fields of Bioinformatics and Computational Biology. The analysis o...

  6. Progress and challenges in bioinformatics approaches for enhancer identification

    KAUST Repository

    Kleftogiannis, Dimitrios A.

    2017-02-03

    Enhancers are cis-acting DNA elements that play critical roles in distal regulation of gene expression. Identifying enhancers is an important step for understanding distinct gene expression programs that may reflect normal and pathogenic cellular conditions. Experimental identification of enhancers is constrained by the set of conditions used in the experiment. This requires multiple experiments to identify enhancers, as they can be active under specific cellular conditions but not in different cell types/tissues or cellular states. This has opened prospects for computational prediction methods that can be used for high-throughput identification of putative enhancers to complement experimental approaches. Potential functions and properties of predicted enhancers have been catalogued and summarized in several enhancer-oriented databases. Because the current methods for the computational prediction of enhancers produce significantly different enhancer predictions, it will be beneficial for the research community to have an overview of the strategies and solutions developed in this field. In this review, we focus on the identification and analysis of enhancers by bioinformatics approaches. First, we describe a general framework for computational identification of enhancers, present relevant data types and discuss possible computational solutions. Next, we cover over 30 existing computational enhancer identification methods that were developed since 2000. Our review highlights advantages, limitations and potentials, while suggesting pragmatic guidelines for development of more efficient computational enhancer prediction methods. Finally, we discuss challenges and open problems of this topic, which require further consideration.

  7. Coherent pipeline for biomarker discovery using mass spectrometry and bioinformatics

    Directory of Open Access Journals (Sweden)

    Al-Shahib Ali

    2010-08-01

    Full Text Available Abstract Background Robust biomarkers are needed to improve microbial identification and diagnostics. Proteomics methods based on mass spectrometry can be used for the discovery of novel biomarkers through their high sensitivity and specificity. However, there has been a lack of a coherent pipeline connecting biomarker discovery with established approaches for evaluation and validation. We propose such a pipeline that uses in silico methods for refined biomarker discovery and confirmation. Results The pipeline has four main stages: Sample preparation, mass spectrometry analysis, database searching and biomarker validation. Using the pathogen Clostridium botulinum as a model, we show that the robustness of candidate biomarkers increases with each stage of the pipeline. This is enhanced by the concordance shown between various database search algorithms for peptide identification. Further validation was done by focusing on the peptides that are unique to C. botulinum strains and absent in phylogenetically related Clostridium species. From a list of 143 peptides, 8 candidate biomarkers were reliably identified as conserved across C. botulinum strains. To avoid discarding other unique peptides, a confidence scale has been implemented in the pipeline giving priority to unique peptides that are identified by a union of algorithms. Conclusions This study demonstrates that implementing a coherent pipeline which includes intensive bioinformatics validation steps is vital for discovery of robust biomarkers. It also emphasises the importance of proteomics based methods in biomarker discovery.

  8. Using Factor Analysis to Identify Topic Preferences Within MBA Courses

    Directory of Open Access Journals (Sweden)

    Earl Chrysler

    2003-02-01

    Full Text Available This study demonstrates the role of a principal components factor analysis in conducting a gap analysis as to the desired characteristics of business alumni. Typically, gap analyses merely compare the emphases that should be given to areas of inquiry with perceptions of actual emphases. As a result, the focus is upon depth of coverage. A neglected area in need of investigation is the breadth of topic dimensions and their differences between the normative (should offer and the descriptive (actually offer. The implications of factor structures, as well as traditional gap analyses, are developed and discussed in the context of outcomes assessment.

  9. 结核分枝杆菌rpoB基因的生物信息学分析%Bioinformatics analysis of rpoB gene in Mycobacterium tuberculosis

    Institute of Scientific and Technical Information of China (English)

    赵启明; 李萍

    2012-01-01

    In this paper the research on rpoB gene sequence was illustrated using bioinformatic methods about physical and chemical properties, bydrophilicity, signal peptide, glycosylation/phosphorylation and predicting secondary and tertiary structure according to protein sequence. Result showed that RNA polymerase beta subunit was an unstable acidic with abundant Valine, Glutamate and Leu-cine, highly phosphorylation and without signal peptide. α-helix and random coil are primary secondary structure components of beta subunit. Three-dimension structure was also obtained using homology modeling softare, and Ramachandran Plot. Result showed it was a good assessment of three-dimension model.%通过生物信息学的方法对rpoB基因及其蛋白质序列的理化性质、亲/疏水性、信号肽、糖基化位点、磷酸化位点、二级结构和三级结构等进行预测分析.结果表明,RNA聚合酶β亚基为富含Val、Glu及Leu的非稳定亲水性蛋白,其中不合信号肽,磷酸化程度较高.α螺旋和无规则卷曲是RNA聚合酶β亚基的主要二级结构元件.用同源建模方法构建三维结构,通过Ramachandran Plot对模型进行评估得到了合理的RNA聚合酶β亚基结构模型.分析rpoB基因及其编码蛋白质的特征对于研究结核杆菌致病及耐利福平药物机理有着重要的意义.

  10. Cloning and Bioinformatic Analysis of Full-length actin Gene of Culex pipiens pallens%淡色库蚊肌动蛋白全长基因的克隆及生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    王晓宇; 刘虎岐

    2012-01-01

    Culex pipiens pallens is the main carrier of multiple viruses and parasites,and there is close relationship between actin protein and pesticide resistance. Based on gene fragments obtained by resistance-related design reverse transcription and amplification primers, using rapid amplification of cD-NA ends method (RACE), the full length of the gene was amplified from a resistant strain of Culex to analyze their bioinformatic characteristics. The actin gene obtained in Culex pipiens has 1 708 bp coding 377 amino acids. The bioinformatic analysis showed that actin gene was a membrane protein with one helix, one signal peptide cleavage point and twenty-seven phosphorylation sites. The full-length actin gene and biological information lay the foundation for clarifying the resistance mechanism of the actin gene and development of new pesticides.%为阐明肌动蛋白抗药性相关机制及研制新型卫生杀虫剂奠定基础,根据库蚊抗性与敏感品系差异表达的EST片段,设计特异扩增引物,运用RACE技术从淡色库蚊抗性品系中扩增出该抗性相关基因的全长cDNA序列,分析其生物信息学特性.结果表明,获得淡色库蚊肌动蛋白基因cDNA全长1708 bp序列,其编码377个氨基酸;该基因编码的蛋白为膜蛋白,具有27个跨膜螺旋、1个信号肽切割位点、27个磷酸化位点.

  11. A Mathematical Optimization Problem in Bioinformatics

    Science.gov (United States)

    Heyer, Laurie J.

    2008-01-01

    This article describes the sequence alignment problem in bioinformatics. Through examples, we formulate sequence alignment as an optimization problem and show how to compute the optimal alignment with dynamic programming. The examples and sample exercises have been used by the author in a specialized course in bioinformatics, but could be adapted…

  12. Using "Arabidopsis" Genetic Sequences to Teach Bioinformatics

    Science.gov (United States)

    Zhang, Xiaorong

    2009-01-01

    This article describes a new approach to teaching bioinformatics using "Arabidopsis" genetic sequences. Several open-ended and inquiry-based laboratory exercises have been designed to help students grasp key concepts and gain practical skills in bioinformatics, using "Arabidopsis" leucine-rich repeat receptor-like kinase (LRR…

  13. Online Bioinformatics Tutorials | Office of Cancer Genomics

    Science.gov (United States)

    Bioinformatics is a scientific discipline that applies computer science and information technology to help understand biological processes. The NIH provides a list of free online bioinformatics tutorials, either generated by the NIH Library or other institutes, which includes introductory lectures and "how to" videos on using various tools.

  14. Identifying Effective Psychological Treatments of Insomnia: A Meta-Analysis.

    Science.gov (United States)

    Murtagh, Douglas R. R.; Greenwood, Kenneth M.

    1995-01-01

    Clarified efficacy of psychological treatments for insomnia through a meta-analysis of 66 outcome studies representing 139 treatment groups. Psychological treatments produced considerable enhancement of both sleep patterns and the subjective experience of sleep. Participants who were clinically referred and who did not regularly use sedatives…

  15. Using Rasch Analysis to Identify Uncharacteristic Responses to Undergraduate Assessments

    Science.gov (United States)

    Edwards, Antony; Alcock, Lara

    2010-01-01

    Rasch Analysis is a statistical technique that is commonly used to analyse both test data and Likert survey data, to construct and evaluate question item banks, and to evaluate change in longitudinal studies. In this article, we introduce the dichotomous Rasch model, briefly discussing its assumptions. Then, using data collected in an…

  16. Automation of Bioinformatics Workflows using CloVR, a Cloud Virtual Resource

    Science.gov (United States)

    Vangala, Mahesh

    2013-01-01

    Exponential growth of biological data, mainly due to revolutionary developments in NGS technologies in past couple of years, created a multitude of challenges in downstream data analysis using bioinformatics approaches. To handle such tsunami of data, bioinformatics analysis must be carried out in an automated and parallel fashion. A successful analysis often requires more than a few computational steps and bootstrapping these individual steps (scripts) into components and the components into pipelines certainly makes bioinformatics a reproducible and manageable segment of scientific research. CloVR (http://clovr.org) is one such flexible framework that facilitates the abstraction of bioinformatics workflows into executable pipelines. CloVR comes packaged with various built-in bioinformatics pipelines that can make use of multicore processing power when run on servers and/or cloud. CloVR is amenable to build custom pipelines based on individual laboratory requirements. CloVR is available as a single executable virtual image file that comes bundled with pre-installed and pre-configured bioinformatics tools and packages and thus circumvents the cumbersome installation difficulties. CloVR is highly portable and can be run on traditional desktop/laptop computers, central servers and cloud compute farms. In conclusion, CloVR provides built-in automated analysis pipelines for microbial genomics with a scope to develop and integrate custom-workflows that make use of parallel processing power when run on compute clusters, there by addressing the bioinformatics challenges with NGS data.

  17. BIOINFORMATICS FOR UNDERGRADUATES OF LIFE SCIENCE COURSES

    Directory of Open Access Journals (Sweden)

    J.F. De Mesquita

    2007-05-01

    Full Text Available In the recent years, Bioinformatics has emerged as an important research tool. Theability to mine large databases for relevant information has become essential fordifferent life science fields. On the other hand, providing education in bioinformatics toundergraduates is challenging from this multidisciplinary perspective. Therefore, it isimportant to introduced undergraduate students to the available information andcurrent methodologies in Bioinformatics. Here we report the results of a course usinga computer-assisted and problem -based learning model. The syllabus was comprisedof theoretical lectures covering different topics within bioinformatics and practicalactivities. For the latter, we developed a set of step-by-step tutorials based on casestudies. The course was applied to undergraduate students of biological andbiomedical courses. At the end of the course, the students were able to build up astep-by-step tutorial covering a bioinformatics issue.

  18. Temperature-based Instanton Analysis: Identifying Vulnerability in Transmission Networks

    Energy Technology Data Exchange (ETDEWEB)

    Kersulis, Jonas [Univ. of Michigan, Ann Arbor, MI (United States); Hiskens, Ian [Univ. of Michigan, Ann Arbor, MI (United States); Chertkov, Michael [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Backhaus, Scott N. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Bienstock, Daniel [Columbia Univ., New York, NY (United States)

    2015-04-08

    A time-coupled instanton method for characterizing transmission network vulnerability to wind generation fluctuation is presented. To extend prior instanton work to multiple-time-step analysis, line constraints are specified in terms of temperature rather than current. An optimization formulation is developed to express the minimum wind forecast deviation such that at least one line is driven to its thermal limit. Results are shown for an IEEE RTS-96 system with several wind-farms.

  19. Incorporating Genomics and Bioinformatics across the Life Sciences Curriculum

    Energy Technology Data Exchange (ETDEWEB)

    Ditty, Jayna L.; Kvaal, Christopher A.; Goodner, Brad; Freyermuth, Sharyn K.; Bailey, Cheryl; Britton, Robert A.; Gordon, Stuart G.; Heinhorst, Sabine; Reed, Kelynne; Xu, Zhaohui; Sanders-Lorenz, Erin R.; Axen, Seth; Kim, Edwin; Johns, Mitrick; Scott, Kathleen; Kerfeld, Cheryl A.

    2011-08-01

    into courses or independent research projects requires infrastructure for organizing and assessing student work. Here, we present a new platform for faculty to keep current with the rapidly changing field of bioinformatics, the Integrated Microbial Genomes Annotation Collaboration Toolkit (IMG-ACT). It was developed by instructors from both research-intensive and predominately undergraduate institutions in collaboration with the Department of Energy-Joint Genome Institute (DOE-JGI) as a means to innovate and update undergraduate education and faculty development. The IMG-ACT program provides a cadre of tools, including access to a clearinghouse of genome sequences, bioinformatics databases, data storage, instructor course management, and student notebooks for organizing the results of their bioinformatic investigations. In the process, IMG-ACT makes it feasible to provide undergraduate research opportunities to a greater number and diversity of students, in contrast to the traditional mentor-to-student apprenticeship model for undergraduate research, which can be too expensive and time-consuming to provide for every undergraduate. The IMG-ACT serves as the hub for the network of faculty and students that use the system for microbial genome analysis. Open access of the IMG-ACT infrastructure to participating schools ensures that all types of higher education institutions can utilize it. With the infrastructure in place, faculty can focus their efforts on the pedagogy of bioinformatics, involvement of students in research, and use of this tool for their own research agenda. What the original faculty members of the IMG-ACT development team present here is an overview of how the IMG-ACT program has affected our development in terms of teaching and research with the hopes that it will inspire more faculty to get involved.

  20. Compartmental analysis of dynamic nuclear medicine data: models and identifiability

    Science.gov (United States)

    Delbary, Fabrice; Garbarino, Sara; Vivaldi, Valentina

    2016-12-01

    Compartmental models based on tracer mass balance are extensively used in clinical and pre-clinical nuclear medicine in order to obtain quantitative information on tracer metabolism in the biological tissue. This paper is the first of a series of two that deal with the problem of tracer coefficient estimation via compartmental modelling in an inverse problem framework. Specifically, here we discuss the identifiability problem for a general n-dimension compartmental system and provide uniqueness results in the case of two-compartment and three-compartment compartmental models. The second paper will utilize this framework in order to show how nonlinear regularization schemes can be applied to obtain numerical estimates of the tracer coefficients in the case of nuclear medicine data corresponding to brain, liver and kidney physiology.

  1. Bioinformatic analysis on the microRNA profiling of pancreatic cancer cell line Panc-1%胰腺癌Panc-1细胞microRNA差异表达谱生物信息学的分析

    Institute of Scientific and Technical Information of China (English)

    单振兴; 周小艳; 李天亮; 韩金祥; 崔亚洲

    2011-01-01

    目的:对胰腺癌细胞差异miRNAs表达谱进行生物信息学分析,以期从整体水平揭示microRNA在胰腺癌癌变和进展中的作用.方法:采用含有924条探针的microRNA微阵列检测胰腺癌Panc-1细胞,以3T3成纤维细胞为对照,筛选Panc-1细胞特异性microRNA表达谱;然后对上调和下调microRNA的靶基因进行Gene Ontology、Pathway和TFBS转录因子结合位点分析,以及构建microRNA和靶基因相互作用网络.结果:与3T3成纤维细胞的microRNA表达谱比较,筛选出9个Panc-1上调microRNA,20个下调microRNA.TargetScan和miRanda软件预测出1 166个microRNA靶基因在Panc-1细胞中上调,212个靶基因下调.以上靶基因在DNA代谢、细胞间信号和胞质溶胶3种GO中富集显著;靶基因共涉及50条信号通路,其中富集度P<0.05的信号通路有6条;转录因子结合位点分析表明,CEBP-β、NF-kB和p53等对于上调以及下调的microRNA可能都有调节作用;microRNA和靶基因的相互作用网络分析表明,HIF-1A等基因连接度高.结论:利用生物信息学方法对胰腺癌细胞microRNA表达谱进行数据分析,可以为进一步了解胰腺癌的发病机制提供新的思路.%OBJECTIVE: To perform bioinformatic anlysis on microRNA profiling of pancreatic cancer cells in order to il-lustrate the role of microRNA in carcinogenesis and progres-sion in pancreatic cancer. METHODS: The specific microRNA of pancreatic cancer Panc-1 was obtained by a microarray con-taining 924 probes with 3T3 fibroblast as a control. Then tar-geted genes of microRNAs were predicted and Gene Ontology, gene network, pathway and Transcription factor binding site (TFBS) analyses were performed. RESULTS: Nine microR-Nas were up-regulated in Panc-1 cells, and 20 microRNAs were down-regulated. 1 166 up-regulated micro-targeted genes and 212 down-regulated microRNA targeted genes were pre-dicted by TargetScan and miRanda software. For Gene Ontol-ogy analysis, the genes involved

  2. Robust enzyme design: bioinformatic tools for improved protein stability.

    Science.gov (United States)

    Suplatov, Dmitry; Voevodin, Vladimir; Švedas, Vytas

    2015-03-01

    The ability of proteins and enzymes to maintain a functionally active conformation under adverse environmental conditions is an important feature of biocatalysts, vaccines, and biopharmaceutical proteins. From an evolutionary perspective, robust stability of proteins improves their biological fitness and allows for further optimization. Viewed from an industrial perspective, enzyme stability is crucial for the practical application of enzymes under the required reaction conditions. In this review, we analyze bioinformatic-driven strategies that are used to predict structural changes that can be applied to wild type proteins in order to produce more stable variants. The most commonly employed techniques can be classified into stochastic approaches, empirical or systematic rational design strategies, and design of chimeric proteins. We conclude that bioinformatic analysis can be efficiently used to study large protein superfamilies systematically as well as to predict particular structural changes which increase enzyme stability. Evolution has created a diversity of protein properties that are encoded in genomic sequences and structural data. Bioinformatics has the power to uncover this evolutionary code and provide a reproducible selection of hotspots - key residues to be mutated in order to produce more stable and functionally diverse proteins and enzymes. Further development of systematic bioinformatic procedures is needed to organize and analyze sequences and structures of proteins within large superfamilies and to link them to function, as well as to provide knowledge-based predictions for experimental evaluation.

  3. The Screening of Genes Sensitive to Long-Term, Low-Level Microwave Exposure and Bioinformatic Analysis of Potential Correlations to Learning and Memory

    Institute of Scientific and Technical Information of China (English)

    ZHAO Ya Li; LI Ying Xian; MA Hong Bo; LI Dong; LI Hai Liang; JIANG Rui; KAN Guang Han; YANG Zhen Zhong; HUANG Zeng Xin

    2015-01-01

    Objective To gain a better understanding of gene expression changes in the brain following microwave exposure in mice. This study hopes to reveal mechanisms contributing to microwave-induced learning and memory dysfunction. Methods Mice were exposed to whole body 2100 MHz microwaves with specific absorption rates (SARs) of 0.45 W/kg, 1.8 W/kg, and 3.6 W/kg for 1 hour daily for 8 weeks. Differentially expressing genes in the brains were screened using high-density oligonucleotide arrays, with genes showing more significant differences further confirmed by RT-PCR. Results The gene chip results demonstrated that 41 genes (0.45 W/kg group), 29 genes (1.8 W/kg group), and 219 genes (3.6 W/kg group) were differentially expressed. GO analysis revealed that these differentially expressed genes were primarily involved in metabolic processes, cellular metabolic processes, regulation of biological processes, macromolecular metabolic processes, biosynthetic processes, cellular protein metabolic processes, transport, developmental processes, cellular component organization, etc. KEGG pathway analysis showed that these genes are mainly involved in pathways related to ribosome, Alzheimer's disease, Parkinson's disease, long-term potentiation, Huntington's disease, and Neurotrophin signaling. Construction of a protein interaction network identified several important regulatory genes including synbindin (sbdn), Crystallin (CryaB), PPP1CA, Ywhaq, Psap, Psmb1, Pcbp2, etc., which play important roles in the processes of learning and memory. Conclusion Long-term, low-level microwave exposure may inhibit learning and memory by affecting protein and energy metabolic processes and signaling pathways relating to neurological functions or diseases.

  4. CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes

    Directory of Open Access Journals (Sweden)

    Borozan Ivan

    2012-08-01

    Full Text Available Abstract Background It is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies presents an unparalleled opportunity for pathogen detection and discovery in human tissues but requires development of new genome-wide bioinformatics tools. Results Here we present CaPSID (Computational Pathogen Sequence IDentification, a comprehensive bioinformatics platform for identifying, querying and visualizing both exogenous and endogenous pathogen nucleotide sequences in tumor genomes and transcriptomes. CaPSID includes a scalable, high performance database for data storage and a web application that integrates the genome browser JBrowse. CaPSID also provides useful metrics for sequence analysis of pre-aligned BAM files, such as gene and genome coverage, and is optimized to run efficiently on multiprocessor computers with low memory usage. Conclusions To demonstrate the usefulness and efficiency of CaPSID, we carried out a comprehensive analysis of both a simulated dataset and transcriptome samples from ovarian cancer. CaPSID correctly identified all of the human and pathogen sequences in the simulated dataset, while in the ovarian dataset CaPSID’s predictions were successfully validated in vitro.

  5. 猪CB1基因的生物信息学分析%Bioinformatics Analysis on Cannabinoid Receptors 1 of Swine

    Institute of Scientific and Technical Information of China (English)

    魏星灿; 贾青; 陶隽; 胡慧艳

    2013-01-01

    运用生物信息学方法分析了猪和其他21个物种 CB1基因CDs序列的系统进化关系和猪C B1基因编码蛋白质的理化性质与结构。结果显示,C B1基因同源性较高,且在进化中受到纯化选择的作用。猪CB1蛋白为疏水性跨膜蛋白,包含472个氨基酸残基,不含信号肽。其一级结构含有23个磷酸化位点、6个糖基化位点;二级结构含有47.67%的α螺旋、39.62%的无规则卷曲、12.71%的延长链;三级结构由7个α螺旋和无规则卷曲组成。研究结果表明,C B1基因可能是哺乳动物的看家基因,7条相连的α螺旋结构是猪CB1的活性位点。%In the study ,the phylogenetic relationship of the coding sequences (CDS) of CB1 gene between swine and other 21 species ,and the physicochemical characters and structural properties of CB1-encoding protein in swine were analyzed with bioinformatics methods .The results showed that the homology of CB1 gene was high as purifying selection could exist in its evolution .The CB1 protein was a hydrophobic transmembrane protein consisting of 472 amino acid residues without signal peptide .The primary structure of the protein CB1 contained 23 phosphorylation sites and 6 glycosylation sites ,the secondary structure was made up of 47 .67% of α-helix , 39 .62% of random coil ,12 .71% of extended strand ,the tertiary structure was composed of 7α-heli-ces and random coil .The results indicate that CB1 maybe is a housekeeping gene of mammals and the 7 connected α-helices are active sites of CB1 in swine .

  6. 藏绵羊脂蛋白脂酶基因克隆及序列分析%Tibetan Sheep LPL Gene Clone and Bioinformatic Analysis

    Institute of Scientific and Technical Information of China (English)

    高思; 徐亚欧; 毛亮; 邵欢欢; 杨虎林; 舒浩国

    2011-01-01

    [目的]为深入研究藏绵羊肉用性能的遗传调控与营养代谢关系.[方法]利用RT-PCR和T-A克隆技术获得了藏绵羊LPL基因,并对其进行生物信息学分析.[结果]藏绵羊LPL编码基因全长1437 bp,编码478个氨基酸.将藏绵羊LPL基因及氨基酸序列分别与GenBank中公布的11种动物进行序列一致率比对,发现藏绵羊与所选动物的LPL基因序列一致率在84.6%-99.6%,LPL氨基酸序列一致率在88.8%-99.0%.藏绵羊与普通绵羊LPL基因存在6个位点核苷酸差异,其中有一个核苷酸位点的差异没有引起相应氨基酸的改变,其余5个住点核苷酸的不同都引起了氨基酸的差异.[结论]该研究可为了解LPL基因的演化关系及作用机理提供资料.%[ Objective ] The aim was to deeply study the relationship between the genetic regulation of meat performance of Tibetan sheep and nutrition and metabolism. [ Method ] The LPL coding gene of Tibetan sheep was cloned by reverse-translation PCR and T-A clone technology,then it was analyzed by Bioinformatics software. [ Result] The results showed that LPL gene of Tibetan sheep contained 1437 bp nucleotides and encoded 478 amino acids. The multiple sequence alignment such as Tibetan sheep, sheep, goat, cattle, yak, pig, dog, cat, baboon, orangutan, human, Norway rat and rattus showed that the total homologous rate of LPL gene was 84.6% - 99.6%, and the homologous rate of amino acids was 88.8% ~ 99.0%. Moreover,6 different nucleotides were foumd between Tibetan sheep and common sheep. One of these nucleotide was synonymous codon so that the amino acid which the synonymous codon encoded was identical between Tibetan sheep and common sheep,and the other five nucleotides which encoded different amino acids between Tibetan sheep and common sheep. [ Conclusion ] The study can provide reference for knowing the evolution relation of LPL gene and its mechanism of action.

  7. 植物抗病WRKY转录因子生物信息学分析%Bioinformatics Analysis of WRKY Transcription Factors with Resistance to Disease in Plant

    Institute of Scientific and Technical Information of China (English)

    路裕; 周振华; 胡尚连; 曹颖; 卢学琴

    2014-01-01

    以GenBank上登录的拟南芥(Arabidopsis thaliana)、欧芹(Petroselinum crispum)、辣椒(Capsicum annuum L.)、水稻(Oryza sativa L.)和毛白杨(Populus tomentosa)的22个WRKY家族抗病转录因子为分析对象,采用生物信息学方法对其系统发生树、保守基序和蛋白质三级结构进行分析。结果表明,22个抗病WRKY转录因子分为2个大类群,7个亚类群。两大类群的转录因子都具有基序1和2,系统发育树中第一大类群中的拟南芥AtWRKY4和AtWRKY25,欧芹PcWRKY1和辣椒CaWRKY-a具有相似的蛋白质三级结构,其都具有基序9和基序12,拟南芥AtWRKY33和AtWRKY48具有相似的蛋白质三级结构,都具有基序18;系统发育树中第四亚类群中的辣椒CaWRKY1和毛白杨PtWRKY23,具有相似的蛋白质三级结构。%The phylogenetic tree and conservative motif and tertiary structure of protein from 22 WRKY transcription factors with resistance to disease were analyzed in Arabidopsis thaliana, Petroselinum crispum, Capsicum annuum L. and Oryza sati-va L., Populus tomentosa, which registered in GenBank by bioinformatics methods. The results showed that 22 WRKY tran-scription factors with resistance to disease were divided into two main groups, seven sub-groups. They had the conservative motif 1 and 2. The tertiary structure of protein encoded by AtWRKY4 and AtWRKY25, PcWRKY1 and CaWRKY-a, be-longed to the first main group, was similar. They had the conservative motif 9 and 12. The tertiary structure of protein en-coded by AtWRKY33 and AtWRKY48, and they had the conservative motif 18. The tertiary structure of protein encoded by CaWRKY1 and PtWRKY23 was similar, and they belonged to the forth sub-group.

  8. 苹果SBP基因家族生物信息学分析%Bioinformatics Analysis of SBP Gene Family in Apple

    Institute of Scientific and Technical Information of China (English)

    刘更森; 慕茜; 戴洪义; 上官凌飞; 张玉刚

    2011-01-01

    This article firstly analyzed the phylogenesis of 42 SBP protein sequences and the localization of SBP genome in apple by using bioinformatics method, and then predicted and analyzed their amino acid composition, physical and chemical characteristics, as well as secondary and tertiary structures, meanwhile analyzed the relation between the SBP gene family in apple and that in Arabi-dopsis lhaliana. The results indicated that the 42 protein sequences in apple and 16 SBP protein sequences in Arabidopsis thaliana could be divided into 7 subtribes, which illuminated that SBP genes had high conservatism between apple and Arabidopsis thaliana. It was also found that these 42 SBP genes distributed on 12 chromosomes. There were some differences in the number of amino acid and hydrophobic quality of amino acid sequences among different subfamilies. The predictive results of secondary structure found that the main compositions of 42 amino acid sequences were randomly curled and a - helix, and the tertiary structure of all 42 amino acid sequences was similar.%首先利用生物信息学方法对苹果42条SBP蛋白序列的系统发生和SBP基因组定位进行分析,然后对其氨基酸组成成分、理化性质以及二级和三级结构进行预测和分析,同时还分析了苹果与拟南芥的SBP基因家族之间的联系.结果显示着42条蛋白序列与拟南芥16条SBP蛋白序列一起被分成了7个亚族,拟南芥与苹果SBP基因间具有较高的保守性.基因组定位结果显示42条SBP基因分布在12条染色体上.研究还发现不同亚族间氨基酸数目、氨基酸序列疏水性存在一定的差异;二级结构预测分析发现,42条氨基酸序列以随机卷曲和α-螺旋为主要组成部分,而且42条氨基酸序列三维结构相似.

  9. Sequence Analysis of Hypothetical Proteins from Helicobacter pylori 26695 to Identify Potential Virulence Factors

    Science.gov (United States)

    Naqvi, Ahmad Abu Turab; Anjum, Farah; Khan, Faez Iqbal; Islam, Asimul; Ahmad, Faizan

    2016-01-01

    Helicobacter pylori is a Gram-negative bacteria that is responsible for gastritis in human. Its spiral flagellated body helps in locomotion and colonization in the host environment. It is capable of living in the highly acidic environment of the stomach with the help of acid adaptive genes. The genome of H. pylori 26695 strain contains 1,555 coding genes that encode 1,445 proteins. Out of these, 340 proteins are characterized as hypothetical proteins (HP). This study involves extensive analysis of the HPs using an established pipeline which comprises various bioinformatics tools and databases to find out probable functions of the HPs and identification of virulence factors. After extensive analysis of all the 340 HPs, we found that 104 HPs are showing characteristic similarities with the proteins with known functions. Thus, on the basis of such similarities, we assigned probable functions to 104 HPs with high confidence and precision. All the predicted HPs contain representative members of diverse functional classes of proteins such as enzymes, transporters, binding proteins, regulatory proteins, proteins involved in cellular processes and other proteins with miscellaneous functions. Therefore, we classified 104 HPs into aforementioned functional groups. During the virulence factors analysis of the HPs, we found 11 HPs are showing significant virulence. The identification of virulence proteins with the help their predicted functions may pave the way for drug target estimation and development of effective drug to counter the activity of that protein. PMID:27729842

  10. Statistical modelling in biostatistics and bioinformatics selected papers

    CERN Document Server

    Peng, Defen

    2014-01-01

    This book presents selected papers on statistical model development related mainly to the fields of Biostatistics and Bioinformatics. The coverage of the material falls squarely into the following categories: (a) Survival analysis and multivariate survival analysis, (b) Time series and longitudinal data analysis, (c) Statistical model development and (d) Applied statistical modelling. Innovations in statistical modelling are presented throughout each of the four areas, with some intriguing new ideas on hierarchical generalized non-linear models and on frailty models with structural dispersion, just to mention two examples. The contributors include distinguished international statisticians such as Philip Hougaard, John Hinde, Il Do Ha, Roger Payne and Alessandra Durio, among others, as well as promising newcomers. Some of the contributions have come from researchers working in the BIO-SI research programme on Biostatistics and Bioinformatics, centred on the Universities of Limerick and Galway in Ireland and fu...

  11. 萱草microRNAs生物信息学及与冷冻相关microRNAs的分析%Bioinformatics, Expression and Functional Analysis of microRNAs in Response to Low Temperature in Hemerocallis fulva (L.) L.

    Institute of Scientific and Technical Information of China (English)

    安凤霞; 卢宝伟; 梁鸣; 唐焕伟; 李富恒

    2014-01-01

    MicroRNAs (miRNAs), as endogenous small non-coding single-stranded RNAs of 16-29 nt, play a prominent role in the process of growth, development and responses to environmental stresses in plants. The miRNAs in response to low temperature in Hemerocallis fulva roots were identified using deep-sequencing technique in combination with bioinformatics prediction. A total of 14 843 184 and 16 072 575 RNA sequences were explored under normal and low temperature conditions, which represented 14 064 385 and 15 309 725 types of small RNA (sRNA), respectively. The sRNA showed a normal distribution. Through GenBank and Rfam comparison analysis, rRNA and tRNA accounts for a larger proportion in non-coding RNA. Totally 799 994 sRNA in 67 411 types were annotated under low temperature, and 1 055 466 sRNAs in 66 524 types were annotated under normal temperature. miR393, miR397 and miR396 were up-regulated and miR319 was down-regulated at low temperature. This research provides rich data for illuminating the regulatory mechanism of protein synthesis and screening the key regulatory genes in response to low temperature.%microRNA是一类长度为16~29 nt的非蛋白质编码的内源小分子RNA (sRNA),在植物生长发育以及逆境胁迫响应等过程中发挥着重要作用。本文利用基于HiSeq原理的sRNA深度测序技术,结合生物信息学方法对萱草根系中已知miRNA的类型、丰度以及部分与冷冻胁迫相关的已知miRNA的功能进行了分析。结果表明,在10℃常温和-25℃低温条件下萱草根系中分别有14843184和16072575条序列信息,代表14064385和15309725种sRNA片段,且sRNA均呈现正态分布特征;在非编码RNA中转运RNA (tRNA)、核糖体RNA (rRNA)所占比例较大。低温sRNA组中得到注释的sRNA有67411种,共计799994条sRNA片段;常温sRNA组中,得到注释的sRNA有66524种,共计1055466条sRNA片段。冷冻胁迫下,萱草通过提高miR393、miR397、miR396的表达量

  12. Computational biology and bioinformatics in Nigeria.

    Science.gov (United States)

    Fatumo, Segun A; Adoga, Moses P; Ojo, Opeolu O; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi

    2014-04-01

    Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries.

  13. Computational biology and bioinformatics in Nigeria.

    Directory of Open Access Journals (Sweden)

    Segun A Fatumo

    2014-04-01

    Full Text Available Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries.

  14. Meconium microbiome analysis identifies bacteria correlated with premature birth.

    Directory of Open Access Journals (Sweden)

    Alexandria N Ardissone

    Full Text Available Preterm birth is the second leading cause of death in children under the age of five years worldwide, but the etiology of many cases remains enigmatic. The dogma that the fetus resides in a sterile environment is being challenged by recent findings and the question has arisen whether microbes that colonize the fetus may be related to preterm birth. It has been posited that meconium reflects the in-utero microbial environment. In this study, correlations between fetal intestinal bacteria from meconium and gestational age were examined in order to suggest underlying mechanisms that may contribute to preterm birth.Meconium from 52 infants ranging in gestational age from 23 to 41 weeks was collected, the DNA extracted, and 16S rRNA analysis performed. Resulting taxa of microbes were correlated to clinical variables and also compared to previous studies of amniotic fluid and other human microbiome niches.Increased detection of bacterial 16S rRNA in meconium of infants of <33 weeks gestational age was observed. Approximately 61·1% of reads sequenced were classified to genera that have been reported in amniotic fluid. Gestational age had the largest influence on microbial community structure (R = 0·161; p = 0·029, while mode of delivery (C-section versus vaginal delivery had an effect as well (R = 0·100; p = 0·044. Enterobacter, Enterococcus, Lactobacillus, Photorhabdus, and Tannerella, were negatively correlated with gestational age and have been reported to incite inflammatory responses, suggesting a causative role in premature birth.This provides the first evidence to support the hypothesis that the fetal intestinal microbiome derived from swallowed amniotic fluid may be involved in the inflammatory response that leads to premature birth.

  15. BioWarehouse: a bioinformatics database warehouse toolkit

    Directory of Open Access Journals (Sweden)

    Stringer-Calvert David WJ

    2006-03-01

    Full Text Available Abstract Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the

  16. Translational Bioinformatics and Clinical Research (Biomedical) Informatics.

    Science.gov (United States)

    Sirintrapun, S Joseph; Zehir, Ahmet; Syed, Aijazuddin; Gao, JianJiong; Schultz, Nikolaus; Cheng, Donavan T

    2016-03-01

    Translational bioinformatics and clinical research (biomedical) informatics are the primary domains related to informatics activities that support translational research. Translational bioinformatics focuses on computational techniques in genetics, molecular biology, and systems biology. Clinical research (biomedical) informatics involves the use of informatics in discovery and management of new knowledge relating to health and disease. This article details 3 projects that are hybrid applications of translational bioinformatics and clinical research (biomedical) informatics: The Cancer Genome Atlas, the cBioPortal for Cancer Genomics, and the Memorial Sloan Kettering Cancer Center clinical variants and results database, all designed to facilitate insights into cancer biology and clinical/therapeutic correlations.

  17. When cloud computing meets bioinformatics: a review.

    Science.gov (United States)

    Zhou, Shuigeng; Liao, Ruiqi; Guan, Jihong

    2013-10-01

    In the past decades, with the rapid development of high-throughput technologies, biology research has generated an unprecedented amount of data. In order to store and process such a great amount of data, cloud computing and MapReduce were applied to many fields of bioinformatics. In this paper, we first introduce the basic concepts of cloud computing and MapReduce, and their applications in bioinformatics. We then highlight some problems challenging the applications of cloud computing and MapReduce to bioinformatics. Finally, we give a brief guideline for using cloud computing in biology research.

  18. Bioinformatics Analysis of Endophilin B1 Gene and Protein%Endophilin B1基因及蛋白的生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    王铸; 何霞; 冯发深; 关琳琳; 徐霖; 张定梅; 汪杨; 罗燕芬; 曹开源

    2012-01-01

    目的:利用生物信息学方法分析人Endophilin B1基因以及蛋白的结构,为进一步研究其功能和参与的调控机制提供一定的理论依据.方法:通过GenBank搜索EndophilinB1基因及蛋白序列,采用生物信息学方法分析该基因在不同物种中的差异,分析该蛋白的亚细胞定位,二级结构,功能域以及抗原表位.结果:该基因编码一个长度为365个氨基酸的蛋白,具有两个BAR和SH3两个功能域.EndophilinB1蛋白理论分子量为40796.3,理论等电点为5.78.二级结构中α螺旋(H)占56.44%,β折叠(E)占5.48%,无规卷曲占38.08%.EndophilinB1蛋白含有4个可能的N连接糖基化位点,5个潜在的酪蛋白激酶Ⅱ磷酸化位点,7个豆蔻酰基化位点,3个PKC磷酸化位点以及2个酪氨酸激酶磷酸化位点.并进一步利用DNAstar软件分析了了该蛋白的抗原表位.结论:利用生物信息学预测出的结构和功能信息,能为EndophinB1蛋白的相关研究提供一定的信息基础.%Objective: To investigate the function and regulatory mechanisms of human Endophilin B1 gene and protein in human disease. Methods: Search Endophilin B1 gene and protein sequences from GenBank. The differences in different species, subcelluar location, secondary struture, functional domains and epitopes were analyzed by using Bioinformatics tools. Results: The gene encodes a length of 365 amina acid protein with two functional domains, BAR and SH3 domain. Its molecular weight is 40796.3, theoretical isoelectric point is 5.78. α-helix secondary structure (H) accounted for 56.44%, β fold (E) accounted for 5.48%, random coil accounting for 38.08%. Endophilin Bl protein contains four potential N-linked glycosylation sites, five potential casein kinase II phosphorylation sites, seven cardamom acylation sites, three PKC phosphorylation sites and two tyrosine kinase phosphorylation sites. And a further analyzed of its protein epitopes by DNAstar software was completed. Conclusion

  19. Bioinformatic analysis of insect serotonin receptor proteins%昆虫血清素受体蛋白的生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    常菊花; 何月平

    2015-01-01

    Objectives] There are five subtypes of serotonin receptor proteins of insects. The relationship between protein structure and the evolution of serotonin receptors in insects was comprehensively studied. [Methods] First, the bioinformatics of 23 protein sequences of serotonin receptors in seven insects reported in the literature were analyzed;Second, the protein sequences of serotonin receptors in the NCBI database were verified by multiple sequence alignment and constructing a phylogenetic tree. [Results] The results show that 40 of 47 putative protein sequences were those of insect serotonin receptors, whereas the other 7 protein sequences, which had 7 transmembrane regions, belong to the G protein-coupled receptor superfamily. It is uncertain if these 7 proteins are insect serotonin receptors. [Conclusion] This study provides a basis for analyzing the function of insect serotonin receptors.%【目的】昆虫血清素(5-羟色胺)受体已知有5个亚型。本文旨在系统分析昆虫5-羟色胺受体亚型蛋白的结构和进化关系。【方法】首先对文献报道已明确亚型7种昆虫的5-羟色胺受体(23个亚型序列)进行生物信息学分析,然后采用多序列比对和进化树构建的方法对NCBI数据库中推测可能为昆虫5-羟色胺受体蛋白序列进行分析。【结果】发现47个推测是昆虫5-羟色胺受体的蛋白序列中,有40个蛋白序列属于昆虫5-羟色胺受体,其余7个未能确认的昆虫5-羟色胺受体的蛋白序列都具有7个跨膜区域,属于G蛋白偶联受体家族,但不一定为5-羟色胺受体。【结论】本文对昆虫5-羟色胺受体蛋白的系统进化树分析,间接地证明了本文确认的昆虫5-羟色胺受体亚型注释信息的准确性,发现分类上同属一个目的昆虫5-HT受体序列的亲缘性较近。本研究为昆虫5-羟色胺受体的结构和功能分析提供基础。

  20. A Bioinformatic Analysis on Caffeine Synthase in Plants%植物咖啡碱合成酶的生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    孔祥瑞; 杨军; 王让剑

    2014-01-01

    The amino acid sequences of caffeine synthase from Camellia sinensis ,Theobroma cacao ,Camellia japonica and other plants which were registered in GenBank,were analyzed and predicted by bioinformatic tools in subsequent aspects, including the isoelectric point, subcellular localization, signal peptide, transmembrane topologieal structure,conserved functional domain,motif,secondary structure and tertiary structure of protein. Results showed that the caffeine synthase of plants which were located in cytoplasm and nuclei, and had phosphorylation,acylation,glycosylation sites could be divided into three different types based on gene sequences and conservative domains.Two of them,type I and type II protein,were α-type soluble proteinases,and the secondary structure of type III proteinase was rich in random coil and has potential signal peptide,but they all did not have transmembrane helical structure.The result of tertiary structure prediction indicated that type I protein and type II protein were similar,they were all composed of α-helix and horizontal β-folded layers,but in the type III protein the α-helixes locateed in the lateral ends and were connected by vertical β-folded layers.%采用生物信息学分析方法对 GenBank 中来源于茶树、可可、山茶等植物咖啡碱合成酶的氨基酸序列进行比对分析,就等电点、亚细胞定位、信号肽、跨膜螺旋、保守性功能结构域及基序、二级结构与三级结构等重要参数进行预测与分析。结果表明,植物咖啡碱合成酶主要定位于胞质和胞核中,含有磷酸化、酰基化和糖基化修饰位点,基于基因序列与保守结构域可被分成3种类型,其中 I 型与 II 型酶蛋白均属全α型水溶性酶蛋白,III 型酶蛋白除二级结构富含无规卷曲构件,还极有可能存在信号肽序列,但3类酶蛋白均无跨膜螺旋,三级结构预测显示,I 型、II 型酶蛋白极为相似,由α螺旋和横

  1. New Link in Bioinformatics Services Value Chain: Position, Organization and Business Model

    Directory of Open Access Journals (Sweden)

    Mladen Čudanov

    2012-11-01

    Full Text Available This paper presents development in the bioinformatics services industry value chain, based on cloud computing paradigm. As genome sequencing costs per Megabase exponentially drop, industry needs to adopt. Paper has two parts: theoretical analysis and practical example of Seven Bridges Genomics Company. We are focused on explaining organizational, business and financial aspects of new business model in bioinformatics services, rather than technical side of the problem. In the light of that we present twofold business model fit for core bioinformatics research and Information and Communication Technologie (ICT support in the new environment, with higher level of capital utilization and better resistance to business risks.

  2. Bioinformatics Analysis of Glutathione S-transferase Gene of Taenia saginata%牛带绦虫成虫谷胱甘肽S-转移酶基因的生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    王宇; 黄江; 戴佳琳; 廖兴江

    2012-01-01

    Objective: To analyze gene structure of glutathione S-transferase (GST) of Taenia sagi-nata, and to predict the structure and function of its encoded protein. Methods: Bioinformatics analy-sis tools in bioinformatics webs such as NCBI and ExPASY combined with some other analysis softwares were used. Results: The full length of this gene was 908 bp. Its coding region was 135 -771 bp, en-coding 212 ammo acids. The encoded protein didn't contain any kinds of subcellular localization se-quence. Consistency and similarity of the screened gene with that of Taenia solium GST were 93% and 96% respectively. Three major epitopes of GST: 33 -53 aa, 62 -68 aa, 179 ~ 184 aa were predicted to locate on the surface of GST spatial structure and were far away from each other. Conclusions; GST gene is screened from cDNA library of adult Taenia saginata. GST is predicted to be a cytosolic protein and has good application prospect for immunodiagnosis.%目的:分析牛带绦虫成虫谷胱甘肽S-转移酶(GST)基因结构并预测其编码蛋白的结构和功能.方法:利用生物信息学网站如NCBI和ExPASY系统中的生物信息学分析工具,并结合其它分析软件,分析该基因的结构并预测其编码蛋白质的结构和功能.结果:该基因全长908bp,编码区为135~771bp,编码212个氨基酸,无各种亚细胞定位序列;与猪带绦虫GST的一致性为93%,相似性为96%;预测3个主要的抗原表位33~53aa,62~68aa,179~184aa位于空间结构上相距较远的分子表面.结论:从牛带绦虫成虫Cdna文库中筛选出GST基因,预测为胞浆型蛋白,可能具有较好的免疫学诊断抗原应用前景.

  3. Phosphoproteomics and bioinformatics analyses of spinal cord proteins in rats with morphine tolerance.

    Directory of Open Access Journals (Sweden)

    Wen-Jinn Liaw

    Full Text Available INTRODUCTION: Morphine is the most effective pain-relieving drug, but it can cause unwanted side effects. Direct neuraxial administration of morphine to spinal cord not only can provide effective, reliable pain relief but also can prevent the development of supraspinal side effects. However, repeated neuraxial administration of morphine may still lead to morphine tolerance. METHODS: To better understand the mechanism that causes morphine tolerance, we induced tolerance in rats at the spinal cord level by giving them twice-daily injections of morphine (20 µg/10 µL for 4 days. We confirmed tolerance by measuring paw withdrawal latencies and maximal possible analgesic effect of morphine on day 5. We then carried out phosphoproteomic analysis to investigate the global phosphorylation of spinal proteins associated with morphine tolerance. Finally, pull-down assays were used to identify phosphorylated types and sites of 14-3-3 proteins, and bioinformatics was applied to predict biological networks impacted by the morphine-regulated proteins. RESULTS: Our proteomics data showed that repeated morphine treatment altered phosphorylation of 10 proteins in the spinal cord. Pull-down assays identified 2 serine/threonine phosphorylated sites in 14-3-3 proteins. Bioinformatics further revealed that morphine impacted on cytoskeletal reorganization, neuroplasticity, protein folding and modulation, signal transduction and biomolecular metabolism. CONCLUSIONS: Repeated morphine administration may affect multiple biological networks by altering protein phosphorylation. These data may provide insight into the mechanism that underlies the development of morphine tolerance.

  4. SNP linkage analysis and whole exome sequencing identify a novel POU4F3 mutation in autosomal dominant late-onset nonsyndromic hearing loss (DFNA15.

    Directory of Open Access Journals (Sweden)

    Hee-Jin Kim

    Full Text Available Autosomal dominant non-syndromic hearing loss (AD-NSHL is one of the most common genetic diseases in human and is well-known for the considerable genetic heterogeneity. In this study, we utilized whole exome sequencing (WES and linkage analysis for direct genetic diagnosis in AD-NSHL. The Korean family had typical AD-NSHL running over 6 generations. Linkage analysis was performed by using genome-wide single nucleotide polymorphism (SNP chip and pinpointed a genomic region on 5q31 with a significant linkage signal. Sequential filtering of variants obtained from WES, application of the linkage region, bioinformatic analyses, and Sanger sequencing validation identified a novel missense mutation Arg326Lys (c.977G>A in the POU homeodomain of the POU4F3 gene as the candidate disease-causing mutation in the family. POU4F3 is a known disease gene causing AD-HSLH (DFNA15 described in 5 unrelated families until now each with a unique mutation. Arg326Lys was the first missense mutation affecting the 3(rd alpha helix of the POU homeodomain harboring a bipartite nuclear localization signal sequence. The phenotype findings in our family further supported previously noted intrafamilial and interfamilial variability of DFNA15. This study demonstrated that WES in combination with linkage analysis utilizing bi-allelic SNP markers successfully identified the disease locus and causative mutation in AD-NSHL.

  5. Lost in the space of bioinformatic tools: a constantly updated survival guide for genetic epidemiology. The GenEpi Toolbox.

    Science.gov (United States)

    Coassin, Stefan; Brandstätter, Anita; Kronenberg, Florian

    2010-04-01

    Genome-wide association studies (GWASs) led to impressive advances in the elucidation of genetic factors underlying complex phenotypes and diseases. However, the ability of GWAS to identify new susceptibility loci in a hypothesis-free approach requires tools to quickly retrieve comprehensive information about a genomic region and analyze the potential effects of coding and non-coding SNPs in a candidate gene region. Furthermore, once a candidate region is chosen for resequencing and fine-mapping studies, the identification of several rare mutations is likely and requires strong bioinformatic support to properly evaluate and prioritize the found mutations for further analysis. Due to the variety of regulatory layers that can be affected by a mutation, a comprehensive in-silico evaluation of candidate SNPs can be a demanding and very time-consuming task. Although many bioinformatic tools that significantly simplify this task were made available in the last years, their utility is often still unknown to researches not intensively involved in bioinformatics. We present a comprehensive guide of 64 tools and databases to bioinformatically analyze gene regions of interest to predict SNP effects. In addition, we discuss tools to perform data mining of large genetic regions, predict the presence of regulatory elements, make in-silico evaluations of SNPs effects and address issues ranging from interactome analysis to graphically annotated proteins sequences. Finally, we exemplify the use of these tools by applying them to hits of a recently performed GWAS. Taken together a combination of the discussed tools are summarized and constantly updated in the web-based "GenEpi Toolbox" (http://genepi_toolbox.i-med.ac.at) and can help to get a glimpse at the potential functional relevance of both large genetic regions and single nucleotide mutations which might help to prioritize the next steps.

  6. Some statistics in bioinformatics: the fifth Armitage Lecture.

    Science.gov (United States)

    Solomon, Patricia J

    2009-10-15

    The spirit and content of the 2007 Armitage Lecture are presented in this paper. To begin, two areas of Peter Armitage's early work are distinguished: his pioneering research on sequential methods intended for use in medical trials and the comparison of survival curves. Their influence on much later work is highlighted, and motivate the proposal of several statistical 'truths' that are presented in the paper. The illustration of these truths demonstrates biology's new morphology and its dominance over statistics in this century. An overview of a recent proteomics ovarian cancer study is given as a warning of what can happen when bioinformatics meets epidemiology badly, in particular, when the study design is poor. A statistical bioinformatics success story is outlined, in which gene profiling is helping to identify novel genes and networks involved in mouse embryonic stem cell development. Some concluding thoughts are given.

  7. Identification of plasma lipid biomarkers for prostate cancer by lipidomics and bioinformatics.

    Directory of Open Access Journals (Sweden)

    Xinchun Zhou

    Full Text Available BACKGROUND: Lipids have critical functions in cellular energy storage, structure and signaling. Many individual lipid molecules have been associated with the evolution of prostate cancer; however, none of them has been approved to be used as a biomarker. The aim of this study is to identify lipid molecules from hundreds plasma apparent lipid species as biomarkers for diagnosis of prostate cancer. METHODOLOGY/PRINCIPAL FINDINGS: Using lipidomics, lipid profiling of 390 individual apparent lipid species was performed on 141 plasma samples from 105 patients with prostate cancer and 36 male controls. High throughput data generated from lipidomics were analyzed using bioinformatic and statistical methods. From 390 apparent lipid species, 35 species were demonstrated to have potential in differentiation of prostate cancer. Within the 35 species, 12 were identified as individual plasma lipid biomarkers for diagnosis of prostate cancer with a sensitivity above 80%, specificity above 50% and accuracy above 80%. Using top 15 of 35 potential biomarkers together increased predictive power dramatically in diagnosis of prostate cancer with a sensitivity of 93.6%, specificity of 90.1% and accuracy of 97.3%. Principal component analysis (PCA and hierarchical clustering analysis (HCA demonstrated that patient and control populations were visually separated by identified lipid biomarkers. RandomForest and 10-fold cross validation analyses demonstrated that the identified lipid biomarkers were able to predict unknown populations accurately, and this was not influenced by patient's age and race. Three out of 13 lipid classes, phosphatidylethanolamine (PE, ether-linked phosphatidylethanolamine (ePE and ether-linked phosphatidylcholine (ePC could be considered as biomarkers in diagnosis of prostate cancer. CONCLUSIONS/SIGNIFICANCE: Using lipidomics and bioinformatic and statistical methods, we have identified a few out of hundreds plasma apparent lipid molecular

  8. Concepts and introduction to RNA bioinformatics

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Hofacker, Ivo L.; Ruzzo, Walter L.

    2014-01-01

    RNA bioinformatics and computational RNA biology have emerged from implementing methods for predicting the secondary structure of single sequences. The field has evolved to exploit multiple sequences to take evolutionary information into account, such as compensating (and structure preserving) base...... for interactions between RNA and proteins.Here, we introduce the basic concepts of predicting RNA secondary structure relevant to the further analyses of RNA sequences. We also provide pointers to methods addressing various aspects of RNA bioinformatics and computational RNA biology....

  9. Agile parallel bioinformatics workflow management using Pwrake.

    OpenAIRE

    2011-01-01

    Abstract Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environm...

  10. Analysis of Maize Crop Leaf using Multivariate Image Analysis for Identifying Soil Deficiency

    Directory of Open Access Journals (Sweden)

    S. Sridevy

    2014-11-01

    Full Text Available Image processing analysis for the soil deficiency identification has become an active area of research in this study. The changes in the color of the leaves are used to analyze and identify the deficiency of soil nutrients such as Nitrogen (N, Phosphorus (P and potassium (K by digital color image analysis. This research study focuses on the image analysis of the maize crop leaf using multivariate image analysis. In this proposed novel approach, initially, a color transformation for the input RGB image is formed and this RGB is converted to HSV because RGB is ideal for color generation but HSV is very suitable for color perception. Then green pixels are masked and removed using specific threshold value by applying histogram equalization. This masking approach is done through specific customized filtering approach which exclusively filters the green color of the leaf. After the filtering step, only the deficiency part of the leaf is taken for consideration. Then, a histogram generation is carried out for the deficiency part of the leaf. Then, Multivariate Image Analysis approach using Independent Component Analysis (ICA is carried out to extract a reference eigenspace from a matrix built by unfolding color data from the deficiency part. Test images are also unfolded and projected onto the reference eigenspace and the result is a score matrix which is used to compute nutrient deficiency based on the T2 statistic. In addition, a multi-resolution scheme by scaling down process is carried out to speed up the process. Finally, based on the training samples, the soil deficiency is identified based on the color of the maize crop leaf.

  11. Bioinformatics Analysis on the Structure and Function of Malate Dehydrogenase Gene of Taenia solium%生物信息学法分析猪带绦虫苹果酸脱氢酶结构与功能

    Institute of Scientific and Technical Information of China (English)

    蓝磊; 廖兴江; 黄江; 戴佳琳

    2012-01-01

    目的:分析和预测猪带绦虫苹果酸脱氢酶的结构和特性,用于指导其生物学功能的实验研究.方法:利用美国国家生物技术信息中心和瑞士生物信息学研究所的蛋白分析专家系统中有关基因和蛋白的序列和结构信息分析的工具,结合Pcgene和Vector NTI suite生物信息学分析软件包,从猪带绦虫全长cDNA质粒文库中识别苹果酸脱氢酶基因及其编码区,分析、预测该基因编码的蛋白质的理化特性、翻译后的修饰位点、功能域、亚细胞定位、拓扑结构、二级结构、三维空间构象等.结果:该基因编码332个氨基酸,为全长基因.GenBank中与细粒棘球绦虫苹果酸脱氢酶序列同源性最高,理论分子量为36459.2 Da.预测编码蛋白无跨膜区,无二硫键,稳定性较好.与吸虫属的苹果酸脱氢酶进化关系最近.结论:应用生物信息方法从猪带绦虫成虫Cd-NA文库中筛选出了猪带绦虫核糖体Cdna全长序列并预测得到其结构与功能方面信息.%Objective: To analyze and predict the structure and characteristics of Taenia solium mal-ate dehydrogenase ( MDH) , and so as to guide the experimental research on biological function of MDH. Methods: Tools about informatics analyis on sequences and structures of gene and protein in protein analysis expert system of bioinformatic institute of Switzerland, and those of state biological and technology information center of USA, combined with Pcgene and Vector NTI suite bioinformatics soft-ware pakege were employed to screen Taenia solium MDH gene and encoding region from cDNA plas-mid library to analyze and predict physicochemical properties of its encoding protein, modification site after translation, function domains, subcelluar location, topological structure, secondary structure, and 3D conformation and so on. Results: This gene encoded 332 amino acids, and was a full length gene. It was the most homologues to Taenia echinococcus MDH in Gen

  12. 肿瘤相关巨噬细胞microRNA表达谱及生物信息学分析%Profile of microRNA expression in tumor associated macrophage and bioinformatics analysis

    Institute of Scientific and Technical Information of China (English)

    雷宇; 刘彦信; 葛晔华; 史娟; 郑德先

    2012-01-01

    Objective To investigate the profile of microRNA expression in tumor associated macrophage (TAM). Methods An xenograft mouse model was established with mouse breast cancer cell line 4T1. TAM were isolated from the tumor tissue. The microRNA expression profile was detected by using a microRNA chip assay. The result of chip assay was validated by real-time PCR and analyzed by bioinformatics. The peritoneal macrophage was used as control. Results There were significant changes in 59 microRNAs' expression in TAM as compared with the negative control. Among these microRNAs, 23 microRNAs' expression was up regulated and 36 were down regulated. Real-time PCR verified the expression of miR-146a, miR-222, miR-31 and miR-877, these results are in line with chip experiment. These microRNAs participate in the regulation of various signaling pathways. Conclusions Profile of microRNA expression and bioinformatics analysis suggeste microRNA plays an important role in the regulation of TAM differentiation.%目的 研究肿瘤相关巨噬细胞( TAM) microRNA的表达谱.方法 建立小鼠乳腺癌细胞系4T1移植瘤模型,从移植瘤组织中分离TAM,用基因芯片检测microRNA表达谱,实时荧光定量PCR( real-time PCR)验证芯片结果并进行生物信息学分析,以小鼠腹腔巨噬细胞(PEC)为阴性对照.结果 与阴性对照细胞相比,TAM中有59个microRNAs表达量出现显著变化,其中23个microRNAs表达上调,有36个microRNAs表达下调;实时荧光定量PCR对miR-146a、miR-222、miR-31和miR-877的表达进行了验证,其结果与基因芯片检测结果一致;这些microRNAs参与了多个信号通路的调控.结论 microRNA表达谱及生物信息学分析表明microRNA在TAM分化过程的调控中有重要作用.

  13. Bioinformatics: Cheap and robust method to explore biomaterial from Indonesia biodiversity

    Science.gov (United States)

    Widodo

    2015-02-01

    Indonesia has a huge amount of biodiversity, which may contain many biomaterials for pharmaceutical application. These resources potency should be explored to discover new drugs for human wealth. However, the bioactive screening using conventional methods is very expensive and time-consuming. Therefore, we developed a methodology for screening the potential of natural resources based on bioinformatics. The method is developed based on the fact that organisms in the same taxon will have similar genes, metabolism and secondary metabolites product. Then we employ bioinformatics to explore the potency of biomaterial from Indonesia biodiversity by comparing species with the well-known taxon containing the active compound through published paper or chemical database. Then we analyze drug-likeness, bioactivity and the target proteins of the active compound based on their molecular structure. The target protein was examined their interaction with other proteins in the cell to determine action mechanism of the active compounds in the cellular level, as well as to predict its side effects and toxicity. By using this method, we succeeded to screen anti-cancer, immunomodulators and anti-inflammation from Indonesia biodiversity. For example, we found anticancer from marine invertebrate by employing the method. The anti-cancer was explore based on the isolated compounds of marine invertebrate from published article and database, and then identified the protein target, followed by molecular pathway analysis. The data suggested that the active compound of the invertebrate able to kill cancer cell. Further, we collect and extract the active compound from the invertebrate, and then examined the activity on cancer cell (MCF7). The MTT result showed that the methanol extract of marine invertebrate was highly potent in killing MCF7 cells. Therefore, we concluded that bioinformatics is cheap and robust way to explore bioactive from Indonesia biodiversity for source of drug and another

  14. Cloning, Expression and Bioinformatics Analysis of Porcine CatSperB and CatSperG Genes%猪CatSperB和CatSperG基因的克隆、表达及生物信息学

    Institute of Scientific and Technical Information of China (English)

    宋成义; 周家庆; 冯晓军; 谢雨琇; 李庆平; 吴晗; 高波; 王霄燕

    2012-01-01

    fluorescence quantitative RT-PCR. [Result] The in silico transcripts of 3 508 bp CatSperB and 3 715 bp CatSperG were identified, and they contain 3 330 bp and 3 483 bp ORFs of CatSperB and CatSperG, respectively, and the sequences were confirmed by TA cloning. The sequence similarity of coding sequences (CDS) of porcine CatSperB and CatSperG with human, cattle, horse, and dog, and other animals was above 80%. CatSperB is a 125.79 kD and stable protein, white CatSperG is a 133.40 kD and unstable protein. Both CatSperB and CatSperG contain seven conservative trans-membrane domains, and a coiled-coil motif was also identified in the C terminal of CatSperG, but this motif was not found in CatSperB. The porcine CatSperB and CatSperG displayed higher degree of homology with the orthologs of cattle, dog and horse, and lower homology with those of human and mouse. The RT-PCR analysis showed that CatSperB and CatSperG were detected mainly in testis, but CatSperB also expressed in other tissues. The mRNA expression of CatSperB and CatSperG was significantly improved at the around stage of spermatogenesis (Day 60), puberty (Day 90) and sex maturity (Day 150) (P<0.05). [Conclusion] The cDNA clones of CatSperB and CatSperG, and a series of bioinformatics parameters of these proteins were obtained. Seven conservative trans-membrane domains in CatSperB and CatSperG and the homology among species were revealed. Furthermore, results of this study had proved that CatSperB and CatSperG expressed mainly in tests, and the change of mRNA expression was paralleled with sexual development of boar.

  15. 2nd Colombian Congress on Computational Biology and Bioinformatics

    CERN Document Server

    Cristancho, Marco; Isaza, Gustavo; Pinzón, Andrés; Rodríguez, Juan

    2014-01-01

    This volume compiles accepted contributions for the 2nd Edition of the Colombian Computational Biology and Bioinformatics Congress CCBCOL, after a rigorous review process in which 54 papers were accepted for publication from 119 submitted contributions. Bioinformatics and Computational Biology are areas of knowledge that have emerged due to advances that have taken place in the Biological Sciences and its integration with Information Sciences. The expansion of projects involving the study of genomes has led the way in the production of vast amounts of sequence data which needs to be organized, analyzed and stored to understand phenomena associated with living organisms related to their evolution, behavior in different ecosystems, and the development of applications that can be derived from this analysis.  .

  16. Bioinformatics for whole-genome shotgun sequencing of microbial communities.

    Directory of Open Access Journals (Sweden)

    Kevin Chen

    2005-07-01

    Full Text Available The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fecal samples. The interpretation of this new kind of data poses a wide variety of exciting and difficult bioinformatics problems. The aim of this review is to introduce the bioinformatics community to this emerging field by surveying existing techniques and promising new approaches for several of the most interesting of these computational problems.

  17. Bioinformatics Analysis for Isocitrate Dehydrogenase 1 Mutation in Acute Myeloid Leukemia%急性髓系白血病中异柠檬酸脱氢酶1基因突变的生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    周匡果; 黄亮; 周剑峰

    2012-01-01

    目的 从分子水平揭示合并异柠檬酸脱氢酶1(IDH1)基因突变的正常核型急性髓系白血病(CN-AML)患者的发病机制,为临床诊疗提供新工具.方法 在GEO 中检索IDH1突变的AML芯片数据,使用BRB-Array Tools、GSEA、MILANO、GAD、GATHER等生物信息学工具进行统合分析.结果 经过2组样本数据的统合分析,发现12个差异表达基因,主要集中在细胞粘附、免疫防御反应等生物学过程.与参与凋亡、Toll-like、Jak-Stat等信号通路有关.GSEA分析表明IDH1相关基因分别涉及到脂质代谢、髓系细胞分化和缺氧调节等方面.结论 利用生物信息学的方法能提取基因芯片有效信息,为进一步深入研究IDH1在CN-AML的发病机制中的作用开辟新思路.%Objective To understand the molecular pathogenesis of isocitrate dehydrogenase l(IDHl)mutation in the subgroup of cytogenetically normal acute myeloid leukemia (CN-AML) ,and provide novel means for clinical diagnosis and treatment of AML. Methods Gene expression profiles were obtained from GEO database ,and a set of bioinformatics tools ,such as BRB-Array Tools ,GSEA ,MILANO ,GAD ,GATHER were used to accomplish the data-mining. Results After combined the results of two independent sample sets ,12 differentially expressed genes were identified ,which were involved in cell adhesion immunity , etc. ID HI related genes played essential roles in such important signal pathway Toll-like,Jak-Stat. GSEA analysis results suggested that IDHl-related genes might mainly affect the biological progress in lipid metabolism ,myeloid cell differentiation and hypoxia regulation. Conclusion Bioinformatics analysis can effectively process the gene chip data. The pathogenesis of IDH1 mutation involves abnormal expression of multiple genes ,and these data may benefit further investigations of the pathogenesis of CN-AML with IDH1 mutation.

  18. In Silico Analysis of Gene Expression Network Components Underlying Pigmentation Phenotypes in the Python Identified Evolutionarily Conserved Clusters of Transcription Factor Binding Sites

    Science.gov (United States)

    2016-01-01

    Color variation provides the opportunity to investigate the genetic basis of evolution and selection. Reptiles are less studied than mammals. Comparative genomics approaches allow for knowledge gained in one species to be leveraged for use in another species. We describe a comparative vertebrate analysis of conserved regulatory modules in pythons aimed at assessing bioinformatics evidence that transcription factors important in mammalian pigmentation phenotypes may also be important in python pigmentation phenotypes. We identified 23 python orthologs of mammalian genes associated with variation in coat color phenotypes for which we assessed the extent of pairwise protein sequence identity between pythons and mouse, dog, horse, cow, chicken, anole lizard, and garter snake. We next identified a set of melanocyte/pigment associated transcription factors (CREB, FOXD3, LEF-1, MITF, POU3F2, and USF-1) that exhibit relatively conserved sequence similarity within their DNA binding regions across species based on orthologous alignments across multiple species. Finally, we identified 27 evolutionarily conserved clusters of transcription factor binding sites within ~200-nucleotide intervals of the 1500-nucleotide upstream regions of AIM1, DCT, MC1R, MITF, MLANA, OA1, PMEL, RAB27A, and TYR from Python bivittatus. Our results provide insight into pigment phenotypes in pythons. PMID:27698666

  19. In Silico Analysis of Gene Expression Network Components Underlying Pigmentation Phenotypes in the Python Identified Evolutionarily Conserved Clusters of Transcription Factor Binding Sites

    Directory of Open Access Journals (Sweden)

    Kristopher J. L. Irizarry

    2016-01-01

    Full Text Available Color variation provides the opportunity to investigate the genetic basis of evolution and selection. Reptiles are less studied than mammals. Comparative genomics approaches allow for knowledge gained in one species to be leveraged for use in another species. We describe a comparative vertebrate analysis of conserved regulatory modules in pythons aimed at assessing bioinformatics evidence that transcription factors important in mammalian pigmentation phenotypes may also be important in python pigmentation phenotypes. We identified 23 python orthologs of mammalian genes associated with variation in coat color phenotypes for which we assessed the extent of pairwise protein sequence identity between pythons and mouse, dog, horse, cow, chicken, anole lizard, and garter snake. We next identified a set of melanocyte/pigment associated transcription factors (CREB, FOXD3, LEF-1, MITF, POU3F2, and USF-1 that exhibit relatively conserved sequence similarity within their DNA binding regions across species based on orthologous alignments across multiple species. Finally, we identified 27 evolutionarily conserved clusters of transcription factor binding sites within ~200-nucleotide intervals of the 1500-nucleotide upstream regions of AIM1, DCT, MC1R, MITF, MLANA, OA1, PMEL, RAB27A, and TYR from Python bivittatus. Our results provide insight into pigment phenotypes in pythons.

  20. Comparative Genomic Analysis of Asian Haemorrhagic Septicaemia-Associated Strains of Pasteurella multocida Identifies More than 90 Haemorrhagic Septicaemia-Specific Genes.

    Directory of Open Access Journals (Sweden)

    Ahmed M Moustafa

    Full Text Available Pasteurella multocida is the primary causative agent of a range of economically important diseases in animals, including haemorrhagic septicaemia (HS, a rapidly fatal disease of ungulates. There is limited information available on the diversity of P. multocida strains that cause HS. Therefore, we determined draft genome sequences of ten disease-causing isolates and two vaccine strains and compared these genomes using a range of bioinformatic analyses. The draft genomes of the 12 HS strains were between 2,298,035 and 2,410,300 bp in length. Comparison of these genomes with the North American HS strain, M1404, and other available P. multocida genomes (Pm70, 3480, 36950 and HN06 identified a core set of 1,824 genes. A set of 96 genes was present in all HS isolates and vaccine strains examined in this study, but absent from Pm70, 3480, 36950 and HN06. Moreover, 59 genes were shared only by the Asian B:2 strains. In two Pakistani isolates, genes with high similarity to genes in the integrative and conjugative element, ICEPmu1 from strain 36950 were identified along with a range of other antimicrobial resistance genes. Phylogenetic analysis indicated that the HS strains formed clades based on their country of isolation. Future analysis of the 96 genes unique to the HS isolates will aid the identification of HS-specific virulence attributes and facilitate the development of disease-specific diagnostic tests.

  1. Cloning and Bioinformatics Analysis on CDS of CYGB Gene in Yak%牦牛CYGB基因CDS区克隆与生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    孙雪婧; 杜晓华; 杨孝朴; 罗玉柱; 刘霞

    2014-01-01

    因子调控的作用。牦牛CYGB氨基酸序列与普通牛、绵羊、家犬、小鼠、褐家鼠、原鸡、猴、黑猩猩、人的CYGB氨基酸序列的同源性分别为100%、98.9%、97.8%、95.3%、93.7%、78.8%、98.4%、95.8%和96.8%,物种之间同源性较高,系统进化情况与其亲缘关系远近一致,说明CYGB基因编码区在进化过程中比较保守。通过RT-PCR与TA克隆技术及核酸测序技术获得了牦牛CYGB基因全长573 bp的CDS区,并对其核苷酸序列和编码蛋白氨基酸序列及其蛋白结构和功能进行了分析,得知牦牛的CYGB是一个由190个氨基酸残基构成的可溶酸性蛋白质,在能量代谢和辅因子生物合成过程中发挥重要作用。CYGB基因编码区在长期生物进化过程中具有较强的保守性。该基因的成功克隆及分析为揭示牦牛CYGB基因的遗传特性提供了理论依据。%Objective In order to enrich basic data in yak CYGB gene, CDS region of yak CYGB gene was cloned and analyzed by bioinformatics method. [Method] Total RNA of yak hippocampus tissue was extracted and reverse transcribed into cDNA by RT-PCR technology. Specific primers were designed according to cDNA sequence of cattle CYGB gene in the GenBank (GenBank accession No.:DV874786.1) by online software Primer 3.0. The CDS region and part of 5′UTR and 3′UTR in yak CYGB gene were cloned from yak hippocampus total RNA by PCR amplification, TA cloning and nucleic acid sequencing technology. The primary structure, secondary structure, tertiary structure, physicochemical properties, homology were analyzed and phylogenetic tree of CYGB was constructed by online software like ProtParam, PredictProtein, SWISS-MODEL and Lasergene7.1 software package. The three-dimensional structure was modified and output by PyMol software. The protein subcellular localization was predicted by online subcellular localization tool PSORT II Prediction, and the protein function was predicted by Protfun

  2. Website for avian flu information and bioinformatics

    Institute of Scientific and Technical Information of China (English)

    GAO; George; Fu

    2009-01-01

    Highly pathogenic influenza A virus H5N1 has spread out worldwide and raised the public concerns. This increased the output of influenza virus sequence data as well as the research publication and other reports. In order to fight against H5N1 avian flu in a comprehensive way, we designed and started to set up the Website for Avian Flu Information (http://www.avian-flu.info) from 2004. Other than the influenza virus database available, the website is aiming to integrate diversified information for both researchers and the public. From 2004 to 2009, we collected information from all aspects, i.e. reports of outbreaks, scientific publications and editorials, policies for prevention, medicines and vaccines, clinic and diagnosis. Except for publications, all information is in Chinese. Till April 15, 2009, the cumulative news entries had been over 2000 and research papers were approaching 5000. By using the curated data from Influenza Virus Resource, we have set up an influenza virus sequence database and a bioinformatic platform, providing the basic functions for the sequence analysis of influenza virus. We will focus on the collection of experimental data and results as well as the integration of the data from the geological information system and avian influenza epidemiology.

  3. Website for avian flu information and bioinformatics

    Institute of Scientific and Technical Information of China (English)

    LIU Di; LIU Quan-He; WU Lin-Huan; LIU Bin; WU Jun; LAO Yi-Mei; LI Xiao-Jing; GAO George Fu; MA Jun-Cai

    2009-01-01

    Highly pathogenic influenza A virus H5N1 has spread out worldwide and raised the public concerns. This increased the output of influenza virus sequence data as well as the research publication and other reports. In order to fight against H5N1 avian flu in a comprehensive way, we designed and started to set up the Website for Avian Flu Information (http://www.avian-flu.info) from 2004. Other than the influenza virus database available, the website is aiming to integrate diversified information for both researchers and the public. From 2004 to 2009, we collected information from all aspects, i.e. reports of outbreaks, scientific publications and editorials, policies for prevention, medicines and vaccines, clinic and diagnosis. Except for publications, all information is in Chinese. Till April 15, 2009, the cumulative news entries had been over 2000 and research papers were approaching 5000. By using the curated data from Influenza Virus Resource, we have set up an influenza virus sequence database and a bioin-formatic platform, providing the basic functions for the sequence analysis of influenza virus. We will focus on the collection of experimental data and results as well as the integration of the data from the geological information system and avian influenza epidemiology.

  4. Fundamentals of bioinformatics and computational biology methods and exercises in matlab

    CERN Document Server

    Singh, Gautam B

    2015-01-01

    This book offers comprehensive coverage of all the core topics of bioinformatics, and includes practical examples completed using the MATLAB bioinformatics toolbox™. It is primarily intended as a textbook for engineering and computer science students attending advanced undergraduate and graduate courses in bioinformatics and computational biology. The book develops bioinformatics concepts from the ground up, starting with an introductory chapter on molecular biology and genetics. This chapter will enable physical science students to fully understand and appreciate the ultimate goals of applying the principles of information technology to challenges in biological data management, sequence analysis, and systems biology. The first part of the book also includes a survey of existing biological databases, tools that have become essential in today’s biotechnology research. The second part of the book covers methodologies for retrieving biological information, including fundamental algorithms for sequence compar...

  5. [Post-translational modification (PTM) bioinformatics in China: progresses and perspectives].

    Science.gov (United States)

    Zexian, Liu; Yudong, Cai; Xuejiang, Guo; Ao, Li; Tingting, Li; Jianding, Qiu; Jian, Ren; Shaoping, Shi; Jiangning, Song; Minghui, Wang; Lu, Xie; Yu, Xue; Ziding, Zhang; Xingming, Zhao

    2015-07-01

    Post-translational modifications (PTMs) are essential for regulating conformational changes, activities and functions of proteins, and are involved in almost all cellular pathways and processes. Identification of protein PTMs is the basis for understanding cellular and molecular mechanisms. In contrast with labor-intensive and time-consuming experiments, the PTM prediction using various bioinformatics approaches can provide accurate, convenient, and efficient strategies and generate valuable information for further experimental consideration. In this review, we summarize the current progresses made by Chineses bioinformaticians in the field of PTM Bioinformatics, including the design and improvement of computational algorithms for predicting PTM substrates and sites, design and maintenance of online and offline tools, establishment of PTM-related databases and resources, and bioinformatics analysis of PTM proteomics data. Through comparing similar studies in China and other countries, we demonstrate both advantages and limitations of current PTM bioinformatics as well as perspectives for future studies in China.

  6. Applying bioinformatics to proteomics: is machine learning the answer to biomarker discovery for PD and MSA?

    Science.gov (United States)

    Mattison, Hayley A; Stewart, Tessandra; Zhang, Jing

    2012-11-01

    Bioinformatics tools are increasingly being applied to proteomic data to facilitate the identification of biomarkers and classification of patients. In the June, 2012 issue, Ishigami et al. used principal component analysis (PCA) to extract features and support vector machine (SVM) to differentiate and classify cerebrospinal fluid (CSF) samples from two small cohorts of patients diagnosed with either Parkinson's disease (PD) or multiple system atrophy (MSA) based on differences in the patterns of peaks generated with matrix-assisted desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). PCA accurately segregated patients with PD and MSA from controls when the cohorts were combined, but did not perform well when segregating PD from MSA. On the other hand, SVM, a machine learning classification model, correctly classified the samples from patients with early PD or MSA, and the peak at m/z 6250 was identified as a strong contributor to the ability of SVM to distinguish the proteomic profiles of either cohort when trained on one cohort. This study, while preliminary, provides promising results for the application of bioinformatics tools to proteomic data, an approach that may eventually facilitate the ability of clinicians to differentiate and diagnose closely related parkinsonian disorders.

  7. "大通"牦牛Lfcin基因克隆及生物信息学分析%Cloning and Bioinformatics Analysis of Lfcin Gene of Datong Yak

    Institute of Scientific and Technical Information of China (English)

    裴杰; 阎萍; 姬国红; 冯瑞林; 梁春年; 郭宪; 曾玉峰; 包鹏甲; 褚敏

    2009-01-01

    [Objective] This study was to clone Lfcin gene from Datong yak, so as to provide reference for applying this gene in feed industry and breeding industry. [Method] Using PCR technology, the lactoferricin(Lfcin)-encoding gene was obtained from genome of Datong yak; then it was cloned into pGEM-T easy vector, and then sequenced; the sequencing results were subsequently aligned with the sequences of dairy cow accessed in GenBank. Moreover, amino acid sequences of Lfcin gene from various species including yak, dairy cow, human and mouse were used for sequence alignment and phylogenesis analysis. [Result] The second exon of lactoferrin(LF) from Datong yak, which is 778 bp in length, was obtained, within which the coding region of Lfcin gene is 75 bp (25 amino acid residues); sequence analysis showed that there is discrepancy of eleven bases between Datong yak and dairy cow; Lfcin proteins from various species shared high homeology, of which that from Datong yak and dairy cow were completely identical; phylogenesis analysis showed that cladogram based on Lfcin was consistent with species evolutionary law. [Conclusion] This study laid a foundation for the prokaryotic or eukaryotic expression of Lfcin gene and further understanding the activity of Lfcin protein.

  8. p3d – Python module for structural bioinformatics

    Directory of Open Access Journals (Sweden)

    Fufezan Christian

    2009-08-01

    Full Text Available Abstract Background High-throughput bioinformatic analysis tools are needed to mine the large amount of structural data via knowledge based approaches. The development of such tools requires a robust interface to access the structural data in an easy way. For this the Python scripting language is the optimal choice since its philosophy is to write an understandable source code. Results p3d is an object oriented Python module that adds a simple yet powerful interface to the Python interpreter to process and analyse three dimensional protein structure files (PDB files. p3d's strength arises from the combination of a very fast spatial access to the structural data due to the implementation of a binary space partitioning (BSP tree, b set theory and c functions that allow to combine a and b and that use human readable language in the search queries rather than complex computer language. All these factors combined facilitate the rapid development of bioinformatic tools that can perform quick and complex analyses of protein structures. Conclusion p3d is the perfect tool to quickly develop tools for structural bioinformatics using the Python scripting language.

  9. Identifying coordinative structure using principal component analysis based on coherence derived from linear systems analysis.

    Science.gov (United States)

    Wang, Xinguang; O'Dwyer, Nicholas; Halaki, Mark; Smith, Richard

    2013-01-01

    Principal component analysis is a powerful and popular technique for capturing redundancy in muscle activity and kinematic patterns. A primary limitation of the correlations or covariances between signals on which this analysis is based is that they do not account for dynamic relations between signals, yet such relations-such as that between neural drive and muscle tension-are widespread in the sensorimotor system. Low correlations may thus be obtained and signals may appear independent despite a dynamic linear relation between them. To address this limitation, linear systems analysis can be used to calculate the matrix of overall coherences between signals, which measures the strength of the relation between signals taking dynamic relations into account. Using ankle, knee, and hip sagittal-plane angles from 6 healthy subjects during ~50% of total variance in the data set, while with overall coherence matrices the first component accounted for > 95% of total variance. The results demonstrate that the dimensionality of the coordinative structure can be overestimated using conventional correlation, whereas a more parsimonious structure is identified with overall coherence.

  10. Identifying significant genetic regulatory networks in the prostate cancer from microarray data based on transcription factor analysis and conditional independency

    Directory of Open Access Journals (Sweden)

    Yeh Cheng-Yu

    2009-12-01

    Full Text Available Abstract Background Prostate cancer is a world wide leading cancer and it is characterized by its aggressive metastasis. According to the clinical heterogeneity, prostate cancer displays different stages and grades related to the aggressive metastasis disease. Although numerous studies used microarray analysis and traditional clustering method to identify the individual genes during the disease processes, the important gene regulations remain unclear. We present a computational method for inferring genetic regulatory networks from micorarray data automatically with transcription factor analysis and conditional independence testing to explore the potential significant gene regulatory networks that are correlated with cancer, tumor grade and stage in the prostate cancer. Results To deal with missing values in microarray data, we used a K-nearest-neighbors (KNN algorithm to determine the precise expression values. We applied web services technology to wrap the bioinformatics toolkits and databases to automatically extract the promoter regions of DNA sequences and predicted the transcription factors that regulate the gene expressions. We adopt the microarray datasets consists of 62 primary tumors, 41 normal prostate tissues from Stanford Microarray Database (SMD as a target dataset to evaluate our method. The predicted results showed that the possible biomarker genes related to cancer and denoted the androgen functions and processes may be in the development of the prostate cancer and promote the cell death in cell cycle. Our predicted results showed that sub-networks of genes SREBF1, STAT6 and PBX1 are strongly related to a high extent while ETS transcription factors ELK1, JUN and EGR2 are related to a low extent. Gene SLC22A3 may explain clinically the differentiation associated with the high grade cancer compared with low grade cancer. Enhancer of Zeste Homolg 2 (EZH2 regulated by RUNX1 and STAT3 is correlated to the pathological stage

  11. The GMOD Drupal Bioinformatic Server Framework

    Science.gov (United States)

    Papanicolaou, Alexie; Heckel, David G.

    2010-01-01

    Motivation: Next-generation sequencing technologies have led to the widespread use of -omic applications. As a result, there is now a pronounced bioinformatic bottleneck. The general model organism database (GMOD) tool kit (http://gmod.org) has produced a number of resources aimed at addressing this issue. It lacks, however, a robust online solution that can deploy heterogeneous data and software within a Web content management system (CMS). Results: We present a bioinformatic framework for the Drupal CMS. It consists of three modules. First, GMOD-DBSF is an application programming interface module for the Drupal CMS that simplifies the programming of bioinformatic Drupal modules. Second, the Drupal Bioinformatic Software Bench (biosoftware_bench) allows for a rapid and secure deployment of bioinformatic software. An innovative graphical user interface (GUI) guides both use and administration of the software, including the secure provision of pre-publication datasets. Third, we present genes4all_experiment, which exemplifies how our work supports the wider research community. Conclusion: Given the infrastructure presented here, the Drupal CMS may become a powerful new tool set for bioinformaticians. The GMOD-DBSF base module is an expandable community resource that decreases development time of Drupal modules for bioinformatics. The biosoftware_bench module can already enhance biologists' ability to mine their own data. The genes4all_experiment module has already been responsible for archiving of more than 150 studies of RNAi from Lepidoptera, which were previously unpublished. Availability and implementation: Implemented in PHP and Perl. Freely available under the GNU Public License 2 or later from http://gmod-dbsf.googlecode.com Contact: alexie@butterflybase.org PMID:20971988

  12. Comparative bioinformatics and experimental analysis of the intergenic regulatory regions of Bacillus cereus hbl and nhe enterotoxin operons and the impact of CodY on virulence heterogeneity

    Directory of Open Access Journals (Sweden)

    Maria-Elisabeth eBöhm

    2016-05-01

    Full Text Available Bacillus cereus is a food contaminant with greatly varying enteropathogenic potential. Almost all known strains harbor the genes for at least one of the three enterotoxins Nhe, Hbl and CytK. While some strains show no cytotoxicity, others have caused outbreaks, in rare cases even with lethal outcome. The reason for these differences in cytotoxicity is unknown. To gain insight into the origin of enterotoxin expression heterogeneity in different strains, the architecture and role of 5’ intergenic regions (5’IGRs upstream of the nhe and hbl operons was investigated. In silico comparison of 142 strains of all seven phylogenetic groups of B. cereus sensu lato proved the presence of long 5’IGRs upstream of the nheABC and hblCDAB operons, which harbor recognition sites for several transcriptional regulators, including the virulence regulator PlcR, redox regulators ResD and Fnr, the nutrient-sensitive regulator CodY as well as the master regulator for biofilm formation SinR. By determining transcription start sites, unusually long 5’ untranslated regions (5’UTRs upstream of the nhe and hbl start codons were identified, which are not present upstream of cytK-1 and cytK-2. Promoter fusions lacking various parts of the nhe and hbl 5’UTR in B. cereus INRA C3 showed that the entire 331 bp 5’UTR of nhe is necessary for full promoter activity, while the presence of the complete 606 bp hbl 5’UTR lowers promoter activity. Repression was caused by a 268 bp sequence directly upstream of the hbl transcription start. Luciferase activity of reporter strains containing nhe and hbl 5’IGR lux fusions provided evidence that toxin gene transcription is upregulated by the depletion of free amino acids. Electrophoretic mobility shift assays showed that the branched-chain amino acid sensing regulator CodY binds to both nhe and hbl 5’UTR downstream of the promoter, potentially acting as a nutrient-responsive roadblock repressor of toxin gene transcription

  13. Comparative Bioinformatics and Experimental Analysis of the Intergenic Regulatory Regions of Bacillus cereus hbl and nhe Enterotoxin Operons and the Impact of CodY on Virulence Heterogeneity.

    Science.gov (United States)

    Böhm, Maria-Elisabeth; Krey, Viktoria M; Jeßberger, Nadja; Frenzel, Elrike; Scherer, Siegfried

    2016-01-01

    Bacillus cereus is a food contaminant with greatly varying enteropathogenic potential. Almost all known strains harbor the genes for at least one of the three enterotoxins Nhe, Hbl, and CytK. While some strains show no cytotoxicity, others have caused outbreaks, in rare cases even with lethal outcome. The reason for these differences in cytotoxicity is unknown. To gain insight into the origin of enterotoxin expression heterogeneity in different strains, the architecture and role of 5' intergenic regions (5' IGRs) upstream of the nhe and hbl operons was investigated. In silico comparison of 142 strains of all seven phylogenetic groups of B. cereus sensu lato proved the presence of long 5' IGRs upstream of the nheABC and hblCDAB operons, which harbor recognition sites for several transcriptional regulators, including the virulence regulator PlcR, redox regulators ResD and Fnr, the nutrient-sensitive regulator CodY as well as the master regulator for biofilm formation SinR. By determining transcription start sites, unusually long 5' untranslated regions (5' UTRs) upstream of the nhe and hbl start codons were identified, which are not present upstream of cytK-1 and cytK-2. Promoter fusions lacking various parts of the nhe and hbl 5' UTR in B. cereus INRA C3 showed that the entire 331 bp 5' UTR of nhe is necessary for full promoter activity, while the presence of the complete 606 bp hbl 5' UTR lowers promoter activity. Repression was caused by a 268 bp sequence directly upstream of the hbl transcription start. Luciferase activity of reporter strains containing nhe and hbl 5' IGR lux fusions provided evidence that toxin gene transcription is upregulated by the depletion of free amino acids. Electrophoretic mobility shift assays showed that the branched-chain amino acid sensing regulator CodY binds to both nhe and hbl 5' UTR downstream of the promoter, potentially acting as a nutrient-responsive roadblock repressor of toxin gene transcription. PlcR binding sites are

  14. Cloning and Bioinformatics Analysis of TSP1 and TSP6 Gene of Echinococcus granulosus%细粒棘球蚴TSP1和TSP6基因的克隆及生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    刘田莉; 孟庆玲; 乔军; 陈诚; 马玉; 胡政香; 才学鹏; 陈创夫

    2015-01-01

    In order to study the function of two important antigen genes tetraspanin 1-TSP1( TSP1 )and tet-raspanin 1-TSP6(TSP6),primers derived from Echinococcus granulosus genome database in GenBank were designed and the open reading frame( ORF)sequences of TSP1 and TSP6 were cloned by RT-PCR from hydatid protoscolex. Then they were cloned into pMD19-T vector for bioinformatics analysis. The results indicated that the TSP1 cDNA contains 792 nucleotides. The deduced protein consisted of 263 amino acids and has three N-glycosylation sites,two N-acylation sites. The gene sequence showed about 98. 99% identity with the TSP1(EG 11043)reported and the induced amino acid sequence showed about 98. 48% identity. The TSP6 cDNA contains 666 nucleotides. The de-duced protein consisted of 221 amino acids and has five N-acylation sites,one Tyrosine kinase phosphorylation sites. The gene sequence showed about 98. 18% identity with the TSP6(EG 00715)reported and the induced ami-no acid sequence showed about 85. 07% identity. The study carried out bioinformatics analysis of the TSP1 and TSP6 gene of Eg by molecular biology software to predict the structure and epitope of protein antigens known and laid a good foundation for the preparation of developing a vaccine.%为了研究绵羊细粒棘球蚴重要抗原基因 Tetraspanin 1-TSP1( TSP1)和 Tetraspanin 1-TSP6( TSP6)的功能,对GenBank中 Eg基因组数据库检索,获得 TSP1和 TSP6的 cDNA序列并设计特异性引物。以 Eg头节为总 RNA 模板,进行 RT-PCR,将 PCR产物克隆到 pMD19-T载体后测序并进行生物信息学分析。TSP1 cDNA 全长792个核苷酸,编码263个氨基酸,该多肽含有3个潜在的 N端糖基化位点,2个 N 端酰基化位点,与已登录的标准株 TSP1基因序列(EG 11043)同源性为98.99%,其推导的氨基酸序列同源性为98.48%;TSP6 cDNA 全长666个核苷酸,编码221个氨基酸,该多肽含有5个 N 端酰基化位点,1个酪氨酸激酶磷

  15. Bioinformatics Analysis of Bovine ASCL2 Gene%牛ASCL2基因生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    王梦楠; 吴茜红; 赵姝君; 王敬姣; 李冬杰; 李世杰

    2013-01-01

    在人与小鼠中,A SCL2基因是一个母源表达的印记基因,在早期胚胎和胎盘发育中起重要作用。牛A SCL2基因的印记状态和印记的分子机理还没有被研究。本研究采用生物信息学方法对牛A SCL2基因分子进化、启动子和CpG岛区域以及蛋白的高级结构进行分析和预测,为进一步揭示该基因生物学功能和其分子调控机理奠定基础。对21种哺乳动物A SCL2基因的mRNA序列进化分析表明:这21种哺乳动物间的遗传距离小于0.536,且牛与猪遗传距离最小,为0.106,与基因进化树分析结果一致。CpG岛在线软件预测显示,在牛中,该基因上游5 k序列中有三个CpG岛。启动子在线软件预测和转录因子分析相结合显示,启动子最可能位于该基因5'端上游4725~4775 bp处CpG岛区域内,此区域包括大量潜在转录因子结合位点,并在4734 bp处存在一个TATA框。蛋白质在线软件分析表明,A SCL2基因编码一种螺旋-环-螺旋形转录因子,有α-螺旋、β-转角和无规则卷曲3种二级结构。%In human and mice, A SCL2 is a maternally expressed imprinted gene and plays an important role in the early embryonic and placental development. The imprinting status and imprinting molecular mechanism of bovine A SCL2 gene have not been studied. The aims of this study are to analyze and predict the molecular evolution, pro-moter, CpG islands and protein advanced structure of bovine A SCL2 gene using software and on-line tool. The result of genetic distance of 21 mammals species indicated that bovine and pig shared the minimum (0.106), which accor-dant with the phylogenetic tree. The genetic distance of all the species were blow 0.536. CG content was analyzed with CpG Plot online software, the result indicated that there were three CpG islands in 5'-flanking region. Using promoter online software combined with transcription factor analysis, the potential promoter of bovine A SCL2

  16. 蔷薇科植物DELLA蛋白的生物信息学分析%Bioinformatics Analysis of DELLA Proteins in Rosaceous Plants

    Institute of Scientific and Technical Information of China (English)

    宋伟; 李鼎立; 王然; 原永兵; 刘成连; 马春晖

    2013-01-01

      为探索蔷薇科植物DELLA蛋白的结构特征和亲缘进化关系,以蔷薇科植物苹果、梨和玫瑰等19个 DELLA 蛋白为试材,利用 expasy、PSORT 和 PROSITE 数据库、TM-HMM 方法、SignalP4.0Server、CDD、DNAMAN和MEGA version5.3等软件对蛋白质进行了生物信息学分析。结果表明:蔷薇科植物中19个DELLA蛋白氨基酸序列组成成分和理化性质差异不明显,均为非跨膜类亲水性蛋白,且不含信号肽;不同DELLA蛋白之间同源性较高,达到74.38%以上,具有DELLA和GRAS 2个保守结构域,功能位点是GRAS;19个DELLA蛋白N端同源性较低,但是存在TVHYNP、VHIID和RVER等DELLA蛋白典型结构域;进化树显示梨属和蔷薇属属内植物DELLA蛋白亲缘关系较近,苹果属内植物亲缘关系相差较远。本研究的开展为蔷薇科植物遗传演化研究提供理论依据。%In order to explore the structural feature and phylogenetic analysis of DELLA protein in Rosaceous plants, 19 DELLA proteins of Malus, Pyrus and Rose in Rosaceous plants were analyzed by using expasy, PSORT and PROSITE date bank, TM-HMM, SignalP4.0Server, CDD, DNAMAN and MEGA version 5.3 softwares. The results showed that all DELLA proteins were non-transmembrane hydrophilic proteins and without the signal peptides, and the difference was insignificant among their amino acid composition, physical and chemical characteristics. There was high homology among the different DELLA proteins, reached above 74.38% . Both DELLA and GRAS had two conserved domains, and the function sites were GRAS. The homology of the N-terminal of the 19 DELLA proteins were low, however they had the DELLA protein typical domains such as TVHYNP, VHIID and RVER etc. Phylogenetic tree analysis showed DELLA proteins had the closest relationships in Pyrus and Rose plants, and the distant relationships in Malus plants. The study provided theoretical basis for the genetic evolution of Rosaceous plants.

  17. Expression and functional analysis of two osmotin (PR5) isoforms with differential antifungal activity from Piper colubrinum: prediction of structure-function relationship by bioinformatics approach.

    Science.gov (United States)

    Mani, Tomson; Sivakumar, K C; Manjula, S

    2012-11-01

    Osmotin, a pathogenesis-related antifungal protein, is relevant in induced plant immunity and belongs to the thaumatin-like group of proteins (TLPs). This article describes comparative structural and functional analysis of the two osmotin isoforms cloned from Phytophthora-resistant wild Piper colubrinum. The two isoforms differ mainly by an internal deletion of 50 amino acid residues which separates them into two size categories (16.4 kDa-PcOSM1 and 21.5 kDa-PcOSM2) with pI values 5.6 and 8.3, respectively. Recombinant proteins were expressed in E. coli and antifungal activity assays of the purified proteins demonstrated significant inhibitory activity of the larger osmotin isoform (PcOSM2) on Phytophthora capsici and Fusarium oxysporum, and a markedly reduced antifungal potential of the smaller isoform (PcOSM1). Homology modelling of the proteins indicated structural alterations in their three-dimensional architecture. Tertiary structure of PcOSM2 conformed to the known structure of osmotin, with domain I comprising of 12 β-sheets, an α-helical domain II and a domain III composed of 2 β-sheets. PcOSM1 (smaller isoform) exhibited a distorted, indistinguishable domain III and loss of 4 β-sheets in domain I. Interestingly, an interdomain acidic cleft between domains I and II, containing an optimally placed endoglucanase catalytic pair composed of Glu-Asp residues, which is characteristic of antifungal PR5 proteins, was present in both isoforms. It is well accepted that the presence of an acidic cleft correlates with antifungal activity due to the presence of endoglucanase catalytic property, and hence the present observation of significantly reduced antifungal capacity of PcOSM1 despite the presence of a strong acidic cleft, is suggestive of the possible roles played by other structural features like domain I or/and III, in deciding the antifungal potential of osmotin.

  18. 蟾蜍Caveolin-1的生物信息学分析及其在肌肉组织中的表达%Bioinformatics analysis of Caveolin-1 in toad and its expression in muscle tissues

    Institute of Scientific and Technical Information of China (English)

    张叶军; 解文放; 李洪艳; 崔玉影; 邹伟

    2015-01-01

    The Caveolin-1 protein sequence of Xenopus laevis in GenBank was obtained and its struc-ture and function were predicted through various bioinformatics analysis softwares . The results showed that Caveolin-1 of toad and mammals have similar structures and functions .Total protein from muscle tissues of Bufo gargarizans was extracted and the Caveolin-1 had been detected through Western Blot by mouse anti-human Caveolin-1 polyclonal antibody .It was suggested that Caveolin-1 is highly conserved in the long biological evolution process ,and maintains its function as a signal transduction center .%在GenBank中获得非洲爪蟾Caveolin-1蛋白序列,通过多种生物信息学分析软件对其进行分析,预测其结构和功能。以成年中华大蟾蜍为实验材料,提取蟾蜍不同肌肉组织的总蛋白,通过Western Blot技术检测Caveolin-1在肌肉组织中的表达。结果表明:蟾蜍的Caveolin-1与哺乳动物Caveolin-1有着相似的结构和功能;利用本实验室已有的抗人Caveolin-1多克隆抗体在中华大蟾蜍的不同肌肉组织中检测到了Caveolin-1的表达,从另一个角度验证了Caveolin-1的保守性。

  19. Bioinformatics Analysis of Cysteine Protease TUBAIN-like Gene in Pinus radiata%辐射松半胱氨酸蛋白酶 OTUBAIN-like 基因生物信息分析

    Institute of Scientific and Technical Information of China (English)

    吴建忠

    2015-01-01

    利用生物基因组学数据库,对辐射松半胱氨酸OTUBAIN-like基因进行生物信息学分析,预测该基因编码蛋白的理化性质、序列特征、蛋白质结构与生物学功能。结果表明,辐射松半胱氨酸OTUBAIN-like基因编码的蛋白质含294个氨基酸,具有非跨膜结构,含保守的Peptidase_C65结构域,预测其可能在蛋白质的翻译、合成及代谢中对胁迫应答和免疫应答等功能起关键性作用,本研究为辐射松半胱氨酸蛋白酶的OTUBAIN-like基因功能深入研究奠定基础。%To forecast the physical and chemical properties, sequence characteristics, structure and biological function of the gene encoding protein, bioinformatics analysis of Cysteine Protease OTUBAIN-like gene in Pinus ra-diata with genomics database was conducted.The results showed that the protein encoded by the Pinus radiata Cysteine Protease OTUBAIN-like gene contains 294 amino acids, and it has the conservative Peptidase_C65 do-main but no cross membrane structure.This study may play a critical role in protein synthesis and metabolism on stress response and immune response function.

  20. Bioinformatic Analysis of UXS Gene Family in Arabidopsis and Rice%拟南芥和水稻UXS基因家族生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    潘玉欣; 王巍杰; 胡金山

    2011-01-01

    In this study, six UDP-glucuronate decarboxylase (UXS) genes separately from Arabi-dopsis and .rice were analyzed from the gene structure, conservative motif, gene expression and phylogenesis. The results showed that 12 UXS genes had introns. All the UXS genes expressed in roots,leaves and calluses,except for OsUXSl expressing only in calluses. The 12 UXS proteins were hydrophilic proteins with a wide range of hydrophilic areas and had highly conserved structure with the family domain 3Beta-HSD and NAD-binding. The 12 UXS proteins belonged to two sub-families and their structures and functions were similar in a sub-family. The comprehensive a-nalysis revealed that as a multi-gene family,the UXS gene expressed widely and had conservative stucture.%以6个拟南芥和6个水稻UXS基因家族序列为目标,对其基因结构、保守结构域、基因表达、系统进化等方面进行了综合分析.结果显示,12个UXS基因均有内含子,除OsUXS1基因仅在愈伤组织中表达外,其余11个基因在根、叶以及愈伤组织均有表达.12个UXS基因编码蛋白均存在较大范围的亲水区,有较强的亲水性.12个蛋白结构保守性较强,含有该基因家族的保守域3Beta- HSD和NAD - binding,分成2个亚家族,家族内结构相似的基因功能较为相似.综合分析表明,UXS是一个多基因家族,基因表达范围广,结构保守性强.

  1. Bioinformatics analyses of Shigella CRISPR structure and spacer classification.

    Science.gov (United States)

    Wang, Pengfei; Zhang, Bing; Duan, Guangcai; Wang, Yingfang; Hong, Lijuan; Wang, Linlin; Guo, Xiangjiao; Xi, Yuanlin; Yang, Haiyan

    2016-03-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) are inheritable genetic elements of a variety of archaea and bacteria and indicative of the bacterial ecological adaptation, conferring acquired immunity against invading foreign nucleic acids. Shigella is an important pathogen for anthroponosis. This study aimed to analyze the features of Shigella CRISPR structure and classify the spacers through bioinformatics approach. Among 107 Shigella, 434 CRISPR structure loci were identified with two to seven loci in different strains. CRISPR-Q1, CRISPR-Q4 and CRISPR-Q5 were widely distributed in Shigella strains. Comparison of the first and last repeats of CRISPR1, CRISPR2 and CRISPR3 revealed several base variants and different stem-loop structures. A total of 259 cas genes were found among these 107 Shigella strains. The cas gene deletions were discovered in 88 strains. However, there is one strain that does not contain cas gene. Intact clusters of cas genes were found in 19 strains. From comprehensive analysis of sequence signature and BLAST and CRISPRTarget score, the 708 spacers were classified into three subtypes: Type I, Type II and Type III. Of them, Type I spacer referred to those linked with one gene segment, Type II spacer linked with two or more different gene segments, and Type III spacer undefined. This study examined the diversity of CRISPR/cas system in Shigella strains, demonstrated the main features of CRISPR structure and spacer classification, which provided critical information for elucidation of the mechanisms of spacer formation and exploration of the role the spacers play in the function of the CRISPR/cas system.

  2. Macrobrachium rosenbergii mannose binding lectin: synthesis of MrMBL-N20 and MrMBL-C16 peptides and their antimicrobial characterization, bioinformatics and relative gene expression analysis.

    Science.gov (United States)

    Arockiaraj, Jesu; Chaurasia, Mukesh Kumar; Kumaresan, Venkatesh; Palanisamy, Rajesh; Harikrishnan, Ramasamy; Pasupuleti, Mukesh; Kasi, Marimuthu

    2015-04-01

    Mannose-binding lectin (MBL), an antimicrobial protein, is an important component of innate immune system which recognizes repetitive sugar groups on the surface of bacteria and viruses leading to activation of the complement system. In this study, we reported a complete molecular characterization of cDNA encoded for MBL from freshwater prawn Macrobrachium rosenbergii (Mr). Two short peptides (MrMBL-N20: (20)AWNTYDYMKREHSLVKPYQG(39) and MrMBL-C16: (307)GGLFYVKHKEQQRKRF(322)) were synthesized from the MrMBL polypeptide. The purity of the MrMBL-N20 (89%) and MrMBL-C16 (93%) peptides were confirmed by MS analysis (MALDI-ToF). The purified peptides were used for further antimicrobial characterization including minimum inhibitory concentration (MIC) assay, kinetics of bactericidal efficiency and analysis of hemolytic capacity. The peptides exhibited antimicrobial activity towards all the Gram-negative bacteria taken for analysis, whereas they showed the activity towards only a few selected Gram-positive bacteria. MrMBL-C16 peptides produced the highest inhibition towards both the Gram-negative and Gram-positive bacteria compared to the MrMBL-N20. Both peptides do not produce any inhibition against Bacillus sps. The kinetics of bactericidal efficiency showed that the peptides drastically reduced the number of surviving bacterial colonies after 24 h incubation. The results of hemolytic activity showed that both peptides produced strong activity at higher concentration. However, MrMBL-C16 peptide produced the highest activity compared to the MrMBL-N20 peptide. Overall, the results indicated that the peptides can be used as bactericidal agents. The MrMBL protein sequence was characterized using various bioinformatics tools including phylogenetic analysis and structure prediction. We also reported the MrMBL gene expression pattern upon viral and bacterial infection in M. rosenbergii gills. It could be concluded that the prawn MBL may be one of the important molecule which

  3. Bioinformatics Analysis of ALAD Gene in Seven Plants%7种植物ALAD基因的生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    龙芳; 李绍鹏; 李茂富

    2013-01-01

    ,Medicago truncatula,Vitis vinifera,and Spinaciao leracea wereanalyzed by bioin formatics,includingthe composition of nucleic acid sequences and amino acid sequences, leader peptides, signal peptide, trans-membrane topological structure, hydrohobicity or hydrophilicity, secondary structure, tertiary structure and functional domains of protein and so on. Phylogenetic tree was constructed for the δ-aminoaevulinic acid dehydratase protein family. Results showed that the open reading frame (ORF) of samples is about 1 290, the molecular weight is about 47 kD, the pI is 5.5~7.0 which illustrated that δ-aminoaevulinic acid dehydratase is neutral to slightly acidic. The most abundant amino acids residues are Ala, Leu, Val, Arg, Ser, Gly, Pro and Asp. The study also showed that the ALAD protein peptide of these plants showed obvious hydrophobicity area and hydrophilicity area, chloroplast transit peptide;may exists a small amount of trans-membrane topological structure, no signal peptide. The main secondary structures of the proteins are random coil and Alpha helix. All these proteins have active site, Schiff base residues,aspartate-rich active site metal binding site and Allosteric magnesium binding site. The nucleotide homology com-parison indicated that A rabidopsis thaliana shared a high homology with other plants. Evolutionary analysis demon-strated that these proteins can be classified into six clusters. Those works would provide the basis for further study of the function and structure characteristis of δ-aminoaevulinic acid dehydratase in future.

  4. Cloning, Tissue Specific Expression and Bioinformatics Analysis of Goat Lysosomal α-AMA Gene%山羊溶酶体α-AMA基因的克隆、生物信息学及组织表达谱分析

    Institute of Scientific and Technical Information of China (English)

    孔祥雅; 李义; 程敏; 荆新堂; 李勤凡

    2012-01-01

    本研究旨在对山羊溶酶体α-甘露糖苷酶(α-AMA)基因进行组织表达谱和生物信息学分析.参考牛α-AMA基因序列设计引物,采用PCR技术克隆山羊α-AMA基因序列,并利用荧光定量RT-PCR进行组织表达谱分析以及进行生物信息学预测.首次获得了山羊α-AMA基因,含有完整CDS编码区3000 bp,编码999个氨基酸,其中前50个氨基酸为信号肽序列.其编码区的核苷酸序列和预测氨基酸序列与牛的α-AMA相似性最高,分别为95.93%和94.79%.组织表达谱分析表明α-AMA在山羊各组织均不同程度的表达,其中在肺脏、肝脏、小脑表达量较高.生物信息学预测发现,α-AMA蛋白属于糖苷水解酶38家族成员,有2个保守的结构域,存在9个N-糖基化位点.SWISS-MODEL同源建模山羊α-AMA具有良好的可信度.本研究为探讨酶的作用机理及疯草解毒剂的研制提供了理论依据.%Goat lysosomal a-AMA gene was amplified using RT-PCR and the tissue specific expression profile, bioinformatics characteristic of a-AMA were studied. Primers were designed based on the sequence of bovine lysosomal a-AMA gene and were used in amplifying the goat a-AMA, the tissue specific expression profile was analyzed by qRT-PCR, and the bioinformatics analysis of a-AMA was conducted. The results showed that CDS sequence of goat a-AMA was 3 000 bp, encoding a deduced protein containing 999 amino acid residues in which the first 50 residues were signal peptide, and this nucleotide sequence of CDS and the deduced amino acids shared 95. 93% and 94. 79% homology with the a-AMA mRNA of cattle. The qRT-PCR revealed that the a-AMA gene was expressed in various tissues at different levels. The expression level of this gene was higher in the lung, liver, and cerebellum. It was predicted that a-AMA was belonged to Glycosyl hydrolases family 38, and composed of two conserved domains. There were thirty-five phosphorylation sites, one phosphorylation site of

  5. Identifying At-Risk Students in General Chemistry via Cluster Analysis of Affective Characteristics

    Science.gov (United States)

    Chan, Julia Y. K.; Bauer, Christopher F.

    2014-01-01

    The purpose of this study is to identify academically at-risk students in first-semester general chemistry using affective characteristics via cluster analysis. Through the clustering of six preselected affective variables, three distinct affective groups were identified: low (at-risk), medium, and high. Students in the low affective group…

  6. Hardware Acceleration of Bioinformatics Sequence Alignment Applications

    NARCIS (Netherlands)

    Hasan, L.

    2011-01-01

    Biological sequence alignment is an important and challenging task in bioinformatics. Alignment may be defined as an arrangement of two or more DNA or protein sequences to highlight the regions of their similarity. Sequence alignment is used to infer the evolutionary relationship between a set of pr

  7. A bioinformatics approach to marker development

    NARCIS (Netherlands)

    Tang, J.

    2008-01-01

    The thesis focuses on two bioinformatics research topics: the development of tools for an efficient and reliable identification of single nucleotides polymorphisms (SNPs) and polymorphic simple sequence repeats (SSRs) from expressed sequence tags (ESTs) (Chapter 2, 3 and 4), and the subsequent imple

  8. Implementing bioinformatic workflows within the bioextract server

    Science.gov (United States)

    Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed servi...

  9. Bioinformatics in Undergraduate Education: Practical Examples

    Science.gov (United States)

    Boyle, John A.

    2004-01-01

    Bioinformatics has emerged as an important research tool in recent years. The ability to mine large databases for relevant information has become increasingly central to many different aspects of biochemistry and molecular biology. It is important that undergraduates be introduced to the available information and methodologies. We present a…

  10. "Extreme Programming" in a Bioinformatics Class

    Science.gov (United States)

    Kelley, Scott; Alger, Christianna; Deutschman, Douglas

    2009-01-01

    The importance of Bioinformatics tools and methodology in modern biological research underscores the need for robust and effective courses at the college level. This paper describes such a course designed on the principles of cooperative learning based on a computer software industry production model called "Extreme Programming" (EP).…

  11. Bioinformatics: A History of Evolution "In Silico"

    Science.gov (United States)

    Ondrej, Vladan; Dvorak, Petr

    2012-01-01

    Bioinformatics, biological databases, and the worldwide use of computers have accelerated biological research in many fields, such as evolutionary biology. Here, we describe a primer of nucleotide sequence management and the construction of a phylogenetic tree with two examples; the two selected are from completely different groups of organisms:…

  12. Evolution of web services in bioinformatics

    NARCIS (Netherlands)

    Neerincx, P.B.T.; Leunissen, J.A.M.

    2005-01-01

    Bioinformaticians have developed large collections of tools to make sense of the rapidly growing pool of molecular biological data. Biological systems tend to be complex and in order to understand them, it is often necessary to link many data sets and use more than one tool. Therefore, bioinformatic

  13. Bioinformatics Analysis of Non-structural Protein 2 of PRRSV%猪繁殖与呼吸综合征病毒非结构蛋白2的生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    孙荡; 高歌; 周胜; 鲍梦雅; 茅翔

    2012-01-01

    [ Objective ] The research aimed to provide theoretical basis for constructing the antagonistic small-peptide and designing the related vaccine in clinic. [ Method] The bioinformaticg analysis was made on amino acid sequences of non-structural protein 2( Nsp 2) of PRRSV strain isolated in China on NCBI website. And its composition, physical and chemical characters, transmembrane domain, secondary structure, glycosylalion site, phosphorylation site and B cell epitope were predicted. [ Result] Nsp2 contained 7 transmembrane regions, 4 glycosy-lation sites and 76 phosphorylation sites. The content of random coil in the secondary structure was the highest (57.64% ). The comprehensive analysis showed that Nsp2 had a great lot of epitopee. [ Conclusion] The bioinformatic analysis of Nsp2 could provide theoretical basis for the vaccine design of PRRSV.%[目的]为临床上构建PRRSV的拮抗性小肽及相关疫苗设计提供理论参考.[方法]对NCBI上国内分离的PRRSV毒株(FJ 175688.1) Nsp2的氨基酸序列进行生物信息学方法分析,并对其组分、理化性质、跨膜结构域、二级结构、糖基化位点、磷酸化位点和B细胞抗原表位进行预测.[结果]Nsp2含有7段的跨膜区域,二级结构中自由卷曲含量最高(57.64%),同时含有4个糖基化位点和76个磷酸化位点.综合分析表明,Nsp2具有大量的抗原决定簇.[结论]通过对Nsp2的生物信息学分析,可为PRRSV疫苗设计提供理论基础.

  14. Applied bioinformatics: Genome annotation and transcriptome analysis

    DEFF Research Database (Denmark)

    Gupta, Vikas

    and dhurrin, which have not previously been characterized in blueberries. There are more than 44,500 spider species with distinct habitats and unique characteristics. Spiders are masters of producing silk webs to catch prey and using venom to neutralize. The exploration of the genetics behind these properties...... has just started. We have assembled and annotated the first two spider genomes to facilitate our understanding of spiders at the molecular level. The need for analyzing the large and increasing amount of sequencing data has increased the demand for efficient, user friendly, and broadly applicable...

  15. Genome bioinformatic analysis of nonsynonymous SNPs

    Directory of Open Access Journals (Sweden)

    Todd John A

    2007-08-01

    Full Text Available Abstract Background Genome-wide association studies of common diseases for common, low penetrance causal variants are underway. A proportion of these will alter protein sequences, the most common of which is the non-synonymous single nucleotide polymorphism (nsSNP. It would be an advantage if the functional effects of an nsSNP on protein structure and function could be predicted, both for the final identification process of a causal variant in a disease-associated chromosome region, and in further functional analyses of the nsSNP and its disease-associated protein. Results In the present report we have compared and contrasted structure- and sequence-based methods of prediction to over 5500 genes carrying nearly 24,000 nsSNPs, by employing an automatic comparative modelling procedure to build models for the genes. The nsSNP information came from two sources, the OMIM database which are rare (minor allele frequency, MAF, 0.05, have no known link to a disease. For over 40% of the nsSNPs, structure-based methods predicted which of these sequence changes are likely to either disrupt the structure of the protein or interfere with the function or interactions of the protein. For the remaining 60%, we generated sequence-based predictions. Conclusion We show that, in general, the prediction tools are able distinguish disease causing mutations from those mutations which are thought to have a neutral affect. We give examples of mutations in genes that are predicted to be deleterious and may have a role in disease. Contrary to previous reports, we also show that rare mutations are consistently predicted to be deleterious as often as commonly occurring nsSNPs.

  16. Assessing Reliability of Cellulose Hydrolysis Models to Support Biofuel Process Design – Identifiability and Uncertainty Analysis

    DEFF Research Database (Denmark)

    Sin, Gürkan; Meyer, Anne S.; Gernaey, Krist

    2010-01-01

    The reliability of cellulose hydrolysis models is studied using the NREL model. An identifiability analysis revealed that only 6 out of 26 parameters are identifiable from the available data (typical hydrolysis experiments). Attempting to identify a higher number of parameters (as done in the ori...... to analyze the uncertainty of model predictions. This allows judging the fitness of the model to the purpose under uncertainty. Hence we recommend uncertainty analysis as a proactive solution when faced with model uncertainty, which is the case for biofuel process development research....

  17. The bioinformatics of microarrays to study cancer: Advantages and disadvantages

    Science.gov (United States)

    Rodríguez-Segura, M. A.; Godina-Nava, J. J.; Villa-Treviño, S.

    2012-10-01

    Microarrays are devices designed to analyze simultaneous expression of thousands of genes. However, the process will adds noise into the information at each stage of the study. To analyze these thousands of data is necessary to use bioinformatics tools. The traditional analysis begins by normalizing data, but the obtained results are highly dependent on how it is conducted the study. It is shown the need to develop new strategies to analyze microarray. Liver tissue taken from an animal model in which is chemically induced cancer is used as an example.

  18. 柱型苹果GRAS基因家族MdSCR的克隆与生物信息学分析%Cloning and Bioinformatic Analysis of MdSCR Gene of GRAS Gene Family in Columnar Apple

    Institute of Scientific and Technical Information of China (English)

    韩菲; 戴洪义; 张玉刚

    2012-01-01

    以“富士×特拉蒙”杂交后代柱型株系新梢为试材,用同源克隆方法,得到了柱型苹果GRAS基因家族的MdSCR基因,开放阅读框(ORF)长度为1458bp,编码486个氨基酸,推测其蛋白质分子质量为54.894kD,等电点(pI)为5.96,具有GRAS基因家族典型结构域.MdSCR基因与毛果杨(Populus trichocarpa)GRAS转录因子的氨基酸序列具有67.8%的相似性,与江南卷柏(Selaginella moellendorffii)、葡萄(Vitis vinifera)、大豆(Glycine max)的SCARECROW-like氨基酸序列均有58.5%以上的相似性.氨基酸聚类分析表明,苹果和毛果杨、江南卷柏聚类关系最近,其次是葡萄,关系最远的是欧洲赤松;生物信息学分析表明,MdSCR的蛋白质二级结构有208个α一螺旋,54个延伸链和224个无规则折叠区.%The shoot tips of columnar apple progeny of ' Fuji x Telamon' were used as experimental materials for cloning MdSCR gene,which belonged to GRAS transcription factors gene family. The open reading flame (ORF) of MdSCR was 1464bp,encoding 486 amino acids,and the estimated molecular weight and isoelectric point (pI) of trfe putative protein were 54. 894 kD and 5. 96. The results indicated that MdSCR had highly conserved GRAS domain. MdSCR has 67. 8% amino acid sequence similarity with GRAS transcription factors of Populus trichocarpa, and has more than 58. 85% amino acid sequence similarity with GRAS transcription factors of Selaginella moellen-dorffii, grape and soybean. Amino acids cluster analysis showed that apple and Populus trichocarpa, Selaginella moellendorffii had the closest clustering relations, and followed by grape, the furthest was Pinus sylvestris. Bioinfor-matics analysis showed that the protein secondary structure of MdSCR has 208 alpha helix,54 extended strand,and 224 random coils.

  19. Biochemical, Transcriptional, and Bioinformatic Analysis of Lipid Droplets from Seeds of Date Palm (Phoenix dactylifera L.) and Their Use as Potent Sequestration Agents against the Toxic Pollutant, 2,3,7,8-Tetrachlorinated Dibenzo-p-Dioxin.

    Science.gov (United States)

    Hanano, Abdulsamie; Almousally, Ibrahem; Shaban, Mouhnad; Rahman, Farzana; Blee, Elizabeth; Murphy, Denis J

    2016-01-01

    Contamination of aquatic environments with dioxins, the most toxic group of persistent organic pollutants (POPs), is a major ecological issue. Dioxins are highly lipophilic and bioaccumulate in fatty tissues of marine organisms used for seafood where they constitute a potential risk for human health. Lipid droplets (LDs) purified from date palm, Phoenix dactylifera, seeds were characterized and their capacity to extract dioxins from aquatic systems was assessed. The bioaffinity of date palm LDs toward 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), the most toxic congener of dioxins was determined. Fractioned LDs were spheroidal with mean diameters of 2.5 µm, enclosing an oil-rich core of 392.5 mg mL(-1). Isolated LDs did not aggregate and/or coalesce unless placed in acidic media and were strongly associated with three major groups of polypeptides of relative mass 32-37, 20-24, and 16-18 kDa. These masses correspond to the LD-associated proteins, oleosins, caleosins, and steroleosins, respectively. Efficient partitioning of TCDD into LDs occurred with a coefficient of log K LB/w,TCDD = 7.528 ± 0.024; it was optimal at neutral pH and was dependent on the presence of the oil-rich core, but was independent of the presence of LD-associated proteins. Bioinformatic analysis of the date palm genome revealed nine oleosin-like, five caleosin-like, and five steroleosin-like sequences, with predicted structures having putative lipid-binding domains that match their LD stabilizing roles and use as bio-based encapsulation systems. Transcriptomic analysis of date palm seedlings exposed to TCDD showed strong up-regulation of several caleosin and steroleosin genes, consistent with increased LD formation. The results suggest that the plant LDs could be used in ecological remediation strategies to remove POPs from aquatic environments. Recent reports suggest that several fungal and algal species also use LDs to sequester both external and internally derived hydrophobic toxins, which

  20. Alternative to Ritt's Pseudodivision for finding the input-output equations in algebraic structural identifiability analysis

    CERN Document Server

    Meshkat, Nicolette; DiStefano, Joseph J

    2012-01-01

    Differential algebra approaches to structural identifiability analysis of a dynamic system model in many instances heavily depend upon Ritt's pseudodivision at an early step in analysis. The pseudodivision algorithm is used to find the characteristic set, of which a subset, the input-output equations, is used for identifiability analysis. A simpler algorithm is proposed for this step, using Gr\\"obner Bases, along with a proof of the method that includes a reduced upper bound on derivative requirements. Efficacy of the new algorithm is illustrated with two biosystem model examples.

  1. 中国蛇岛蝮蛇毒腺cDNA文库ESTs序列测定及生物信息学分析%Sequence Determination and Bioinformatics Analysis of ESTs from Chinese Gloydius shedaoensis shedaoensis Venom Gland

    Institute of Scientific and Technical Information of China (English)

    郭春梅; 孙明忠; 郑体花; 任一鑫; 刘淑清

    2012-01-01

    前期我们构建了中国蛇岛蝮蛇(Gloydius shedaoensis shedaoensis,GSS)毒腺(GSSG)的cDNA(GSSG-cDNA)文库.本文从构建的GSSG-cDNA文库阳性重组子中随机挑选了216个单克隆进行5'端表达序列标签(EST)单向测序,获得了211条高质量的ESTs.生物信息学序列比对分析结果表明84个克隆为已知功能基因,29个克隆为未知功能基因,98个克隆为新基因,分别占总ESTs的39.8%、13.7%和46.5%.成功获得了GSSG的部分ESTs序列,为GSS蛋白活性组分基因的克隆、表达和功能研究奠定了一定基础.%Previously, we have successfully constructed a cDNA library of Chinese Gloydius shedaoensis shedaoensis (GSS) venom gland (GSSG). In current work,a total of 216 GSSG-cDNAs were randomly picked up and analyzed by single-pass sequencing from the 5' end. A total of 211 ESTs in high quality were generated and sequenced. Bioinformatics sequencing blasting results indicated that 84 ESTs could be annotated as the genes with known function,29 ESTs as similar genes with unknown function,and the rest of 98 ESTs were identified as novel genes,which account 39. 8% ,13.7% and 46. 5% of 211 obtained ESTs,respectively. Taken together,the partial ESTs of GSSG were obtained in current work, which provides certain useful information for cloning and expressing target protein genes and studying the biological functions of target proteins from GSS.

  2. Component-Based Approach for Educating Students in Bioinformatics

    Science.gov (United States)

    Poe, D.; Venkatraman, N.; Hansen, C.; Singh, G.

    2009-01-01

    There is an increasing need for an effective method of teaching bioinformatics. Increased progress and availability of computer-based tools for educating students have led to the implementation of a computer-based system for teaching bioinformatics as described in this paper. Bioinformatics is a recent, hybrid field of study combining elements of…

  3. 番茄ARF2蛋白的生物信息学分析与亚细胞定位%Bioinformatic Analysis and Subcellular Localization of Solanum lycopersicum ARF2

    Institute of Scientific and Technical Information of China (English)

    冯媛媛; 侯佩; 李颖楠; 刘永胜

    2012-01-01

    克隆番茄(Solanum lycopersicum)ARF2基因,并分析其分子特性和亚细胞定位,为研究其功能提供基础.通过生物信息学方法分析SlARF2基因编码蛋白的理化性质和分子特性.采用RT-PCR技术从番茄果实cDNA中扩增SIARF2基因全长,并构建与黄色荧光蛋白(YFP)融合的pBA-ARF2-YFP表达载体,进而再通过农杆菌介导的遗传转化方法,将重组质粒转化到野生型番茄中,将得到的T1代转基因种子萌发,然后取根尖通过荧光显微镜观察了融合蛋白在活细胞内分布的特点.生物信息学分析结果表明,S1ARF2是富含Ser、Leu、Gly和Pro以及具有ARF家族典型结构域的可溶性蛋白,其氨基酸序列与葡萄、木薯和拟南芥的同源性分别为70.08%、66.94%和60.87%.经酶切和测序分析证实pBA-ARF2-YFP融合表达载体构建成功,此外,PCR分析表明融合蛋白在转基因植株中得到表达.经荧光显微镜观察,ARF2定位在细胞核中.表明转录因子S1ARF2定位在细胞核中,对番茄果实发育和成熟起重要作用.%Auxin response factors (ARFs) are important transcription factors involved in auxin signal transduction pathway. In order to elucidate the function of tomato ARF2, we isolated the SIARF2 gene and analyzed its molecular features, in addition, we observed the subcellular localization of ARF2 in transgenic tomato plants. Physicochemical properties and molecular features of ARF2 were predicted by bioinformatic approaches including physical and chemical properties analysis, hydrophobicity analysis, domain analysis, phylogenetic tree analysis and subcellular localization analysis. Moreover, the full-length of SLARF2 gene was amplified by RT-PCR, and a binary vector consisting of ARF2 fused with the yellow fluorescent protein (YFP) coding sequence was further constructed. Using the method of Agrobacterium-mediated transformation, the recombinant vector was transformed into wild-type tomato, and the transgenic tomato

  4. Comparative QTL mapping of resistance to sugarcane mosaic virus in maize based on bioinformatics

    Institute of Scientific and Technical Information of China (English)

    Xiangling L(U); Xinhai LI; Chuanxiao XIE; Zhuanfang HAO; Hailian JI; Liyu SHI; Shihuang ZHANG

    2008-01-01

    The development of genomics and bioinfor-matics offers new tools for comparative gene mapping. In this paper, an integrated QTL map for sugarcane mosaic virus (SCMV) resistance in maize was constructed by compiling a total of 81 QTL loci available, using the Genetic Map IBM2 2005 Neighbors as reference. These 81 QTL loci were scattered on 7 chromosomes of maize, and most of them were clustered on chromosomes 3 and 6. By using the method of meta-analysis, we identified one "consensus QTL" on chromosome 3 covering a genetic distance of 6.44 cM, and two on chromosome 6 covering genetic distances of 16 cM and 27.48 cM, respectively. Four positional candidate resistant genes were identified within the "consensus QTL" on chromosome 3 via the strategy of comparative genomics. These results suggest that application of a combination of meta-analysis within a species with sequence homology comparison in a related model plant is an efficient approach to identify the major QTL and its candidate gene(s) for the target traits. The results of this study provide useful information for iden-tifying and cloning the major gene(s) conferring resistance to SCMV in maize.

  5. Prediction and Bioinformatics Analysis of Human Gene Expression Profiling Regulated by Amifostine%依硫磷酸调控人类基因表达谱的预测及生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    杨波; 脱朝伟; 蔡力力; 迟小华; 卢学春; 张峰; 脱帅; 朱宏丽; 刘丽宏; 严江伟

    2011-01-01

    Objective of this study was to perform bioinformatics analysis of the characteristics of gene expression profiling regulated by amifostine and predict its novel potential biological function to provide a direction for further exploring pharmacological actions of amifostine and study methods. Amifostine was used as a key word to search intemet-based free gene expression database including GEO, affymetrix gene chip database, GenBank, SAGE,GeneCard, InterPro, ProtoNet, UniProt and BLOCKS and the sifted amifostine-regulated gene expression profiling data was subjected to validity testing, gene expression difference analysis and functional clustering and gene annotation. The results showed that only one data of gene expression profiling regulated by amifostine was sifted from GEO database (accession: GSE3212). Through validity testing and gene expression difference analysis, significant difference (p <0.01 ) was only found in 2.14% of the whole genome (460/192000). Gene annotation analysis showed that 139 out of 460 genes were known genes, in which 77 genes were up-regulated and 62 genes were down-regulated. 13 out of 139 genes were newly expressed following amifostine treatment of K562 cells, however expression of 5 genes was completely inhibited. Functional clustering displayed that 139 genes were divided into 1 l categories and their biological function was involved in hematopoietic and immunologic regulation, apoptosis and cell cycle. It is concluded that bioinformatics method can be applied to analysis of gene expression profiling regulated by amifostine. Amifostine has a regulatory effect on human gene expression profiling and this action is mainly presented in biological processes including hematopoiesis,immunologic regulation, apoptosis and cell cycle and so on. The effect of amifostine on human gene expression need to be further testified in experimental condition.%本研究对依硫磷酸调控人类基因表达谱进行生物信息学分析,预测其可

  6. Bioinformatics analysis of microRNAs differently expressed in major depression disorder%重症抑郁障碍外周血microRNA的生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    范惠民; 吴文波; 牛威; 孙欣羊; 仲爱芳; 赵林; 张理义

    2015-01-01

    Objective To predict the target genes and function of has-miR-26b,has-miR-1972,has-miR4485,has-miR-4498,and has-miR-4743 by bioinformatics analysis,and provide the theoretical basis for the further research.Methods The targets of the five microRNAs were predicted by Target Scan,miRBD,and DIANA-microT-CDS,and the result were analyzed by gene ontology and pathway analysis using FunNet.Results 734 predicted targets were obtained by finding the intersected genes of Target Scan,miRBD,and DIANA-microT-CDS.GO analysis showed that biological processes regulated by the differentially expressed microRNAs included diverse terms,among which some terms (e.g.,central nervous system development,neuron differentiation,axonogenesis,synaptic transmission,learning,and memory,etc.) had direct relationship with the central nervous system and brain functions.The pathway analysis showed that a significant enrichment in several pathways related to neuronal brain function,such as axon guidance,glutamatergic synapse,Wnt signaling pathway,mTOR signaling pathway,VEGF signaling pathway,etc.Among the five microRNAs,has-miR-26b,has-miR-1972,has-miR-4498 might have more important regulatory functions.Conclusion Bioinformatic analysis indicates that has-miR-26b,has-miR-1972,has-miR-4485,has-miR-4498,and has-miR-4743 are closely related to the mechanism and pathogenesis of major depressive disorder.%目的 采用生物信息学方法预测has-miR-26b、has-miR-1972、has-miR-4485、has-miR-4498和has-miR-4743的靶基因,并分析其所参与的生物学过程及信号通路,为后续功能研究提供理论依据.方法 应用TargetScan、miRBD及DIANA-microT-CDS预测上述5个miRNA的靶基因,各个miRNA的预测结果分别取交集后,再取其合集,应用FunNet进行功能富集分析和信号通路富集分析.结果 用3个在线数据库得到5个miRNA的靶基因的合集为734个;靶基因所涉及的生物学过程广泛,包括多项与中枢神经系统相关的条目,如皮层发

  7. Isolation, Identification and Bioinformatics Analysis of CAT Protein Related with Hepatotoxity by Copper Nanoparticles in Rats%纳米铜对大鼠肝脏毒性相关蛋白过氧化氢酶的分离鉴定及生物信息学分析

    Institute of Scientific and Technical Information of China (English)

    董书伟; 高昭辉; 申小云; 薛慧文; 荔霞

    2012-01-01

    [目的]分离和鉴定纳米铜对大鼠肝脏毒性相关蛋白过氧化氢酶(catalase,CAT),探讨CAT在毒性发挥中的作用,为揭示纳米铜对肝脏毒性机制提供依据.[方法]应用2-DE技术和PDQuest 8.0软件在大鼠肝脏蛋白组中筛选纳米铜对肝脏毒性差异蛋白,经质谱鉴定后进行生物信息学分析.[结果]筛选到下调的差异蛋白点6602和7702与肝毒性相关,鉴定均为CAT蛋白;其性质稳定,有一定亲水性,无信号肽,定位于细胞质,可能属于非分泌性蛋白,含有过氧化氢酶活性位点64FDRERIPERVVHAKGAG80和过氧化氢酶亚铁血红素配合基位点354RLFAYPDTH362等功能位点;无规则卷曲、α螺旋和延伸链是其主要的二级结构元件,并预测了其三级结构图;同源性分析表明,大鼠的CAT与其它8个物种有较高同源性,并构建了CAT蛋白的系统进化树.[结论]纳米铜通过下调大鼠肝脏中CAT蛋白表达,引起肝细胞氧化应激损伤,可能是其发挥毒性作用的途径之一.%[Objective] In order to investigate the hepatotoxic mechanisms of nanoparticles copper, catalase (CAT) was isolated and identified from liver, and analyzed by bioinformatics, which is related with hepatotoxity induced by copper nanoparticles in rats. [Method] The differential expression proteins related with hepatotoxity of copper nanoparticles were screened by 2-DE and PDQuest 8.0 software and then analyzed by bioinformatics after identified by MALDI-TOF-TOF MS through comparative proteomics strategy. [Result] The 6602 and 7702 spots of differentially expressed proteins were found to associate with hepatotoxity. They were identified as CAT protein which was located in the cytoplasm. This hydrophilic protein had no signal peptide, and was non-secreted protein. It also contained catalase active sites 64FDRERIPERWHAKGAG80 and catalase heme ligand sites 354RLFAYPDTH362. The random coils, o-helices and extended chains were its main secondary structural

  8. Comparative analysis of a cryptic thienamycin-like gene cluster identified in Streptomyces flavogriseus by genome mining.

    Science.gov (United States)

    Blanco, Gloria

    2012-06-01

    In silico database searches allowed the identification in the S. flavogriseus ATCC 33331 genome of a carbapenem gene cluster highly related to the S. cattleya thienamycin one. This is the second cluster found for a complex highly substituted carbapenem. Comparative analysis revealed that both gene clusters display a high degree of synteny in gene organization and in protein conservation. Although the cluster appears to be silent under our laboratory conditions, the putative metabolic product was predicted from bioinformatics analyses using sequence comparison tools. These data, together with previous reports concerning epithienamycins production by S. flavogriseus strains, suggest that the cluster metabolic product might be a thienamycin-like carbapenem, possibly the epimeric epithienamycin. This finding might help in understanding the biosynthetic pathway to thienamycin and other highly substituted carbapenems. It also provides another example of genome mining in Streptomyces sequenced genomes as a powerful approach for novel antibiotic discovery.

  9. Bioinformatics Mining and Modeling Methods for the Identification of Disease Mechanisms in Neurodegenerative Disorders

    Directory of Open Access Journals (Sweden)

    Martin Hofmann-Apitius

    2015-12-01

    Full Text Available Since the decoding of the Human Genome, techniques from bioinformatics, statistics, and machine learning have been instrumental in uncovering patterns in increasing amounts and types of different data produced by technical profiling technologies applied to clinical samples, animal models, and cellular systems. Yet, progress on unravelling biological mechanisms, causally driving diseases, has been limited, in part due to the inherent complexity of biological systems. Whereas we have witnessed progress in the areas of cancer, cardiovascular and metabolic diseases, the area of neurodegenerative diseases has proved to be very challenging. This is in part because the aetiology of neurodegenerative diseases such as Alzheimer´s disease or Parkinson´s disease is unknown, rendering it very difficult to discern early causal events. Here we describe a panel of bioinformatics and modeling approaches that have recently been developed to identify candidate mechanisms of neurodegenerative diseases based on publicly available data and knowledge. We identify two complementary strategies—data mining techniques using genetic data as a starting point to be further enriched using other data-types, or alternatively to encode prior knowledge about disease mechanisms in a model based framework supporting reasoning and enrichment analysis. Our review illustrates the challenges entailed in integrating heterogeneous, multiscale and multimodal information in the area of neurology in general and neurodegeneration in particular. We conclude, that progress would be accelerated by increasing efforts on performing systematic collection of multiple data-types over time from each individual suffering from neurodegenerative disease. The work presented here has been driven by project AETIONOMY; a project funded in the course of the Innovative Medicines Initiative (IMI; which is a public-private partnership of the European Federation of Pharmaceutical Industry Associations

  10. Bioinformatics Mining and Modeling Methods for the Identification of Disease Mechanisms in Neurodegenerative Disorders.

    Science.gov (United States)

    Hofmann-Apitius, Martin; Ball, Gordon; Gebel, Stephan; Bagewadi, Shweta; de Bono, Bernard; Schneider, Reinhard; Page, Matt; Kodamullil, Alpha Tom; Younesi, Erfan; Ebeling, Christian; Tegnér, Jesper; Canard, Luc

    2015-12-07

    Since the decoding of the Human Genome, techniques from bioinformatics, statistics, and machine learning have been instrumental in uncovering patterns in increasing amounts and types of different data produced by technical profiling technologies applied to clinical samples, animal models, and cellular systems. Yet, progress on unravelling biological mechanisms, causally driving diseases, has been limited, in part due to the inherent complexity of biological systems. Whereas we have witnessed progress in the areas of cancer, cardiovascular and metabolic diseases, the area of neurodegenerative diseases has proved to be very challenging. This is in part because the aetiology of neurodegenerative diseases such as Alzheimer´s disease or Parkinson´s disease is unknown, rendering it very difficult to discern early causal events. Here we describe a panel of bioinformatics and modeling approaches that have recently been developed to identify candidate mechanisms of neurodegenerative diseases based on publicly available data and knowledge. We identify two complementary strategies-data mining techniques using genetic data as a starting point to be further enriched using other data-types, or alternatively to encode prior knowledge about disease mechanisms in a model based framework supporting reasoning and enrichment analysis. Our review illustrates the challenges entailed in integrating heterogeneous, multiscale and multimodal information in the area of neurology in general and neurodegeneration in particular. We conclude, that progress would be accelerated by increasing efforts on performing systematic collection of multiple data-types over time from each individual suffering from neurodegenerative disease. The work presented here has been driven by project AETIONOMY; a project funded in the course of the Innovative Medicines Initiative (IMI); which is a public-private partnership of the European Federation of Pharmaceutical Industry Associations (EFPIA) and the European

  11. Analyses of Brucella pathogenesis, host immunity, and vaccine targets using systems biology and bioinformatics.

    Science.gov (United States)

    He, Yongqun

    2012-01-01

    Brucella is a Gram-negative, facultative intracellular bacterium that causes zoonotic brucellosis in humans and various animals. Out of 10 classified Brucella species, B. melitensis, B. abortus, B. suis, and B. canis are pathogenic to humans. In the past decade, the mechanisms of Brucella pathogenesis and host immunity have been extensively investigated using the cutting edge systems biology and bioinformatics approaches. This article provides a comprehensive review of the applications of Omics (including genomics, transcriptomics, and proteomics) and bioinformatics technologies for the analysis of Brucella pathogenesis, host immune responses, and vaccine targets. Based on more than 30 sequenced Brucella genomes, comparative genomics is able to identify gene variations among Brucella strains that help to explain host specificity and virulence differences among Brucella species. Diverse transcriptomics and proteomics gene expression studies have been conducted to analyze gene expression profiles of wild type Brucella strains and mutants under different laboratory conditions. High throughput Omics analyses of host responses to infections with virulent or attenuated Brucella strains have been focused on responses by mouse and cattle macrophages, bovine trophoblastic cells, mouse and boar splenocytes, and ram buffy coat. Differential serum responses in humans and rams to Brucella infections have been analyzed using high throughput serum antibody screening technology. The Vaxign reverse vaccinology has been used to predict many Brucella vaccine targets. More than 180 Brucella virulence factors and their gene interaction networks have been identified using advanced literature mining methods. The recent development of community-based Vaccine Ontology and Brucellosis Ontology provides an efficient way for Brucella data integration, exchange, and computer-assisted automated reasoning.

  12. 家蝇转铁蛋白基因的克隆和生物信息学分析%Cloning and bioinformatics analysis of transferrin gene of Musca domestica (Housefly)

    Institute of Scientific and Technical Information of China (English)

    龚晓林; 张洁; 李显航; 刘红美

    2012-01-01

    为了研究家蝇转铁蛋白基因功能,获得全长cDNA序列并对其蛋白序列进行生物信息学分析,利用在线分析程序和相关工具软件分析转铁蛋白基因的开放读码框,分析编码蛋白的理化性质、结构域、并预测其空间结构和功能.结果表明家蝇转铁蛋白基因编码蛋白由622个氨基酸组成,分子量为70.58kDa,理论等电点为5.33,为稳定蛋白,有跨膜区,含有信号肽,该蛋白属于TR-FER保守结构域家族,亚细胞定位于细胞核,二级结构以无规则卷曲为主,功能预测该蛋白具有酶活性.%To obtain and perform the bioinformatics analysis of Transferrin gene of Musca domestica(Housefly) so as to provide the basis for its' function research. Internet online procedures and the related software were exploited to analyze the open reading frame ( ORF) of Transferrin gene, physical and chemical properties of Transferrin protein and the domains of the protein, and predict the space structure and functions of the protein. The Transferrin protein sequence was composed of 622 amino acids with 70. 58 kDa of the molecular weight. Theoretical isoelectric point was 5. 33. The protein was stable and had structures of transmembrane and signal peptide. The protein belonged to the family of TR - FER. The protein probably located in nucleus and dominated by random coils in second structure. The ProtFun result showed the protein has enzymatic activity.

  13. 猪ATGL基因5'调控区的SNPs检测及生物信息学分析%SNPs Detection and Bioinformatics Analysis on 5'Regulatory Region of the Porcine ATGL Gene

    Institute of Scientific and Technical Information of China (English)

    华绪川; 张立凡; 蒋晓玲; 翟继鹏; 徐宁迎; 张金枝

    2011-01-01

    脂肪甘油三酯水解酶(ATGL)是脂肪组织脂肪动员过程中的水解限速酶,主要催化甘油三酯水解为甘油二酯.研究对金华猪、岔路黑猪、杜洛克、大约克和皮特兰5个猪种ATGL基因其5'调控区1.2 kb的片段进行SNPs检测和生物信息学分析.结果表明:ATGI基因5'调控区存在第-845位G→C和第-854位T→C的连锁突变.序列分析显示该区域可能存在启动子区,且2个突变都会导致其部分潜在转录因子结合位点的产生或消失.采用PCR-RFLP方法检测g-845G→C座位在金华猪、岔路黑猪、杜洛克、大约克和皮特兰中的分布情况,卡方分析结果显示,3种基因型在5个猪种中的分布存在极显著差异(P<0.01),提示不同猪种间脂肪性状的差异可能与ATGL基因5'调控区的基因突变有关.%As a key enzyme in the initial step of triglyceride hydrolysis, adipose triglyceride lipase (ATGL) plays a critical role in the lipolytic catabolism of stored fat in adipose tissue. 1.2 kb of the 5' flanking region of the porcine A TGL gene was sequenced in this study and two completely linked mutations, g-845G→C and g-854T→C, were found in the region. Results of the bioinformatics analysis indicated the presence of promoter sequence and mutations in loci g -845G→C and g-854T→C could create or destroy potential transcription factor binding sites. Locus g-845G→C were genotyped in Jinhua, Chalu black, Large Yorkshire, Duroc and Pietrain pig breeds by PCR-RFLP, and the results showed that the distribution of three genotypes was significantly different among breeds (P<0.01), which suggested that the g-845G→C mutation may contribute to diversity of fat traits in different pig breeds.

  14. 丹参肉桂酰辅酶A还原酶基因克隆与生物信息学分析%Bioinformatics and Expression Analysis of Cinnamoyl-CoA Reductase Gene from Salvia miltiorrhiza Bunge

    Institute of Scientific and Technical Information of China (English)

    陈尘; 王政军; 曹鑫林; 王喆之

    2011-01-01

    One sequence which showed high homology with cinnamoyl-CoA reductase gene was found by analyzing the transcriptome database of Salvia miltiorrhiza Bunge. The cDNA sequence of the gene was cloned and named as S/wCCR-2(GenBank accession number: JF784010). SmCCR-2 included a 966 bp opening reading frame encoding a 321 amino acid-peptide. SmCCR-2 belonged to the NADB_Rossmann super-family and contained the conserved NWYCY motif. Bioinformatics analysis showed that the predicted molecular weight of SmCCR-2 was 35. 80 kD, which was consistent with the result of SDS-PAGE. SmCCR-2 encoded a hydrophilic and stable neutral protein with trans-membrane domain. The Quantitative RT-PCR results revealed that this gene expressed differently in different organs and the expression in stem was the highest. Besides,SmCCR-2 could be induced by pathogen,indicating that it may be involved in plant defenses.%分析丹参转录组数据库,获得一条新的肉桂酰辅酶A还原酶( cinnamoyl-CoA reductase,CCR)基因,命名为SmCCR-2(GenBank注册号为JF784010).该基因包含一个长为966 bp的完整开放读码框,编码321个氮基酸残基.生物信息学分析显示,SmCCR-2编码的蛋白具有NWYCY基序,属于NABD_Rossmann超家族,相对分子量为35.80 kD;预测SmCCR-2为中性亲水的稳定蛋白,存在跨膜结构域.实时荧光定量PCR结果表明,SmCCR-2基因在丹参各组织都有表达,茎中表达量最高.其表达受到病原菌的影响,表明SmCCR-2基因可能与植物防御反应有关.

  15. 丹参赤霉素刺激转录基因SmGAST的克隆及生物信息学分析%Cloning and bioinformatics analysis of SmGAST from Salvia miltiorrhiza Bunge

    Institute of Scientific and Technical Information of China (English)

    强毅; 王喆之

    2011-01-01

    通过对丹参EST数据库进行BLAST比对,发现一条与GAST基因家族同源性较高的基因序列(CV163373),采用RT-PCR方法从丹参中克隆得到该基因,命名为SmGAST.该基因cDNA序列长406 bp,包含一个303 bp的ORF,编码100个氨基酸残基,推测为GAST基因家族的一个新基因.生物信息学分析表明,SmGAST所编码蛋白的分子量为10.5473 kD,理论等电点为9.01,具有信号肽,为定位于胞外的不