WorldWideScience

Sample records for multigenic sequence analysis

  1. PseudoMLSA: a database for multigenic sequence analysis of Pseudomonas species

    Directory of Open Access Journals (Sweden)

    Lalucat Jorge

    2010-04-01

    Full Text Available Abstract Background The genus Pseudomonas comprises more than 100 species of environmental, clinical, agricultural, and biotechnological interest. Although, the recommended method for discriminating bacterial species is DNA-DNA hybridisation, alternative techniques based on multigenic sequence analysis are becoming a common practice in bacterial species discrimination studies. Since there is not a general criterion for determining which genes are more useful for species resolution; the number of strains and genes analysed is increasing continuously. As a result, sequences of different genes are dispersed throughout several databases. This sequence information needs to be collected in a common database, in order to be useful for future identification-based projects. Description The PseudoMLSA Database is a comprehensive database of multiple gene sequences from strains of Pseudomonas species. The core of the database is composed of selected gene sequences from all Pseudomonas type strains validly assigned to the genus through 2008. The database is aimed to be useful for MultiLocus Sequence Analysis (MLSA procedures, for the identification and characterisation of any Pseudomonas bacterial isolate. The sequences are available for download via a direct connection to the National Center for Biotechnology Information (NCBI. Additionally, the database includes an online BLAST interface for flexible nucleotide queries and similarity searches with the user's datasets, and provides a user-friendly output for easily parsing, navigating, and analysing BLAST results. Conclusions The PseudoMLSA database amasses strains and sequence information of validly described Pseudomonas species, and allows free querying of the database via a user-friendly, web-based interface available at http://www.uib.es/microbiologiaBD/Welcome.html. The web-based platform enables easy retrieval at strain or gene sequence information level; including references to published peer

  2. DNA multigene sequencing of topotypic specimens of the fascioliasis vector Lymnaea diaphana and phylogenetic analysis of the genus Pectinidens (Gastropoda

    Directory of Open Access Journals (Sweden)

    Maria Dolores Bargues

    2012-02-01

    Full Text Available Freshwater lymnaeid snails are crucial in defining transmission and epidemiology of fascioliasis. In South America, human endemic areas are related to high altitudes in Andean regions. The species Lymnaea diaphana has, however, been involved in low altitude areas of Chile, Argentina and Peru where human infection also occurs. Complete nuclear ribosomal DNA 18S, internal transcribed spacer (ITS-2 and ITS-1 and fragments of mitochondrial DNA 16S and cytochrome c oxidase (cox1 genes of L. diaphana specimens from its type locality offered 1,848, 495, 520, 424 and 672 bp long sequences. Comparisons with New and Old World Galba/Fossaria, Palaearctic stagnicolines, Nearctic stagnicolines, Old World Radix and Pseudosuccinea allowed to conclude that (i L. diaphana shows sequences very different from all other lymnaeids, (ii each marker allows its differentiation, except cox1 amino acid sequence, and (iii L. diaphana is not a fossarine lymnaeid, but rather an archaic relict form derived from the oldest North American stagnicoline ancestors. Phylogeny and large genetic distances support the genus Pectinidens as the first stagnicoline representative in the southern hemisphere, including colonization of extreme world regions, as most southern Patagonia, long time ago. The phylogenetic link of L. diaphana with the stagnicoline group may give light to the aforementioned peculiar low altitude epidemiological scenario of fascioliasis.

  3. Cross-study analysis of genomic data defines the ciliate multigenic epiplasmin family: strategies for functional analysis in Paramecium tetraurelia

    Directory of Open Access Journals (Sweden)

    Ravet Viviane

    2009-06-01

    Full Text Available Abstract Background The sub-membranous skeleton of the ciliate Paramecium, the epiplasm, is composed of hundreds of epiplasmic scales centered on basal bodies, and presents a complex set of proteins, epiplasmins, which belong to a multigenic family. The repeated duplications observed in the P. tetraurelia genome present an interesting model of the organization and evolution of a multigenic family within a single cell. Results To study this multigenic family, we used phylogenetic, structural, and analytical transcriptional approaches. The phylogenetic method defines 5 groups of epiplasmins in the multigenic family. A refined analysis by Hydrophobic Cluster Analysis (HCA identifies structural characteristics of 51 epiplasmins, defining five separate groups, and three classes. Depending on the sequential arrangement of their structural domains, the epiplasmins are defined as symmetric, asymmetric or atypical. The EST data aid in this classification, in the identification of putative regulating sequences such as TATA or CAAT boxes. When specific RNAi experiments were conducted using sequences from either symmetric or asymmetric classes, phenotypes were drastic. Local effects show either disrupted or ill-shaped epiplasmic scales. In either case, this results in aborted cell division. Using structural features, we show that 4 epiplasmins are also present in another ciliate, Tetrahymena thermophila. Their affiliation with the distinctive structural groups of Paramecium epiplasmins demonstrates an interspecific multigenic family. Conclusion The epiplasmin multigenic family illustrates the history of genomic duplication in Paramecium. This study provides a framework which can guide functional analysis of epiplasmins, the major components of the membrane skeleton in ciliates. We show that this set of proteins handles an important developmental information in Paramecium since maintenance of epiplasm organization is crucial for cell morphogenesis.

  4. Partial sequence homogenization in the 5S multigene families may generate sequence chimeras and spurious results in phylogenetic reconstructions.

    Science.gov (United States)

    Galián, José A; Rosato, Marcela; Rosselló, Josep A

    2014-03-01

    Multigene families have provided opportunities for evolutionary biologists to assess molecular evolution processes and phylogenetic reconstructions at deep and shallow systematic levels. However, the use of these markers is not free of technical and analytical challenges. Many evolutionary studies that used the nuclear 5S rDNA gene family rarely used contiguous 5S coding sequences due to the routine use of head-to-tail polymerase chain reaction primers that are anchored to the coding region. Moreover, the 5S coding sequences have been concatenated with independent, adjacent gene units in many studies, creating simulated chimeric genes as the raw data for evolutionary analysis. This practice is based on the tacitly assumed, but rarely tested, hypothesis that strict intra-locus concerted evolution processes are operating in 5S rDNA genes, without any empirical evidence as to whether it holds for the recovered data. The potential pitfalls of analysing the patterns of molecular evolution and reconstructing phylogenies based on these chimeric genes have not been assessed to date. Here, we compared the sequence integrity and phylogenetic behavior of entire versus concatenated 5S coding regions from a real data set obtained from closely related plant species (Medicago, Fabaceae). Our results suggest that within arrays sequence homogenization is partially operating in the 5S coding region, which is traditionally assumed to be highly conserved. Consequently, concatenating 5S genes increases haplotype diversity, generating novel chimeric genotypes that most likely do not exist within the genome. In addition, the patterns of gene evolution are distorted, leading to incorrect haplotype relationships in some evolutionary reconstructions.

  5. Utilization of multigene panels in hereditary cancer predisposition testing: analysis of more than 2,000 patients

    OpenAIRE

    LaDuca, Holly; Stuenkel, A J; Dolinsky, Jill S.; Keiles, Steven; Tandy, Stephany; Pesaran, Tina; Chen, Elaine; Gau, Chia-Ling; Palmaer, Erika; Shoaepour, Kamelia; Shah, Divya; Speare, Virginia; Gandomi, Stephanie; Chao, Elizabeth

    2014-01-01

    Purpose: The aim of this study was to determine the clinical and molecular characteristics of 2,079 patients who underwent hereditary cancer multigene panel testing. Methods: Panels included comprehensive analysis of 14–22 cancer susceptibility genes (BRCA1 and BRCA2 not included), depending on the panel ordered (BreastNext, OvaNext, ColoNext, or CancerNext). Next-generation sequencing and deletion/duplication analyses were performed for all genes except EPCAM (deletion/duplication analysis o...

  6. Consistency and reproducibility of next-generation sequencing and other multigene mutational assays: A worldwide ring trial study on quantitative cytological molecular reference specimens.

    Science.gov (United States)

    Malapelle, Umberto; Mayo-de-Las-Casas, Clara; Molina-Vila, Miguel A; Rosell, Rafael; Savic, Spasenija; Bihl, Michel; Bubendorf, Lukas; Salto-Tellez, Manuel; de Biase, Dario; Tallini, Giovanni; Hwang, David H; Sholl, Lynette M; Luthra, Rajyalakshmi; Weynand, Birgit; Vander Borght, Sara; Missiaglia, Edoardo; Bongiovanni, Massimo; Stieber, Daniel; Vielh, Philippe; Schmitt, Fernando; Rappa, Alessandra; Barberis, Massimo; Pepe, Francesco; Pisapia, Pasquale; Serra, Nicola; Vigliar, Elena; Bellevicine, Claudio; Fassan, Matteo; Rugge, Massimo; de Andrea, Carlos E; Lozano, Maria D; Basolo, Fulvio; Fontanini, Gabriella; Nikiforov, Yuri E; Kamel-Reid, Suzanne; da Cunha Santos, Gilda; Nikiforova, Marina N; Roy-Chowdhuri, Sinchita; Troncone, Giancarlo

    2017-08-01

    Molecular testing of cytological lung cancer specimens includes, beyond epidermal growth factor receptor (EGFR), emerging predictive/prognostic genomic biomarkers such as Kirsten rat sarcoma viral oncogene homolog (KRAS), neuroblastoma RAS viral [v-ras] oncogene homolog (NRAS), B-Raf proto-oncogene, serine/threonine kinase (BRAF), and phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit α (PIK3CA). Next-generation sequencing (NGS) and other multigene mutational assays are suitable for cytological specimens, including smears. However, the current literature reflects single-institution studies rather than multicenter experiences. Quantitative cytological molecular reference slides were produced with cell lines designed to harbor concurrent mutations in the EGFR, KRAS, NRAS, BRAF, and PIK3CA genes at various allelic ratios, including low allele frequencies (AFs; 1%). This interlaboratory ring trial study included 14 institutions across the world that performed multigene mutational assays, from tissue extraction to data analysis, on these reference slides, with each laboratory using its own mutation analysis platform and methodology. All laboratories using NGS (n = 11) successfully detected the study's set of mutations with minimal variations in the means and standard errors of variant fractions at dilution points of 10% (P = .171) and 5% (P = .063) despite the use of different sequencing platforms (Illumina, Ion Torrent/Proton, and Roche). However, when mutations at a low AF of 1% were analyzed, the concordance of the NGS results was low, and this reflected the use of different thresholds for variant calling among the institutions. In contrast, laboratories using matrix-assisted laser desorption/ionization-time of flight (n = 2) showed lower concordance in terms of mutation detection and mutant AF quantification. Quantitative molecular reference slides are a useful tool for monitoring the performance of different multigene mutational

  7. Multigene panel next generation sequencing in a patient with cherry red macular spot: Identification of two novel mutations in NEU1 gene causing sialidosis type I associated with mild to unspecific biochemical and enzymatic findings

    Directory of Open Access Journals (Sweden)

    Ulrike Mütze

    2017-03-01

    Discussion: Sialidosis should be suspected in patients with cherry red macular spots, even with non-significant urinary sialic acid excretion. Multigene panel next generation sequencing can establish a definite diagnosis, allowing for counseling of the patient and family.

  8. Molecular typing of lung adenocarcinoma on cytological samples using a multigene next generation sequencing panel.

    Directory of Open Access Journals (Sweden)

    Aldo Scarpa

    Full Text Available Identification of driver mutations in lung adenocarcinoma has led to development of targeted agents that are already approved for clinical use or are in clinical trials. Therefore, the number of biomarkers that will be needed to assess is expected to rapidly increase. This calls for the implementation of methods probing the mutational status of multiple genes for inoperable cases, for which limited cytological or bioptic material is available. Cytology specimens from 38 lung adenocarcinomas were subjected to the simultaneous assessment of 504 mutational hotspots of 22 lung cancer-associated genes using 10 nanograms of DNA and Ion Torrent PGM next-generation sequencing. Thirty-six cases were successfully sequenced (95%. In 24/36 cases (67% at least one mutated gene was observed, including EGFR, KRAS, PIK3CA, BRAF, TP53, PTEN, MET, SMAD4, FGFR3, STK11, MAP2K1. EGFR and KRAS mutations, respectively found in 6/36 (16% and 10/36 (28% cases, were mutually exclusive. Nine samples (25% showed concurrent alterations in different genes. The next-generation sequencing test used is superior to current standard methodologies, as it interrogates multiple genes and requires limited amounts of DNA. Its applicability to routine cytology samples might allow a significant increase in the fraction of lung cancer patients eligible for personalized therapy.

  9. Prevalence of pathogenic germline variants detected by multigene sequencing in unselected Japanese patients with ovarian cancer.

    Science.gov (United States)

    Hirasawa, Akira; Imoto, Issei; Naruto, Takuya; Akahane, Tomoko; Yamagami, Wataru; Nomura, Hiroyuki; Masuda, Kiyoshi; Susumu, Nobuyuki; Tsuda, Hitoshi; Aoki, Daisuke

    2017-12-22

    Pathogenic germline BRCA1 , BRCA2 ( BRCA1/2 ), and several other gene variants predispose women to primary ovarian, fallopian tube, and peritoneal carcinoma (OC), although variant frequency and relevance information is scarce in Japanese women with OC. Using targeted panel sequencing, we screened 230 unselected Japanese women with OC from our hospital-based cohort for pathogenic germline variants in 75 or 79 OC-associated genes. Pathogenic variants of 11 genes were identified in 41 (17.8%) women: 19 (8.3%; BRCA1 ), 8 (3.5%; BRCA2 ), 6 (2.6%; mismatch repair genes), 3 (1.3%; RAD51D ), 2 (0.9%; ATM ), 1 (0.4%; MRE11A ), 1 ( FANCC ), and 1 ( GABRA6 ). Carriers of BRCA1/2 or any other tested gene pathogenic variants were more likely to be diagnosed younger, have first or second-degree relatives with OC, and have OC classified as high-grade serous carcinoma (HGSC). After adjustment for these variables, all 3 features were independent predictive factors for pathogenic variants in any tested genes whereas only the latter two remained for variants in BRCA1/2 . Our data indicate similar variant prevalence in Japanese patients with OC and other ethnic groups and suggest that HGSC and OC family history may facilitate genetic predisposition prediction in Japanese patients with OC and referring high-risk patients for genetic counseling and testing.

  10. Multi-gene analysis reveals a lack of genetic divergence between Calanus agulhensis and C. sinicus (Copepoda; Calanoida.

    Directory of Open Access Journals (Sweden)

    Robert Kozol

    Full Text Available The discrimination and taxonomic identification of marine species continues to pose a challenge despite the growing number of diagnostic metrics and approaches. This study examined the genetic relationship between two sibling species of the genus Calanus (Crustacea; Copepoda; Calanidae, C. agulhensis and C. sinicus, using a multi-gene analysis. DNA sequences were determined for portions of the mitochondrial cytochrome c oxidase I (mtCOI; nuclear citrate synthase (CS, and large subunit (28S rRNA genes for specimens collected from the Sea of Japan and North East (NE Pacific Ocean for C. sinicus and from the Benguela Current and Agulhas Bank, off South Africa, for C. agulhensis. For mtCOI, C. sinicus and C. agulhensis showed similar levels of haplotype diversity (H(d = 0.695 and 0.660, respectively and nucleotide diversity (π = 0.003 and 0.002, respectively. Pairwise F(ST distances for mtCOI were significant only between C. agulhensis collected from the Agulhas and two C. sinicus populations: the Sea of Japan (F(ST = 0.152, p<0.01 and NE Pacific (F(ST = 0.228, p<0.005. Between the species, F(ST distances were low for both mtCOI (F(ST = 0.083, p = 0.003 and CS (F(ST = 0.050, p = 0.021. Large subunit (28S rRNA showed no variation between the species. Our results provide evidence of the lack of genetic distinction of C. sinicus and C. agulhensis, raise questions of whether C. agulhensis warrants status as a distinct species, and indicate the clear need for more intensive and extensive ecological and genetic analysis.

  11. Full-Length Venom Protein cDNA Sequences from Venom-Derived mRNA: Exploring Compositional Variation and Adaptive Multigene Evolution.

    Science.gov (United States)

    Modahl, Cassandra M; Mackessy, Stephen P

    2016-06-01

    Envenomation of humans by snakes is a complex and continuously evolving medical emergency, and treatment is made that much more difficult by the diverse biochemical composition of many venoms. Venomous snakes and their venoms also provide models for the study of molecular evolutionary processes leading to adaptation and genotype-phenotype relationships. To compare venom complexity and protein sequences, venom gland transcriptomes are assembled, which usually requires the sacrifice of snakes for tissue. However, toxin transcripts are also present in venoms, offering the possibility of obtaining cDNA sequences directly from venom. This study provides evidence that unknown full-length venom protein transcripts can be obtained from the venoms of multiple species from all major venomous snake families. These unknown venom protein cDNAs are obtained by the use of primers designed from conserved signal peptide sequences within each venom protein superfamily. This technique was used to assemble a partial venom gland transcriptome for the Middle American Rattlesnake (Crotalus simus tzabcan) by amplifying sequences for phospholipases A2, serine proteases, C-lectins, and metalloproteinases from within venom. Phospholipase A2 sequences were also recovered from the venoms of several rattlesnakes and an elapid snake (Pseudechis porphyriacus), and three-finger toxin sequences were recovered from multiple rear-fanged snake species, demonstrating that the three major clades of advanced snakes (Elapidae, Viperidae, Colubridae) have stable mRNA present in their venoms. These cDNA sequences from venom were then used to explore potential activities derived from protein sequence similarities and evolutionary histories within these large multigene superfamilies. Venom-derived sequences can also be used to aid in characterizing venoms that lack proteomic profiles and identify sequence characteristics indicating specific envenomation profiles. This approach, requiring only venom, provides

  12. Assessment of the validity of a multigene analysis in the diagnostics of inflammatory bowel disease

    DEFF Research Database (Denmark)

    Bjerrum, J T; Nyberg, Caroline; Olsen, J

    2014-01-01

    OBJECTIVES: The findings of a previous multigene study indicated that the expression of a panel of seven specific genes had strong differential power regarding inflammatory bowel disease (IBD) versus non-IBD, as well as ulcerative colitis (UC) versus Crohn's disease (CD). This prospective...... confirmatory study based on an independent patient cohort from a national Danish IBD centre was conducted in an attempt to verify these earlier observations. DESIGN, SETTING AND PARTICIPANTS: A total of 119 patients were included in the study (CD, UC and controls). Three mucosal biopsies were retrieved from......, a reliable and simple diagnostic tool is still warranted for optimal diagnosis and treatment of patients with IBD, especially the subgroup with unclassified disease....

  13. Rapid functional and sequence differentiation of a tandemly repeated species-specific multigene family in Drosophila

    DEFF Research Database (Denmark)

    Clifton, Bryan D.; Sanz, Pablo Librado; Yeh, Shu-Dan

    2017-01-01

    Gene clusters of recently duplicated genes are hotbeds for evolutionary change. However, our understanding of how mutational mechanisms and evolutionary forces shape the structural and functional evolution of these clusters is hindered by the high sequence identity among the copies, which typical...

  14. Evidence for 5S rDNA horizontal transfer in the toadfish Halobatrachus didactylus (Schneider, 1801) based on the analysis of three multigene families.

    Science.gov (United States)

    Merlo, Manuel A; Cross, Ismael; Palazón, José L; Ubeda-Manzanaro, María; Sarasquete, Carmen; Rebordinos, Laureana

    2012-10-07

    The Batrachoididae family is a group of marine teleosts that includes several species with more complicated physiological characteristics, such as their excretory, reproductive, cardiovascular and respiratory systems. Previous studies of the 5S rDNA gene family carried out in four species from the Western Atlantic showed two types of this gene in two species but only one in the other two, under processes of concerted evolution and birth-and-death evolution with purifying selection. Here we present results of the 5S rDNA and another two gene families in Halobatrachus didactylus, an Eastern Atlantic species, and draw evolutionary inferences regarding the gene families. In addition we have also mapped the genes on the chromosomes by two-colour fluorescence in situ hybridization (FISH). Two types of 5S rDNA were observed, named type α and type β. Molecular analysis of the 5S rDNA indicates that H. didactylus does not share the non-transcribed spacer (NTS) sequences with four other species of the family; therefore, it must have evolved in isolation. Amplification with the type β specific primers amplified a specific band in 9 specimens of H. didactylus and two of Sparus aurata. Both types showed regulatory regions and a secondary structure which mark them as functional genes. However, the U2 snRNA gene and the ITS-1 sequence showed one electrophoretic band and with one type of sequence. The U2 snRNA sequence was the most variable of the three multigene families studied. Results from two-colour FISH showed no co-localization of the gene coding from three multigene families and provided the first map of the chromosomes of the species. A highly significant finding was observed in the analysis of the 5S rDNA, since two such distant species as H. didactylus and Sparus aurata share a 5S rDNA type. This 5S rDNA type has been detected in other species belonging to the Batrachoidiformes and Perciformes orders, but not in the Pleuronectiformes and Clupeiformes orders. Two

  15. Evidence for 5S rDNA Horizontal Transfer in the toadfish Halobatrachus didactylus (Schneider, 1801 based on the analysis of three multigene families

    Directory of Open Access Journals (Sweden)

    Merlo Manuel A

    2012-10-01

    Full Text Available Abstract Background The Batrachoididae family is a group of marine teleosts that includes several species with more complicated physiological characteristics, such as their excretory, reproductive, cardiovascular and respiratory systems. Previous studies of the 5S rDNA gene family carried out in four species from the Western Atlantic showed two types of this gene in two species but only one in the other two, under processes of concerted evolution and birth-and-death evolution with purifying selection. Here we present results of the 5S rDNA and another two gene families in Halobatrachus didactylus, an Eastern Atlantic species, and draw evolutionary inferences regarding the gene families. In addition we have also mapped the genes on the chromosomes by two-colour fluorescence in situ hybridization (FISH. Results Two types of 5S rDNA were observed, named type α and type β. Molecular analysis of the 5S rDNA indicates that H. didactylus does not share the non-transcribed spacer (NTS sequences with four other species of the family; therefore, it must have evolved in isolation. Amplification with the type β specific primers amplified a specific band in 9 specimens of H. didactylus and two of Sparus aurata. Both types showed regulatory regions and a secondary structure which mark them as functional genes. However, the U2 snRNA gene and the ITS-1 sequence showed one electrophoretic band and with one type of sequence. The U2 snRNA sequence was the most variable of the three multigene families studied. Results from two-colour FISH showed no co-localization of the gene coding from three multigene families and provided the first map of the chromosomes of the species. Conclusions A highly significant finding was observed in the analysis of the 5S rDNA, since two such distant species as H. didactylus and Sparus aurata share a 5S rDNA type. This 5S rDNA type has been detected in other species belonging to the Batrachoidiformes and Perciformes orders, but not

  16. Characterization and gene expression analysis of the cir multi-gene family of plasmodium chabaudi chabaudi (AS

    Directory of Open Access Journals (Sweden)

    Lawton Jennifer

    2012-03-01

    Full Text Available Abstract Background The pir genes comprise the largest multi-gene family in Plasmodium, with members found in P. vivax, P. knowlesi and the rodent malaria species. Despite comprising up to 5% of the genome, little is known about the functions of the proteins encoded by pir genes. P. chabaudi causes chronic infection in mice, which may be due to antigenic variation. In this model, pir genes are called cirs and may be involved in this mechanism, allowing evasion of host immune responses. In order to fully understand the role(s of CIR proteins during P. chabaudi infection, a detailed characterization of the cir gene family was required. Results The cir repertoire was annotated and a detailed bioinformatic characterization of the encoded CIR proteins was performed. Two major sub-families were identified, which have been named A and B. Members of each sub-family displayed different amino acid motifs, and were thus predicted to have undergone functional divergence. In addition, the expression of the entire cir repertoire was analyzed via RNA sequencing and microarray. Up to 40% of the cir gene repertoire was expressed in the parasite population during infection, and dominant cir transcripts could be identified. In addition, some differences were observed in the pattern of expression between the cir subgroups at the peak of P. chabaudi infection. Finally, specific cir genes were expressed at different time points during asexual blood stages. Conclusions In conclusion, the large number of cir genes and their expression throughout the intraerythrocytic cycle of development indicates that CIR proteins are likely to be important for parasite survival. In particular, the detection of dominant cir transcripts at the peak of P. chabaudi infection supports the idea that CIR proteins are expressed, and could perform important functions in the biology of this parasite. Further application of the methodologies described here may allow the elucidation of CIR sub

  17. Investigating a multigene prognostic assay based on significant pathways for Luminal A breast cancer through gene expression profile analysis.

    Science.gov (United States)

    Gao, Haiyan; Yang, Mei; Zhang, Xiaolan

    2018-04-01

    The present study aimed to investigate potential recurrence-risk biomarkers based on significant pathways for Luminal A breast cancer through gene expression profile analysis. Initially, the gene expression profiles of Luminal A breast cancer patients were downloaded from The Cancer Genome Atlas database. The differentially expressed genes (DEGs) were identified using a Limma package and the hierarchical clustering analysis was conducted for the DEGs. In addition, the functional pathways were screened using Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses and rank ratio calculation. The multigene prognostic assay was exploited based on the statistically significant pathways and its prognostic function was tested using train set and verified using the gene expression data and survival data of Luminal A breast cancer patients downloaded from the Gene Expression Omnibus. A total of 300 DEGs were identified between good and poor outcome groups, including 176 upregulated genes and 124 downregulated genes. The DEGs may be used to effectively distinguish Luminal A samples with different prognoses verified by hierarchical clustering analysis. There were 9 pathways screened as significant pathways and a total of 18 DEGs involved in these 9 pathways were identified as prognostic biomarkers. According to the survival analysis and receiver operating characteristic curve, the obtained 18-gene prognostic assay exhibited good prognostic function with high sensitivity and specificity to both the train and test samples. In conclusion the 18-gene prognostic assay including the key genes, transcription factor 7-like 2, anterior parietal cortex and lymphocyte enhancer factor-1 may provide a new method for predicting outcomes and may be conducive to the promotion of precision medicine for Luminal A breast cancer.

  18. Characterization and gene expression analysis of the cir multi-gene family of plasmodium chabaudi chabaudi (AS)

    KAUST Repository

    Lawton, Jennifer

    2012-03-29

    Background: The pir genes comprise the largest multi-gene family in Plasmodium, with members found in P. vivax, P. knowlesi and the rodent malaria species. Despite comprising up to 5% of the genome, little is known about the functions of the proteins encoded by pir genes. P. chabaudi causes chronic infection in mice, which may be due to antigenic variation. In this model, pir genes are called cirs and may be involved in this mechanism, allowing evasion of host immune responses. In order to fully understand the role(s) of CIR proteins during P. chabaudi infection, a detailed characterization of the cir gene family was required.Results: The cir repertoire was annotated and a detailed bioinformatic characterization of the encoded CIR proteins was performed. Two major sub-families were identified, which have been named A and B. Members of each sub-family displayed different amino acid motifs, and were thus predicted to have undergone functional divergence. In addition, the expression of the entire cir repertoire was analyzed via RNA sequencing and microarray. Up to 40% of the cir gene repertoire was expressed in the parasite population during infection, and dominant cir transcripts could be identified. In addition, some differences were observed in the pattern of expression between the cir subgroups at the peak of P. chabaudi infection. Finally, specific cir genes were expressed at different time points during asexual blood stages.Conclusions: In conclusion, the large number of cir genes and their expression throughout the intraerythrocytic cycle of development indicates that CIR proteins are likely to be important for parasite survival. In particular, the detection of dominant cir transcripts at the peak of P. chabaudi infection supports the idea that CIR proteins are expressed, and could perform important functions in the biology of this parasite. Further application of the methodologies described here may allow the elucidation of CIR sub-family A and B protein

  19. Characterization and gene expression analysis of the cir multi-gene family of plasmodium chabaudi chabaudi (AS)

    KAUST Repository

    Lawton, Jennifer; Brugat, Thibaut; Yan, Yam Xue; Reid, Adam James; Bö hme, Ulrike; Otto, Thomas Dan; Pain, Arnab; Jackson, Andrew; Berriman, Matthew; Cunningham, Deirdre; Preiser, Peter; Langhorne, Jean

    2012-01-01

    Background: The pir genes comprise the largest multi-gene family in Plasmodium, with members found in P. vivax, P. knowlesi and the rodent malaria species. Despite comprising up to 5% of the genome, little is known about the functions of the proteins encoded by pir genes. P. chabaudi causes chronic infection in mice, which may be due to antigenic variation. In this model, pir genes are called cirs and may be involved in this mechanism, allowing evasion of host immune responses. In order to fully understand the role(s) of CIR proteins during P. chabaudi infection, a detailed characterization of the cir gene family was required.Results: The cir repertoire was annotated and a detailed bioinformatic characterization of the encoded CIR proteins was performed. Two major sub-families were identified, which have been named A and B. Members of each sub-family displayed different amino acid motifs, and were thus predicted to have undergone functional divergence. In addition, the expression of the entire cir repertoire was analyzed via RNA sequencing and microarray. Up to 40% of the cir gene repertoire was expressed in the parasite population during infection, and dominant cir transcripts could be identified. In addition, some differences were observed in the pattern of expression between the cir subgroups at the peak of P. chabaudi infection. Finally, specific cir genes were expressed at different time points during asexual blood stages.Conclusions: In conclusion, the large number of cir genes and their expression throughout the intraerythrocytic cycle of development indicates that CIR proteins are likely to be important for parasite survival. In particular, the detection of dominant cir transcripts at the peak of P. chabaudi infection supports the idea that CIR proteins are expressed, and could perform important functions in the biology of this parasite. Further application of the methodologies described here may allow the elucidation of CIR sub-family A and B protein

  20. Genome-wide analysis of the grapevine stilbene synthase multigenic family: genomic organization and expression profiles upon biotic and abiotic stresses

    Directory of Open Access Journals (Sweden)

    Vannozzi Alessandro

    2012-08-01

    Full Text Available Abstract Background Plant stilbenes are a small group of phenylpropanoids, which have been detected in at least 72 unrelated plant species and accumulate in response to biotic and abiotic stresses such as infection, wounding, UV-C exposure and treatment with chemicals. Stilbenes are formed via the phenylalanine/polymalonate-route, the last step of which is catalyzed by the enzyme stilbene synthase (STS, a type III polyketide synthase (PKS. Stilbene synthases are closely related to chalcone synthases (CHS, the key enzymes of the flavonoid pathway, as illustrated by the fact that both enzymes share the same substrates. To date, STSs have been cloned from peanut, pine, sorghum and grapevine, the only stilbene-producing fruiting-plant for which the entire genome has been sequenced. Apart from sorghum, STS genes appear to exist as a family of closely related genes in these other plant species. Results In this study a complete characterization of the STS multigenic family in grapevine has been performed, commencing with the identification, annotation and phylogenetic analysis of all members and integration of this information with a comprehensive set of gene expression analyses including healthy tissues at differential developmental stages and in leaves exposed to both biotic (downy mildew infection and abiotic (wounding and UV-C exposure stresses. At least thirty-three full length sequences encoding VvSTS genes were identified, which, based on predicted amino acid sequences, cluster in 3 principal groups designated A, B and C. The majority of VvSTS genes cluster in groups B and C and are located on chr16 whereas the few gene family members in group A are found on chr10. Microarray and mRNA-seq expression analyses revealed different patterns of transcript accumulation between the different groups of VvSTS family members and between VvSTSs and VvCHSs. Indeed, under certain conditions the transcriptional response of VvSTS and VvCHS genes appears to be

  1. Lynx web services for annotations and systems analysis of multi-gene disorders.

    Science.gov (United States)

    Sulakhe, Dinanath; Taylor, Andrew; Balasubramanian, Sandhya; Feng, Bo; Xie, Bingqing; Börnigen, Daniela; Dave, Utpal J; Foster, Ian T; Gilliam, T Conrad; Maltsev, Natalia

    2014-07-01

    Lynx is a web-based integrated systems biology platform that supports annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Lynx has integrated multiple classes of biomedical data (genomic, proteomic, pathways, phenotypic, toxicogenomic, contextual and others) from various public databases as well as manually curated data from our group and collaborators (LynxKB). Lynx provides tools for gene list enrichment analysis using multiple functional annotations and network-based gene prioritization. Lynx provides access to the integrated database and the analytical tools via REST based Web Services (http://lynx.ci.uchicago.edu/webservices.html). This comprises data retrieval services for specific functional annotations, services to search across the complete LynxKB (powered by Lucene), and services to access the analytical tools built within the Lynx platform. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Multigene Sequence Analysis of Aster Yellows Phytoplasma Associated with Primrose Yellows

    Czech Academy of Sciences Publication Activity Database

    Fránová, Jana; Přibylová, Jaroslava; Koloniuk, Igor; Podrábská, K.; Špak, Josef

    2016-01-01

    Roč. 164, č. 3 (2016), s. 166-176 ISSN 0931-1785 Institutional support: RVO:60077344 Keywords : Candidatus Phytoplasma asteris * pyrH-frr genes * Primula acaulis Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 0.853, year: 2016

  3. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene...

  4. Lentiviral gene ontology (LeGO) vectors equipped with novel drug-selectable fluorescent proteins: new building blocks for cell marking and multi-gene analysis.

    Science.gov (United States)

    Weber, K; Mock, U; Petrowitz, B; Bartsch, U; Fehse, B

    2010-04-01

    Vector-encoded fluorescent proteins (FPs) facilitate unambiguous identification or sorting of gene-modified cells by fluorescence-activated cell sorting (FACS). Exploiting this feature, we have recently developed lentiviral gene ontology (LeGO) vectors (www.LentiGO-Vectors.de) for multi-gene analysis in different target cells. In this study, we extend the LeGO principle by introducing 10 different drug-selectable FPs created by fusing one of the five selection marker (protecting against blasticidin, hygromycin, neomycin, puromycin and zeocin) and one of the five FP genes (Cerulean, eGFP, Venus, dTomato and mCherry). All tested fusion proteins allowed both fluorescence-mediated detection and drug-mediated selection of LeGO-transduced cells. Newly generated codon-optimized hygromycin- and neomycin-resistance genes showed improved expression as compared with their ancestors. New LeGO constructs were produced at titers >10(6) per ml (for non-concentrated supernatants). We show efficient combinatorial marking and selection of various cells, including mesenchymal stem cells, simultaneously transduced with different LeGO constructs. Inclusion of the cytomegalovirus early enhancer/chicken beta-actin promoter into LeGO vectors facilitated robust transgene expression in and selection of neural stem cells and their differentiated progeny. We suppose that the new drug-selectable markers combining advantages of FACS and drug selection are well suited for numerous applications and vector systems. Their inclusion into LeGO vectors opens new possibilities for (stem) cell tracking and functional multi-gene analysis.

  5. Phylogenomic analysis of UDP glycosyltransferase 1 multigene family in Linum usitatissimum identified genes with varied expression patterns

    Science.gov (United States)

    2012-01-01

    Background The glycosylation process, catalyzed by ubiquitous glycosyltransferase (GT) family enzymes, is a prevalent modification of plant secondary metabolites that regulates various functions such as hormone homeostasis, detoxification of xenobiotics and biosynthesis and storage of secondary metabolites. Flax (Linum usitatissimum L.) is a commercially grown oilseed crop, important because of its essential fatty acids and health promoting lignans. Identification and characterization of UDP glycosyltransferase (UGT) genes from flax could provide valuable basic information about this important gene family and help to explain the seed specific glycosylated metabolite accumulation and other processes in plants. Plant genome sequencing projects are useful to discover complexity within this gene family and also pave way for the development of functional genomics approaches. Results Taking advantage of the newly assembled draft genome sequence of flax, we identified 137 UDP glycosyltransferase (UGT) genes from flax using a conserved signature motif. Phylogenetic analysis of these protein sequences clustered them into 14 major groups (A-N). Expression patterns of these genes were investigated using publicly available expressed sequence tag (EST), microarray data and reverse transcription quantitative real time PCR (RT-qPCR). Seventy-three per cent of these genes (100 out of 137) showed expression evidence in 15 tissues examined and indicated varied expression profiles. The RT-qPCR results of 10 selected genes were also coherent with the digital expression analysis. Interestingly, five duplicated UGT genes were identified, which showed differential expression in various tissues. Of the seven intron loss/gain positions detected, two intron positions were conserved among most of the UGTs, although a clear relationship about the evolution of these genes could not be established. Comparison of the flax UGTs with orthologs from four other sequenced dicot genomes indicated that

  6. Phylogenomic analysis of UDP glycosyltransferase 1 multigene family in Linum usitatissimum identified genes with varied expression patterns

    Directory of Open Access Journals (Sweden)

    Barvkar Vitthal T

    2012-05-01

    Full Text Available Abstract Background The glycosylation process, catalyzed by ubiquitous glycosyltransferase (GT family enzymes, is a prevalent modification of plant secondary metabolites that regulates various functions such as hormone homeostasis, detoxification of xenobiotics and biosynthesis and storage of secondary metabolites. Flax (Linum usitatissimum L. is a commercially grown oilseed crop, important because of its essential fatty acids and health promoting lignans. Identification and characterization of UDP glycosyltransferase (UGT genes from flax could provide valuable basic information about this important gene family and help to explain the seed specific glycosylated metabolite accumulation and other processes in plants. Plant genome sequencing projects are useful to discover complexity within this gene family and also pave way for the development of functional genomics approaches. Results Taking advantage of the newly assembled draft genome sequence of flax, we identified 137 UDP glycosyltransferase (UGT genes from flax using a conserved signature motif. Phylogenetic analysis of these protein sequences clustered them into 14 major groups (A-N. Expression patterns of these genes were investigated using publicly available expressed sequence tag (EST, microarray data and reverse transcription quantitative real time PCR (RT-qPCR. Seventy-three per cent of these genes (100 out of 137 showed expression evidence in 15 tissues examined and indicated varied expression profiles. The RT-qPCR results of 10 selected genes were also coherent with the digital expression analysis. Interestingly, five duplicated UGT genes were identified, which showed differential expression in various tissues. Of the seven intron loss/gain positions detected, two intron positions were conserved among most of the UGTs, although a clear relationship about the evolution of these genes could not be established. Comparison of the flax UGTs with orthologs from four other sequenced dicot

  7. Molecular phylogeny and species separation of five morphologically similar Holosticha-complex ciliates (Protozoa, Ciliophora) using ARDRA riboprinting and multigene sequence data

    Science.gov (United States)

    Gao, Feng; Yi, Zhenzhen; Gong, Jun; Al-Rasheid Khaled, A. S.; Song, Weibo

    2010-05-01

    To separate and redefine the ambiguous Holosticha-complex, a confusing group of hypotrichous ciliates, six strains belonging to five morphospecies of three genera, Holosticha heterofoissneri, Anteholosticha sp. pop1, Anteholosticha sp. pop2, A. manca, A. gracilis and Nothoholosticha fasciola, were analyzed using 12 restriction enzymes on the basis of amplified ribosomal DNA restriction analysis. Nine of the 12 enzymes could digest the DNA products, four ( Hinf I, Hind III, Msp I, Taq I) yielded species-specific restriction patterns, and Hind III and Taq I produced different patterns for two Anteholosticha sp. populations. Distinctly different restriction digestion haplotypes and similarity indices can be used to separate the species. The secondary structures of the five species were predicted based on the ITS2 transcripts and there were several minor differences among species, while two Anteholosticha sp. populations were identical. In addition, phylogenies based on the SSrRNA gene sequences were reconstructed using multiple algorithms, which grouped them generally into four clades, and exhibited that the genus Anteholosticha should be a convergent assemblage. The fact that Holosticha species clustered with the oligotrichs and choreotrichs, though with very low support values, indicated that the topology may be very divergent and unreliable when the number of sequence data used in the analyses is too low.

  8. Image sequence analysis

    CERN Document Server

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  9. The actin multigene family of Paramecium tetraurelia

    Directory of Open Access Journals (Sweden)

    Wagner Erika

    2007-03-01

    Full Text Available Abstract Background A Paramecium tetraurelia pilot genome project, the subsequent sequencing of a Megabase chromosome as well as the Paramecium genome project aimed at gaining insight into the genome of Paramecium. These cells display a most elaborate membrane trafficking system, with distinct, predictable pathways in which actin could participate. Previously we had localized actin in Paramecium; however, none of the efforts so far could proof the occurrence of actin in the cleavage furrow of a dividing cell, despite the fact that actin is unequivocally involved in cell division. This gave a first hint that Paramecium may possess actin isoforms with unusual characteristics. The genome project gave us the chance to search the whole Paramecium genome, and, thus, to identify and characterize probably all actin isoforms in Paramecium. Results The ciliated protozoan, P. tetraurelia, contains an actin multigene family with at least 30 members encoding actin, actin-related and actin-like proteins. They group into twelve subfamilies; a large subfamily with 10 genes, seven pairs and one trio with > 82% amino acid identity, as well as three single genes. The different subfamilies are very distinct from each other. In comparison to actins in other organisms, P. tetraurelia actins are highly divergent, with identities topping 80% and falling to 30%. We analyzed their structure on nucleotide level regarding the number and position of introns. On amino acid level, we scanned the sequences for the presence of actin consensus regions, for amino acids of the intermonomer interface in filaments, for residues contributing to ATP binding, and for known binding sites for myosin and actin-specific drugs. Several of those characteristics are lacking in several subfamilies. The divergence of P. tetraurelia actins and actin-related proteins between different P. tetraurelia subfamilies as well as with sequences of other organisms is well represented in a phylogenetic

  10. Mass spectrometric amino acid sequencing of a mixture of seed storage proteins (napin) from Brassica napus, products of a multigene family.

    OpenAIRE

    Gehrig, P M; Krzyzaniak, A; Barciszewski, J; Biemann, K

    1996-01-01

    The amino acid sequences of a number of closely related proteins ("napin") isolated from Brassica napus were determined by mass spectrometry without prior separation into individual components. Some of these proteins correspond to those previously deduced (napA, BngNAP1, and gNa), chiefly from DNA sequences. Others were found to differ to a varying extent (BngNAP1', BngNAP1A, BngNAP1B, BngNAP1C, gNa', and gNaA). The short chains of gNa and gNa' and of BngNAP1 and BngNAP1' differ by the replac...

  11. Rice Multi-Gene Analysis

    Indian Academy of Sciences (India)

    gdyang

    Click here for a legend that explains the icons and colors in the image below. Click here to ..... PARE Data. We display these signature images when the abundance view option is set to "Individual ..... These data comprise two libraries from the ...

  12. Dynamic evolution of toll-like receptor multigene families in echinoderms

    Directory of Open Access Journals (Sweden)

    Katherine M Buckley

    2012-06-01

    Full Text Available The genome of the purple sea urchin, Strongylocentrotus purpuratus, was the first to be sequenced from a long-lived large invertebrate. Analysis of this genome uncovered a surprisingly complex immune system in which the moderately sized sets of pattern recognition receptors that form the core of vertebrate innate immunity are encoded in large multigene families. The sea urchin genome contains 253 Toll-like receptor (TLR genes, more than 200 Nod-like receptors and 1095 scavenger receptor cysteine-rich domains, a ten-fold expansion relative to vertebrates. Given their stereotypic structure and simple intron-exon architecture, the TLRs are the most tractable of these families for more detailed analysis. An immune defense role for these receptors is suggested by their sequence diversity and expression in immunologically active tissues, including phagocytes. This complexity of the sea urchin TLR multigene families largely derives from expansions that are independent of those in vertebrates and protostomes, although a small family of TLRs with structure similar to that of Drosophila Toll likely originated in an ancient eumetazoan ancestor. Several other invertebrate deuterostome genomes have been sequenced, including the cephalochordate, Branchiostoma floridae and the sea urchin Lytechinus variegatus, as well as partial sequences from two other sea urchin species. Here, we present an analysis of the invertebrate deuterostome TLRs with emphasis on the echinoderms. Representatives of most of the S. purpuratus TLR subfamilies and homologs of the protostome-like sequences are found in L. variegatus. The phylogeny of these genes within sea urchins highlights lineage-specific expansions at higher resolution than is evident at the phylum level. These analyses identify quickly evolving TLR subfamilies that are likely to have novel functions and other, more stable, subfamilies that may function similarly to those of vertebrates.

  13. Direct, rapid RNA sequence analysis

    International Nuclear Information System (INIS)

    Peattie, D.A.

    1987-01-01

    The original methods of RNA sequence analysis were based on enzymatic production and chromatographic separation of overlapping oligonucleotide fragments from within an RNA molecule followed by identification of the mononucleotides comprising the oligomer. Over the past decade the field of nucleic acid sequencing has changed dramatically, however, and RNA molecules now can be sequenced in a variety of more streamlined fashions. Most of the more recent advances in RNA sequencing have involved one-dimensional electrophoretic separation of 32 P-end-labeled oligoribonucleotides on polyacrylamide gels. In this chapter the author discusses two of these methods for determining the nucleotide sequences of RNA molecules rapidly: the chemical method and the enzymatic method. Both methods are direct and degradative, i.e., they rely on fragmatic and chemical approaches should be utilized. The single-strand-specific ribonucleases (A, T 1 , T 2 , and S 1 ) provide an efficient means to locate double-helical regions rapidly, and the chemical reactions provide a means to determine the RNA sequence within these regions. In addition, the chemical reactions allow one to assign interactions to specific atoms and to distinguish secondary interactions from tertiary ones. If the RNA molecule is small enough to be sequenced directly by the enzymatic or chemical method, the probing reactions can be done easily at the same time as sequencing reactions

  14. Integrated sequence analysis. Final report

    International Nuclear Information System (INIS)

    Andersson, K.; Pyy, P.

    1998-02-01

    The NKS/RAK subprojet 3 'integrated sequence analysis' (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term 'methodology' denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  15. Fractals in DNA sequence analysis

    Institute of Scientific and Technical Information of China (English)

    Yu Zu-Guo(喻祖国); Vo Anh; Gong Zhi-Min(龚志民); Long Shun-Chao(龙顺潮)

    2002-01-01

    Fractal methods have been successfully used to study many problems in physics, mathematics, engineering, finance,and even in biology. There has been an increasing interest in unravelling the mysteries of DNA; for example, how can we distinguish coding and noncoding sequences, and the problems of classification and evolution relationship of organisms are key problems in bioinformatics. Although much research has been carried out by taking into consideration the long-range correlations in DNA sequences, and the global fractal dimension has been used in these works by other people, the models and methods are somewhat rough and the results are not satisfactory. In recent years, our group has introduced a time series model (statistical point of view) and a visual representation (geometrical point of view)to DNA sequence analysis. We have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. In this paper, we introduce these fractal models and methods and the results of DNA sequence analysis.

  16. Genetic Diversity and Differentiation of Colletotrichum spp. Isolates Associated with Leguminosae Using Multigene Loci, RAPD and ISSR

    Directory of Open Access Journals (Sweden)

    Farshid Mahmodi

    2014-03-01

    Full Text Available Genetic diversity and differentiation of 50 Colletotrichum spp. isolates from legume crops studied through multigene loci, RAPD and ISSR analysis. DNA sequence comparisons by six genes (ITS, ACT, Tub2, CHS-1, GAPDH, and HIS3 verified species identity of C. truncatum, C. dematium and C. gloeosporiodes and identity C. capsici as a synonym of C. truncatum. Based on the matrix distance analysis of multigene sequences, the Colletotrichum species showed diverse degrees of intera and interspecific divergence (0.0 to 1.4% and (15.5–19.9, respectively. A multilocus molecular phylogenetic analysis clustered Colletotrichum spp. isolates into 3 well-defined clades, representing three distinct species; C. truncatum, C. dematium and C. gloeosporioides. The ISSR and RAPD and cluster analysis exhibited a high degree of variability among different isolates and permitted the grouping of isolates of Colletotrichum spp. into three distinct clusters. Distinct populations of Colletotrichum spp. isolates were genetically in accordance with host specificity and inconsistent with geographical origins. The large population of C. truncatum showed greater amounts of genetic diversity than smaller populations of C. dematium and C. gloeosporioides species. Results of ISSR and RAPD markers were congruent, but the effective maker ratio and the number of private alleles were greater in ISSR markers.

  17. Integrated sequence analysis. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Andersson, K.; Pyy, P

    1998-02-01

    The NKS/RAK subprojet 3 `integrated sequence analysis` (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term `methodology` denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  18. Multi-gene phylogenetic analysis reveals that shochu-fermenting Saccharomyces cerevisiae strains form a distinct sub-clade of the Japanese sake cluster.

    Science.gov (United States)

    Futagami, Taiki; Kadooka, Chihiro; Ando, Yoshinori; Okutsu, Kayu; Yoshizaki, Yumiko; Setoguchi, Shinji; Takamine, Kazunori; Kawai, Mikihiko; Tamaki, Hisanori

    2017-10-01

    Shochu is a traditional Japanese distilled spirit. The formation of the distinguishing flavour of shochu produced in individual distilleries is attributed to putative indigenous yeast strains. In this study, we performed the first (to our knowledge) phylogenetic classification of shochu strains based on nucleotide gene sequences. We performed phylogenetic classification of 21 putative indigenous shochu yeast strains isolated from 11 distilleries. All of these strains were shown or confirmed to be Saccharomyces cerevisiae, sharing species identification with 34 known S. cerevisiae strains (including commonly used shochu, sake, ale, whisky, bakery, bioethanol and laboratory yeast strains and clinical isolate) that were tested in parallel. Our analysis used five genes that reflect genome-level phylogeny for the strain-level classification. In a first step, we demonstrated that partial regions of the ZAP1, THI7, PXL1, YRR1 and GLG1 genes were sufficient to reproduce previous sub-species classifications. In a second step, these five analysed regions from each of 25 strains (four commonly used shochu strains and the 21 putative indigenous shochu strains) were concatenated and used to generate a phylogenetic tree. Further analysis revealed that the putative indigenous shochu yeast strains form a monophyletic group that includes both the shochu yeasts and a subset of the sake group strains; this cluster is a sister group to other sake yeast strains, together comprising a sake-shochu group. Differences among shochu strains were small, suggesting that it may be possible to correlate subtle phenotypic differences among shochu flavours with specific differences in genome sequences. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  19. Exome Sequencing Identifies a Novel LMNA Splice-Site Mutation and Multigenic Heterozygosity of Potential Modifiers in a Family with Sick Sinus Syndrome, Dilated Cardiomyopathy, and Sudden Cardiac Death.

    Directory of Open Access Journals (Sweden)

    Michael V Zaragoza

    Full Text Available The goals are to understand the primary genetic mechanisms that cause Sick Sinus Syndrome and to identify potential modifiers that may result in intrafamilial variability within a multigenerational family. The proband is a 63-year-old male with a family history of individuals (>10 with sinus node dysfunction, ventricular arrhythmia, cardiomyopathy, heart failure, and sudden death. We used exome sequencing of a single individual to identify a novel LMNA mutation and demonstrated the importance of Sanger validation and family studies when evaluating candidates. After initial single-gene studies were negative, we conducted exome sequencing for the proband which produced 9 gigabases of sequencing data. Bioinformatics analysis showed 94% of the reads mapped to the reference and identified 128,563 unique variants with 108,795 (85% located in 16,319 genes of 19,056 target genes. We discovered multiple variants in known arrhythmia, cardiomyopathy, or ion channel associated genes that may serve as potential modifiers in disease expression. To identify candidate mutations, we focused on ~2,000 variants located in 237 genes of 283 known arrhythmia, cardiomyopathy, or ion channel associated genes. We filtered the candidates to 41 variants in 33 genes using zygosity, protein impact, database searches, and clinical association. Only 21 of 41 (51% variants were validated by Sanger sequencing. We selected nine confirmed variants with minor allele frequencies G, a novel heterozygous splice-site mutation as the primary mutation with rare or novel variants in HCN4, MYBPC3, PKP4, TMPO, TTN, DMPK and KCNJ10 as potential modifiers and a mechanism consistent with haploinsufficiency.

  20. In silico analysis of the fucosylation-associated genome of the human blood fluke Schistosoma mansoni: cloning and characterization of the fucosyltransferase multigene family.

    Science.gov (United States)

    Peterson, Nathan A; Anderson, Tavis K; Yoshino, Timothy P

    2013-01-01

    Fucosylated glycans of the parasitic flatworm Schistosoma mansoni play key roles in its development and immunobiology. In the present study we used a genome-wide homology-based bioinformatics approach to search for genes that contribute to fucosylated glycan expression in S. mansoni, specifically the α2-, α3-, and α6-fucosyltransferases (FucTs), which transfer L-fucose from a GDP-L-fucose donor to an oligosaccharide acceptor. We identified and in silico characterized several novel schistosome FucT homologs, including six α3-FucTs and six α6-FucTs, as well as two protein O-FucTs that catalyze the unrelated transfer of L-fucose to serine and threonine residues of epidermal growth factor- and thrombospondin-type repeats. No α2-FucTs were observed. Primary sequence analyses identified key conserved FucT motifs as well as characteristic transmembrane domains, consistent with their putative roles as fucosyltransferases. Most genes exhibit alternative splicing, with multiple transcript variants generated. A phylogenetic analysis demonstrated that schistosome α3- and α6-FucTs form monophyletic clades within their respective gene families, suggesting multiple gene duplications following the separation of the schistosome lineage from the main evolutionary tree. Quantitative decreases in steady-state transcript levels of some FucTs during early larval development suggest a possible mechanism for differential expression of fucosylated glycans in schistosomes. This study systematically identifies the complete repertoire of FucT homologs in S. mansoni and provides fundamental information regarding their genomic organization, genetic variation, developmental expression, and evolutionary history.

  1. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  2. Extensive lineage-specific gene duplication and evolution of the spiggin multi-gene family in stickleback

    Directory of Open Access Journals (Sweden)

    Nishida Mutsumi

    2007-11-01

    Full Text Available Abstract Background The threespine stickleback (Gasterosteus aculeatus has a characteristic reproductive mode; mature males build nests using a secreted glue-like protein called spiggin. Although recent studies reported multiple occurrences of genes that encode this glue-like protein spiggin in threespine and ninespine sticklebacks, it is still unclear how many genes compose the spiggin multi-gene family. Results Genome sequence analysis of threespine stickleback showed that there are at least five spiggin genes and two pseudogenes, whereas a single spiggin homolog occurs in the genomes of other fishes. Comparative genome sequence analysis demonstrated that Muc19, a single-copy mucous gene in human and mouse, is an ortholog of spiggin. Phylogenetic and molecular evolutionary analyses of these sequences suggested that an ancestral spiggin gene originated from a member of the mucin gene family as a single gene in the common ancestor of teleosts, and gene duplications of spiggin have occurred in the stickleback lineage. There was inter-population variation in the copy number of spiggin genes and positive selection on some codons, indicating that additional gene duplication/deletion events and adaptive evolution at some amino acid sites may have occurred in each stickleback population. Conclusion A number of spiggin genes exist in the threespine stickleback genome. Our results provide insight into the origin and dynamic evolutionary process of the spiggin multi-gene family in the threespine stickleback lineage. The dramatic evolution of genes for mucous substrates may have contributed to the generation of distinct characteristics such as "bio-glue" in vertebrates.

  3. Genome Sequencing and Analysis Conference IV

    Energy Technology Data Exchange (ETDEWEB)

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  4. DNA multigene characterization of Fasciola hepatica and Lymnaea neotropica and its fascioliasis transmission capacity in Uruguay, with historical correlation, human report review and infection risk analysis.

    Science.gov (United States)

    Bargues, María Dolores; Gayo, Valeria; Sanchis, Jaime; Artigas, Patricio; Khoubbane, Messaoud; Birriel, Soledad; Mas-Coma, Santiago

    2017-02-01

    Fascioliasis is a pathogenic disease transmitted by lymnaeid snails and recently emerging in humans, in part due to effects of climate changes, anthropogenic environment modifications, import/export and movements of livestock. South America is the continent presenting more human fascioliasis hyperendemic areas and the highest prevalences and intensities known. These scenarios appear mainly linked to altitude areas in Andean countries, whereas lowland areas of non-Andean countries, such as Uruguay, only show sporadic human cases or outbreaks. A study including DNA marker sequencing of fasciolids and lymnaeids, an experimental study of the life cycle in Uruguay, and a review of human fascioliasis in Uruguay, are performed. The characterization of Fasciola hepatica from cattle and horses of Uruguay included the complete sequences of the ribosomal DNA ITS-2 and ITS-1 and mitochondrial DNA cox1 and nad1. ITS-2, ITS-1, partial cox1 and rDNA 16S gene of mtDNA were used for lymnaeids. Results indicated that vectors belong to Lymnaea neotropica instead of to Lymnaea viator, as always reported from Uruguay. The life cycle and transmission features of F. hepatica by L. neotropica of Uruguay were studied under standardized experimental conditions to enable a comparison with the transmission capacity of F. hepatica by Galba truncatula at very high altitude in Bolivia. On this baseline, we reviewed the 95 human fascioliasis cases reported in Uruguay and analyzed the risk of human infection in front of future climate change estimations. The correlation of fasciolid and lymnaeid haplotypes with historical data on the introduction and spread of livestock into Uruguay allowed to understand the molecular diversity detected. Although Uruguayan L. neotropica is a highly efficient vector, its transmission capacity is markedly lower than that of Bolivian G. truncatula. This allows to understand the transmission and epidemiological differences between Andean highlands and non

  5. Robustness analysis of chiller sequencing control

    International Nuclear Information System (INIS)

    Liao, Yundan; Sun, Yongjun; Huang, Gongsheng

    2015-01-01

    Highlights: • Uncertainties with chiller sequencing control were systematically quantified. • Robustness of chiller sequencing control was systematically analyzed. • Different sequencing control strategies were sensitive to different uncertainties. • A numerical method was developed for easy selection of chiller sequencing control. - Abstract: Multiple-chiller plant is commonly employed in the heating, ventilating and air-conditioning system to increase operational feasibility and energy-efficiency under part load condition. In a multiple-chiller plant, chiller sequencing control plays a key role in achieving overall energy efficiency while not sacrifices the cooling sufficiency for indoor thermal comfort. Various sequencing control strategies have been developed and implemented in practice. Based on the observation that (i) uncertainty, which cannot be avoided in chiller sequencing control, has a significant impact on the control performance and may cause the control fail to achieve the expected control and/or energy performance; and (ii) in current literature few studies have systematically addressed this issue, this paper therefore presents a study on robustness analysis of chiller sequencing control in order to understand the robustness of various chiller sequencing control strategies under different types of uncertainty. Based on the robustness analysis, a simple and applicable method is developed to select the most robust control strategy for a given chiller plant in the presence of uncertainties, which will be verified using case studies

  6. DNA multigene characterization of Fasciola hepatica and Lymnaea neotropica and its fascioliasis transmission capacity in Uruguay, with historical correlation, human report review and infection risk analysis

    Science.gov (United States)

    Gayo, Valeria; Sanchis, Jaime; Artigas, Patricio; Khoubbane, Messaoud; Birriel, Soledad; Mas-Coma, Santiago

    2017-01-01

    Background Fascioliasis is a pathogenic disease transmitted by lymnaeid snails and recently emerging in humans, in part due to effects of climate changes, anthropogenic environment modifications, import/export and movements of livestock. South America is the continent presenting more human fascioliasis hyperendemic areas and the highest prevalences and intensities known. These scenarios appear mainly linked to altitude areas in Andean countries, whereas lowland areas of non-Andean countries, such as Uruguay, only show sporadic human cases or outbreaks. A study including DNA marker sequencing of fasciolids and lymnaeids, an experimental study of the life cycle in Uruguay, and a review of human fascioliasis in Uruguay, are performed. Methodology/Principal findings The characterization of Fasciola hepatica from cattle and horses of Uruguay included the complete sequences of the ribosomal DNA ITS-2 and ITS-1 and mitochondrial DNA cox1 and nad1. ITS-2, ITS-1, partial cox1 and rDNA 16S gene of mtDNA were used for lymnaeids. Results indicated that vectors belong to Lymnaea neotropica instead of to Lymnaea viator, as always reported from Uruguay. The life cycle and transmission features of F. hepatica by L. neotropica of Uruguay were studied under standardized experimental conditions to enable a comparison with the transmission capacity of F. hepatica by Galba truncatula at very high altitude in Bolivia. On this baseline, we reviewed the 95 human fascioliasis cases reported in Uruguay and analyzed the risk of human infection in front of future climate change estimations. Conclusions/Significance The correlation of fasciolid and lymnaeid haplotypes with historical data on the introduction and spread of livestock into Uruguay allowed to understand the molecular diversity detected. Although Uruguayan L. neotropica is a highly efficient vector, its transmission capacity is markedly lower than that of Bolivian G. truncatula. This allows to understand the transmission and

  7. Probabilistic accident sequence recovery analysis

    International Nuclear Information System (INIS)

    Stutzke, Martin A.; Cooper, Susan E.

    2004-01-01

    Recovery analysis is a method that considers alternative strategies for preventing accidents in nuclear power plants during probabilistic risk assessment (PRA). Consideration of possible recovery actions in PRAs has been controversial, and there seems to be a widely held belief among PRA practitioners, utility staff, plant operators, and regulators that the results of recovery analysis should be skeptically viewed. This paper provides a framework for discussing recovery strategies, thus lending credibility to the process and enhancing regulatory acceptance of PRA results and conclusions. (author)

  8. Integrated multigene expression panel to prognosticate patients with gastric cancer.

    Science.gov (United States)

    Kanda, Mitsuro; Murotani, Kenta; Tanaka, Haruyoshi; Miwa, Takashi; Umeda, Shinichi; Tanaka, Chie; Kobayashi, Daisuke; Hayashi, Masamichi; Hattori, Norifumi; Suenaga, Masaya; Yamada, Suguru; Nakayama, Goro; Fujiwara, Michitaka; Kodera, Yasuhiro

    2018-04-10

    Most of the proposed individual markers had limited clinical utility due to the inherent biological and genetic heterogeneity of gastric cancer. We aimed to build a new molecular-based model to predict prognosis in patients with gastric cancer. A total of 200 patients who underwent gastric resection for gastric cancer were divided into learning and validation cohorts using a table of random numbers in a 1:1 ratio. In the learning cohort, mRNA expression levels of 15 molecular markers in gastric tissues were analyzed and concordance index (C-index) values of all single and combinations of the 15 candidate markers for overall survival were calculated. The multigene expression panel was designed according to C-index values and the subpopulation index. Expression scores were determined with weighting according to the coefficient of each constituent. The reproducibility of the panel was evaluated in the validation cohort. C-index values of the 15 single candidate markers ranged from 0.506-0.653. Among 32,767 combinations, the optimal and balanced expression panel comprised four constituents ( MAGED2, SYT8, BTG1 , and FAM46 ) and the C-index value was 0.793. Using this panel, patients were provisionally categorized with scores of 1-3, and clearly stratified into favorable, intermediate, and poor overall survival groups. In the validation cohort, both overall and disease-free survival rates decreased incrementally with increasing expression scores. Multivariate analysis revealed that the expression score was an independent prognostic factor for overall survival after curative gastrectomy. We developed an integrated multigene expression panel that simply and accurately stratified risk of patients with gastric cancer.

  9. Sequence analysis by iterated maps, a review.

    Science.gov (United States)

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  10. Preliminary hazard analysis using sequence tree method

    International Nuclear Information System (INIS)

    Huang Huiwen; Shih Chunkuan; Hung Hungchih; Chen Minghuei; Yih Swu; Lin Jiinming

    2007-01-01

    A system level PHA using sequence tree method was developed to perform Safety Related digital I and C system SSA. The conventional PHA is a brainstorming session among experts on various portions of the system to identify hazards through discussions. However, this conventional PHA is not a systematic technique, the analysis results strongly depend on the experts' subjective opinions. The analysis quality cannot be appropriately controlled. Thereby, this research developed a system level sequence tree based PHA, which can clarify the relationship among the major digital I and C systems. Two major phases are included in this sequence tree based technique. The first phase uses a table to analyze each event in SAR Chapter 15 for a specific safety related I and C system, such as RPS. The second phase uses sequence tree to recognize what I and C systems are involved in the event, how the safety related systems work, and how the backup systems can be activated to mitigate the consequence if the primary safety systems fail. In the sequence tree, the defense-in-depth echelons, including Control echelon, Reactor trip echelon, ESFAS echelon, and Indication and display echelon, are arranged to construct the sequence tree structure. All the related I and C systems, include digital system and the analog back-up systems are allocated in their specific echelon. By this system centric sequence tree based analysis, not only preliminary hazard can be identified systematically, the vulnerability of the nuclear power plant can also be recognized. Therefore, an effective simplified D3 evaluation can be performed as well. (author)

  11. Digital image sequence processing, compression, and analysis

    CERN Document Server

    Reed, Todd R

    2004-01-01

    IntroductionTodd R. ReedCONTENT-BASED IMAGE SEQUENCE REPRESENTATIONPedro M. Q. Aguiar, Radu S. Jasinschi, José M. F. Moura, andCharnchai PluempitiwiriyawejTHE COMPUTATION OF MOTIONChristoph Stiller, Sören Kammel, Jan Horn, and Thao DangMOTION ANALYSIS AND DISPLACEMENT ESTIMATION IN THE FREQUENCY DOMAINLuca Lucchese and Guido Maria CortelazzoQUALITY OF SERVICE ASSESSMENT IN NEW GENERATION WIRELESS VIDEO COMMUNICATIONSGaetano GiuntaERROR CONCEALMENT IN DIGITAL VIDEOFrancesco G.B. De NataleIMAGE SEQUENCE RESTORATION: A WIDER PERSPECTIVEAnil KokaramVIDEO SUMMARIZATIONCuneyt M. Taskiran and Edward

  12. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    Phylogenetic analysis suggests that our sequences are clustered with sequences reported from Japan. This is the first phylogenetic analysis of HCV core gene from Pakistani population. Our sequences and sequences from Japan are grouped into same cluster in the phylogenetic tree. Sequence comparison and ...

  13. [Complete genome sequencing and sequence analysis of BCG Tice].

    Science.gov (United States)

    Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

    2012-10-04

    The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.

  14. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  15. Sequence Matching Analysis for Curriculum Development

    Directory of Open Access Journals (Sweden)

    Liem Yenny Bendatu

    2015-06-01

    Full Text Available Many organizations apply information technologies to support their business processes. Using the information technologies, the actual events are recorded and utilized to conform with predefined model. Conformance checking is an approach to measure the fitness and appropriateness between process model and actual events. However, when there are multiple events with the same timestamp, the traditional approach unfit to result such measures. This study attempts to develop a sequence matching analysis. Considering conformance checking as the basis of this approach, this proposed approach utilizes the current control flow technique in process mining domain. A case study in the field of educational process has been conducted. This study also proposes a curriculum analysis framework to test the proposed approach. By considering the learning sequence of students, it results some measurements for curriculum development. Finally, the result of the proposed approach has been verified by relevant instructors for further development.

  16. Analysis of Pteridium ribosomal RNA sequences by rapid direct sequencing.

    Science.gov (United States)

    Tan, M K

    1991-08-01

    A total of 864 bases from 5 regions interspersed in the 18S and 26S rRNA molecules from various clones of Pteridium covering the general geographical distribution of the genus was analysed using a rapid rRNA sequencing technique. No base difference has been detected amongst the three major lineages, two of which apparently separated before the breakup of the ancient supercontinent, Pangaea. These regions of the rRNA sequences have thus been conserved for at least 160 million years and are here compared with other eukaryotic, especially plant rRNAs.

  17. Characterization of the repertoire diversity of the Plasmodium falciparum stevor multigene family in laboratory and field isolates

    Directory of Open Access Journals (Sweden)

    Holder Anthony A

    2009-06-01

    Full Text Available Abstract Background The evasion of host immune response by the human malaria parasite Plasmodium falciparum has been linked to expression of a range of variable antigens on the infected erythrocyte surface. Several genes are potentially involved in this process with the var, rif and stevor multigene families being the most likely candidates and coding for rapidly evolving proteins. The high sequence diversity of proteins encoded by these gene families may have evolved as an immune evasion strategy that enables the parasite to establish long lasting chronic infections. Previous findings have shown that the hypervariable region (HVR of STEVOR has significant sequence diversity both within as well as across different P. falciparum lines. However, these studies did not address whether or not there are ancestral stevor that can be found in different parasites. Methods DNA and RNA sequences analysis as well as phylogenetic approaches were used to analyse the stevor sequence repertoire and diversity in laboratory lines and Kilifi (Kenya fresh isolates. Results Conserved stevor genes were identified in different P. falciparum isolates from different global locations. Consistent with previous studies, the HVR of the stevor gene family was found to be highly divergent both within and between isolates. Importantly phylogenetic analysis shows some clustering of stevor sequences both within a single parasite clone as well as across different parasite isolates. Conclusion This indicates that the ancestral P. falciparum parasite genome already contained multiple stevor genes that have subsequently diversified further within the different P. falciparum populations. It also confirms that STEVOR is under strong selection pressure.

  18. Quantitative RT-PCR based platform for rapid quantification of the transcripts of highly homologous multigene families and their members during grain development

    DEFF Research Database (Denmark)

    Kaczmarczyk, Agnieszka Ewa; Bowra, Steve; Elek, Zoltan

    2012-01-01

    expression combined with genetic variation in large multigene families with high homology among the alleles is very challenging. Results We designed a rapid qRT-PCR system with the aim of characterising the variation in the expression of hordein genes families. All the known D-, C-, B-, and gamma......-hordein sequences coding full length open reading frames were collected from commonly available databases. Phylogenetic analysis was performed and the members of the different hordein families were classified into subfamilies. Primer sets were designed to discriminate the gene expression level of whole families...... and its subgroups. More over the results indicate the genotypic specific gene expression. Conclusions Quantitative RT-PCR with SYBR Green labelling can be a useful technique to follow gene expression levels of large gene families with highly homologues members. We showed variation in the temporal...

  19. FAST: FAST Analysis of Sequences Toolbox

    Directory of Open Access Journals (Sweden)

    Travis J. Lawrence

    2015-05-01

    Full Text Available FAST (FAST Analysis of Sequences Toolbox provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU’s Not Unix Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics makes FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format. Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought.

  20. Bayesian Correlation Analysis for Sequence Count Data.

    Directory of Open Access Journals (Sweden)

    Daniel Sánchez-Taltavull

    Full Text Available Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low-especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.

  1. A basic analysis toolkit for biological sequences

    Directory of Open Access Journals (Sweden)

    Siragusa Enrico

    2007-09-01

    Full Text Available Abstract This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at http://www.math.unipa.it/~raffaele/BATS/ under the GNU GPL.

  2. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; Van Der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah; Siame, Kabengele Keith; Gey Van Pittius, Nicolaas Claudius; Van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-01-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  3. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  4. Computational analysis of sequence selection mechanisms.

    Science.gov (United States)

    Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

    2004-04-01

    Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.

  5. Comparative analysis of sequences from PT 2013

    DEFF Research Database (Denmark)

    Mikkelsen, Susie Sommer

    Sheatfish and not EHNV. Generally, mistakes occurred at the ends of the sequences. This can be due to several factors. One is that the sequence has not been trimmed of the sequence primer sites. Another is the lack of quality control of the chromatogram. Finally, sequencing in just one direction can result...... diseases in Europe. As part of the EURL proficiency test for fish diseases it is required to sequence any RANA virus isolates found in any of the samples. It is also highly recommended to sequence the ISA virus to determine whether it be HPRΔ or HPR0. Furthermore, it is recommended that any VHSV and IHNV...... isolates be genotyped. As part of the evaluation of the proficiency results it was decided this year to look into the quality and similarity of the sequence results for selected viruses. Ampoule III in the proficiency test 2013 contained an EHNV isolate. The EURL received 43 sequences from 41 laboratories...

  6. Time fluctuation analysis of forest fire sequences

    Science.gov (United States)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value

  7. SVAMP: Sequence variation analysis, maps and phylogeny

    KAUST Repository

    Naeem, Raeece

    2014-04-03

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis in real time from variant call format (VCF) and associated metadata files. Allele frequency map, geographical map of isolates, Tajima\\'s D metric, single nucleotide polymorphism density, GC and variation density are also available for visualization in real time. We demonstrate the utility of SVAMP in tracking a methicillin-resistant Staphylococcus aureus outbreak from published next-generation sequencing data across 15 countries. We also demonstrate the scalability and accuracy of our software on 245 Plasmodium falciparum malaria isolates from three continents. Availability and implementation: The Qt/C++ software code, binaries, user manual and example datasets are available at http://cbrc.kaust.edu.sa/svamp. © The Author 2014.

  8. Statistical analysis of next generation sequencing data

    CERN Document Server

    Nettleton, Dan

    2014-01-01

    Next Generation Sequencing (NGS) is the latest high throughput technology to revolutionize genomic research. NGS generates massive genomic datasets that play a key role in the big data phenomenon that surrounds us today. To extract signals from high-dimensional NGS data and make valid statistical inferences and predictions, novel data analytic and statistical techniques are needed. This book contains 20 chapters written by prominent statisticians working with NGS data. The topics range from basic preprocessing and analysis with NGS data to more complex genomic applications such as copy number variation and isoform expression detection. Research statisticians who want to learn about this growing and exciting area will find this book useful. In addition, many chapters from this book could be included in graduate-level classes in statistical bioinformatics for training future biostatisticians who will be expected to deal with genomic data in basic biomedical research, genomic clinical trials and personalized med...

  9. Movement Pattern Analysis Based on Sequence Signatures

    Directory of Open Access Journals (Sweden)

    Seyed Hossein Chavoshi

    2015-09-01

    Full Text Available Increased affordability and deployment of advanced tracking technologies have led researchers from various domains to analyze the resulting spatio-temporal movement data sets for the purpose of knowledge discovery. Two different approaches can be considered in the analysis of moving objects: quantitative analysis and qualitative analysis. This research focuses on the latter and uses the qualitative trajectory calculus (QTC, a type of calculus that represents qualitative data on moving point objects (MPOs, and establishes a framework to analyze the relative movement of multiple MPOs. A visualization technique called sequence signature (SESI is used, which enables to map QTC patterns in a 2D indexed rasterized space in order to evaluate the similarity of relative movement patterns of multiple MPOs. The applicability of the proposed methodology is illustrated by means of two practical examples of interacting MPOs: cars on a highway and body parts of a samba dancer. The results show that the proposed method can be effectively used to analyze interactions of multiple MPOs in different domains.

  10. Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding.

    Directory of Open Access Journals (Sweden)

    Marta Brozynska

    Full Text Available Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina and Ion Torrent (Life Technology sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare. Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.

  11. Noncoding sequence classification based on wavelet transform analysis: part I

    Science.gov (United States)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  12. Image sequence analysis workstation for multipoint motion analysis

    Science.gov (United States)

    Mostafavi, Hassan

    1990-08-01

    This paper describes an application-specific engineering workstation designed and developed to analyze motion of objects from video sequences. The system combines the software and hardware environment of a modem graphic-oriented workstation with the digital image acquisition, processing and display techniques. In addition to automation and Increase In throughput of data reduction tasks, the objective of the system Is to provide less invasive methods of measurement by offering the ability to track objects that are more complex than reflective markers. Grey level Image processing and spatial/temporal adaptation of the processing parameters is used for location and tracking of more complex features of objects under uncontrolled lighting and background conditions. The applications of such an automated and noninvasive measurement tool include analysis of the trajectory and attitude of rigid bodies such as human limbs, robots, aircraft in flight, etc. The system's key features are: 1) Acquisition and storage of Image sequences by digitizing and storing real-time video; 2) computer-controlled movie loop playback, freeze frame display, and digital Image enhancement; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored Image sequence; 4) model-based estimation and tracking of the six degrees of freedom of a rigid body: 5) field-of-view and spatial calibration: 6) Image sequence and measurement data base management; and 7) offline analysis software for trajectory plotting and statistical analysis.

  13. Novel algorithms for protein sequence analysis

    NARCIS (Netherlands)

    Ye, Kai

    2008-01-01

    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology”s paradigm is that this order of amino acids determines the protein”s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1

  14. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication. CONCLUSIONS...

  15. Characterization and sequence analysis of cysteine and glycine-rich ...

    African Journals Online (AJOL)

    Primers specific for CSRP3 were designed using known cDNA sequences of Bos taurus published in database with different accession numbers. Polymerase chain reaction (PCR) was performed and products were purified and sequenced. Sequence analysis and alignment were carried out using CLUSTAL W (1.83).

  16. Incident sequence analysis; event trees, methods and graphical symbols

    International Nuclear Information System (INIS)

    1980-11-01

    When analyzing incident sequences, unwanted events resulting from a certain cause are looked for. Graphical symbols and explanations of graphical representations are presented. The method applies to the analysis of incident sequences in all types of facilities. By means of the incident sequence diagram, incident sequences, i.e. the logical and chronological course of repercussions initiated by the failure of a component or by an operating error, can be presented and analyzed simply and clearly

  17. Computer-aided visualization and analysis system for sequence evaluation

    Energy Technology Data Exchange (ETDEWEB)

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  18. Multigene analyses resolve early diverging lineages in the Rhodymeniophycidae (Florideophyceae, Rhodophyta).

    Science.gov (United States)

    Saunders, Gary W; Filloramo, Gina; Dixon, Kyatt; Le Gall, Line; Maggs, Christine A; Kraft, Gerald T

    2016-08-01

    Multigene phylogenetic analyses were directed at resolving the earliest divergences in the red algal subclass Rhodymeniophycidae. The inclusion of key taxa (new to science and/or previously lacking molecular data), additional sequence data (SSU, LSU, EF2, rbcL, COI-5P), and phylogenetic analyses removing the most variable sites (site stripping) have provided resolution for the first time at these deep nodes. The earliest diverging lineage within the subclass was the enigmatic Catenellopsis oligarthra from New Zealand (Catenellopsidaceae), which is here placed in the Catenellopsidales ord. nov. In our analyses, Atractophora hypnoides was not allied with the other included Bonnemaisoniales, but resolved as sister to the Peyssonneliales, and is here assigned to Atractophoraceae fam. nov. in the Atractophorales ord. nov. Inclusion of Acrothesaurum gemellifilum gen. et sp. nov. from Tasmania has greatly improved our understanding of the Acrosymphytales, to which we assign three families, the Acrosymphytaceae, Acrothesauraceae fam. nov. and Schimmelmanniaceae fam. nov. © 2016 Phycological Society of America.

  19. Establishing a framework for comparative analysis of genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  20. Scalable Kernel Methods and Algorithms for General Sequence Analysis

    Science.gov (United States)

    Kuksa, Pavel

    2011-01-01

    Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…

  1. Recurrence plot analysis of DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    Wu Zuobing [State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing 100080 (China)]. E-mail: wuzb@lnm.imech.ac.cn

    2004-11-15

    Recurrence plot technique of DNA sequences is established on metric representation and employed to analyze correlation structure of nucleotide strings. It is found that, in the transference of nucleotide strings, a human DNA fragment has a major correlation distance, but a yeast chromosome's correlation distance has a constant increasing.

  2. Analysis of Neuronal Sequences Using Pairwise Biases

    Science.gov (United States)

    2015-08-27

    semantic memory (knowledge of facts) and implicit memory (e.g., how to ride a bike ). Evidence for the participation of the hippocampus in the formation of...hippocampal formation in an attempt to be cured of severe epileptic seizures. Although the surgery was successful in regards to reducing the frequency and...very different from each other in many ways including duration and number of spikes. Still, these sequences share a similar trend in the general order

  3. Google matrix analysis of DNA sequences.

    Science.gov (United States)

    Kandiah, Vivek; Shepelyansky, Dima L

    2013-01-01

    For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  4. Google matrix analysis of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Vivek Kandiah

    Full Text Available For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW. At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  5. Evolution and variation of multigene families

    CERN Document Server

    Ohta, Tomoko

    1980-01-01

    During the last decade and a half, studies of evolution and variation have been revolutionized by the introduction of the methods and concepts of molecular genetics. We can now construct reliable phylogenetic trees, even when fossil records are missing, by compara­ tive studies of protein or mRNA sequences. If, in addition, paleon­ tological information is available, we can estimate the rate at which genes are substituted in the species in the course of evolution. Through the application of electrophoretic methods, it has become possible to study intraspecific variation in molecular terms. We now know that an immense genetic variability exists in a sexually repro­ ducing species, and our human species is no exception. The mathematical theory of population genetics (particularly its stochastic aspects) in conjunction with these new developments led us to formulate the "neutral theory" of molecular evolution, pointing out that chance, in the form of random gene frequency drift, is playing a much more importa...

  6. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    Directory of Open Access Journals (Sweden)

    Wadim L. Matochko

    2013-01-01

    Full Text Available Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa. The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq. Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN, which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  7. Utility of Multi-Gene Loci for Forensic Species Diagnosis of Blowflies

    Science.gov (United States)

    Zaidi, Farrah; Wei, Shu-jun; Shi, Min; Chen, Xue-xin

    2011-01-01

    Contemporary studies in forensic entomology exhaustively evaluate gene sequences because these constitute the fastest and most accurate method of species identification. For this purpose single gene segments, cytochrome oxidase subunit I (COI) in particular, are commonly used. However, the limitation of such sequences in identification, especially of closely related species and populations, demand a multi-gene approach. But this raises the question of which group of genes can best fulfill the identification task? In this context the utility of five gene segments was explored among blowfly species from two distinct geographic regions, China and Pakistan. COI, cytochrome b (CYTB), NADH dehydrogenase 5 (ND5), nuclear internal transcribed spacers (ITS1 and ITS2), were sequenced for eight blowfly species including Chrysomya megacephala F. (Diptera: Calliphoidae), Ch. pinguis Walker, Lucilia sericata Meigen L. porphyrina Walker, L. illustris Meigen Hemipyrellia ligurriens Wiedemann, Aldrichina grahami Aldrich, and the housefly, Musca domestica L. (Muscidae), from Hangzhou, China; while COI, CYTB, and ITS2 were sequenced for four species, i.e. Ch. megacephala, Ch. rufifacies, L. cuprina, and the flesh fly, Sarcophaga albiceps Meigen (Sarcophagidae), from Dera Ismail Khan Pakistan. The results demonstrate a universal utility of these gene segments in the molecular identification of flies of forensic importance. PMID:21864153

  8. Cloning and sequence analysis of benzo-a-pyreneinducible ...

    African Journals Online (AJOL)

    The phylogenetic tree based on the amino acid sequences clearly shows tilapia CYP1A and killifish CYP1A to be more closely related to each other than to the other CYP1A subfamilies. Sequence analysis of 3727 bp of genomic DNA showed that the clone obtained was the structural gene of CYP1A which consists of ...

  9. Biological sequence analysis: probabilistic models of proteins and nucleic acids

    National Research Council Canada - National Science Library

    Durbin, Richard

    1998-01-01

    ... analysis methods are now based on principles of probabilistic modelling. Examples of such methods include the use of probabilistically derived score matrices to determine the significance of sequence alignments, the use of hidden Markov models as the basis for profile searches to identify distant members of sequence families, and the inference...

  10. Phylogenetic analysis of the genus Hordeum using repetitive DNA sequences

    DEFF Research Database (Denmark)

    Svitashev, S.; Bryngelsson, T.; Vershinin, A.

    1994-01-01

    A set of six cloned barley (Hordeum vulgare) repetitive DNA sequences was used for the analysis of phylogenetic relationships among 31 species (46 taxa) of the genus Hordeum, using molecular hybridization techniques. In situ hybridization experiments showed dispersed organization of the sequences...

  11. Assessing the 5S ribosomal RNA heterogeneity in Arabidopsis thaliana using short RNA next generation sequencing data.

    Science.gov (United States)

    Szymanski, Maciej; Karlowski, Wojciech M

    2016-01-01

    In eukaryotes, ribosomal 5S rRNAs are products of multigene families organized within clusters of tandemly repeated units. Accumulation of genomic data obtained from a variety of organisms demonstrated that the potential 5S rRNA coding sequences show a large number of variants, often incompatible with folding into a correct secondary structure. Here, we present results of an analysis of a large set of short RNA sequences generated by the next generation sequencing techniques, to address the problem of heterogeneity of the 5S rRNA transcripts in Arabidopsis and identification of potentially functional rRNA-derived fragments.

  12. Parametric inference for biological sequence analysis.

    Science.gov (United States)

    Pachter, Lior; Sturmfels, Bernd

    2004-11-16

    One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied to these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.

  13. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    Science.gov (United States)

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  14. RESEARCH NOTE Genome-based exome-sequencing analysis ...

    Indian Academy of Sciences (India)

    Navya

    2017-02-22

    Feb 22, 2017 ... Genome-based exome-sequencing analysis identifies GYG1, DIS3L, DDRGK1 genes ... Cardiology Division, Department of Internal Medicine, Severance .... with p values of <0.05 byanalyzing differences in allele distribution.

  15. Editorial: Special Issue on Algorithms for Sequence Analysis and Storage

    Directory of Open Access Journals (Sweden)

    Veli Mäkinen

    2014-03-01

    Full Text Available This special issue of Algorithms is dedicated to approaches to biological sequence analysis that have algorithmic novelty and potential for fundamental impact in methods used for genome research.

  16. Tools for integrated sequence-structure analysis with UCSF Chimera

    Directory of Open Access Journals (Sweden)

    Huang Conrad C

    2006-07-01

    Full Text Available Abstract Background Comparing related structures and viewing the structures in the context of sequence alignments are important tasks in protein structure-function research. While many programs exist for individual aspects of such work, there is a need for interactive visualization tools that: (a provide a deep integration of sequence and structure, far beyond mapping where a sequence region falls in the structure and vice versa; (b facilitate changing data of one type based on the other (for example, using only sequence-conserved residues to match structures, or adjusting a sequence alignment based on spatial fit; (c can be used with a researcher's own data, including arbitrary sequence alignments and annotations, closely or distantly related sets of proteins, etc.; and (d interoperate with each other and with a full complement of molecular graphics features. We describe enhancements to UCSF Chimera to achieve these goals. Results The molecular graphics program UCSF Chimera includes a suite of tools for interactive analyses of sequences and structures. Structures automatically associate with sequences in imported alignments, allowing many kinds of crosstalk. A novel method is provided to superimpose structures in the absence of a pre-existing sequence alignment. The method uses both sequence and secondary structure, and can match even structures with very low sequence identity. Another tool constructs structure-based sequence alignments from superpositions of two or more proteins. Chimera is designed to be extensible, and mechanisms for incorporating user-specific data without Chimera code development are also provided. Conclusion The tools described here apply to many problems involving comparison and analysis of protein structures and their sequences. Chimera includes complete documentation and is intended for use by a wide range of scientists, not just those in the computational disciplines. UCSF Chimera is free for non-commercial use and is

  17. Sequencing and Analysis of Neanderthal Genomic DNA

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  18. SVAMP: Sequence variation analysis, maps and phylogeny

    KAUST Repository

    Naeem, Raeece; Hidayah, Lailatul; Preston, Mark D.; Clark, Taane G.; Pain, Arnab

    2014-01-01

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis

  19. DSAP: deep-sequencing small RNA analysis pipeline.

    Science.gov (United States)

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  20. Quantiprot - a Python package for quantitative analysis of protein sequences.

    Science.gov (United States)

    Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

    2017-07-17

    The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.

  1. Nonlinear analysis of river flow time sequences

    Science.gov (United States)

    Porporato, Amilcare; Ridolfi, Luca

    1997-06-01

    Within the field of chaos theory several methods for the analysis of complex dynamical systems have recently been proposed. In light of these ideas we study the dynamics which control the behavior over time of river flow, investigating the existence of a low-dimension deterministic component. The present article follows the research undertaken in the work of Porporato and Ridolfi [1996a] in which some clues as to the existence of chaos were collected. Particular emphasis is given here to the problem of noise and to nonlinear prediction. With regard to the latter, the benefits obtainable by means of the interpolation of the available time series are reported and the remarkable predictive results attained with this nonlinear method are shown.

  2. Accident sequence analysis of human-computer interface design

    International Nuclear Information System (INIS)

    Fan, C.-F.; Chen, W.-H.

    2000-01-01

    It is important to predict potential accident sequences of human-computer interaction in a safety-critical computing system so that vulnerable points can be disclosed and removed. We address this issue by proposing a Multi-Context human-computer interaction Model along with its analysis techniques, an Augmented Fault Tree Analysis, and a Concurrent Event Tree Analysis. The proposed augmented fault tree can identify the potential weak points in software design that may induce unintended software functions or erroneous human procedures. The concurrent event tree can enumerate possible accident sequences due to these weak points

  3. Food Fish Identification from DNA Extraction through Sequence Analysis

    Science.gov (United States)

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  4. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers.

    Directory of Open Access Journals (Sweden)

    Stephan Pabinger

    Full Text Available Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM. Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage

  5. Multilocus Sequence Analysis and rpoB Sequencing of Mycobacterium abscessus (Sensu Lato) Strains▿

    Science.gov (United States)

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-01-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536T, M. massiliense CIP 108297T, and M. bolletii CIP 108541T) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the clustering

  6. Multilocus sequence analysis and rpoB sequencing of Mycobacterium abscessus (sensu lato) strains.

    Science.gov (United States)

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-02-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536(T), M. massiliense CIP 108297(T), and M. bolletii CIP 108541(T)) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the

  7. An optimum analysis sequence for environmental gamma-ray spectrometry

    Energy Technology Data Exchange (ETDEWEB)

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L., E-mail: fta777@hotmail.co [Universidad Autonoma de Zacatecas, Centro Regional de Estudis Nucleares, Calle Cipres No. 10, Fracc. La Penuela, 98068 Zacatecas (Mexico)

    2010-10-15

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced {chi}{sup 2} criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  8. An optimum analysis sequence for environmental gamma-ray spectrometry

    International Nuclear Information System (INIS)

    De la Torre, F.; Rios M, C.; Ruvalcaba A, M. G.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J. L.

    2010-10-01

    This work aims to obtain an optimum analysis sequence for environmental gamma-ray spectroscopy by means of Genie 2000 (Canberra). Twenty different analysis sequences were customized using different peak area percentages and different algorithms for: 1) peak finding, and 2) peak area determination, and with or without the use of a library -based on evaluated nuclear data- of common gamma-ray emitters in environmental samples. The use of an optimum analysis sequence with certified nuclear information avoids the problems originated by the significant variations in out-of-date nuclear parameters of commercial software libraries. Interference-free gamma ray energies with absolute emission probabilities greater than 3.75% were included in the customized library. The gamma-ray spectroscopy system (based on a Ge Re-3522 Canberra detector) was calibrated both in energy and shape by means of the IAEA-2002 reference spectra for software intercomparison. To test the performance of the analysis sequences, the IAEA-2002 reference spectrum was used. The z-score and the reduced χ 2 criteria were used to determine the optimum analysis sequence. The results show an appreciable variation in the peak area determinations and their corresponding uncertainties. Particularly, the combination of second derivative peak locate with simple peak area integration algorithms provides the greater accuracy. Lower accuracy comes from the combination of library directed peak locate algorithm and Genie's Gamma-M peak area determination. (Author)

  9. A Pareto-optimal moving average multigene genetic programming model for daily streamflow prediction

    Science.gov (United States)

    Danandeh Mehr, Ali; Kahya, Ercan

    2017-06-01

    Genetic programming (GP) is able to systematically explore alternative model structures of different accuracy and complexity from observed input and output data. The effectiveness of GP in hydrological system identification has been recognized in recent studies. However, selecting a parsimonious (accurate and simple) model from such alternatives still remains a question. This paper proposes a Pareto-optimal moving average multigene genetic programming (MA-MGGP) approach to develop a parsimonious model for single-station streamflow prediction. The three main components of the approach that take us from observed data to a validated model are: (1) data pre-processing, (2) system identification and (3) system simplification. The data pre-processing ingredient uses a simple moving average filter to diminish the lagged prediction effect of stand-alone data-driven models. The multigene ingredient of the model tends to identify the underlying nonlinear system with expressions simpler than classical monolithic GP and, eventually simplification component exploits Pareto front plot to select a parsimonious model through an interactive complexity-efficiency trade-off. The approach was tested using the daily streamflow records from a station on Senoz Stream, Turkey. Comparing to the efficiency results of stand-alone GP, MGGP, and conventional multi linear regression prediction models as benchmarks, the proposed Pareto-optimal MA-MGGP model put forward a parsimonious solution, which has a noteworthy importance of being applied in practice. In addition, the approach allows the user to enter human insight into the problem to examine evolved models and pick the best performing programs out for further analysis.

  10. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    Science.gov (United States)

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  11. Single Day Construction of Multigene Circuits with 3G Assembly.

    Science.gov (United States)

    Halleran, Andrew D; Swaminathan, Anandh; Murray, Richard M

    2018-05-18

    The ability to rapidly design, build, and test prototypes is of key importance to every engineering discipline. DNA assembly often serves as a rate limiting step of the prototyping cycle for synthetic biology. Recently developed DNA assembly methods such as isothermal assembly and type IIS restriction enzyme systems take different approaches to accelerate DNA construction. We introduce a hybrid method, Golden Gate-Gibson (3G), that takes advantage of modular part libraries introduced by type IIS restriction enzyme systems and isothermal assembly's ability to build large DNA constructs in single pot reactions. Our method is highly efficient and rapid, facilitating construction of entire multigene circuits in a single day. Additionally, 3G allows generation of variant libraries enabling efficient screening of different possible circuit constructions. We characterize the efficiency and accuracy of 3G assembly for various construct sizes, and demonstrate 3G by characterizing variants of an inducible cell-lysis circuit.

  12. Multigene Genetic Programming for Estimation of Elastic Modulus of Concrete

    Directory of Open Access Journals (Sweden)

    Alireza Mohammadi Bayazidi

    2014-01-01

    Full Text Available This paper presents a new multigene genetic programming (MGGP approach for estimation of elastic modulus of concrete. The MGGP technique models the elastic modulus behavior by integrating the capabilities of standard genetic programming and classical regression. The main aim is to derive precise relationships between the tangent elastic moduli of normal and high strength concrete and the corresponding compressive strength values. Another important contribution of this study is to develop a generalized prediction model for the elastic moduli of both normal and high strength concrete. Numerous concrete compressive strength test results are obtained from the literature to develop the models. A comprehensive comparative study is conducted to verify the performance of the models. The proposed models perform superior to the existing traditional models, as well as those derived using other powerful soft computing tools.

  13. Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes

    Directory of Open Access Journals (Sweden)

    Rebecca M. Davidson

    2011-11-01

    Full Text Available Transcriptome sequencing is a powerful method for studying global expression patterns in large, complex genomes. Evaluation of sequence-based expression profiles during reproductive development would provide functional annotation to genes underlying agronomic traits. We generated transcriptome profiles for 12 diverse maize ( L. reproductive tissues representing male, female, developing seed, and leaf tissues using high throughput transcriptome sequencing. Overall, ∼80% of annotated genes were expressed. Comparative analysis between sequence and hybridization-based methods demonstrated the utility of ribonucleic acid sequencing (RNA-seq for expression determination and differentiation of paralagous genes (∼85% of maize genes. Analysis of 4975 gene families across reproductive tissues revealed expression divergence is proportional to family size. In all pairwise comparisons between tissues, 7 (pre- vs. postemergence cobs to 48% (pollen vs. ovule of genes were differentially expressed. Genes with expression restricted to a single tissue within this study were identified with the highest numbers observed in leaves, endosperm, and pollen. Coexpression network analysis identified 17 gene modules with complex and shared expression patterns containing many previously described maize genes. The data and analyses in this study provide valuable tools through improved gene annotation, gene family characterization, and a core set of candidate genes to further characterize maize reproductive development and improve grain yield potential.

  14. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    Science.gov (United States)

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  15. De novo transcriptome sequencing and sequence analysis of the malaria vector Anopheles sinensis (Diptera: Culicidae)

    Science.gov (United States)

    2014-01-01

    Background Anopheles sinensis is the major malaria vector in China and Southeast Asia. Vector control is one of the most effective measures to prevent malaria transmission. However, there is little transcriptome information available for the malaria vector. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to build a transcriptome dataset for functional genomics analysis by large-scale RNA sequencing (RNA-seq). Methods To provide a more comprehensive and complete transcriptome of An. sinensis, eggs, larvae, pupae, male adults and female adults RNA were pooled together for cDNA preparation, sequenced using the Illumina paired-end sequencing technology and assembled into unigenes. These unigenes were then analyzed in their genome mapping, functional annotation, homology, codon usage bias and simple sequence repeats (SSRs). Results Approximately 51.6 million clean reads were obtained, trimmed, and assembled into 38,504 unigenes with an average length of 571 bp, an N50 of 711 bp, and an average GC content 51.26%. Among them, 98.4% of unigenes could be mapped onto the reference genome, and 69% of unigenes could be annotated with known biological functions. Homology analysis identified certain numbers of An. sinensis unigenes that showed homology or being putative 1:1 orthologues with genomes of other Dipteran species. Codon usage bias was analyzed and 1,904 SSRs were detected, which will provide effective molecular markers for the population genetics of this species. Conclusions Our data and analysis provide the most comprehensive transcriptomic resource and characteristics currently available for An. sinensis, and will facilitate genetic, genomic studies, and further vector control of An. sinensis. PMID:25000941

  16. Contralateral prophylactic mastectomy rate and predictive factors among patients with breast cancer who underwent multigene panel testing for hereditary cancer.

    Science.gov (United States)

    Elsayegh, Nisreen; Webster, Rachel D; Gutierrez Barrera, Angelica M; Lin, Heather; Kuerer, Henry M; Litton, Jennifer K; Bedrosian, Isabelle; Arun, Banu K

    2018-05-07

    Although multigene panel testing is increasingly common in patients with cancer, the relationship between its use among breast cancer patients with non-BRCA mutations or variants of uncertain significance (VUS) and disease management decisions has not been well described. This study evaluated the rate and predictive factors of CPM patients who underwent multigene panel testing. Three hundred and fourteen patients with breast cancer who underwent multigene panel testing between 2014 and 2017 were included in the analysis. Of the 314 patients, 70 elected CPM. Election of CPM by gene status was as follows: BRCA carriers (42.3%), non-BRCA carriers (30.1%), and VUS (10.6%). CPM election rates did not differ between non-BRCA carriers and BRCA carriers (P = 0.6205). Among non-BRCA carriers, negative hormone receptor status was associated with CPM (P = 0.0115). For those with a VUS, hormone receptor status was not associated with CPM (P = 0.1879). Although the rate of CPM between BRCA carriers and non-BRCA carriers was not significantly different, the predictors of CPM were different in each group. Our analyses shed the light on the increasing use of CPM among patients who are non-BRCA carriers as well those with a VUS. Our study elucidates the differing predictive factors of CPM election among BRCA carriers, non-BRCA carries, and those with a VUS. Our findings reveal the need for providers to be cognizant that non-BRCA genes and VUS drive women to elect CPM despite the lack of data for contralateral breast cancer risk associated with these genes. © 2018 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.

  17. Sequence analysis corresponding to the PPE and PE proteins in ...

    Indian Academy of Sciences (India)

    Unknown

    AB repeats; Mycobacterium tuberculosis genome; PE-PPE domain; PPE, PE proteins; sequence analysis; surface antigens. J. Biosci. | Vol. ... bacterium tuberculosis genomes resulted in the identification of a previously uncharacterized 225 amino acid- ...... Vega Lopez F, Brooks L A, Dockrell H M, De Smet K A,. Thompson ...

  18. Molecular cloning, expression analysis and sequence prediction of ...

    African Journals Online (AJOL)

    CCAAT/enhancer-binding protein beta as an essential transcriptional factor, regulates the differentiation of adipocytes and the deposition of fat. Herein, we cloned the whole open reading frame (ORF) of bovine C/EBPβ gene and analyzed its putative protein structures via DNA cloning and sequence analysis. Then, the ...

  19. Multilocus sequence analysis of phytopathogenic species of the genus Streptomyces

    Science.gov (United States)

    The identification and classification of species within the genus Streptomyces is difficult because there are presently 576 validly described species and this number increases every year. The value of the application of multilocus sequence analysis scheme to the systematics of Streptomyces species h...

  20. Sequence symmetry analysis in pharmacovigilance and pharmacoepidemiologic studies

    DEFF Research Database (Denmark)

    Lai, Edward Chia Cheng; Pratt, Nicole; Hsieh, Cheng Yang

    2017-01-01

    Sequence symmetry analysis (SSA) is a method for detecting adverse drug events by utilizing computerized claims data. The method has been increasingly used to investigate safety concerns of medications and as a pharmacovigilance tool to identify unsuspected side effects. Validation studies have i...

  1. DNAApp: a mobile application for sequencing data analysis.

    Science.gov (United States)

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-11-15

    There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. The Android version of DNAApp is available in Google Play Store as 'DNAApp', and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. samuelg@bii.a-star.edu.sg. © The Author 2014. Published by Oxford University Press.

  2. DNAApp: a mobile application for sequencing data analysis

    Science.gov (United States)

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-01-01

    Summary: There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. Availability and implementation: The Android version of DNAApp is available in Google Play Store as ‘DNAApp’, and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. Contact: samuelg@bii.a-star.edu.sg PMID:25095882

  3. Long-read sequencing data analysis for yeasts.

    Science.gov (United States)

    Yue, Jia-Xing; Liti, Gianni

    2018-06-01

    Long-read sequencing technologies have become increasingly popular due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast Saccharomyces cerevisiae has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here, we present a modular computational framework named long-read sequencing data analysis for yeasts (LRSDAY), the first one-stop solution that streamlines this process. Starting from the raw sequencing reads, LRSDAY can produce chromosome-level genome assembly and comprehensive genome annotation in a highly automated manner with minimal manual intervention, which is not possible using any alternative tool available to date. The annotated genomic features include centromeres, protein-coding genes, tRNAs, transposable elements (TEs), and telomere-associated elements. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable to virtually any eukaryotic organism. When applying LRSDAY to an S. cerevisiae strain, it takes ∼41 h to generate a complete and well-annotated genome from ∼100× Pacific Biosciences (PacBio) running the basic workflow with four threads. Basic experience working within the Linux command-line environment is recommended for carrying out the analysis using LRSDAY.

  4. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  5. Identification and characterization of a new multigene family in the human MHC: A candidate autoimmune disease susceptibility element (3.8-1)

    Energy Technology Data Exchange (ETDEWEB)

    Harris, J.M.; Venditti, C.P.; Chorney, M.J. [Pennsylvania State Univ. College of Medicine, Hershey, PA (United States)

    1994-09-01

    An association between idiopathic hemochromatosis (HFE) and the HLA-A3 locus has been previously well-established. In an attempt to identify potential HFE candidate genes, a genomic DNA fragment distal to the HLA-A9 breakpoint was used to screen a B cell cDNA library; a member (3.8-1) of a new multigene family, composed of five distinct genomic cross-reactive fragments, was identified. Clone 3.8-1 represents the 3{prime} end of 9.6 kb transcript which is expressed in multiple tissues including the spleen, thymus, lung and kidney. Sequencing and genome database analysis indicate that 3.8-1 is unique, with no homology to any known entries. The genomic residence of 3-8.1, defined by polymorphism analysis and physical mapping using YAC clones, appears to be absent from the genomes of higher primates, although four other cross-reactivities are maintained. The absence of this gene as well as other probes which map in the TNF to HLA-B interval, suggest that this portion of the human HMC, located between the Class I and Class III regions, arose in humans as the result of a post-speciation insertional event. The large size of the 3.8-1 gene and the possible categorization of 3.8-1 as a human-specific gene are significant given the genetic data that place an autoimmune susceptibility element for IDDM and myasthenia gravis in the precise region where this gene resides. In an attempt to isolate the 5{prime} end of this large transcript, we have constructed a cosmid contig which encompasses the genomic locus of this gene and are progressively isolating coding sequences by exon trapping.

  6. Analysis of Sequence Diagram Layout in Advanced UML Modelling Tools

    Directory of Open Access Journals (Sweden)

    Ņikiforova Oksana

    2016-05-01

    Full Text Available System modelling using Unified Modelling Language (UML is the task that should be solved for software development. The more complex software becomes the higher requirements are stated to demonstrate the system to be developed, especially in its dynamic aspect, which in UML is offered by a sequence diagram. To solve this task, the main attention is devoted to the graphical presentation of the system, where diagram layout plays the central role in information perception. The UML sequence diagram due to its specific structure is selected for a deeper analysis on the elements’ layout. The authors research represents the abilities of modern UML modelling tools to offer automatic layout of the UML sequence diagram and analyse them according to criteria required for the diagram perception.

  7. Network clustering coefficient approach to DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gerhardt, Guenther J.L. [Universidade Federal do Rio Grande do Sul-Hospital de Clinicas de Porto Alegre, Rua Ramiro Barcelos 2350/sala 2040/90035-003 Porto Alegre (Brazil); Departamento de Fisica e Quimica da Universidade de Caxias do Sul, Rua Francisco Getulio Vargas 1130, 95001-970 Caxias do Sul (Brazil); Lemke, Ney [Programa Interdisciplinar em Computacao Aplicada, Unisinos, Av. Unisinos, 950, 93022-000 Sao Leopoldo, RS (Brazil); Corso, Gilberto [Departamento de Biofisica e Farmacologia, Centro de Biociencias, Universidade Federal do Rio Grande do Norte, Campus Universitario, 59072 970 Natal, RN (Brazil)]. E-mail: corso@dfte.ufrn.br

    2006-05-15

    In this work we propose an alternative DNA sequence analysis tool based on graph theoretical concepts. The methodology investigates the path topology of an organism genome through a triplet network. In this network, triplets in DNA sequence are vertices and two vertices are connected if they occur juxtaposed on the genome. We characterize this network topology by measuring the clustering coefficient. We test our methodology against two main bias: the guanine-cytosine (GC) content and 3-bp (base pairs) periodicity of DNA sequence. We perform the test constructing random networks with variable GC content and imposed 3-bp periodicity. A test group of some organisms is constructed and we investigate the methodology in the light of the constructed random networks. We conclude that the clustering coefficient is a valuable tool since it gives information that is not trivially contained in 3-bp periodicity neither in the variable GC content.

  8. Evolutionary analysis of hepatitis C virus gene sequences from 1953

    Science.gov (United States)

    Gray, Rebecca R.; Tanaka, Yasuhito; Takebe, Yutaka; Magiorkinis, Gkikas; Buskell, Zelma; Seeff, Leonard; Alter, Harvey J.; Pybus, Oliver G.

    2013-01-01

    Reconstructing the transmission history of infectious diseases in the absence of medical or epidemiological records often relies on the evolutionary analysis of pathogen genetic sequences. The precision of evolutionary estimates of epidemic history can be increased by the inclusion of sequences derived from ‘archived’ samples that are genetically distinct from contemporary strains. Historical sequences are especially valuable for viral pathogens that circulated for many years before being formally identified, including HIV and the hepatitis C virus (HCV). However, surprisingly few HCV isolates sampled before discovery of the virus in 1989 are currently available. Here, we report and analyse two HCV subgenomic sequences obtained from infected individuals in 1953, which represent the oldest genetic evidence of HCV infection. The pairwise genetic diversity between the two sequences indicates a substantial period of HCV transmission prior to the 1950s, and their inclusion in evolutionary analyses provides new estimates of the common ancestor of HCV in the USA. To explore and validate the evolutionary information provided by these sequences, we used a new phylogenetic molecular clock method to estimate the date of sampling of the archived strains, plus the dates of four more contemporary reference genomes. Despite the short fragments available, we conclude that the archived sequences are consistent with a proposed sampling date of 1953, although statistical uncertainty is large. Our cross-validation analyses suggest that the bias and low statistical power observed here likely arise from a combination of high evolutionary rate heterogeneity and an unstructured, star-like phylogeny. We expect that attempts to date other historical viruses under similar circumstances will meet similar problems. PMID:23938759

  9. Human mast cell tryptase: Multiple cDNAs and genes reveal a multigene serine protease family

    International Nuclear Information System (INIS)

    Vanderslice, P.; Ballinger, S.M.; Tam, E.K.; Goldstein, S.M.; Craik, C.S.; Caughey, G.H.

    1990-01-01

    Three different cDNAs and a gene encoding human skin mast cell tryptase have been cloned and sequenced in their entirety. The deduced amino acid sequences reveal a 30-amino acid prepropeptide followed by a 245-amino acid catalytic domain. The C-terminal undecapeptide of the human preprosequence is identical in dog tryptase and appears to be part of a prosequence unique among serine proteases. The differences among the three human tryptase catalytic domains include the loss of a consensus N-glycosylation site in one cDNA, which may explain some of the heterogeneity in size and susceptibility to deglycosylation seen in tryptase preparations. All three tryptase cDNAs are distinct from a recently reported cDNA obtained from a human lung mast cell library. A skin tryptase cDNA was used to isolate a human tryptase gene, the exons of which match one of the skin-derived cDNAs. The organization of the ∼1.8-kilobase-pair tryptase gene is unique and is not closely related to that of any other mast cell or leukocyte serine protease. The 5' regulatory regions of the gene share features with those of other serine proteases, including mast cell chymase, but are unusual in being separated from the protein-coding sequence by an intron. High-stringency hybridization of a human genomic DNA blot with a fragment of the tryptase gene confirms the presence of multiple tryptase genes. These findings provide genetic evidence that human mast cell tryptases are the products of a multigene family

  10. Using SQL Databases for Sequence Similarity Searching and Analysis.

    Science.gov (United States)

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  11. Now And Next Generation Sequencing Techniques: Future of Sequence Analysis using Cloud Computing

    Directory of Open Access Journals (Sweden)

    Radhe Shyam Thakur

    2012-12-01

    Full Text Available Advancements in the field of sequencing techniques resulted in the huge sequenced data to be produced at a very faster rate. It is going cumbersome for the datacenter to maintain the databases. Data mining and sequence analysis approaches needs to analyze the databases several times to reach any efficient conclusion. To cope with such overburden on computer resources and to reach efficient and effective conclusions quickly, the virtualization of the resources and computation on pay as you go concept was introduced and termed as cloud computing. The datacenter’s hardware and software is collectively known as cloud which when available publicly is termed as public cloud. The datacenter’s resources are provided in a virtual mode to the clients via a service provider like Amazon, Google and Joyent which charges on pay as you go manner. The workload is shifted to the provider which is maintained by the required hardware and software upgradation. The service provider manages it by upgrading the requirements in the virtual mode. Basically a virtual environment is created according to the need of the user by taking permission from datacenter via internet, the task is performed and the environment is deleted after the task is over. In this discussion, we are focusing on the basics of cloud computing, the prerequisites and overall working of clouds. Furthermore, briefly the applications of cloud computing in biological systems, especially in comparative genomics, genome informatics and SNP detection with reference to traditional workflow are discussed.

  12. Now and next-generation sequencing techniques: future of sequence analysis using cloud computing.

    Science.gov (United States)

    Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav

    2012-01-01

    Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed "cloud computing") has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows.

  13. SEQUENCING AND SEQUENCE ANALYSIS OF MYOSTATIN GENE IN THE EXON 1 OF THE CAMEL (CAMELUS DROMEDARIUS

    Directory of Open Access Journals (Sweden)

    M. G. SHAH, A. S. QURESHI1, M. REISSMANN2 AND H. J. SCHWARTZ3

    2006-10-01

    Full Text Available Myostatin, also called growth differentiation factor-8 (GDF-8, is a member of the mammalian growth transforming family (TGF-beta superfamily, which is expressed specifically in developing an adult skeletal muscle. Muscular hypertrophy allele (mh allele in the double muscle breeds involved mutation within the myostatin gene. Genomic DNA was isolated from the camel hair using NucleoSpin Tissue kit. Two animals of each of the six breeds namely, Marecha, Dhatti, Larri, Kohi, Sakrai and Cambelpuri were used for sequencing. For PCR amplification of the gene, a primer pair was designed from homolog regions of already published sequences of farm animals from GenBank. Results showed that camel myostatin possessed more than 90% homology with that of cattle, sheep and pig. Camel formed separate cluster from the pig in spite of having high homology (98% and showed 94% homology with cattle and sheep as reported in literature. Sequence analysis of the PCR amplified part of exon 1 (256 bp of the camel myostatin was identical among six camel breeds.

  14. An Imaging And Graphics Workstation For Image Sequence Analysis

    Science.gov (United States)

    Mostafavi, Hassan

    1990-01-01

    This paper describes an application-specific engineering workstation designed and developed to analyze imagery sequences from a variety of sources. The system combines the software and hardware environment of the modern graphic-oriented workstations with the digital image acquisition, processing and display techniques. The objective is to achieve automation and high throughput for many data reduction tasks involving metric studies of image sequences. The applications of such an automated data reduction tool include analysis of the trajectory and attitude of aircraft, missile, stores and other flying objects in various flight regimes including launch and separation as well as regular flight maneuvers. The workstation can also be used in an on-line or off-line mode to study three-dimensional motion of aircraft models in simulated flight conditions such as wind tunnels. The system's key features are: 1) Acquisition and storage of image sequences by digitizing real-time video or frames from a film strip; 2) computer-controlled movie loop playback, slow motion and freeze frame display combined with digital image sharpening, noise reduction, contrast enhancement and interactive image magnification; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored image sequence; 4) automatic and manual field-of-view and spatial calibration; 5) image sequence data base generation and management, including the measurement data products; 6) off-line analysis software for trajectory plotting and statistical analysis; 7) model-based estimation and tracking of object attitude angles; and 8) interface to a variety of video players and film transport sub-systems.

  15. Multilocus sequence analysis of Treponema denticola strains of diverse origin

    Directory of Open Access Journals (Sweden)

    Mo Sisu

    2013-02-01

    Full Text Available Abstract Background The oral spirochete bacterium Treponema denticola is associated with both the incidence and severity of periodontal disease. Although the biological or phenotypic properties of a significant number of T. denticola isolates have been reported in the literature, their genetic diversity or phylogeny has never been systematically investigated. Here, we describe a multilocus sequence analysis (MLSA of 20 of the most highly studied reference strains and clinical isolates of T. denticola; which were originally isolated from subgingival plaque samples taken from subjects from China, Japan, the Netherlands, Canada and the USA. Results The sequences of the 16S ribosomal RNA gene, and 7 conserved protein-encoding genes (flaA, recA, pyrH, ppnK, dnaN, era and radC were successfully determined for each strain. Sequence data was analyzed using a variety of bioinformatic and phylogenetic software tools. We found no evidence of positive selection or DNA recombination within the protein-encoding genes, where levels of intraspecific sequence polymorphism varied from 18.8% (flaA to 8.9% (dnaN. Phylogenetic analysis of the concatenated protein-encoding gene sequence data (ca. 6,513 nucleotides for each strain using Bayesian and maximum likelihood approaches indicated that the T. denticola strains were monophyletic, and formed 6 well-defined clades. All analyzed T. denticola strains appeared to have a genetic origin distinct from that of ‘Treponema vincentii’ or Treponema pallidum. No specific geographical relationships could be established; but several strains isolated from different continents appear to be closely related at the genetic level. Conclusions Our analyses indicate that previous biological and biophysical investigations have predominantly focused on a subset of T. denticola strains with a relatively narrow range of genetic diversity. Our methodology and results establish a genetic framework for the discrimination and phylogenetic

  16. Sirius PSB: a generic system for analysis of biological sequences.

    Science.gov (United States)

    Koh, Chuan Hock; Lin, Sharene; Jedd, Gregory; Wong, Limsoon

    2009-12-01

    Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models - one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at: http://compbio.ddns.comp.nus.edu.sg/~sirius.

  17. Molecular characterization of Giardia psittaci by multilocus sequence analysis.

    Science.gov (United States)

    Abe, Niichiro; Makino, Ikuko; Kojima, Atsushi

    2012-12-01

    Multilocus sequence analyses targeting small subunit ribosomal DNA (SSU rDNA), elongation factor 1 alpha (ef1α), glutamate dehydrogenase (gdh), and beta giardin (β-giardin) were performed on Giardia psittaci isolates from three Budgerigars (Melopsittacus undulates) and four Barred parakeets (Bolborhynchus lineola) kept in individual households or imported from overseas. Nucleotide differences and phylogenetic analyses at four loci indicate the distinction of G. psittaci from the other known Giardia species: Giardia muris, Giardia microti, Giardia ardeae, and Giardia duodenalis assemblages. Furthermore, G. psittaci was related more closely to G. duodenalis than to the other known Giardia species, except for G. microti. Conflicting signals regarded as "double peaks" were found at the same nucleotide positions of the ef1α in all isolates. However, the sequences of the other three loci, including gdh and β-giardin, which are known to be highly variable, from all isolates were also mutually identical at every locus. They showed no double peaks. These results suggest that double peaks found in the ef1α sequences are caused not by mixed infection with genetically different G. psittaci isolates but by allelic sequence heterogeneity (ASH), which is observed in diplomonad lineages including G. duodenalis. No sequence difference was found in any G. psittaci isolates at the gdh and β-giardin, suggesting that G. psittaci is indeed not more diverse genetically than other Giardia species. This report is the first to provide evidence related to the genetic characteristics of G. psittaci obtained using multilocus sequence analysis. Copyright © 2012 Elsevier B.V. All rights reserved.

  18. CISAPS: Complex Informational Spectrum for the Analysis of Protein Sequences

    Directory of Open Access Journals (Sweden)

    Charalambos Chrysostomou

    2015-01-01

    Full Text Available Complex informational spectrum analysis for protein sequences (CISAPS and its web-based server are developed and presented. As recent studies show, only the use of the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient. Therefore, CISAPS is developed to consider and provide results in three forms including absolute, real, and imaginary spectrum. Biologically related features to the analysis of influenza A subtypes as presented as a case study in this study can also appear individually either in the real or imaginary spectrum. As the results presented, protein classes can present similarities or differences according to the features extracted from CISAPS web server. These associations are probable to be related with the protein feature that the specific amino acid index represents. In addition, various technical issues such as zero-padding and windowing that may affect the analysis are also addressed. CISAPS uses an expanded list of 611 unique amino acid indices where each one represents a different property to perform the analysis. This web-based server enables researchers with little knowledge of signal processing methods to apply and include complex informational spectrum analysis to their work.

  19. Multigene analysis of lophophorate and chaetognath phylogenetic relationships.

    Science.gov (United States)

    Helmkampf, Martin; Bruchhaus, Iris; Hausdorf, Bernhard

    2008-01-01

    Maximum likelihood and Bayesian inference analyses of seven concatenated fragments of nuclear-encoded housekeeping genes indicate that Lophotrochozoa is monophyletic, i.e., the lophophorate groups Bryozoa, Brachiopoda and Phoronida are more closely related to molluscs and annelids than to Deuterostomia or Ecdysozoa. Lophophorates themselves, however, form a polyphyletic assemblage. The hypotheses that they are monophyletic and more closely allied to Deuterostomia than to Protostomia can be ruled out with both the approximately unbiased test and the expected likelihood weights test. The existence of Phoronozoa, a putative clade including Brachiopoda and Phoronida, has also been rejected. According to our analyses, phoronids instead share a more recent common ancestor with bryozoans than with brachiopods. Platyhelminthes is the sister group of Lophotrochozoa. Together these two constitute Spiralia. Although Chaetognatha appears as the sister group of Priapulida within Ecdysozoa in our analyses, alternative hypothesis concerning chaetognath relationships could not be rejected.

  20. CAFE: aCcelerated Alignment-FrEe sequence analysis.

    Science.gov (United States)

    Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A; Waterman, Michael S; Sun, Fengzhu

    2017-07-03

    Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Environmental impact analysis for the main accidental sequences of ignitor

    International Nuclear Information System (INIS)

    Carpignano, A.; Francabandiera, S.; Vella, R.; Zucchetti, M.

    1996-01-01

    A safety analysis study has been applied to the Ignitor machine using Probabilistic Safety Assessment. The main initiating events have been identified, and accident sequences have been studied by means of traditional methods such as Failure Mode and Effect Analysis (FMEA), Fault Trees (FT) and Event Trees (ET). The consequences of the radioactive environmental releases have been assessed in terms of Effective Dose Equivalent (EDEs) to the Most Exposed Individuals (MEI) of the chosen site, by means of a population dose code. Results point out the low enviromental impact of the machine. 13 refs., 1 fig., 3 tabs

  2. Analysis of sequence diversity through internal transcribed spacers and simple sequence repeats to identify Dendrobium species.

    Science.gov (United States)

    Liu, Y T; Chen, R K; Lin, S J; Chen, Y C; Chin, S W; Chen, F C; Lee, C Y

    2014-04-08

    The Orchidaceae is one of the largest and most diverse families of flowering plants. The Dendrobium genus has high economic potential as ornamental plants and for medicinal purposes. In addition, the species of this genus are able to produce large crops. However, many Dendrobium varieties are very similar in outward appearance, making it difficult to distinguish one species from another. This study demonstrated that the 12 Dendrobium species used in this study may be divided into 2 groups by internal transcribed spacer (ITS) sequence analysis. Red and yellow flowers may also be used to separate these species into 2 main groups. In particular, the deciduous characteristic is associated with the ITS genetic diversity of the A group. Of 53 designed simple sequence repeat (SSR) primer pairs, 7 pairs were polymorphic for polymerase chain reaction products that were amplified from a specific band. The results of this study demonstrate that these 7 SSR primer pairs may potentially be used to identify Dendrobium species and their progeny in future studies.

  3. Using Behavior Sequence Analysis to Map Serial Killers' Life Histories.

    Science.gov (United States)

    Keatley, David A; Golightly, Hayley; Shephard, Rebecca; Yaksic, Enzo; Reid, Sasha

    2018-03-01

    The aim of the current research was to provide a novel method for mapping the developmental sequences of serial killers' life histories. An in-depth biographical account of serial killers' lives, from birth through to conviction, was gained and analyzed using Behavior Sequence Analysis. The analyses highlight similarities in behavioral events across the serial killers' lives, indicating not only which risk factors occur, but the temporal order of these factors. Results focused on early childhood environment, indicating the role of parental abuse; behaviors and events surrounding criminal histories of serial killers, showing that many had previous convictions and were known to police for other crimes; behaviors surrounding their murders, highlighting differences in victim choice and modus operandi; and, finally, trial pleas and convictions. The present research, therefore, provides a novel approach to synthesizing large volumes of data on criminals and presenting results in accessible, understandable outcomes.

  4. Detection of high frequency of mutations in a breast and/or ovarian cancer cohort: implications of embracing a multi-gene panel in molecular diagnosis in India.

    Science.gov (United States)

    Mannan, Ashraf U; Singh, Jaya; Lakshmikeshava, Ravikiran; Thota, Nishita; Singh, Suhasini; Sowmya, T S; Mishra, Avshesh; Sinha, Aditi; Deshwal, Shivani; Soni, Megha R; Chandrasekar, Anbukayalvizhi; Ramesh, Bhargavi; Ramamurthy, Bharat; Padhi, Shila; Manek, Payal; Ramalingam, Ravi; Kapoor, Suman; Ghosh, Mithua; Sankaran, Satish; Ghosh, Arunabha; Veeramachaneni, Vamsi; Ramamoorthy, Preveen; Hariharan, Ramesh; Subramanian, Kalyanasundaram

    2016-06-01

    Breast and/or ovarian cancer (BOC) are among the most frequently diagnosed forms of hereditary cancers and leading cause of death in India. This emphasizes on the need for a cost-effective method for early detection of these cancers. We sequenced 141 unrelated patients and families with BOC using the TruSight Cancer panel, which includes 13 genes strongly associated with risk of inherited BOC. Multi-gene sequencing was done on the Illumina MiSeq platform. Genetic variations were identified using the Strand NGS software and interpreted using the StrandOmics platform. We were able to detect pathogenic mutations in 51 (36.2%) cases, out of which 19 were novel mutations. When we considered familial breast cancer cases only, the detection rate increased to 52%. When cases were stratified based on age of diagnosis into three categories, ⩽40 years, 40-50 years and >50 years, the detection rates were higher in the first two categories (44.4% and 53.4%, respectively) as compared with the third category, in which it was 26.9%. Our study suggests that next-generation sequencing-based multi-gene panels increase the sensitivity of mutation detection and help in identifying patients with a high risk of developing cancer as compared with sequential tests of individual genes.

  5. Swab-to-Sequence: Real-time Data Analysis Platform for the Biomolecule Sequencer

    Data.gov (United States)

    National Aeronautics and Space Administration — DNA was successfully sequenced on the ISS in 2016, but the DNA sequenced was prepared on the ground. With FY’16 IRAD funds, the same team developed a...

  6. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    STORAGESEVER

    2010-07-19

    Jul 19, 2010 ... and antisense primers, a single band of 573 base pairs .... Amino acid sequence alignment of Cluster I and Cluster II of phylogenetic tree. First ten sequences ... sequence weighting, postion-spiecific gap penalties and weight.

  7. Linear discriminant analysis of character sequences using occurrences of words

    KAUST Repository

    Dutta, Subhajit; Chaudhuri, Probal; Ghosh, Anil

    2014-01-01

    Classification of character sequences, where the characters come from a finite set, arises in disciplines such as molecular biology and computer science. For discriminant analysis of such character sequences, the Bayes classifier based on Markov models turns out to have class boundaries defined by linear functions of occurrences of words in the sequences. It is shown that for such classifiers based on Markov models with unknown orders, if the orders are estimated from the data using cross-validation, the resulting classifier has Bayes risk consistency under suitable conditions. Even when Markov models are not valid for the data, we develop methods for constructing classifiers based on linear functions of occurrences of words, where the word length is chosen by cross-validation. Such linear classifiers are constructed using ideas of support vector machines, regression depth, and distance weighted discrimination. We show that classifiers with linear class boundaries have certain optimal properties in terms of their asymptotic misclassification probabilities. The performance of these classifiers is demonstrated in various simulated and benchmark data sets.

  8. Planarian homeobox genes: cloning, sequence analysis, and expression.

    Science.gov (United States)

    Garcia-Fernàndez, J; Baguñà, J; Saló, E

    1991-01-01

    Freshwater planarians (Platyhelminthes, Turbellaria, and Tricladida) are acoelomate, triploblastic, unsegmented, and bilaterally symmetrical organisms that are mainly known for their ample power to regenerate a complete organism from a small piece of their body. To identify potential pattern-control genes in planarian regeneration, we have isolated two homeobox-containing genes, Dth-1 and Dth-2 [Dugesia (Girardia) tigrina homeobox], by using degenerate oligonucleotides corresponding to the most conserved amino acid sequence from helix-3 of the homeodomain. Dth-1 and Dth-2 homeodomains are closely related (68% at the nucleotide level and 78% at the protein level) and show the conserved residues characteristic of the homeodomains identified to data. Similarity with most homeobox sequences is low (30-50%), except with Drosophila NK homeodomains (80-82% with NK-2) and the rodent TTF-1 homeodomain (77-87%). Some unusual amino acid residues specific to NK-2, TTF-1, Dth-1, and Dth-2 can be observed in the recognition helix (helix-3) and may define a family of homeodomains. The deduced amino acid sequences from the cDNAs contain, in addition to the homeodomain, other domains also present in various homeobox-containing genes. The expression of both genes, detected by Northern blot analysis, appear slightly higher in cephalic regions than in the rest of the intact organism, while a slight increase is detected in the central period (5 days) or regeneration. Images PMID:1714599

  9. Analysis of correlations between sites in models of protein sequences

    International Nuclear Information System (INIS)

    Giraud, B.G.; Lapedes, A.; Liu, L.C.

    1998-01-01

    A criterion based on conditional probabilities, related to the concept of algorithmic distance, is used to detect correlated mutations at noncontiguous sites on sequences. We apply this criterion to the problem of analyzing correlations between sites in protein sequences; however, the analysis applies generally to networks of interacting sites with discrete states at each site. Elementary models, where explicit results can be derived easily, are introduced. The number of states per site considered ranges from 2, illustrating the relation to familiar classical spin systems, to 20 states, suitable for representing amino acids. Numerical simulations show that the criterion remains valid even when the genetic history of the data samples (e.g., protein sequences), as represented by a phylogenetic tree, introduces nonindependence between samples. Statistical fluctuations due to finite sampling are also investigated and do not invalidate the criterion. A subsidiary result is found: The more homogeneous a population, the more easily its average properties can drift from the properties of its ancestor. copyright 1998 The American Physical Society

  10. Linear discriminant analysis of character sequences using occurrences of words

    KAUST Repository

    Dutta, Subhajit

    2014-02-01

    Classification of character sequences, where the characters come from a finite set, arises in disciplines such as molecular biology and computer science. For discriminant analysis of such character sequences, the Bayes classifier based on Markov models turns out to have class boundaries defined by linear functions of occurrences of words in the sequences. It is shown that for such classifiers based on Markov models with unknown orders, if the orders are estimated from the data using cross-validation, the resulting classifier has Bayes risk consistency under suitable conditions. Even when Markov models are not valid for the data, we develop methods for constructing classifiers based on linear functions of occurrences of words, where the word length is chosen by cross-validation. Such linear classifiers are constructed using ideas of support vector machines, regression depth, and distance weighted discrimination. We show that classifiers with linear class boundaries have certain optimal properties in terms of their asymptotic misclassification probabilities. The performance of these classifiers is demonstrated in various simulated and benchmark data sets.

  11. Sequence analysis of PROTEOLYSIS 6 from Solanum lycopersicum

    Science.gov (United States)

    Roslan, Nur Farhana; Chew, Bee Lyn; Goh, Hoe-Han; Isa, Nurulhikma Md

    2018-04-01

    The N-end rule pathway is a protein degradation pathway that relates the protein half-life with the identity of its N-terminal residues. A destabilizing N-terminal residues is created by enzymatic reaction or chemical modifications. This destabilized substrate will be recognized by PROTEOLYSIS 6 (PRT6) protein, which encodes an E3 ligase enzyme and resulted in substrate degradation by proteasome. PRT6 has been studied in Arabidopsis thaliana and barley but not yet been studied in fleshy fruit plants. Hence, this study was carried out in tomato that is known as the model for fleshy fruit plants. BLASTX analysis identified that Solyc09g010830 which encodes for a PRT6 gene in tomato based on its sequence similarity with PRT6 in A. thaliana. In silico gene expression analysis shows that PRT6 gene was highly expressed in tomato fruits breaker +5. Co-expression analysis shows that PRT6 may not only involved in abiotic stresses but also in biotic stresses. The objective is to analyze the sequence and characterize PRT6 gene in tomato.

  12. Determining physical constraints in transcriptional initiationcomplexes using DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Shultzaberger, Ryan K.; Chiang, Derek Y.; Moses, Alan M.; Eisen,Michael B.

    2007-07-01

    Eukaryotic gene expression is often under the control ofcooperatively acting transcription factors whose binding is limited bystructural constraints. By determining these structural constraints, wecan understand the "rules" that define functional cooperativity.Conversely, by understanding the rules of binding, we can inferstructural characteristics. We have developed an information theory basedmethod for approximating the physical limitations of cooperativeinteractions by comparing sequence analysis to microarray expressiondata. When applied to the coordinated binding of the sulfur amino acidregulatory protein Met4 by Cbf1 and Met31, we were able to create acombinatorial model that can correctly identify Met4 regulatedgenes.

  13. Advances in the diagnosis of hereditary kidney cancer: Initial results of a multigene panel test.

    Science.gov (United States)

    Nguyen, Kevin A; Syed, Jamil S; Espenschied, Carin R; LaDuca, Holly; Bhagat, Ansh M; Suarez-Sarmiento, Alfredo; O'Rourke, Timothy K; Brierley, Karina L; Hofstatter, Erin W; Shuch, Brian

    2017-11-15

    Panel testing has been recently introduced to evaluate hereditary cancer; however, limited information is available regarding its use in kidney cancer. The authors retrospectively reviewed test results and clinical data from patients who underwent targeted multigene panel testing of up to 19 genes associated with hereditary kidney cancer from 2013 to 2016. The frequency of positive (mutation/variant likely pathogenic), inconclusive (variant of unknown significance), and negative results was evaluated. A logistic regression analysis evaluated predictive factors for a positive test. Patients (n = 1235) had a median age at diagnosis of 46 years, which was significantly younger than the US population of individuals with kidney cancer (P kidney cancer. Panel tests may be particularly useful for patients who lack distinguishing clinical characteristics of known hereditary kidney cancer syndromes. The current results support the use of early age of onset for genetic counseling and/or testing. Cancer 2017;123:4363-71. © 2017 American Cancer Society. © 2017 American Cancer Society.

  14. C-State: an interactive web app for simultaneous multi-gene visualization and comparative epigenetic pattern search.

    Science.gov (United States)

    Sowpati, Divya Tej; Srivastava, Surabhi; Dhawan, Jyotsna; Mishra, Rakesh K

    2017-09-13

    Comparative epigenomic analysis across multiple genes presents a bottleneck for bench biologists working with NGS data. Despite the development of standardized peak analysis algorithms, the identification of novel epigenetic patterns and their visualization across gene subsets remains a challenge. We developed a fast and interactive web app, C-State (Chromatin-State), to query and plot chromatin landscapes across multiple loci and cell types. C-State has an interactive, JavaScript-based graphical user interface and runs locally in modern web browsers that are pre-installed on all computers, thus eliminating the need for cumbersome data transfer, pre-processing and prior programming knowledge. C-State is unique in its ability to extract and analyze multi-gene epigenetic information. It allows for powerful GUI-based pattern searching and visualization. We include a case study to demonstrate its potential for identifying user-defined epigenetic trends in context of gene expression profiles.

  15. Streaming support for data intensive cloud-based sequence analysis.

    Science.gov (United States)

    Issa, Shadi A; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of "resources-on-demand" and "pay-as-you-go", scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

  16. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Shadi A. Issa

    2013-01-01

    Full Text Available Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client’s site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

  17. Next-generation sequence analysis of cancer xenograft models.

    Directory of Open Access Journals (Sweden)

    Fernando J Rossello

    Full Text Available Next-generation sequencing (NGS studies in cancer are limited by the amount, quality and purity of tissue samples. In this situation, primary xenografts have proven useful preclinical models. However, the presence of mouse-derived stromal cells represents a technical challenge to their use in NGS studies. We examined this problem in an established primary xenograft model of small cell lung cancer (SCLC, a malignancy often diagnosed from small biopsy or needle aspirate samples. Using an in silico strategy that assign reads according to species-of-origin, we prospectively compared NGS data from primary xenograft models with matched cell lines and with published datasets. We show here that low-coverage whole-genome analysis demonstrated remarkable concordance between published genome data and internal controls, despite the presence of mouse genomic DNA. Exome capture sequencing revealed that this enrichment procedure was highly species-specific, with less than 4% of reads aligning to the mouse genome. Human-specific expression profiling with RNA-Seq replicated array-based gene expression experiments, whereas mouse-specific transcript profiles correlated with published datasets from human cancer stroma. We conclude that primary xenografts represent a useful platform for complex NGS analysis in cancer research for tumours with limited sample resources, or those with prominent stromal cell populations.

  18. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Science.gov (United States)

    Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461

  19. Extended -Regular Sequence for Automated Analysis of Microarray Images

    Directory of Open Access Journals (Sweden)

    Jin Hee-Jeong

    2006-01-01

    Full Text Available Microarray study enables us to obtain hundreds of thousands of expressions of genes or genotypes at once, and it is an indispensable technology for genome research. The first step is the analysis of scanned microarray images. This is the most important procedure for obtaining biologically reliable data. Currently most microarray image processing systems require burdensome manual block/spot indexing work. Since the amount of experimental data is increasing very quickly, automated microarray image analysis software becomes important. In this paper, we propose two automated methods for analyzing microarray images. First, we propose the extended -regular sequence to index blocks and spots, which enables a novel automatic gridding procedure. Second, we provide a methodology, hierarchical metagrid alignment, to allow reliable and efficient batch processing for a set of microarray images. Experimental results show that the proposed methods are more reliable and convenient than the commercial tools.

  20. Sequence Quality Analysis Tool for HIV Type 1 Protease and Reverse Transcriptase

    OpenAIRE

    DeLong, Allison K.; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W.; Kantor, Rami

    2012-01-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802...

  1. Genomic organization and molecular phylogenies of the beta (β keratin multigene family in the chicken (Gallus gallus and zebra finch (Taeniopygia guttata: implications for feather evolution

    Directory of Open Access Journals (Sweden)

    Sawyer Roger H

    2010-05-01

    Full Text Available Abstract Background The epidermal appendages of reptiles and birds are constructed of beta (β keratins. The molecular phylogeny of these keratins is important to understanding the evolutionary origin of these appendages, especially feathers. Knowing that the crocodilian β-keratin genes are closely related to those of birds, the published genomes of the chicken and zebra finch provide an opportunity not only to compare the genomic organization of their β-keratins, but to study their molecular evolution in archosaurians. Results The subfamilies (claw, feather, feather-like, and scale of β-keratin genes are clustered in the same 5' to 3' order on microchromosome 25 in chicken and zebra finch, although the number of claw and feather genes differs between the species. Molecular phylogenies show that the monophyletic scale genes are the basal group within birds and that the monophyletic avian claw genes form the basal group to all feather and feather-like genes. Both species have a number of feather clades on microchromosome 27 that form monophyletic groups. An additional monophyletic cluster of feather genes exist on macrochromosome 2 for each species. Expression sequence tag analysis for the chicken demonstrates that all feather β-keratin clades are expressed. Conclusions Similarity in the overall genomic organization of β-keratins in Galliformes and Passeriformes suggests similar organization in all Neognathae birds, and perhaps in the ancestral lineages leading to modern birds, such as the paravian Anchiornis huxleyi. Phylogenetic analyses demonstrate that evolution of archosaurian epidermal appendages in the lineage leading to birds was accompanied by duplication and divergence of an ancestral β-keratin gene cluster. As morphological diversification of epidermal appendages occurred and the β-keratin multigene family expanded, novel β-keratin genes were selected for novel functions within appendages such as feathers.

  2. Plant X-tender: An extension of the AssemblX system for the assembly and expression of multigene constructs in plants

    Science.gov (United States)

    Machens, Fabian; Coll, Anna; Baebler, Špela; Messerschmidt, Katrin; Gruden, Kristina

    2018-01-01

    Cloning multiple DNA fragments for delivery of several genes of interest into the plant genome is one of the main technological challenges in plant synthetic biology. Despite several modular assembly methods developed in recent years, the plant biotechnology community has not widely adopted them yet, probably due to the lack of appropriate vectors and software tools. Here we present Plant X-tender, an extension of the highly efficient, scar-free and sequence-independent multigene assembly strategy AssemblX, based on overlap-depended cloning methods and rare-cutting restriction enzymes. Plant X-tender consists of a set of plant expression vectors and the protocols for most efficient cloning into the novel vector set needed for plant expression and thus introduces advantages of AssemblX into plant synthetic biology. The novel vector set covers different backbones and selection markers to allow full design flexibility. We have included ccdB counterselection, thereby allowing the transfer of multigene constructs into the novel vector set in a straightforward and highly efficient way. Vectors are available as empty backbones and are fully flexible regarding the orientation of expression cassettes and addition of linkers between them, if required. We optimised the assembly and subcloning protocol by testing different scar-less assembly approaches: the noncommercial SLiCE and TAR methods and the commercial Gibson assembly and NEBuilder HiFi DNA assembly kits. Plant X-tender was applicable even in combination with low efficient homemade chemically competent or electrocompetent Escherichia coli. We have further validated the developed procedure for plant protein expression by cloning two cassettes into the newly developed vectors and subsequently transferred them to Nicotiana benthamiana in a transient expression setup. Thereby we show that multigene constructs can be delivered into plant cells in a streamlined and highly efficient way. Our results will support faster

  3. Plant X-tender: An extension of the AssemblX system for the assembly and expression of multigene constructs in plants.

    Science.gov (United States)

    Lukan, Tjaša; Machens, Fabian; Coll, Anna; Baebler, Špela; Messerschmidt, Katrin; Gruden, Kristina

    2018-01-01

    Cloning multiple DNA fragments for delivery of several genes of interest into the plant genome is one of the main technological challenges in plant synthetic biology. Despite several modular assembly methods developed in recent years, the plant biotechnology community has not widely adopted them yet, probably due to the lack of appropriate vectors and software tools. Here we present Plant X-tender, an extension of the highly efficient, scar-free and sequence-independent multigene assembly strategy AssemblX, based on overlap-depended cloning methods and rare-cutting restriction enzymes. Plant X-tender consists of a set of plant expression vectors and the protocols for most efficient cloning into the novel vector set needed for plant expression and thus introduces advantages of AssemblX into plant synthetic biology. The novel vector set covers different backbones and selection markers to allow full design flexibility. We have included ccdB counterselection, thereby allowing the transfer of multigene constructs into the novel vector set in a straightforward and highly efficient way. Vectors are available as empty backbones and are fully flexible regarding the orientation of expression cassettes and addition of linkers between them, if required. We optimised the assembly and subcloning protocol by testing different scar-less assembly approaches: the noncommercial SLiCE and TAR methods and the commercial Gibson assembly and NEBuilder HiFi DNA assembly kits. Plant X-tender was applicable even in combination with low efficient homemade chemically competent or electrocompetent Escherichia coli. We have further validated the developed procedure for plant protein expression by cloning two cassettes into the newly developed vectors and subsequently transferred them to Nicotiana benthamiana in a transient expression setup. Thereby we show that multigene constructs can be delivered into plant cells in a streamlined and highly efficient way. Our results will support faster

  4. Sequencing Infrastructure Investments under Deep Uncertainty Using Real Options Analysis

    Directory of Open Access Journals (Sweden)

    Nishtha Manocha

    2018-02-01

    Full Text Available The adaptation tipping point and adaptation pathway approach developed to make decisions under deep uncertainty do not shed light on which among the multiple available pathways should be chosen as the preferred pathway. This creates the need to extend these approaches by means of suitable tools that can help sequence actions and subsequently enable the outlining of relevant policies. This paper presents two sequencing approaches, namely, the “Build to Target” and “Build Up” approach, to aid in sub-selecting a set of preferred pathways. Both approaches differ in the levels of flexibility they offer. They are exemplified by means of two case studies wherein the Net Present Valuation and the Real Options Analysis are employed as selection criterions. The results demonstrate the benefit of these two approaches when used in conjunction with the adaptation pathways and show how the pathways selected by means of a Build to Target approach generally have a value greater than, or at least the same as, the pathways selected by the Build Up approach. Further, this paper also demonstrates the capacity of Real Options to quantify and capture the economic value of flexibility, which cannot be done by traditional valuation approaches such as Net Present Valuation.

  5. Reverse transcriptase sequences from mulberry LTR retrotransposons: characterization analysis

    Directory of Open Access Journals (Sweden)

    Ma Bi

    2017-10-01

    Full Text Available Copia and Gypsy play important roles in structural, functional and evolutionary dynamics of plant genomes. In this study, a total of 106 and 101, Copia and Gypsy reverse transcriptase (rt were amplified respectively in the Morus notabilis genome using degenerate primers. All sequences exhibited high levels of heterogeneity, were rich in AT and possessed higher sequence divergence of Copia rt in comparison to Gypsy rt. Two reasons are likely to account for this phenomenon: a these elements often experience deletions or fragmentation by illegitimate or unequal homologous recombination in the transposition process; b strong purifying selective pressure drives the evolution of these elements through “selective silencing” with random mutation and eventual deletion from the host genome. Interestingly, mulberry rt clustered with other rt from distantly related taxa according to the phylogenetic analysis. This phenomenon did not result from horizontal transposable element transfer. Results obtained from fluorescence in situ hybridization revealed that most of the hybridization signals were preferentially concentrated in pericentromeric and distal regions of chromosomes, and these elements may play important roles in the regions in which they are found. Results of this study support the continued pursuit of further functional studies of Copia and Gypsy in the mulberry genome.

  6. Nonlinear analysis of sequence repeats of multi-domain proteins

    Energy Technology Data Exchange (ETDEWEB)

    Huang Yanzhao [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Li Mingfeng [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Xiao Yi [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China)]. E-mail: lmf_bill@sina.com

    2007-11-15

    Many multi-domain proteins have repetitive three-dimensional structures but nearly-random amino acid sequences. In the present paper, by using a modified recurrence plot proposed by us previously, we show that these amino acid sequences have hidden repetitions in fact. These results indicate that the repetitive domain structures are encoded by the repetitive sequences. This also gives a method to detect the repetitive domain structures directly from amino acid sequences.

  7. Human factors review for Severe Accident Sequence Analysis (SASA)

    International Nuclear Information System (INIS)

    Krois, P.A.; Haas, P.M.; Manning, J.J.; Bovell, C.R.

    1984-01-01

    The paper will discuss work being conducted during this human factors review including: (1) support of the Severe Accident Sequence Analysis (SASA) Program based on an assessment of operator actions, and (2) development of a descriptive model of operator severe accident management. Research by SASA analysts on the Browns Ferry Unit One (BF1) anticipated transient without scram (ATWS) was supported through a concurrent assessment of operator performance to demonstrate contributions to SASA analyses from human factors data and methods. A descriptive model was developed called the Function Oriented Accident Management (FOAM) model, which serves as a structure for bridging human factors, operations, and engineering expertise and which is useful for identifying needs/deficiencies in the area of accident management. The assessment of human factors issues related to ATWS required extensive coordination with SASA analysts. The analysis was consolidated primarily to six operator actions identified in the Emergency Procedure Guidelines (EPGs) as being the most critical to the accident sequence. These actions were assessed through simulator exercises, qualitative reviews, and quantitative human reliability analyses. The FOAM descriptive model assumes as a starting point that multiple operator/system failures exceed the scope of procedures and necessitates a knowledge-based emergency response by the operators. The FOAM model provides a functionally-oriented structure for assembling human factors, operations, and engineering data and expertise into operator guidance for unconventional emergency responses to mitigate severe accident progression and avoid/minimize core degradation. Operators must also respond to potential radiological release beyond plant protective barriers. Research needs in accident management and potential uses of the FOAM model are described. 11 references, 1 figure

  8. Sequence analysis of cereal sucrose synthase genes and isolation ...

    African Journals Online (AJOL)

    SERVER

    2007-10-18

    Oct 18, 2007 ... sequencing of sucrose synthase gene fragment from sor- ghum using primers designed at their conserved exons. MATERIALS AND METHODS. Multiple sequence alignment. Sucrose synthase gene sequences of various cereals like rice, maize, and barley were accessed from NCBI Genbank database.

  9. Chimera: construction of chimeric sequences for phylogenetic analysis

    NARCIS (Netherlands)

    Leunissen, J.A.M.

    2003-01-01

    Chimera allows the construction of chimeric protein or nucleic acid sequence files by concatenating sequences from two or more sequence files in PHYLIP formats. It allows the user to interactively select genes and species from the input files. The concatenated result is stored to one single output

  10. Accident Sequence Evaluation Program: Human reliability analysis procedure

    Energy Technology Data Exchange (ETDEWEB)

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs.

  11. Accident Sequence Evaluation Program: Human reliability analysis procedure

    International Nuclear Information System (INIS)

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs

  12. A Quantitative Accident Sequence Analysis for a VHTR

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Jintae; Lee, Joeun; Jae, Moosung [Hanyang University, Seoul (Korea, Republic of)

    2016-05-15

    In Korea, the basic design features of VHTR are currently discussed in the various design concepts. Probabilistic risk assessment (PRA) offers a logical and structured method to assess risks of a large and complex engineered system, such as a nuclear power plant. It will be introduced at an early stage in the design, and will be upgraded at various design and licensing stages as the design matures and the design details are defined. Risk insights to be developed from the PRA are viewed as essential to developing a design that is optimized in meeting safety objectives and in interpreting the applicability of the existing demands to the safety design approach of the VHTR. In this study, initiating events which may occur in VHTRs were selected through MLD method. The initiating events were then grouped into four categories for the accident sequence analysis. Initiating events frequency and safety systems failure rate were calculated by using reliability data obtained from the available sources and fault tree analysis. After quantification, uncertainty analysis was conducted. The SR and LR frequency are calculated respectively 7.52E- 10/RY and 7.91E-16/RY, which are relatively less than the core damage frequency of LWRs.

  13. PCR-based isolation of multigene families: lessons from the avian MHC class IIB

    Czech Academy of Sciences Publication Activity Database

    Burri, R.; Promerová, Marta; Goebel, J.; Fumagalli, L.

    2014-01-01

    Roč. 14, č. 4 (2014), s. 778-788 ISSN 1755-098X R&D Projects: GA ČR GAP505/10/1871 Institutional support: RVO:68081766 Keywords : Birds * Major histocompatibility complex * Multigene families * PCR bias Subject RIV: EG - Zoology Impact factor: 3.712, year: 2014

  14. Comparing methods of classifying life courses: Sequence analysis and latent class analysis

    NARCIS (Netherlands)

    Elzinga, C.H.; Liefbroer, Aart C.; Han, Sapphire

    2017-01-01

    We compare life course typology solutions generated by sequence analysis (SA) and latent class analysis (LCA). First, we construct an analytic protocol to arrive at typology solutions for both methodologies and present methods to compare the empirical quality of alternative typologies. We apply this

  15. Comparing methods of classifying life courses: sequence analysis and latent class analysis

    NARCIS (Netherlands)

    Han, Y.; Liefbroer, A.C.; Elzinga, C.

    2017-01-01

    We compare life course typology solutions generated by sequence analysis (SA) and latent class analysis (LCA). First, we construct an analytic protocol to arrive at typology solutions for both methodologies and present methods to compare the empirical quality of alternative typologies. We apply this

  16. Large-scale chromatin remodeling at the immunoglobulin heavy chain locus: a paradigm for multigene regulation.

    Science.gov (United States)

    Bolland, Daniel J; Wood, Andrew L; Corcoran, Anne E

    2009-01-01

    V(D)J recombination in lymphocytes is the cutting and pasting together of antigen receptor genes in cis to generate the enormous variety of coding sequences required to produce diverse antigen receptor proteins. It is the key role of the adaptive immune response, which must potentially combat millions of different foreign antigens. Most antigen receptor loci have evolved to be extremely large and contain multiple individual V, D and J genes. The immunoglobulin heavy chain (Igh) and immunoglobulin kappa light chain (Igk) loci are the largest multigene loci in the mammalian genome and V(D)J recombination is one of the most complicated genetic processes in the nucleus. The challenge for the appropriate lymphocyte is one of macro-management-to make all of the antigen receptor genes in a particular locus available for recombination at the appropriate developmental time-point. Conversely, these large loci must be kept closed in lymphocytes in which they do not normally recombine, to guard against genomic instability generated by the DNA double strand breaks inherent to the V(D)J recombination process. To manage all of these demanding criteria, V(D)J recombination is regulated at numerous levels. It is restricted to lymphocytes since the Rag genes which control the DNA double-strand break step of recombination are only expressed in these cells. Within the lymphocyte lineage, immunoglobulin recombination is restricted to B-lymphocytes and TCR recombination to T-lymphocytes by regulation of locus accessibility, which occurs at multiple levels. Accessibility of recombination signal sequences (RSSs) flanking individual V, D and J genes at the nucleosomal level is the key micro-management mechanism, which is discussed in greater detail in other chapters. This chapter will explore how the antigen receptor loci are regulated as a whole, focussing on the Igh locus as a paradigm for the mechanisms involved. Numerous recent studies have begun to unravel the complex and

  17. Monthly streamflow forecasting using continuous wavelet and multi-gene genetic programming combination

    Science.gov (United States)

    Hadi, Sinan Jasim; Tombul, Mustafa

    2018-06-01

    Streamflow is an essential component of the hydrologic cycle in the regional and global scale and the main source of fresh water supply. It is highly associated with natural disasters, such as droughts and floods. Therefore, accurate streamflow forecasting is essential. Forecasting streamflow in general and monthly streamflow in particular is a complex process that cannot be handled by data-driven models (DDMs) only and requires pre-processing. Wavelet transformation is a pre-processing technique; however, application of continuous wavelet transformation (CWT) produces many scales that cause deterioration in the performance of any DDM because of the high number of redundant variables. This study proposes multigene genetic programming (MGGP) as a selection tool. After the CWT analysis, it selects important scales to be imposed into the artificial neural network (ANN). A basin located in the southeast of Turkey is selected as case study to prove the forecasting ability of the proposed model. One month ahead downstream flow is used as output, and downstream flow, upstream, rainfall, temperature, and potential evapotranspiration with associated lags are used as inputs. Before modeling, wavelet coherence transformation (WCT) analysis was conducted to analyze the relationship between variables in the time-frequency domain. Several combinations were developed to investigate the effect of the variables on streamflow forecasting. The results indicated a high localized correlation between the streamflow and other variables, especially the upstream. In the models of the standalone layout where the data were entered to ANN and MGGP without CWT, the performance is found poor. In the best-scale layout, where the best scale of the CWT identified as the highest correlated scale is chosen and enters to ANN and MGGP, the performance increased slightly. Using the proposed model, the performance improved dramatically particularly in forecasting the peak values because of the inclusion

  18. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    Science.gov (United States)

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  19. Frame sequences analysis technique of linear objects movement

    Science.gov (United States)

    Oshchepkova, V. Y.; Berg, I. A.; Shchepkin, D. V.; Kopylova, G. V.

    2017-12-01

    Obtaining data by noninvasive methods are often needed in many fields of science and engineering. This is achieved through video recording in various frame rate and light spectra. In doing so quantitative analysis of movement of the objects being studied becomes an important component of the research. This work discusses analysis of motion of linear objects on the two-dimensional plane. The complexity of this problem increases when the frame contains numerous objects whose images may overlap. This study uses a sequence containing 30 frames at the resolution of 62 × 62 pixels and frame rate of 2 Hz. It was required to determine the average velocity of objects motion. This velocity was found as an average velocity for 8-12 objects with the error of 15%. After processing dependencies of the average velocity vs. control parameters were found. The processing was performed in the software environment GMimPro with the subsequent approximation of the data obtained using the Hill equation.

  20. Transcriptome sequencing and positive selected genes analysis of Bombyx mandarina.

    Directory of Open Access Journals (Sweden)

    Tingcai Cheng

    Full Text Available The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG and posterior silk gland (PSG. Three sericin genes (sericin 1, sericin 2, and sericin 3 were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25 were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs and 361 insertion-deletions (INDELs were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research.

  1. Cloning, sequencing, and sequence analysis of two novel plasmids from the thermophilic anaerobic bacterium Anaerocellum thermophilum

    DEFF Research Database (Denmark)

    Clausen, Anders; Mikkelsen, Marie Just; Schrøder, I.

    2004-01-01

    The nucleotide sequence of two novel plasmids isolated from the extreme thermophilic anaerobic bacterium Anaerocellum thermophilum DSM6725 (A. thermophilum), growing optimally at 70degreesC, has been determined. pBAS2 was found to be a 3653 bp plasmid with a GC content of 43%, and the sequence re...... with highest similarity to DNA repair protein from Campylobacter jejuni (25% aa). Orf34 showed similarity to sigma factors with highest similarity (28% aa) to the sporulation specific Sigma factor, Sigma 28(K) from Bacillus thuringiensis....

  2. Multi-gene detection and identification of mosquito-borne RNA viruses using an oligonucleotide microarray.

    Directory of Open Access Journals (Sweden)

    Nathan D Grubaugh

    Full Text Available BACKGROUND: Arthropod-borne viruses are important emerging pathogens world-wide. Viruses transmitted by mosquitoes, such as dengue, yellow fever, and Japanese encephalitis viruses, infect hundreds of millions of people and animals each year. Global surveillance of these viruses in mosquito vectors using molecular based assays is critical for prevention and control of the associated diseases. Here, we report an oligonucleotide DNA microarray design, termed ArboChip5.1, for multi-gene detection and identification of mosquito-borne RNA viruses from the genera Flavivirus (family Flaviviridae, Alphavirus (Togaviridae, Orthobunyavirus (Bunyaviridae, and Phlebovirus (Bunyaviridae. METHODOLOGY/PRINCIPAL FINDINGS: The assay utilizes targeted PCR amplification of three genes from each virus genus for electrochemical detection on a portable, field-tested microarray platform. Fifty-two viruses propagated in cell-culture were used to evaluate the specificity of the PCR primer sets and the ArboChip5.1 microarray capture probes. The microarray detected all of the tested viruses and differentiated between many closely related viruses such as members of the dengue, Japanese encephalitis, and Semliki Forest virus clades. Laboratory infected mosquitoes were used to simulate field samples and to determine the limits of detection. Additionally, we identified dengue virus type 3, Japanese encephalitis virus, Tembusu virus, Culex flavivirus, and a Quang Binh-like virus from mosquitoes collected in Thailand in 2011 and 2012. CONCLUSIONS/SIGNIFICANCE: We demonstrated that the described assay can be utilized in a comprehensive field surveillance program by the broad-range amplification and specific identification of arboviruses from infected mosquitoes. Furthermore, the microarray platform can be deployed in the field and viral RNA extraction to data analysis can occur in as little as 12 h. The information derived from the ArboChip5.1 microarray can help to establish

  3. Automatic analysis of the 2015 Gorkha earthquake aftershock sequence.

    Science.gov (United States)

    Baillard, C.; Lyon-Caen, H.; Bollinger, L.; Rietbrock, A.; Letort, J.; Adhikari, L. B.

    2016-12-01

    The Mw 7.8 Gorkha earthquake, that partially ruptured the Main Himalayan Thrust North of Kathmandu on the 25th April 2015, was the largest and most catastrophic earthquake striking Nepal since the great M8.4 1934 earthquake. This mainshock was followed by multiple aftershocks, among them, two notable events that occurred on the 12th May with magnitudes of 7.3 Mw and 6.3 Mw. Due to these recent events it became essential for the authorities and for the scientific community to better evaluate the seismic risk in the region through a detailed analysis of the earthquake catalog, amongst others, the spatio-temporal distribution of the Gorkha aftershock sequence. Here we complement this first study by doing a microseismic study using seismic data coming from the eastern part of the Nepalese Seismological Center network associated to one broadband station in Everest. Our primary goal is to deliver an accurate catalog of the aftershock sequence. Due to the exceptional number of events detected we performed an automatic picking/locating procedure which can be splitted in 4 steps: 1) Coarse picking of the onsets using a classical STA/LTA picker, 2) phase association of picked onsets to detect and declare seismic events, 3) Kurtosis pick refinement around theoretical arrival times to increase picking and location accuracy and, 4) local magnitude calculation based amplitude of waveforms. This procedure is time efficient ( 1 sec/event), reduces considerably the location uncertainties ( 2 to 5 km errors) and increases the number of events detected compared to manual processing. Indeed, the automatic detection rate is 10 times higher than the manual detection rate. By comparing to the USGS catalog we were able to give a new attenuation law to compute local magnitudes in the region. A detailed analysis of the seismicity shows a clear migration toward the east of the region and a sudden decrease of seismicity 100 km east of Kathmandu which may reveal the presence of a tectonic

  4. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  5. A genome-wide analysis of lentivector integration sites using targeted sequence capture and next generation sequencing technology.

    Science.gov (United States)

    Ustek, Duran; Sirma, Sema; Gumus, Ergun; Arikan, Muzaffer; Cakiris, Aris; Abaci, Neslihan; Mathew, Jaicy; Emrence, Zeliha; Azakli, Hulya; Cosan, Fulya; Cakar, Atilla; Parlak, Mahmut; Kursun, Olcay

    2012-10-01

    One application of next-generation sequencing (NGS) is the targeted resequencing of interested genes which has not been used in viral integration site analysis of gene therapy applications. Here, we combined targeted sequence capture array and next generation sequencing to address the whole genome profiling of viral integration sites. Human 293T and K562 cells were transduced with a HIV-1 derived vector. A custom made DNA probe sets targeted pLVTHM vector used to capture lentiviral vector/human genome junctions. The captured DNA was sequenced using GS FLX platform. Seven thousand four hundred and eighty four human genome sequences flanking the long terminal repeats (LTR) of pLVTHM fragment sequences matched with an identity of at least 98% and minimum 50 bp criteria in both cells. In total, 203 unique integration sites were identified. The integrations in both cell lines were totally distant from the CpG islands and from the transcription start sites and preferentially located in introns. A comparison between the two cell lines showed that the lentiviral-transduced DNA does not have the same preferred regions in the two different cell lines. Copyright © 2012 Elsevier B.V. All rights reserved.

  6. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    Science.gov (United States)

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  7. Sequencing and analysis of an Irish human genome.

    LENUS (Irish Health Repository)

    Tong, Pin

    2010-01-01

    Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.

  8. Exome Sequence Analysis of 14 Families With High Myopia

    DEFF Research Database (Denmark)

    Kloss, Bethany A.; Tompson, Stuart W.; Whisenhunt, Kristina N.

    2017-01-01

    Purpose: To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Methods: Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sang...

  9. Database-driven primary analysis of raw sequencing data

    DEFF Research Database (Denmark)

    2014-01-01

    The present invention relates to methods for identifying the source of a biological sequence containing sample from raw sequencing reads. The method may be used to identify the source of unknown DNA and can be used for diagnostic, biodefense, food safety and quality, and hygiene applications...

  10. Accelerating next generation sequencing data analysis with system level optimizations.

    Science.gov (United States)

    Kathiresan, Nagarajan; Temanni, Ramzi; Almabrazi, Hakeem; Syed, Najeeb; Jithesh, Puthen V; Al-Ali, Rashid

    2017-08-22

    Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.

  11. The sequence and analysis of a Chinese pig genome

    Directory of Open Access Journals (Sweden)

    Fang Xiaodong

    2012-11-01

    Full Text Available Abstract Background The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP, as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome. Results Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes. Conclusion Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models.

  12. Analysis of expressed sequence tags from the Ulva prolifera (Chlorophyta)

    Science.gov (United States)

    Niu, Jianfeng; Hu, Haiyan; Hu, Songnian; Wang, Guangce; Peng, Guang; Sun, Song

    2010-01-01

    In 2008, a green tide broke out before the sailing competition of the 29th Olympic Games in Qingdao. The causative species was determined to be Enteromorpha prolifera ( Ulva prolifera O. F. Müller), a familiar green macroalga along the coastline of China. Rapid accumulation of a large biomass of floating U. prolifera prompted research on different aspects of this species. In this study, we constructed a nonnormalized cDNA library from the thalli of U. prolifera and acquired 10 072 high-quality expressed sequence tags (ESTs). These ESTs were assembled into 3 519 nonredundant gene groups, including 1 446 clusters and 2 073 singletons. After annotation with the nr database, a large number of genes were found to be related with chloroplast and ribosomal protein, GO functional classification showed 1 418 ESTs participated in photosynthesis and 1 359 ESTs were responsible for the generation of precursor metabolites and energy. In addition, rather comprehensive carbon fixation pathways were found in U. prolifera using KEGG. Some stress-related and signal transduction-related genes were also found in this study. All the evidences displayed that U. prolifera had substance and energy foundation for the intense photosynthesis and the rapid proliferation. Phylogenetic analysis of cytochrome c oxidase subunit I revealed that this green-tide causative species is most closely affiliated to Pseudendoclonium akinetum (Ulvophyceae).

  13. Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Linh Nguyen

    2016-12-01

    Full Text Available Background: Selected gene mutations are routinely used to guide the selection of cancer drugs for a given patient tumour. Large pharmacogenomic data sets were introduced to discover more of these single-gene markers of drug sensitivity. Very recently, machine learning regression has been used to investigate how well cancer cell line sensitivity to drugs is predicted depending on the type of molecular profile. The latter has revealed that gene expression data is the most predictive profile in the pan-cancer setting. However, no study to date has exploited GDSC data to systematically compare the performance of machine learning models based on multi-gene expression data against that of widely-used single-gene markers based on genomics data. Methods: Here we present this systematic comparison using Random Forest (RF classifiers exploiting the expression levels of 13,321 genes and an average of 501 tested cell lines per drug. To account for time-dependent batch effects in IC50 measurements, we employ independent test sets generated with more recent GDSC data than that used to train the predictors and show that this is a more realistic validation than K-fold cross-validation. Results and Discussion: Across 127 GDSC drugs, our results show that the single-gene markers unveiled by the MANOVA analysis tend to achieve higher precision than these RF-based multi-gene models, at the cost of generally having a poor recall (i.e. correctly detecting only a small part of the cell lines sensitive to the drug. Regarding overall classification performance, about two thirds of the drugs are better predicted by multi-gene RF classifiers. Among the drugs with the most predictive of these models, we found pyrimethamine, sunitinib and 17-AAG. Conclusions: We now know that this type of models can predict in vitro tumour response to these drugs. These models can thus be further investigated on in vivo tumour models.

  14. Event Sequence Analysis of the Air Intelligence Agency Information Operations Center Flight Operations

    National Research Council Canada - National Science Library

    Larsen, Glen

    1998-01-01

    This report applies Event Sequence Analysis, methodology adapted from aircraft mishap investigation, to an investigation of the performance of the Air Intelligence Agency's Information Operations Center (IOC...

  15. Analysis of the Macaca mulatta transcriptome and the sequence divergence between Macaca and human.

    Science.gov (United States)

    Magness, Charles L; Fellin, P Campion; Thomas, Matthew J; Korth, Marcus J; Agy, Michael B; Proll, Sean C; Fitzgibbon, Matthew; Scherer, Christina A; Miner, Douglas G; Katze, Michael G; Iadonato, Shawn P

    2005-01-01

    We report the initial sequencing and comparative analysis of the Macaca mulatta transcriptome. Cloned sequences from 11 tissues, nine animals, and three species (M. mulatta, M. fascicularis, and M. nemestrina) were sampled, resulting in the generation of 48,642 sequence reads. These data represent an initial sampling of the putative rhesus orthologs for 6,216 human genes. Mean nucleotide diversity within M. mulatta and sequence divergence among M. fascicularis, M. nemestrina, and M. mulatta are also reported.

  16. Sequence analysis of mitochondrial 16S ribosomal RNA gene ...

    Indian Academy of Sciences (India)

    Unknown

    For the understanding of their vectorial capacity, identification of disease carrying and refractory strains is essential. ... been widely used for phylogenetic studies and sequence differences in ... In order to fill up the internal gap, a new set.

  17. simple sequence repeat (SSR) markers in genetic analysis of

    African Journals Online (AJOL)

    Yomi

    2012-08-28

    1998). Cross- species amplification of soybean (Glycine max) simple sequence repeats (SSRs) within the genus and other legume genera: implications for the transferability of SSRs in plants. Mol. Biol. Evol. 15:1275-1287.

  18. Sequence and expression analysis of gaps in human chromosome 20

    DEFF Research Database (Denmark)

    Minocherhomji, Sheroy; Seemann, Stefan; Mang, Yuan

    2012-01-01

    /or overlap disease-associated loci, including the DLGAP4 locus. In this study, we sequenced ~99% of all three unfinished gaps on human chr 20, determined their complete genomic sizes and assessed epigenetic profiles using a combination of Sanger sequencing, mate pair paired-end high-throughput sequencing......The finished human genome-assemblies comprise several hundred un-sequenced euchromatic gaps, which may be rich in long polypurine/polypyrimidine stretches. Human chromosome 20 (chr 20) currently has three unfinished gaps remaining on its q-arm. All three gaps are within gene-dense regions and...... and chromatin, methylation and expression analyses. We found histone 3 trimethylated at Lysine 27 to be distributed across all three gaps in immortalized B-lymphocytes. In one gap, five novel CpG islands were predominantly hypermethylated in genomic DNA from peripheral blood lymphocytes and human cerebellum...

  19. DELIMINATE--a fast and efficient method for loss-less compression of genomic sequences: sequence analysis.

    Science.gov (United States)

    Mohammed, Monzoorul Haque; Dutta, Anirban; Bose, Tungadri; Chadaram, Sudha; Mande, Sharmila S

    2012-10-01

    An unprecedented quantity of genome sequence data is currently being generated using next-generation sequencing platforms. This has necessitated the development of novel bioinformatics approaches and algorithms that not only facilitate a meaningful analysis of these data but also aid in efficient compression, storage, retrieval and transmission of huge volumes of the generated data. We present a novel compression algorithm (DELIMINATE) that can rapidly compress genomic sequence data in a loss-less fashion. Validation results indicate relatively higher compression efficiency of DELIMINATE when compared with popular general purpose compression algorithms, namely, gzip, bzip2 and lzma. Linux, Windows and Mac implementations (both 32 and 64-bit) of DELIMINATE are freely available for download at: http://metagenomics.atc.tcs.com/compression/DELIMINATE. sharmila@atc.tcs.com Supplementary data are available at Bioinformatics online.

  20. Analysis of 16S rRNA amplicon sequencing options on the Roche/454 next-generation titanium sequencing platform.

    Directory of Open Access Journals (Sweden)

    Hideyuki Tamaki

    Full Text Available BACKGROUND: 16S rRNA gene pyrosequencing approach has revolutionized studies in microbial ecology. While primer selection and short read length can affect the resulting microbial community profile, little is known about the influence of pyrosequencing methods on the sequencing throughput and the outcome of microbial community analyses. The aim of this study is to compare differences in output, ease, and cost among three different amplicon pyrosequencing methods for the Roche/454 Titanium platform METHODOLOGY/PRINCIPAL FINDINGS: The following three pyrosequencing methods for 16S rRNA genes were selected in this study: Method-1 (standard method is the recommended method for bi-directional sequencing using the LIB-A kit; Method-2 is a new option designed in this study for unidirectional sequencing with the LIB-A kit; and Method-3 uses the LIB-L kit for unidirectional sequencing. In our comparison among these three methods using 10 different environmental samples, Method-2 and Method-3 produced 1.5-1.6 times more useable reads than the standard method (Method-1, after quality-based trimming, and did not compromise the outcome of microbial community analyses. Specifically, Method-3 is the most cost-effective unidirectional amplicon sequencing method as it provided the most reads and required the least effort in consumables management. CONCLUSIONS: Our findings clearly demonstrated that alternative pyrosequencing methods for 16S rRNA genes could drastically affect sequencing output (e.g. number of reads before and after trimming but have little effect on the outcomes of microbial community analysis. This finding is important for both researchers and sequencing facilities utilizing 16S rRNA gene pyrosequencing for microbial ecological studies.

  1. Compilation and analysis of Escherichia coli promoter DNA sequences.

    OpenAIRE

    Hawley, D K; McClure, W R

    1983-01-01

    The DNA sequence of 168 promoter regions (-50 to +10) for Escherichia coli RNA polymerase were compiled. The complete listing was divided into two groups depending upon whether or not the promoter had been defined by genetic (promoter mutations) or biochemical (5' end determination) criteria. A consensus promoter sequence based on homologies among 112 well-defined promoters was determined that was in substantial agreement with previous compilations. In addition, we have tabulated 98 promoter ...

  2. Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.

    Science.gov (United States)

    Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami

    2012-08-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or 15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.

  3. First fungal genome sequence from Africa: A preliminary analysis

    Directory of Open Access Journals (Sweden)

    Rene Sutherland

    2012-01-01

    Full Text Available Some of the most significant breakthroughs in the biological sciences this century will emerge from the development of next generation sequencing technologies. The ease of availability of DNA sequence made possible through these new technologies has given researchers opportunities to study organisms in a manner that was not possible with Sanger sequencing. Scientists will, therefore, need to embrace genomics, as well as develop and nurture the human capacity to sequence genomes and utilise the ’tsunami‘ of data that emerge from genome sequencing. In response to these challenges, we sequenced the genome of Fusarium circinatum, a fungal pathogen of pine that causes pitch canker, a disease of great concern to the South African forestry industry. The sequencing work was conducted in South Africa, making F. circinatum the first eukaryotic organism for which the complete genome has been sequenced locally. Here we report on the process that was followed to sequence, assemble and perform a preliminary characterisation of the genome. Furthermore, details of the computer annotation and manual curation of this genome are presented. The F. circinatum genome was found to be nearly 44 million bases in size, which is similar to that of four other Fusarium genomes that have been sequenced elsewhere. The genome contains just over 15 000 open reading frames, which is less than that of the related species, Fusarium oxysporum, but more than that for Fusarium verticillioides. Amongst the various putative gene clusters identified in F. circinatum, those encoding the secondary metabolites fumosin and fusarin appeared to harbour evidence of gene translocation. It is anticipated that similar comparisons of other loci will provide insights into the genetic basis for pathogenicity of the pitch canker pathogen. Perhaps more importantly, this project has engaged a relatively large group of scientists

  4. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Science.gov (United States)

    Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.

    2009-01-01

    The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722

  5. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Directory of Open Access Journals (Sweden)

    Guy Leonard

    2009-01-01

    Full Text Available The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment fi le, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree fi les (with a user-defined combination of species name and/or database accession number. Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file and generation of species and accession number lists for use in supplementary materials or figure legends.

  6. Sequencing and analysis of the Mediterranean amphioxus (Branchiostoma lanceolatum transcriptome.

    Directory of Open Access Journals (Sweden)

    Silvan Oulion

    Full Text Available BACKGROUND: The basally divergent phylogenetic position of amphioxus (Cephalochordata, as well as its conserved morphology, development and genetics, make it the best proxy for the chordate ancestor. Particularly, studies using the amphioxus model help our understanding of vertebrate evolution and development. Thus, interest for the amphioxus model led to the characterization of both the transcriptome and complete genome sequence of the American species, Branchiostoma floridae. However, recent technical improvements allowing induction of spawning in the laboratory during the breeding season on a daily basis with the Mediterranean species Branchiostoma lanceolatum have encouraged European Evo-Devo researchers to adopt this species as a model even though no genomic or transcriptomic data have been available. To fill this need we used the pyrosequencing method to characterize the B. lanceolatum transcriptome and then compared our results with the published transcriptome of B. floridae. RESULTS: Starting with total RNA from nine different developmental stages of B. lanceolatum, a normalized cDNA library was constructed and sequenced on Roche GS FLX (Titanium mode. Around 1.4 million of reads were produced and assembled into 70,530 contigs (average length of 490 bp. Overall 37% of the assembled sequences were annotated by BlastX and their Gene Ontology terms were determined. These results were then compared to genomic and transcriptomic data of B. floridae to assess similarities and specificities of each species. CONCLUSION: We obtained a high-quality amphioxus (B. lanceolatum reference transcriptome using a high throughput sequencing approach. We found that 83% of the predicted genes in the B. floridae complete genome sequence are also found in the B. lanceolatum transcriptome, while only 41% were found in the B. floridae transcriptome obtained with traditional Sanger based sequencing. Therefore, given the high degree of sequence conservation

  7. Reiterative Recombination for the in vivo assembly of libraries of multigene pathways

    OpenAIRE

    Wingler, Laura M.; Cornish, Virginia W.

    2011-01-01

    The increasing sophistication of synthetic biology is creating a demand for robust, broadly accessible methodology for constructing multigene pathways inside of the cell. Due to the difficulty of rationally designing pathways that function as desired in vivo, there is a further need to assemble libraries of pathways in parallel, in order to facilitate the combinatorial optimization of performance. While some in vitro DNA assembly methods can theoretically make libraries of pathways, these tec...

  8. The conserved clag multigene family of malaria parasites: essential roles in host-pathogen interaction.

    Science.gov (United States)

    Gupta, Ankit; Thiruvengadam, Girija; Desai, Sanjay A

    2015-01-01

    The clag multigene family is strictly conserved in malaria parasites, but absent from neighboring genera of protozoan parasites. Early research pointed to roles in merozoite invasion and infected cell cytoadherence, but more recent studies have implicated channel-mediated uptake of ions and nutrients from host plasma. Here, we review the current understanding of this gene family, which appears to be central to host-parasite interactions and an important therapeutic target. Published by Elsevier Ltd.

  9. Reiterative Recombination for the in vivo assembly of libraries of multigene pathways.

    Science.gov (United States)

    Wingler, Laura M; Cornish, Virginia W

    2011-09-13

    The increasing sophistication of synthetic biology is creating a demand for robust, broadly accessible methodology for constructing multigene pathways inside of the cell. Due to the difficulty of rationally designing pathways that function as desired in vivo, there is a further need to assemble libraries of pathways in parallel, in order to facilitate the combinatorial optimization of performance. While some in vitro DNA assembly methods can theoretically make libraries of pathways, these techniques are resource intensive and inherently require additional techniques to move the DNA back into cells. All previously reported in vivo assembly techniques have been low yielding, generating only tens to hundreds of constructs at a time. Here, we develop "Reiterative Recombination," a robust method for building multigene pathways directly in the yeast chromosome. Due to its use of endonuclease-induced homologous recombination in conjunction with recyclable markers, Reiterative Recombination provides a highly efficient, technically simple strategy for sequentially assembling an indefinite number of DNA constructs at a defined locus. In this work, we describe the design and construction of the first Reiterative Recombination system in Saccharomyces cerevisiae, and we show that it can be used to assemble multigene constructs. We further demonstrate that Reiterative Recombination can construct large mock libraries of at least 10(4) biosynthetic pathways. We anticipate that our system's simplicity and high efficiency will make it a broadly accessible technology for pathway construction and render it a valuable tool for optimizing pathways in vivo.

  10. Genome sequencing of chimpanzee malaria parasites reveals possible pathways of adaptation to human hosts

    KAUST Repository

    Otto, Thomas D.

    2014-09-09

    Plasmodium falciparum causes most human malaria deaths, having prehistorically evolved from parasites of African Great Apes. Here we explore the genomic basis of P. falciparum adaptation to human hosts by fully sequencing the genome of the closely related chimpanzee parasite species P. reichenowi, and obtaining partial sequence data from a more distantly related chimpanzee parasite (P. gaboni). The close relationship between P. reichenowi and P. falciparum is emphasized by almost complete conservation of genomic synteny, but against this strikingly conserved background we observe major differences at loci involved in erythrocyte invasion. The organization of most virulence-associated multigene families, including the hypervariable var genes, is broadly conserved, but P. falciparum has a smaller subset of rif and stevor genes whose products are expressed on the infected erythrocyte surface. Genome-wide analysis identifies other loci under recent positive selection, but a limited number of changes at the host–parasite interface may have mediated host switching.

  11. Analysis of expressed sequence tags from Prunus mume flower and fruit and development of simple sequence repeat markers

    Directory of Open Access Journals (Sweden)

    Gao Zhihong

    2010-07-01

    Full Text Available Abstract Background Expressed Sequence Tag (EST has been a cost-effective tool in molecular biology and represents an abundant valuable resource for genome annotation, gene expression, and comparative genomics in plants. Results In this study, we constructed a cDNA library of Prunus mume flower and fruit, sequenced 10,123 clones of the library, and obtained 8,656 expressed sequence tag (EST sequences with high quality. The ESTs were assembled into 4,473 unigenes composed of 1,492 contigs and 2,981 singletons and that have been deposited in NCBI (accession IDs: GW868575 - GW873047, among which 1,294 unique ESTs were with known or putative functions. Furthermore, we found 1,233 putative simple sequence repeats (SSRs in the P. mume unigene dataset. We randomly tested 42 pairs of PCR primers flanking potential SSRs, and 14 pairs were identified as true-to-type SSR loci and could amplify polymorphic bands from 20 individual plants of P. mume. We further used the 14 EST-SSR primer pairs to test the transferability on peach and plum. The result showed that nearly 89% of the primer pairs produced target PCR bands in the two species. A high level of marker polymorphism was observed in the plum species (65% and low in the peach (46%, and the clustering analysis of the three species indicated that these SSR markers were useful in the evaluation of genetic relationships and diversity between and within the Prunus species. Conclusions We have constructed the first cDNA library of P. mume flower and fruit, and our data provide sets of molecular biology resources for P. mume and other Prunus species. These resources will be useful for further study such as genome annotation, new gene discovery, gene functional analysis, molecular breeding, evolution and comparative genomics between Prunus species.

  12. Accident Sequence Precursor Analysis for SGTR by Using Dynamic PSA Approach

    International Nuclear Information System (INIS)

    Lee, Han Sul; Heo, Gyun Young; Kim, Tae Wan

    2016-01-01

    In order to address this issue, this study suggests the sequence tree model to analyze accident sequence systematically. Using the sequence tree model, all possible scenarios which need a specific safety action to prevent the core damage can be identified and success conditions of safety action under complicated situation such as combined accident will be also identified. Sequence tree is branch model to divide plant condition considering the plant dynamics. Since sequence tree model can reflect the plant dynamics, arising from interaction of different accident timing and plant condition and from the interaction between the operator action, mitigation system, and the indicators for operation, sequence tree model can be used to develop the dynamic event tree model easily. Target safety action for this study is a feed-and-bleed (F and B) operation. A F and B operation directly cools down the reactor cooling system (RCS) using the primary cooling system when residual heat removal by the secondary cooling system is not available. In this study, a TLOFW accident and a TLOFW accident with LOCA were the target accidents. Based on the conventional PSA model and indicators, the sequence tree model for a TLOFW accident was developed. Based on the results of a sampling analysis and data from the conventional PSA model, the CDF caused by Sequence no. 26 can be realistically estimated. For a TLOFW accident with LOCA, second accident timings were categorized according to plant condition. Indicators were selected as branch point using the flow chart and tables, and a corresponding sequence tree model was developed. If sampling analysis is performed, practical accident sequences can be identified based on the sequence analysis. If a realistic distribution for the variables can be obtained for sampling analysis, much more realistic accident sequences can be described. Moreover, if the initiating event frequency under a combined accident can be quantified, the sequence tree model

  13. Sequencing and phylogenetic analysis of Herpes simplex virus type ...

    African Journals Online (AJOL)

    For determination of the genetic relationship of HSV-2 glycoprotein G gene (gG) in Iran with those in other countries, DNA fragment of 1100 bp corresponding to gG from six HSV-2 strains have been isolated from human infected sera samples in Iran, it was amplified in PCR system and was sequenced for determining ...

  14. Transcriptome analysis of blueberry using 454 EST sequencing

    Science.gov (United States)

    Blueberry (Vaccinium corymbosum) is a major berry crop in the United States, and one that has great nutritional and economical value. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities du...

  15. Characterization and sequence analysis of cysteine and glycine-rich ...

    African Journals Online (AJOL)

    Tarek

    2011-04-18

    Apr 18, 2011 ... nucleotide alignment of both native buffalo and cattle CSRP3 cDNAs sequences ..... Exon III, Identities = 71/75 (94%), Gaps = 1/75 (1%) Strand=Plus/Plus ... Band MR, Larson JH, Rebeiz M, Green CA, Heyen DW, Donovan J,.

  16. Functional analysis of bipartite begomovirus coat protein promoter sequences

    International Nuclear Information System (INIS)

    Lacatus, Gabriela; Sunter, Garry

    2008-01-01

    We demonstrate that the AL2 gene of Cabbage leaf curl virus (CaLCuV) activates the CP promoter in mesophyll and acts to derepress the promoter in vascular tissue, similar to that observed for Tomato golden mosaic virus (TGMV). Binding studies indicate that sequences mediating repression and activation of the TGMV and CaLCuV CP promoter specifically bind different nuclear factors common to Nicotiana benthamiana, spinach and tomato. However, chromatin immunoprecipitation demonstrates that TGMV AL2 can interact with both sequences independently. Binding of nuclear protein(s) from different crop species to viral sequences conserved in both bipartite and monopartite begomoviruses, including TGMV, CaLCuV, Pepper golden mosaic virus and Tomato yellow leaf curl virus suggests that bipartite begomoviruses bind common host factors to regulate the CP promoter. This is consistent with a model in which AL2 interacts with different components of the cellular transcription machinery that bind viral sequences important for repression and activation of begomovirus CP promoters

  17. The DNA sequence, annotation and analysis of human chromosome 3

    DEFF Research Database (Denmark)

    Muzny, D.M.; Bolund, Lars; As part of the Chinese Human Genome Sequencing Consortium, E.T.A.L.

    2006-01-01

    as numerous loci involved in multiple human cancers such as the gene encoding FHIT, which contains the most common constitutive fragile site in the genome, FRA3B. Using genomic sequence from chimpanzee and rhesus macaque, we were able to characterize the breakpoints defining a large pericentric inversion...

  18. Sequence analysis of mitochondrial 16S ribosomal RNA gene

    Indian Academy of Sciences (India)

    Mosquitoes are vectors for the transmission of many human pathogens that include viruses, nematodes and protozoa. For the understanding of their vectorial capacity, identification of disease carrying and refractory strains is essential. Recently, molecular taxonomic techniques have been utilized for this purpose. Sequence ...

  19. Illumina-based de novo transcriptome sequencing and analysis

    Indian Academy of Sciences (India)

    In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI nonredundant ...

  20. Generation and analysis of expressed sequence tags from Botrytis cinerea

    Directory of Open Access Journals (Sweden)

    EVELYN SILVA

    2006-01-01

    Full Text Available Botrytis cinerea is a filamentous plant pathogen of a wide range of plant species, and its infection may cause enormous damage both during plant growth and in the post-harvest phase. We have constructed a cDNA library from an isolate of B. cinerea and have sequenced 11,482 expressed sequence tags that were assembled into 1,003 contigs sequences and 3,032 singletons. Approximately 81% of the unigenes showed significant similarity to genes coding for proteins with known functions: more than 50% of the sequences code for genes involved in cellular metabolism, 12% for transport of metabolites, and approximately 10% for cellular organization. Other functional categories include responses to biotic and abiotic stimuli, cell communication, cell homeostasis, and cell development. We carried out pair-wise comparisons with fungal databases to determine the B. cinerea unisequence set with relevant similarity to genes in other fungal pathogenic counterparts. Among the 4,035 non-redundant B. cinerea unigenes, 1,338 (23% have significant homology with Fusarium verticillioides unigenes. Similar values were obtained for Saccharomyces cerevisiae and Aspergillus nidulans (22% and 24%, respectively. The lower percentages of homology were with Magnaporthe grisae and Neurospora crassa (13% and 19%, respectively. Several genes involved in putative and known fungal virulence and general pathogenicity were identified. The results provide important information for future research on this fungal pathogen

  1. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  2. DNA sequence and prokaryotic expression analysis of vitellogenin ...

    African Journals Online (AJOL)

    In this study, the DNA sequence of vitellogenin from Antheraea pernyi (Ap-Vg) was identified and its functional domain (30-740 aa, Ap-Vg-1) was expressed in Escherichia coli BL21 (DE3) cells. The recombinant Ap-Vg-1 proteins were purified and used for antibody preparation. The results showed that the intact DNA ...

  3. Molecular cloning, sequence analysis and structure prediction of the ...

    African Journals Online (AJOL)

    AJL

    2012-04-19

    Apr 19, 2012 ... The primers were based on the rBAT sequences of other animals deposited in GenBank. .... fragment; M1, 2000 bp DNA ladder; M2, 1000 bp DNA ladder. spliced to obtain the ..... A traffic signal for heterodimeric amino acid.

  4. A bibliometric analysis of global research on genome sequencing ...

    African Journals Online (AJOL)

    The results show that disease and protein related researches were the leading research focuses, and comparative genomics and evolution related research had strong potential in the near future. Key words: Genome sequencing, research trend, scientometrics, science citation index expanded (SCI-Expanded), word cluster ...

  5. Cloning and sequence analysis of the defective in anther ...

    African Journals Online (AJOL)

    To clone the defective in anther dehiscence1 (DAD1) gene fragment of Chinese kale, about 700 bp product was obtained by PCR amplification using Chinese kale genomic DNA as the template and a pair of specific primers designed according to the conserved sequence of DAD1 genes of Arabidopsis thaliana and ...

  6. Sequence and comparative analysis of Leuconostoc dairy bacteriophages

    DEFF Research Database (Denmark)

    Kot, Witold; Hansen, Lars Henrik; Neve, Horst

    2014-01-01

    Bacteriophages attacking Leuconostoc species may significantly influence the quality of the final product. There is however limited knowledge of this group of phages in the literature. We have determined the complete genome sequences of nine Leuconostoc bacteriophages virulent to either Leuconostoc...

  7. XplorSeq: a software environment for integrated management and phylogenetic analysis of metagenomic sequence data.

    Science.gov (United States)

    Frank, Daniel N

    2008-10-07

    Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects. XplorSeq is a software package that facilitates the compilation, management and phylogenetic analysis of DNA sequences. XplorSeq was developed for, but is not limited to, high-throughput analysis of environmental rRNA gene sequences. XplorSeq integrates and extends several commonly used UNIX-based analysis tools by use of a Macintosh OS-X-based graphical user interface (GUI). Through this GUI, users may perform basic sequence import and assembly steps (base-calling, vector/primer trimming, contig assembly), perform BLAST (Basic Local Alignment and Search Tool; 123) searches of NCBI and local databases, create multiple sequence alignments, build phylogenetic trees, assemble Operational Taxonomic Units, estimate biodiversity indices, and summarize data in a variety of formats. Furthermore, sequences may be annotated with user-specified meta-data, which then can be used to sort data and organize analyses and reports. A document-based architecture permits parallel analysis of sequence data from multiple clones or amplicons, with sequences and other data stored in a single file. XplorSeq should benefit researchers who are engaged in analyses of environmental sequence data, especially those with little experience using bioinformatics software. Although XplorSeq was developed for management of rDNA sequence data, it can be applied to most any sequencing project. The application is available free of charge for non-commercial use at http://vent.colorado.edu/phyloware.

  8. Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools.

    Science.gov (United States)

    Kisand, Veljo; Lettieri, Teresa

    2013-04-01

    De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize

  9. Taxonomic evaluation of selected Ganoderma species and database sequence validation

    Directory of Open Access Journals (Sweden)

    Suldbold Jargalmaa

    2017-07-01

    Full Text Available Species in the genus Ganoderma include several ecologically important and pathogenic fungal species whose medicinal and economic value is substantial. Due to the highly similar morphological features within the Ganoderma, identification of species has relied heavily on DNA sequencing using BLAST searches, which are only reliable if the GenBank submissions are accurately labeled. In this study, we examined 113 specimens collected from 1969 to 2016 from various regions in Korea using morphological features and multigene analysis (internal transcribed spacer, translation elongation factor 1-α, and the second largest subunit of RNA polymerase II. These specimens were identified as four Ganoderma species: G. sichuanense, G. cf. adspersum, G. cf. applanatum, and G. cf. gibbosum. With the exception of G. sichuanense, these species were difficult to distinguish based solely on morphological features. However, phylogenetic analysis at three different loci yielded concordant phylogenetic information, and supported the four species distinctions with high bootstrap support. A survey of over 600 Ganoderma sequences available on GenBank revealed that 65% of sequences were either misidentified or ambiguously labeled. Here, we suggest corrected annotations for GenBank sequences based on our phylogenetic validation and provide updated global distribution patterns for these Ganoderma species.

  10. Taxonomic evaluation of selected Ganoderma species and database sequence validation

    Science.gov (United States)

    Jargalmaa, Suldbold; Eimes, John A.; Park, Myung Soo; Park, Jae Young; Oh, Seung-Yoon

    2017-01-01

    Species in the genus Ganoderma include several ecologically important and pathogenic fungal species whose medicinal and economic value is substantial. Due to the highly similar morphological features within the Ganoderma, identification of species has relied heavily on DNA sequencing using BLAST searches, which are only reliable if the GenBank submissions are accurately labeled. In this study, we examined 113 specimens collected from 1969 to 2016 from various regions in Korea using morphological features and multigene analysis (internal transcribed spacer, translation elongation factor 1-α, and the second largest subunit of RNA polymerase II). These specimens were identified as four Ganoderma species: G. sichuanense, G. cf. adspersum, G. cf. applanatum, and G. cf. gibbosum. With the exception of G. sichuanense, these species were difficult to distinguish based solely on morphological features. However, phylogenetic analysis at three different loci yielded concordant phylogenetic information, and supported the four species distinctions with high bootstrap support. A survey of over 600 Ganoderma sequences available on GenBank revealed that 65% of sequences were either misidentified or ambiguously labeled. Here, we suggest corrected annotations for GenBank sequences based on our phylogenetic validation and provide updated global distribution patterns for these Ganoderma species. PMID:28761785

  11. Sequence analysis of the Legionella micdadei groELS operon

    DEFF Research Database (Denmark)

    Hindersson, P; Høiby, N; Bangsborg, Jette Marie

    1991-01-01

    A 2.7 kb DNA fragment encoding the 60 kDa common antigen (CA) and a 13 kDa protein of Legionella micdadei was sequenced. Two open reading frames of 57,677 and 10,456 Da were identified, corresponding to the heat shock proteins GroEL and GroES, respectively. Typical -35, -10, and Shine-Dalgarno heat...

  12. The Matrix Method of Representation, Analysis and Classification of Long Genetic Sequences

    Directory of Open Access Journals (Sweden)

    Ivan V. Stepanyan

    2017-01-01

    Full Text Available The article is devoted to a matrix method of comparative analysis of long nucleotide sequences by means of presenting each sequence in the form of three digital binary sequences. This method uses a set of symmetries of biochemical attributes of nucleotides. It also uses the possibility of presentation of every whole set of N-mers as one of the members of a Kronecker family of genetic matrices. With this method, a long nucleotide sequence can be visually represented as an individual fractal-like mosaic or another regular mosaic of binary type. In contrast to natural nucleotide sequences, artificial random sequences give non-regular patterns. Examples of binary mosaics of long nucleotide sequences are shown, including cases of human chromosomes and penicillins. The obtained results are then discussed.

  13. OPTSDNA: Performance evaluation of an efficient distributed bioinformatics system for DNA sequence analysis.

    Science.gov (United States)

    Khan, Mohammad Ibrahim; Sheel, Chotan

    2013-01-01

    Storage of sequence data is a big concern as the amount of data generated is exponential in nature at several locations. Therefore, there is a need to develop techniques to store data using compression algorithm. Here we describe optimal storage algorithm (OPTSDNA) for storing large amount of DNA sequences of varying length. This paper provides performance analysis of optimal storage algorithm (OPTSDNA) of a distributed bioinformatics computing system for analysis of DNA sequences. OPTSDNA algorithm is used for storing various sizes of DNA sequences into database. DNA sequences of different lengths were stored by using this algorithm. These input DNA sequences are varied in size from very small to very large. Storage size is calculated by this algorithm. Response time is also calculated in this work. The efficiency and performance of the algorithm is high (in size calculation with percentage) when compared with other known with sequential approach.

  14. Analysis of xylem formation in pine by cDNA sequencing

    Science.gov (United States)

    Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.; hide

    1998-01-01

    Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.

  15. MiSeq: A Next Generation Sequencing Platform for Genomic Analysis.

    Science.gov (United States)

    Ravi, Rupesh Kanchi; Walton, Kendra; Khosroheidari, Mahdieh

    2018-01-01

    MiSeq, Illumina's integrated next generation sequencing instrument, uses reversible-terminator sequencing-by-synthesis technology to provide end-to-end sequencing solutions. The MiSeq instrument is one of the smallest benchtop sequencers that can perform onboard cluster generation, amplification, genomic DNA sequencing, and data analysis, including base calling, alignment and variant calling, in a single run. It performs both single- and paired-end runs with adjustable read lengths from 1 × 36 base pairs to 2 × 300 base pairs. A single run can produce output data of up to 15 Gb in as little as 4 h of runtime and can output up to 25 M single reads and 50 M paired-end reads. Thus, MiSeq provides an ideal platform for rapid turnaround time. MiSeq is also a cost-effective tool for various analyses focused on targeted gene sequencing (amplicon sequencing and target enrichment), metagenomics, and gene expression studies. For these reasons, MiSeq has become one of the most widely used next generation sequencing platforms. Here, we provide a protocol to prepare libraries for sequencing using the MiSeq instrument and basic guidelines for analysis of output data from the MiSeq sequencing run.

  16. Maturity onset diabetes of youth (MODY) in Turkish children: sequence analysis of 11 causative genes by next generation sequencing.

    Science.gov (United States)

    Ağladıoğlu, Sebahat Yılmaz; Aycan, Zehra; Çetinkaya, Semra; Baş, Veysel Nijat; Önder, Aşan; Peltek Kendirci, Havva Nur; Doğan, Haldun; Ceylaner, Serdar

    2016-04-01

    Maturity-onset diabetes of the youth (MODY), is a genetically and clinically heterogeneous group of diseasesand is often misdiagnosed as type 1 or type 2 diabetes. The aim of this study is to investigate both novel and proven mutations of 11 MODY genes in Turkish children by using targeted next generation sequencing. A panel of 11 MODY genes were screened in 43 children with MODY diagnosed by clinical criterias. Studies of index cases was done with MISEQ-ILLUMINA, and family screenings and confirmation studies of mutations was done by Sanger sequencing. We identified 28 (65%) point mutations among 43 patients. Eighteen patients have GCK mutations, four have HNF1A, one has HNF4A, one has HNF1B, two have NEUROD1, one has PDX1 gene variations and one patient has both HNF1A and HNF4A heterozygote mutations. This is the first study including molecular studies of 11 MODY genes in Turkish children. GCK is the most frequent type of MODY in our study population. Very high frequency of novel mutations (42%) in our study population, supports that in heterogenous disorders like MODY sequence analysis provides rapid, cost effective and accurate genetic diagnosis.

  17. Whole genome sequencing and bioinformatics analysis of two Egyptian genomes.

    Science.gov (United States)

    ElHefnawi, Mahmoud; Jeon, Sungwon; Bhak, Youngjune; ElFiky, Asmaa; Horaiz, Ahmed; Jun, JeHoon; Kim, Hyunho; Bhak, Jong

    2018-05-15

    We report two Egyptian male genomes (EGP1 and EGP2) sequenced at ~ 30× sequencing depths. EGP1 had 4.7 million variants, where 198,877 were novel variants while EGP2 had 209,109 novel variants out of 4.8 million variants. The mitochondrial haplogroup of the two individuals were identified to be H7b1 and L2a1c, respectively. We also identified the Y haplogroup of EGP1 (R1b) and EGP2 (J1a2a1a2 > P58 > FGC11). EGP1 had a mutation in the NADH gene of the mitochondrial genome ND4 (m.11778 G > A) that causes Leber's hereditary optic neuropathy. Some SNPs shared by the two genomes were associated with an increased level of cholesterol and triglycerides, probably related with Egyptians obesity. Comparison of these genomes with African and Western-Asian genomes can provide insights on Egyptian ancestry and genetic history. This resource can be used to further understand genomic diversity and functional classification of variants as well as human migration and evolution across Africa and Western-Asia. Copyright © 2017. Published by Elsevier B.V.

  18. Accident sequence precursor analysis level 2/3 model development

    International Nuclear Information System (INIS)

    Lui, C.H.; Galyean, W.J.; Brownson, D.A.

    1997-01-01

    The US Nuclear Regulatory Commission's Accident Sequence Precursor (ASP) program currently uses simple Level 1 models to assess the conditional core damage probability for operational events occurring in commercial nuclear power plants (NPP). Since not all accident sequences leading to core damage will result in the same radiological consequences, it is necessary to develop simple Level 2/3 models that can be used to analyze the response of the NPP containment structure in the context of a core damage accident, estimate the magnitude of the resulting radioactive releases to the environment, and calculate the consequences associated with these releases. The simple Level 2/3 model development work was initiated in 1995, and several prototype models have been completed. Once developed, these simple Level 2/3 models are linked to the simple Level 1 models to provide risk perspectives for operational events. This paper describes the methods implemented for the development of these simple Level 2/3 ASP models, and the linkage process to the existing Level 1 models

  19. In Vivo Enhancer Analysis Chromosome 16 Conserved NoncodingSequences

    Energy Technology Data Exchange (ETDEWEB)

    Pennacchio, Len A.; Ahituv, Nadav; Moses, Alan M.; Nobrega,Marcelo; Prabhakar, Shyam; Shoukry, Malak; Minovitsky, Simon; Visel,Axel; Dubchak, Inna; Holt, Amy; Lewis, Keith D.; Plajzer-Frick, Ingrid; Akiyama, Jennifer; De Val, Sarah; Afzal, Veena; Black, Brian L.; Couronne, Olivier; Eisen, Michael B.; Rubin, Edward M.

    2006-02-01

    The identification of enhancers with predicted specificitiesin vertebrate genomes remains a significant challenge that is hampered bya lack of experimentally validated training sets. In this study, weleveraged extreme evolutionary sequence conservation as a filter toidentify putative gene regulatory elements and characterized the in vivoenhancer activity of human-fish conserved and ultraconserved1 noncodingelements on human chromosome 16 as well as such elements from elsewherein the genome. We initially tested 165 of these extremely conservedsequences in a transgenic mouse enhancer assay and observed that 48percent (79/165) functioned reproducibly as tissue-specific enhancers ofgene expression at embryonic day 11.5. While driving expression in abroad range of anatomical structures in the embryo, the majority of the79 enhancers drove expression in various regions of the developingnervous system. Studying a set of DNA elements that specifically droveforebrain expression, we identified DNA signatures specifically enrichedin these elements and used these parameters to rank all ~;3,400human-fugu conserved noncoding elements in the human genome. The testingof the top predictions in transgenic mice resulted in a three-foldenrichment for sequences with forebrain enhancer activity. These datadramatically expand the catalogue of in vivo-characterized human geneenhancers and illustrate the future utility of such training sets for avariety of iological applications including decoding the regulatoryvocabulary of the human genome.

  20. Sequence analysis of putative swrW gene required for surfactant ...

    African Journals Online (AJOL)

    Serratia marcescens produces biosurfactant serrawettin, essential for its population migration behavior. Serrawettin W1 was revealed to be an antibiotic serratamolide that makes it significant for deoxyribonucleic acid (DNA) and protein sequence analysis. Four nucleotide and amino-acid sequences from local strains ...

  1. Genomic insight into the common carp (Cyprinus carpio genome by sequencing analysis of BAC-end sequences

    Directory of Open Access Journals (Sweden)

    Wang Jintu

    2011-04-01

    Full Text Available Abstract Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio, a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3

  2. Genomic insight into the common carp (Cyprinus carpio) genome by sequencing analysis of BAC-end sequences

    Science.gov (United States)

    2011-01-01

    Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of

  3. A symbolic dynamics approach for the complexity analysis of chaotic pseudo-random sequences

    International Nuclear Information System (INIS)

    Xiao Fanghong

    2004-01-01

    By considering a chaotic pseudo-random sequence as a symbolic sequence, authors present a symbolic dynamics approach for the complexity analysis of chaotic pseudo-random sequences. The method is applied to the cases of Logistic map and one-way coupled map lattice to demonstrate how it works, and a comparison is made between it and the approximate entropy method. The results show that this method is applicable to distinguish the complexities of different chaotic pseudo-random sequences, and it is superior to the approximate entropy method

  4. The sequence and analysis of duplication rich human chromosome 16

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-08-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  5. Analysis of decision procedures for a sequence of inventory periods

    International Nuclear Information System (INIS)

    Avenhaus, R.

    1982-07-01

    Optimal test procedures for a sequence of inventory periods will be discussed. Starting with a game theoretical description of the conflict situation between the plant operator and the inspector, the objectives of the inspector as well as the general decision theoretical problem will be formulated. In the first part the objective of 'secure' detection will be emphasized which means that only at the end of the reference time a decision is taken by the inspector. In the second part the objective of 'timely' detection will be emphasized which will lead to sequential test procedures. At the end of the paper all procedures will be summarized, and in view of the multitude of procedures available at the moment some comments about future work will be given. (orig./HP) [de

  6. The Sequence and Analysis of Duplication Rich Human Chromosome 16

    Science.gov (United States)

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-01-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  7. Factoring local sequence composition in motif significance analysis.

    Science.gov (United States)

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  8. Peptide Pattern Recognition for high-throughput protein sequence analysis and clustering

    DEFF Research Database (Denmark)

    Busk, Peter Kamp

    2017-01-01

    Large collections of protein sequences with divergent sequences are tedious to analyze for understanding their phylogenetic or structure-function relation. Peptide Pattern Recognition is an algorithm that was developed to facilitate this task but the previous version does only allow a limited...... number of sequences as input. I implemented Peptide Pattern Recognition as a multithread software designed to handle large numbers of sequences and perform analysis in a reasonable time frame. Benchmarking showed that the new implementation of Peptide Pattern Recognition is twenty times faster than...... the previous implementation on a small protein collection with 673 MAP kinase sequences. In addition, the new implementation could analyze a large protein collection with 48,570 Glycosyl Transferase family 20 sequences without reaching its upper limit on a desktop computer. Peptide Pattern Recognition...

  9. Clinical application of multigene panels: challenges of next generation counseling and cancer risk management

    Directory of Open Access Journals (Sweden)

    Thomas Paul Slavin

    2015-09-01

    Full Text Available Background: Multigene panels can be a cost- and time-effective alternative to sequentially testing multiple genes, especially with a mixed family cancer phenotype. However, moving beyond our single-gene testing paradigm has unveiled many new challenges to the clinician. The purpose of this article is to familiarize the reader with some of the challenges, as well as potential opportunities, of expanded hereditary cancer panel testing. Methods: We include results from 348 commercial multigene panel tests ordered from January 1, 2014, through October 1, 2014, by clinicians associated with the City of Hope’s Clinical Cancer Genetics Community of Practice. We also discuss specific challenging cases that arose during this period involving abnormalities in the genes: CDH1, TP53, PMS2, PALB2, CHEK2, NBN, and RAD51C. Results: If historically high risk genes only were included in the panels (BRCA1, BRCA2, MSH6, PMS2, TP53, APC, CDH1, the results would have been positive only 6.2% of the time, instead of 17%. Results returned with variants of uncertain significance (VUS 42% of the time. Conclusion: These figures and cases stress the importance of adequate pretest counseling in anticipation of higher percentages of positive, VUS, unexpected, and ambiguous test results. Test result ambiguity can be limited by the use of phenotype specific panels; if found, multiple resources (the literature, reference laboratory, colleagues, national experts, and research efforts can be accessed to better clarify counseling and management for the patient and family. For pathogenic variants in low and moderate risk genes, empiric risk modeling based on the patient’s personal and family history of cancer may supersede gene-specific risk. Commercial laboratory and patient contributions to public databases and research efforts will be needed to better classify variants and reduce clinical ambiguity of multigene panels.

  10. Information-Theoretical Analysis of EEG Microstate Sequences in Python

    Directory of Open Access Journals (Sweden)

    Frederic von Wegner

    2018-06-01

    Full Text Available We present an open-source Python package to compute information-theoretical quantities for electroencephalographic data. Electroencephalography (EEG measures the electrical potential generated by the cerebral cortex and the set of spatial patterns projected by the brain's electrical potential on the scalp surface can be clustered into a set of representative maps called EEG microstates. Microstate time series are obtained by competitively fitting the microstate maps back into the EEG data set, i.e., by substituting the EEG data at a given time with the label of the microstate that has the highest similarity with the actual EEG topography. As microstate sequences consist of non-metric random variables, e.g., the letters A–D, we recently introduced information-theoretical measures to quantify these time series. In wakeful resting state EEG recordings, we found new characteristics of microstate sequences such as periodicities related to EEG frequency bands. The algorithms used are here provided as an open-source package and their use is explained in a tutorial style. The package is self-contained and the programming style is procedural, focusing on code intelligibility and easy portability. Using a sample EEG file, we demonstrate how to perform EEG microstate segmentation using the modified K-means approach, and how to compute and visualize the recently introduced information-theoretical tests and quantities. The time-lagged mutual information function is derived as a discrete symbolic alternative to the autocorrelation function for metric time series and confidence intervals are computed from Markov chain surrogate data. The software package provides an open-source extension to the existing implementations of the microstate transform and is specifically designed to analyze resting state EEG recordings.

  11. Massively parallel sequencing and analysis of the Necator americanus transcriptome.

    Directory of Open Access Journals (Sweden)

    Cinzia Cantacessi

    2010-05-01

    Full Text Available The blood-feeding hookworm Necator americanus infects hundreds of millions of people worldwide. In order to elucidate fundamental molecular biological aspects of this hookworm, the transcriptome of the adult stage of Necator americanus was explored using next-generation sequencing and bioinformatic analyses.A total of 19,997 contigs were assembled from the sequence data; 6,771 of these contigs had known orthologues in the free-living nematode Caenorhabditis elegans, and most of them encoded proteins with WD40 repeats (10.6%, proteinase inhibitors (7.8% or calcium-binding EF-hand proteins (6.7%. Bioinformatic analyses inferred that the C. elegans homologues are involved mainly in biological pathways linked to ribosome biogenesis (70%, oxidative phosphorylation (63% and/or proteases (60%; most of these molecules were predicted to be involved in more than one biological pathway. Comparative analyses of the transcriptomes of N. americanus and the canine hookworm, Ancylostoma caninum, revealed qualitative and quantitative differences. For instance, proteinase inhibitors were inferred to be highly represented in the former species, whereas SCP/Tpx-1/Ag5/PR-1/Sc7 proteins ( = SCP/TAPS or Ancylostoma-secreted proteins were predominant in the latter. In N. americanus, essential molecules were predicted using a combination of orthology mapping and functional data available for C. elegans. Further analyses allowed the prioritization of 18 predicted drug targets which did not have homologues in the human host. These candidate targets were inferred to be linked to mitochondrial (e.g., processing proteins or amino acid metabolism (e.g., asparagine t-RNA synthetase.This study has provided detailed insights into the transcriptome of the adult stage of N. americanus and examines similarities and differences between this species and A. caninum. Future efforts should focus on comparative transcriptomic and proteomic investigations of the other predominant human

  12. Combined DECS Analysis and Next-Generation Sequencing Enable Efficient Detection of Novel Plant RNA Viruses

    Directory of Open Access Journals (Sweden)

    Hironobu Yanagisawa

    2016-03-01

    Full Text Available The presence of high molecular weight double-stranded RNA (dsRNA within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV, a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as “DECS-C,” is a powerful method for detecting novel plant viruses.

  13. Data Analysis of Sequences and qPCR for Microbial Communities during Algal Blooms

    Science.gov (United States)

    A training opportunity is open to a highly microbial-research-motivated student to conduct sequence analysis, explore novel genes and metabolic pathways, validate resultant findings using qPCR/RT-qPCR and summarize the findings

  14. Sequence analysis of the N-acetyltransferase 2 gene (NAT2) among ...

    African Journals Online (AJOL)

    Yazun Bashir Jarrar

    2017-11-26

    Nov 26, 2017 ... Sequence analysis of the N-acetyltransferase 2 gene (NAT2) among Jordanian volunteers, Libyan. Journal of Medicine .... For molecular modeling of NAT2 protein, visualized ..... cal clustering. .... cular dynamics simulation.

  15. Analysis of common SHOX gene sequence variants and ∼4.9-kb ...

    Indian Academy of Sciences (India)

    [Solc R., Hirschfeldova K., Kebrdlova V. and Baxova A. 2014 Analysis of common SHOX gene sequence variants ... based on a Gibbs sampling strategy were done using .... SHOX (short stature homeobox) are an important cause of growth.

  16. Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

    Science.gov (United States)

    Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

    2004-03-01

    One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.

  17. Probabilistic topic modeling for the analysis and classification of genomic sequences

    Science.gov (United States)

    2015-01-01

    Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734

  18. Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer

    Science.gov (United States)

    2017-09-01

    AWARD NUMBER: W81XWH-14-1-0080 TITLE: Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer . PRINCIPAL INVESTIGATOR...TITLE AND SUBTITLE Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer . 5a. CONTRACT NUMBER 5b. GRANT NUMBER GRANT11489...institutional, NIH-funded study of genetic and epigenetic alterations of pre-invasive DCIS that did or did not progress to invasive breast cancer , with an

  19. Seismically induced accident sequence analysis of the advanced test reactor

    International Nuclear Information System (INIS)

    Khericha, S.T.; Henry, D.M.; Ravindra, M.K.; Hashimoto, P.S.; Griffin, M.J.; Tong, W.H.; Nafday, A.M.

    1991-01-01

    A seismic probabilistic risk assessment (PRA) was performed for the Department of Energy (DOE) Advanced Test Reactor (ATR) as part of the external events analysis. The risk from seismic events to the fuel in the core and in the fuel storage canal was evaluated. The key elements of this paper are the integration of seismically induced internal flood and internal fire, and the modeling of human error rates as a function of the magnitude of earthquake. The systems analysis was performed by EG ampersand G Idaho, Inc. and the fragility analysis and quantification were performed by EQE International, Inc. (EQE)

  20. Recent advances in nanopore-based nucleic acid analysis and sequencing

    International Nuclear Information System (INIS)

    Shi, Jidong; Fang, Ying; Hou, Junfeng

    2016-01-01

    Nanopore-based sequencing platforms are transforming the field of genomic science. This review (containing 116 references) highlights some recent progress on nanopore-based nucleic acid analysis and sequencing. These studies are classified into three categories, biological, solid-state, and hybrid nanopores, according to their nanoporous materials. We begin with a brief description of the translocation-based detection mechanism of nanopores. Next, specific examples are given in nanopore-based nucleic acid analysis and sequencing, with an emphasis on identifying strategies that can improve the resolution of nanopores. This review concludes with a discussion of future research directions that will advance the practical applications of nanopore technology. (author)

  1. Microscopic Analysis and Modeling of Airport Surface Sequencing, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — The complexity and interdependence of operations on the airport surface motivate the need for a comprehensive and detailed, yet flexible and validated analysis and...

  2. BioMatriX: Sequence analysis, structure visualization, phylogenetics ...

    African Journals Online (AJOL)

    bmx-biomatrix.blogspot.com) developed for biological science community to augment scientific research regarding genomics, proteomics, phylogenetics and linkage analysis in one platform. BioMatriX offers multi-functional services to perform ...

  3. Survey sequencing and comparative analysis of the elephant shark (Callorhinchus milii genome.

    Directory of Open Access Journals (Sweden)

    Byrappa Venkatesh

    2007-04-01

    Full Text Available Owing to their phylogenetic position, cartilaginous fishes (sharks, rays, skates, and chimaeras provide a critical reference for our understanding of vertebrate genome evolution. The relatively small genome of the elephant shark, Callorhinchus milii, a chimaera, makes it an attractive model cartilaginous fish genome for whole-genome sequencing and comparative analysis. Here, the authors describe survey sequencing (1.4x coverage and comparative analysis of the elephant shark genome, one of the first cartilaginous fish genomes to be sequenced to this depth. Repetitive sequences, represented mainly by a novel family of short interspersed element-like and long interspersed element-like sequences, account for about 28% of the elephant shark genome. Fragments of approximately 15,000 elephant shark genes reveal specific examples of genes that have been lost differentially during the evolution of tetrapod and teleost fish lineages. Interestingly, the degree of conserved synteny and conserved sequences between the human and elephant shark genomes are higher than that between human and teleost fish genomes. Elephant shark contains putative four Hox clusters indicating that, unlike teleost fish genomes, the elephant shark genome has not experienced an additional whole-genome duplication. These findings underscore the importance of the elephant shark as a critical reference vertebrate genome for comparative analysis of the human and other vertebrate genomes. This study also demonstrates that a survey-sequencing approach can be applied productively for comparative analysis of distantly related vertebrate genomes.

  4. Direct Detection and Differentiation of Pathogenic Leptospira Species Using a Multi-Gene Targeted Real Time PCR Approach

    Science.gov (United States)

    Ferreira, Ana Sofia; Costa, Pedro; Rocha, Teresa; Amaro, Ana; Vieira, Maria Luísa; Ahmed, Ahmed; Thompson, Gertrude; Hartskeerl, Rudy A.; Inácio, João

    2014-01-01

    Leptospirosis is a growing public and veterinary health concern caused by pathogenic species of Leptospira. Rapid and reliable laboratory tests for the direct detection of leptospiral infections in animals are in high demand not only to improve diagnosis but also for understanding the epidemiology of the disease. In this work we describe a novel and simple TaqMan-based multi-gene targeted real-time PCR approach able to detect and differentiate Leptospira interrogans, L. kirschneri, L. borgpeteresenii and L. noguchii, which constitute the veterinary most relevant pathogenic species of Leptospira. The method uses sets of species-specific probes, and respective flanking primers, designed from ompL1 and secY gene sequences. To monitor the presence of inhibitors, a duplex amplification assay targeting both the mammal β-actin and the leptospiral lipL32 genes was implemented. The analytical sensitivity of all primer and probe sets was estimated to be <10 genome equivalents (GE) in the reaction mixture. Application of the amplification reactions on genomic DNA from a variety of pathogenic and non-pathogenic Leptospira strains and other non-related bacteria revealed a 100% analytical specificity. Additionally, pathogenic leptospires were successfully detected in five out of 29 tissue samples from animals (Mus spp., Rattus spp., Dolichotis patagonum and Sus domesticus). Two samples were infected with L. borgpetersenii, two with L. interrogans and one with L. kirschneri. The possibility to detect and identify these pathogenic agents to the species level in domestic and wildlife animals reinforces the diagnostic information and will enhance our understanding of the epidemiology of leptopirosis. PMID:25398140

  5. Sequence analysis of L RNA of Lassa virus

    International Nuclear Information System (INIS)

    Vieth, Simon; Torda, Andrew E.; Asper, Marcel; Schmitz, Herbert; Guenther, Stephan

    2004-01-01

    The L RNA of three Lassa virus strains originating from Nigeria, Ghana/Ivory Coast, and Sierra Leone was sequenced and the data subjected to structure predictions and phylogenetic analyses. The L gene products had 2218-2221 residues, diverged by 18% at the amino acid level, and contained several conserved regions. Only one region of 504 residues (positions 1043-1546) could be assigned a function, namely that of an RNA polymerase. Secondary structure predictions suggest that this domain is very similar to RNA-dependent RNA polymerases of known structure encoded by plus-strand RNA viruses, permitting a model to be built. Outside the polymerase region, there is little structural data, except for regions of strong alpha-helical content and probably a coiled-coil domain at the N terminus. No evidence for reassortment or recombination during Lassa virus evolution was found. The secondary structure-assisted alignment of the RNA polymerase region permitted a reliable reconstruction of the phylogeny of all negative-strand RNA viruses, indicating that Arenaviridae are most closely related to Nairoviruses. In conclusion, the data provide a basis for structural and functional characterization of the Lassa virus L protein and reveal new insights into the phylogeny of negative-strand RNA viruses

  6. Reproducible analysis of sequencing-based RNA structure probing data with user-friendly tools

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz Jan; Sidiropoulos, Nikos; Vinther, Jeppe

    2015-01-01

    time also made analysis of the data challenging for scientists without formal training in computational biology. Here, we discuss different strategies for data analysis of massive parallel sequencing-based structure-probing data. To facilitate reproducible and standardized analysis of this type of data...

  7. Stratigraphical analysis of the neoproterozoic sedimentary sequences of the Sao Francisco Basin

    International Nuclear Information System (INIS)

    Martins, Mariela; Lemos, Valesca Brasil

    2007-01-01

    A stratigraphic analysis was performed under the principles of Sequence Stratigraphy on the neoproterozoic sedimentary sequences of the Sao Francisco Basin (Central Brazil). Three periods of deposition separated by unconformities were recognized in the Sao Francisco Megasequence: (1) Sequences 1 and 2, a cryogenian glaciogenic sequence, followed by a distal scarp carbonate ramp, developed during stable conditions, (2) Sequence 3, a Upper Cryogenian stack homoclinal ramps with mixed carbonate-siliciclastic sedimentation, deposited under a progressive influence of compressional stresses of the Brasiliano Cycle, (3) Sequence 4, a Lower Ediacaran shallow platform dominated by siliciclastic sedimentation of molassic nature, the erosion product of the nearby uplifted thrust sheets. Each of the carbonate-bearing sequences presents a distinct δ 13 C isotopic signature. The superposition to the global curve for carbon isotopic variation allowed the recognition of a major depositional hiatus between the Paranoa and Sao Francisco Megasequences, and suggested that the glacial diamictite deposition (Jequitai Formation) took place most probably around 800 Ma. This constrains the Sao Francisco Megasequence deposition to the interval between 800 and 600 Ma (the known ages of the Brasiliano Orogeny defines the upper limit). A minor depositional hiatus (700.680 Ma) was also identified separating sequences 2 and 3. Isotopic analyses suggest that from then on, more restricted environmental conditions were established in the basin, probably associated with a first order global event, which prevailed throughout deposition of the Sequence 3. (author)

  8. Isolation and sequence analysis of a cDNA clone encoding the fifth complement component

    DEFF Research Database (Denmark)

    Lundwall, Åke B; Wetsel, Rick A; Kristensen, Torsten

    1985-01-01

    DNA clone of 1.85 kilobase pairs was isolated. Hybridization of the mixed-sequence probe to the complementary strand of the plasmid insert and sequence analysis by the dideoxy method predicted the expected protein sequence of C5a (positions 1-12), amino-terminal to the anticipated priming site. The sequence......, subcloned into M13 mp8, and sequenced at random by the dideoxy technique, thereby generating a contiguous sequence of 1703 base pairs. This clone contained coding sequence for the C-terminal 262 amino acid residues of the beta-chain, the entire C5a fragment, and the N-terminal 98 residues of the alpha......'-chain. The 3' end of the clone had a polyadenylated tail preceded by a polyadenylation recognition site, a 3'-untranslated region, and base pairs homologous to the human Alu concensus sequence. Comparison of the derived partial human C5 protein sequence with that previously determined for murine C3 and human...

  9. Oasis: online analysis of small RNA deep sequencing data.

    Science.gov (United States)

    Capece, Vincenzo; Garcia Vizcaino, Julio C; Vidal, Ramon; Rahman, Raza-Ur; Pena Centeno, Tonatiuh; Shomroni, Orr; Suberviola, Irantzu; Fischer, Andre; Bonn, Stefan

    2015-07-01

    Oasis is a web application that allows for the fast and flexible online analysis of small-RNA-seq (sRNA-seq) data. It was designed for the end user in the lab, providing an easy-to-use web frontend including video tutorials, demo data and best practice step-by-step guidelines on how to analyze sRNA-seq data. Oasis' exclusive selling points are a differential expression module that allows for the multivariate analysis of samples, a classification module for robust biomarker detection and an advanced programming interface that supports the batch submission of jobs. Both modules include the analysis of novel miRNAs, miRNA targets and functional analyses including GO and pathway enrichment. Oasis generates downloadable interactive web reports for easy visualization, exploration and analysis of data on a local system. Finally, Oasis' modular workflow enables for the rapid (re-) analysis of data. Oasis is implemented in Python, R, Java, PHP, C++ and JavaScript. It is freely available at http://oasis.dzne.de. stefan.bonn@dzne.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  10. Establishment of screening technique for mutant cell and analysis of base sequence in the mutation

    International Nuclear Information System (INIS)

    Sofuni, Toshio; Nomi, Takehiko; Yamada, Masami; Masumura, Kenichi

    2000-01-01

    This research project aimed to establish an easy and quick detection method for radiation-induced mutation using molecular-biological techniques and an effective analyzing method for the molecular changes in base sequence. In this year, Spi mutants derived from γ-radiation exposed mouse were analyzed by PCR method and DNA sequence method. Male transgenic mice were exposed to γ-ray at 5,10, 50 Gy and the transgene was taken out from the genome DNA from the spleen in vivo packaging method. Spi mutant plaques were obtained by infecting the recovered phage to E. coli. Sequence analysis for the mutants was made using ALFred DNA sequencer and SequiTherm TM Long-Red Cycle sequencing kit. Sequence analysis was carried out for 41 of 50 independent Spi mutants obtained. The deletions were classified into 4 groups; Group 1 included 15 mutants that were characterized with a large deletion (43 bp-10 kb) with a short homologous sequence. Group 2 included 11 mutants of a large deletion having no homologous sequence at the connecting region. Group 3 included 11 mutants having a short deletion of less than 20 bp, which occurred in the non-repetitive sequence of gam gene and possibly caused by oxidative breakage of DNA or recombination of DNA fragment produced by the breakage. Group 4 included 4 mutants having deletions as short as 20 bp or less in the repetitive sequence of gam gene, resulting in an alteration of the reading frame. Thus, the synthesis of Gam protein was terminated by the appearance of TGA between code 13 and 14 of redB gene, leading to inactivation of gam gene and redBA gene. These results indicated that most of Spi mutants had a deletion in red/gam region and the deletions in more than half mutants occurred in homologous sequences as short as 8 bp. (M.N.)

  11. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    Directory of Open Access Journals (Sweden)

    Roberts Richard J

    2008-05-01

    Full Text Available Abstract Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360, cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  12. A base composition analysis of natural patterns for the preprocessing of metagenome sequences.

    Science.gov (United States)

    Bonham-Carter, Oliver; Ali, Hesham; Bastola, Dhundy

    2013-01-01

    On the pretext that sequence reads and contigs often exhibit the same kinds of base usage that is also observed in the sequences from which they are derived, we offer a base composition analysis tool. Our tool uses these natural patterns to determine relatedness across sequence data. We introduce spectrum sets (sets of motifs) which are permutations of bacterial restriction sites and the base composition analysis framework to measure their proportional content in sequence data. We suggest that this framework will increase the efficiency during the pre-processing stages of metagenome sequencing and assembly projects. Our method is able to differentiate organisms and their reads or contigs. The framework shows how to successfully determine the relatedness between these reads or contigs by comparison of base composition. In particular, we show that two types of organismal-sequence data are fundamentally different by analyzing their spectrum set motif proportions (coverage). By the application of one of the four possible spectrum sets, encompassing all known restriction sites, we provide the evidence to claim that each set has a different ability to differentiate sequence data. Furthermore, we show that the spectrum set selection having relevance to one organism, but not to the others of the data set, will greatly improve performance of sequence differentiation even if the fragment size of the read, contig or sequence is not lengthy. We show the proof of concept of our method by its application to ten trials of two or three freshly selected sequence fragments (reads and contigs) for each experiment across the six organisms of our set. Here we describe a novel and computationally effective pre-processing step for metagenome sequencing and assembly tasks. Furthermore, our base composition method has applications in phylogeny where it can be used to infer evolutionary distances between organisms based on the notion that related organisms often have much conserved code.

  13. Expressed sequence tags as a tool for phylogenetic analysis of placental mammal evolution.

    Directory of Open Access Journals (Sweden)

    Morgan Kullberg

    Full Text Available BACKGROUND: We investigate the usefulness of expressed sequence tags, ESTs, for establishing divergences within the tree of placental mammals. This is done on the example of the established relationships among primates (human, lagomorphs (rabbit, rodents (rat and mouse, artiodactyls (cow, carnivorans (dog and proboscideans (elephant. METHODOLOGY/PRINCIPAL FINDINGS: We have produced 2000 ESTs (1.2 mega bases from a marsupial mouse and characterized the data for their use in phylogenetic analysis. The sequences were used to identify putative orthologous sequences from whole genome projects. Although most ESTs stem from single sequence reads, the frequency of potential sequencing errors was found to be lower than allelic variation. Most of the sequences represented slowly evolving housekeeping-type genes, with an average amino acid distance of 6.6% between human and mouse. Positive Darwinian selection was identified at only a few single sites. Phylogenetic analyses of the EST data yielded trees that were consistent with those established from whole genome projects. CONCLUSIONS: The general quality of EST sequences and the general absence of positive selection in these sequences make ESTs an attractive tool for phylogenetic analysis. The EST approach allows, at reasonable costs, a fast extension of data sampling from species outside the genome projects.

  14. An Unusual Accumulation of Ribosomal Multigene Families and Microsatellite DNAs in the XX/XY Sex Chromosome System in the Trans-Andean Catfish Pimelodella cf. chagresi (Siluriformes:Heptapteridae).

    Science.gov (United States)

    Conde-Saldaña, Cristhian Camilo; Barreto, Cynthia Aparecida Valiati; Villa-Navarro, Francisco Antonio; Dergam, Jorge Abdala

    2018-02-01

    This work constitutes the first cytogenetic characterization of a trans-Andean species of Heptapteridae. The catfish Pimelodella cf. chagresi from the Upper Rio Magdalena was studied, applying standard cytogenetic techniques (Giemsa, C-banding, and argyrophilic nucleolar organizer region [Ag-NOR]) and fluorescence in situ hybridization techniques using repetitive DNA probes: microsatellites (CA 15 and GA 15 ) and ribosomal RNA (rRNA) multigene families (18S and 5S recombinant DNA [rDNA] probes). The species showed a unique diploid chromosome number 2n = 50 (32m [metacentrics] +14sm [submetacentrics] +4st [subtelocentrics]) and a XX/XY sex chromosomal system, where the heteromorphic Y-chromosome revealed a conspicuous accumulation of all the assayed domains of repetitive DNA. P. cf. chagresi karyotype shares common features with other Heptapteridae, such as the predominance of metacentric and submetacentric chromosomes, and one pair of subtelomeric nucleolar organizer regions (NORs). These results reflect an independent karyological identity of a trans-Andean species and the relevance of repetitive DNA sequences in the process of sex chromosome differentiation in fish; it is the first case of syntenic accumulation of rRNA multigene families (18S and 5S rDNA) and microsatellite sequences (CA 15 and GA 15 ) in a differentiated sex chromosome in Neotropical fish.

  15. Cloning and sequence analysis of hyaluronoglucosaminidase (nagH gene of Clostridium chauvoei

    Directory of Open Access Journals (Sweden)

    Saroj K. Dangi

    2017-09-01

    Full Text Available Aim: Blackleg disease is caused by Clostridium chauvoei in ruminants. Although virulence factors such as C. chauvoei toxin A, sialidase, and flagellin are well characterized, hyaluronidases of C. chauvoei are not characterized. The present study was aimed at cloning and sequence analysis of hyaluronoglucosaminidase (nagH gene of C. chauvoei. Materials and Methods: C. chauvoei strain ATCC 10092 was grown in ATCC 2107 media and confirmed by polymerase chain reaction (PCR using the primers specific for 16-23S rDNA spacer region. nagH gene of C. chauvoei was amplified and cloned into pRham-SUMO vector and transformed into Escherichia cloni 10G cells. The construct was then transformed into E. cloni cells. Colony PCR was carried out to screen the colonies followed by sequencing of nagH gene in the construct. Results: PCR amplification yielded nagH gene of 1143 bp product, which was cloned in prokaryotic expression system. Colony PCR, as well as sequencing of nagH gene, confirmed the presence of insert. Sequence was then subjected to BLAST analysis of NCBI, which confirmed that the sequence was indeed of nagH gene of C. chauvoei. Phylogenetic analysis of the sequence showed that it is closely related to Clostridium perfringens and Clostridium paraputrificum. Conclusion: The gene for virulence factor nagH was cloned into a prokaryotic expression vector and confirmed by sequencing.

  16. The experimental study of genetic engineering human neural stem cells mediated by lentivirus to express multigene.

    Science.gov (United States)

    Cai, Pei-qiang; Tang, Xun; Lin, Yue-qiu; Martin, Oudega; Sun, Guang-yun; Xu, Lin; Yang, Yun-kang; Zhou, Tian-hua

    2006-02-01

    To explore the feasibility to construct genetic engineering human neural stem cells (hNSCs) mediated by lentivirus to express multigene in order to provide a graft source for further studies of spinal cord injury (SCI). Human neural stem cells from the brain cortex of human abortus were isolated and cultured, then gene was modified by lentivirus to express both green fluorescence protein (GFP) and rat neurotrophin-3 (NT-3); the transgenic expression was detected by the methods of fluorescence microscope, dorsal root ganglion of fetal rats and slot blot. Genetic engineering hNSCs were successfully constructed. All of the genetic engineering hNSCs which expressed bright green fluorescence were observed under the fluorescence microscope. The conditioned medium of transgenic hNSCs could induce neurite flourishing outgrowth from dorsal root ganglion (DRG). The genetic engineering hNSCs expressed high level NT-3 which could be detected by using slot blot. Genetic engineering hNSCs mediated by lentivirus can be constructed to express multigene successfully.

  17. Analysis of Multiple Genomic Sequence Alignments: A Web Resource, Online Tools, and Lessons Learned From Analysis of Mammalian SCL Loci

    Science.gov (United States)

    Chapman, Michael A.; Donaldson, Ian J.; Gilbert, James; Grafham, Darren; Rogers, Jane; Green, Anthony R.; Göttgens, Berthold

    2004-01-01

    Comparative analysis of genomic sequences is becoming a standard technique for studying gene regulation. However, only a limited number of tools are currently available for the analysis of multiple genomic sequences. An extensive data set for the testing and training of such tools is provided by the SCL gene locus. Here we have expanded the data set to eight vertebrate species by sequencing the dog SCL locus and by annotating the dog and rat SCL loci. To provide a resource for the bioinformatics community, all SCL sequences and functional annotations, comprising a collation of the extensive experimental evidence pertaining to SCL regulation, have been made available via a Web server. A Web interface to new tools specifically designed for the display and analysis of multiple sequence alignments was also implemented. The unique SCL data set and new sequence comparison tools allowed us to perform a rigorous examination of the true benefits of multiple sequence comparisons. We demonstrate that multiple sequence alignments are, overall, superior to pairwise alignments for identification of mammalian regulatory regions. In the search for individual transcription factor binding sites, multiple alignments markedly increase the signal-to-noise ratio compared to pairwise alignments. PMID:14718377

  18. WebMGA: a customizable web server for fast metagenomic sequence analysis.

    Science.gov (United States)

    Wu, Sitao; Zhu, Zhengwei; Fu, Liming; Niu, Beifang; Li, Weizhong

    2011-09-07

    The new field of metagenomics studies microorganism communities by culture-independent sequencing. With the advances in next-generation sequencing techniques, researchers are facing tremendous challenges in metagenomic data analysis due to huge quantity and high complexity of sequence data. Analyzing large datasets is extremely time-consuming; also metagenomic annotation involves a wide range of computational tools, which are difficult to be installed and maintained by common users. The tools provided by the few available web servers are also limited and have various constraints such as login requirement, long waiting time, inability to configure pipelines etc. We developed WebMGA, a customizable web server for fast metagenomic analysis. WebMGA includes over 20 commonly used tools such as ORF calling, sequence clustering, quality control of raw reads, removal of sequencing artifacts and contaminations, taxonomic analysis, functional annotation etc. WebMGA provides users with rapid metagenomic data analysis using fast and effective tools, which have been implemented to run in parallel on our local computer cluster. Users can access WebMGA through web browsers or programming scripts to perform individual analysis or to configure and run customized pipelines. WebMGA is freely available at http://weizhongli-lab.org/metagenomic-analysis. WebMGA offers to researchers many fast and unique tools and great flexibility for complex metagenomic data analysis.

  19. WebMGA: a customizable web server for fast metagenomic sequence analysis

    Directory of Open Access Journals (Sweden)

    Niu Beifang

    2011-09-01

    Full Text Available Abstract Background The new field of metagenomics studies microorganism communities by culture-independent sequencing. With the advances in next-generation sequencing techniques, researchers are facing tremendous challenges in metagenomic data analysis due to huge quantity and high complexity of sequence data. Analyzing large datasets is extremely time-consuming; also metagenomic annotation involves a wide range of computational tools, which are difficult to be installed and maintained by common users. The tools provided by the few available web servers are also limited and have various constraints such as login requirement, long waiting time, inability to configure pipelines etc. Results We developed WebMGA, a customizable web server for fast metagenomic analysis. WebMGA includes over 20 commonly used tools such as ORF calling, sequence clustering, quality control of raw reads, removal of sequencing artifacts and contaminations, taxonomic analysis, functional annotation etc. WebMGA provides users with rapid metagenomic data analysis using fast and effective tools, which have been implemented to run in parallel on our local computer cluster. Users can access WebMGA through web browsers or programming scripts to perform individual analysis or to configure and run customized pipelines. WebMGA is freely available at http://weizhongli-lab.org/metagenomic-analysis. Conclusions WebMGA offers to researchers many fast and unique tools and great flexibility for complex metagenomic data analysis.

  20. The scale analysis sequence for LWR fuel depletion

    International Nuclear Information System (INIS)

    Hermann, O.W.; Parks, C.V.

    1991-01-01

    The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system is used extensively to perform away-from-reactor safety analysis (particularly criticality safety, shielding, heat transfer analyses) for spent light water reactor (LWR) fuel. Spent fuel characteristics such as radiation sources, heat generation sources, and isotopic concentrations can be computed within SCALE using the SAS2 control module. A significantly enhanced version of the SAS2 control module, which is denoted as SAS2H, has been made available with the release of SCALE-4. For each time-dependent fuel composition, SAS2H performs one-dimensional (1-D) neutron transport analyses (via XSDRNPM-S) of the reactor fuel assembly using a two-part procedure with two separate unit-cell-lattice models. The cross sections derived from a transport analysis at each time step are used in a point-depletion computation (via ORIGEN-S) that produces the burnup-dependent fuel composition to be used in the next spectral calculation. A final ORIGEN-S case is used to perform the complete depletion/decay analysis using the burnup-dependent cross sections. The techniques used by SAS2H and two recent applications of the code are reviewed in this paper. 17 refs., 5 figs., 5 tabs

  1. Sequence determination and analysis of the NSs genes of two tospoviruses.

    Science.gov (United States)

    Hallwass, Mariana; Leastro, Mikhail O; Lima, Mirtes F; Inoue-Nagata, Alice K; Resende, Renato O

    2012-03-01

    The tospoviruses groundnut ringspot virus (GRSV) and zucchini lethal chlorosis virus (ZLCV) cause severe losses in many crops, especially in solanaceous and cucurbit species. In this study, the non-structural NSs gene and the 5'UTRs of these two biologically distinct tospoviruses were cloned and sequenced. The NSs sequence of GRSV and ZLCV were both 1,404 nucleotides long. Pairwise comparison showed that the NSs amino acid sequence of GRSV shared 69.6% identity with that of ZLCV and 75.9% identity with that of TSWV, while the NSs sequence of ZLCV and TSWV shared 67.9% identity. Phylogenetic analysis based on NSs sequences confirmed that these viruses cluster in the American clade.

  2. Sequencing and phylogenetic analysis of tobacco virus 2, a polerovirus from Nicotiana tabacum.

    Science.gov (United States)

    Zhou, Benguo; Wang, Fang; Zhang, Xuesong; Zhang, Lina; Lin, Huafeng

    2017-07-01

    The complete genome sequence of a new virus, provisionally named tobacco virus 2 (TV2), was determined and identified from leaves of tobacco (Nicotiana tabacum) exhibiting leaf mosaic, yellowing, and deformity, in Anhui Province, China. The genome sequence of TV2 comprises 5,979 nucleotides, with 87% nucleotide sequence identity to potato leafroll virus (PLRV). Its genome organization is similar to that of PLRV, containing six open reading frames (ORFs) that potentially encode proteins with putative functions in cell-to-cell movement and suppression of RNA silencing. Phylogenetic analysis of the nucleotide sequence placed TV2 alongside members of the genus Polerovirus in the family Luteoviridae. To the best our knowledge, this study is the first report of a complete genome sequence of a new polerovirus identified in tobacco.

  3. Deconstructing the genetic basis of spent sulphite liquor tolerance using deep sequencing of genome-shuffled yeast.

    Science.gov (United States)

    Pinel, Dominic; Colatriano, David; Jiang, Heng; Lee, Hung; Martin, Vincent Jj

    2015-01-01

    Identifying the genetic basis of complex microbial phenotypes is currently a major barrier to our understanding of multigenic traits and our ability to rationally design biocatalysts with highly specific attributes for the biotechnology industry. Here, we demonstrate that strain evolution by meiotic recombination-based genome shuffling coupled with deep sequencing can be used to deconstruct complex phenotypes and explore the nature of multigenic traits, while providing concrete targets for strain development. We determined genomic variations found within Saccharomyces cerevisiae previously evolved in our laboratory by genome shuffling for tolerance to spent sulphite liquor. The representation of these variations was backtracked through parental mutant pools and cross-referenced with RNA-seq gene expression analysis to elucidate the importance of single mutations and key biological processes that play a role in our trait of interest. Our findings pinpoint novel genes and biological determinants of lignocellulosic hydrolysate inhibitor tolerance in yeast. These include the following: protein homeostasis constituents, including Ubp7p and Art5p, related to ubiquitin-mediated proteolysis; stress response transcriptional repressor, Nrg1p; and NADPH-dependent glutamate dehydrogenase, Gdh1p. Reverse engineering a prominent mutation in ubiquitin-specific protease gene UBP7 in a laboratory S. cerevisiae strain effectively increased spent sulphite liquor tolerance. This study advances understanding of yeast tolerance mechanisms to inhibitory substrates and biocatalyst design for a biomass-to-biofuel/biochemical industry, while providing insights into the process of mutation accumulation that occurs during genome shuffling.

  4. An analysis of LOCA sequences in the development of severe accident analysis DB

    International Nuclear Information System (INIS)

    Choi, Young; Park, Soo Yong; Ahn, Kwang-Il; Kim, D.H.

    2006-01-01

    Although a Level 2 PSA was performed for the Korean Standard Power Plants (KSNPs), and it considered the necessary sequences for an assessment of the containment integrity and source term analysis. In terms of an accident management, however, more cases causing severe core damage need to be analyzed and arranged systematically for an easy access to the results. At present, KAERI is calculating the severe accident sequences intensively for various initiating events and generating a database for the accident progression including thermal hydraulic and source term behaviours. The developed Database (DB) system includes a graphical display for a plant and equipment status, previous research results by knowledge-base technique, and the expected plant behaviour. The plant model used in this paper is oriented to the case of LOCAs related severe accident phenomena and thus can simulate the plant behaviours for a severe accident. Therefore the developed system may play a central role as an information source for decision-making for a severe accident management, and will be used as a training simulator for a severe accident management. (author)

  5. Sequence analysis of serum albumins reveals the molecular evolution of ligand recognition properties.

    Science.gov (United States)

    Fanali, Gabriella; Ascenzi, Paolo; Bernardi, Giorgio; Fasano, Mauro

    2012-01-01

    Serum albumin (SA) is a circulating protein providing a depot and carrier for many endogenous and exogenous compounds. At least seven major binding sites have been identified by structural and functional investigations mainly in human SA. SA is conserved in vertebrates, with at least 49 entries in protein sequence databases. The multiple sequence analysis of this set of entries leads to the definition of a cladistic tree for the molecular evolution of SA orthologs in vertebrates, thus showing the clustering of the considered species, with lamprey SAs (Lethenteron japonicum and Petromyzon marinus) in a separate outgroup. Sequence analysis aimed at searching conserved domains revealed that most SA sequences are made up by three repeated domains (about 600 residues), as extensively characterized for human SA. On the contrary, lamprey SAs are giant proteins (about 1400 residues) comprising seven repeated domains. The phylogenetic analysis of the SA family reveals a stringent correlation with the taxonomic classification of the species available in sequence databases. A focused inspection of the sequences of ligand binding sites in SA revealed that in all sites most residues involved in ligand binding are conserved, although the versatility towards different ligands could be peculiar of higher organisms. Moreover, the analysis of molecular links between the different sites suggests that allosteric modulation mechanisms could be restricted to higher vertebrates.

  6. Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data [version 2; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Linh Nguyen

    2017-03-01

    Full Text Available Background: Selected gene mutations are routinely used to guide the selection of cancer drugs for a given patient tumour. Large pharmacogenomic data sets, such as those by Genomics of Drug Sensitivity in Cancer (GDSC consortium, were introduced to discover more of these single-gene markers of drug sensitivity. Very recently, machine learning regression has been used to investigate how well cancer cell line sensitivity to drugs is predicted depending on the type of molecular profile. The latter has revealed that gene expression data is the most predictive profile in the pan-cancer setting. However, no study to date has exploited GDSC data to systematically compare the performance of machine learning models based on multi-gene expression data against that of widely-used single-gene markers based on genomics data. Methods: Here we present this systematic comparison using Random Forest (RF classifiers exploiting the expression levels of 13,321 genes and an average of 501 tested cell lines per drug. To account for time-dependent batch effects in IC50 measurements, we employ independent test sets generated with more recent GDSC data than that used to train the predictors and show that this is a more realistic validation than standard k-fold cross-validation. Results and Discussion: Across 127 GDSC drugs, our results show that the single-gene markers unveiled by the MANOVA analysis tend to achieve higher precision than these RF-based multi-gene models, at the cost of generally having a poor recall (i.e. correctly detecting only a small part of the cell lines sensitive to the drug. Regarding overall classification performance, about two thirds of the drugs are better predicted by the multi-gene RF classifiers. Among the drugs with the most predictive of these models, we found pyrimethamine, sunitinib and 17-AAG. Conclusions: Thanks to this unbiased validation, we now know that this type of models can predict in vitro tumour response to some of these

  7. Generation and analysis of expressed sequence tags from the ciliate protozoan parasite Ichthyophthirius multifiliis

    Directory of Open Access Journals (Sweden)

    Arias Covadonga

    2007-06-01

    Full Text Available Abstract Background The ciliate protozoan Ichthyophthirius multifiliis (Ich is an important parasite of freshwater fish that causes 'white spot disease' leading to significant losses. A genomic resource for large-scale studies of this parasite has been lacking. To study gene expression involved in Ich pathogenesis and virulence, our goal was to generate expressed sequence tags (ESTs for the development of a powerful microarray platform for the analysis of global gene expression in this species. Here, we initiated a project to sequence and analyze over 10,000 ESTs. Results We sequenced 10,368 EST clones using a normalized cDNA library made from pooled samples of the trophont, tomont, and theront life-cycle stages, and generated 9,769 sequences (94.2% success rate. Post-sequencing processing led to 8,432 high quality sequences. Clustering analysis of these ESTs allowed identification of 4,706 unique sequences containing 976 contigs and 3,730 singletons. These unique sequences represent over two million base pairs (~10% of Plasmodium falciparum genome, a phylogenetically related protozoan. BLASTX searches produced 2,518 significant (E-value -5 hits and further Gene Ontology (GO analysis annotated 1,008 of these genes. The ESTs were analyzed comparatively against the genomes of the related protozoa Tetrahymena thermophila and P. falciparum, allowing putative identification of additional genes. All the EST sequences were deposited by dbEST in GenBank (GenBank: EG957858–EG966289. Gene discovery and annotations are presented and discussed. Conclusion This set of ESTs represents a significant proportion of the Ich transcriptome, and provides a material basis for the development of microarrays useful for gene expression studies concerning Ich development, pathogenesis, and virulence.

  8. Genetic mutation analysis of human gastric adenocarcinomas using ion torrent sequencing platform.

    Directory of Open Access Journals (Sweden)

    Zhi Xu

    Full Text Available Gastric cancer is the one of the major causes of cancer-related death, especially in Asia. Gastric adenocarcinoma, the most common type of gastric cancer, is heterogeneous and its incidence and cause varies widely with geographical regions, gender, ethnicity, and diet. Since unique mutations have been observed in individual human cancer samples, identification and characterization of the molecular alterations underlying individual gastric adenocarcinomas is a critical step for developing more effective, personalized therapies. Until recently, identifying genetic mutations on an individual basis by DNA sequencing remained a daunting task. Recent advances in new next-generation DNA sequencing technologies, such as the semiconductor-based Ion Torrent sequencing platform, makes DNA sequencing cheaper, faster, and more reliable. In this study, we aim to identify genetic mutations in the genes which are targeted by drugs in clinical use or are under development in individual human gastric adenocarcinoma samples using Ion Torrent sequencing. We sequenced 737 loci from 45 cancer-related genes in 238 human gastric adenocarcinoma samples using the Ion Torrent Ampliseq Cancer Panel. The sequencing analysis revealed a high occurrence of mutations along the TP53 locus (9.7% in our sample set. Thus, this study indicates the utility of a cost and time efficient tool such as Ion Torrent sequencing to screen cancer mutations for the development of personalized cancer therapy.

  9. Plastome Sequence Determination and Comparative Analysis for Members of the Lolium-Festuca Grass Species Complex

    Science.gov (United States)

    Hand, Melanie L.; Spangenberg, German C.; Forster, John W.; Cogan, Noel O. I.

    2013-01-01

    Chloroplast genome sequences are of broad significance in plant biology, due to frequent use in molecular phylogenetics, comparative genomics, population genetics, and genetic modification studies. The present study used a second-generation sequencing approach to determine and assemble the plastid genomes (plastomes) of four representatives from the agriculturally important Lolium-Festuca species complex of pasture grasses (Lolium multiflorum, Festuca pratensis, Festuca altissima, and Festuca ovina). Total cellular DNA was extracted from either roots or leaves, was sequenced, and the output was filtered for plastome-related reads. A comparison between sources revealed fewer plastome-related reads from root-derived template but an increase in incidental bacterium-derived sequences. Plastome assembly and annotation indicated high levels of sequence identity and a conserved organization and gene content between species. However, frequent deletions within the F. ovina plastome appeared to contribute to a smaller plastid genome size. Comparative analysis with complete plastome sequences from other members of the Poaceae confirmed conservation of most grass-specific features. Detailed analysis of the rbcL–psaI intergenic region, however, revealed a “hot-spot” of variation characterized by independent deletion events. The evolutionary implications of this observation are discussed. The complete plastome sequences are anticipated to provide the basis for potential organelle-specific genetic modification of pasture grasses. PMID:23550121

  10. galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.

    Science.gov (United States)

    Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M

    2004-06-12

    The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se

  11. QTL analysis by sequencing of Water Use Efficiency (WUE) in potato

    DEFF Research Database (Denmark)

    Kaminski, Kacper Piotr; Sønderkær, Mads; Sørensen, Kirsten Kørup

    2013-01-01

    The traditional approach to potato breeding, the classical “mate and phenotype” approach is relatively costly and because phenotyping and growth capacity is limited, this are being slowly replaced by Marker Assisted Selection (MAS) breeding schemes. MAS is based on the presence of DNA polymorphic.......sparsipilum), phenotyped for water use efficiency. This population has also previously been phenotyped for the total glycoalkaloid (TGA) content....... and time consuming process. Here, a novel method for Quantitative Trait Locus (QTL) analysis has been developed, that allows for development of specific markers by use of genomic sequence reads and the recently published reference genome sequence for potato. Prior to sequencing the mapping population...

  12. Sequence length variation, indel costs, and congruence in sensitivity analysis

    DEFF Research Database (Denmark)

    Aagesen, Lone; Petersen, Gitte; Seberg, Ole

    2005-01-01

    The behavior of two topological and four character-based congruence measures was explored using different indel treatments in three empirical data sets, each with different alignment difficulties. The analyses were done using direct optimization within a sensitivity analysis framework in which...... the cost of indels was varied. Indels were treated either as a fifth character state, or strings of contiguous gaps were considered single events by using linear affine gap cost. Congruence consistently improved when indels were treated as single events, but no congruence measure appeared as the obviously...... preferable one. However, when combining enough data, all congruence measures clearly tended to select the same alignment cost set as the optimal one. Disagreement among congruence measures was mostly caused by a dominant fragment or a data partition that included all or most of the length variation...

  13. Accident sequences and causes analysis in a hydrogen production process

    Energy Technology Data Exchange (ETDEWEB)

    Jae, Moo Sung; Hwang, Seok Won; Kang, Kyong Min; Ryu, Jung Hyun; Kim, Min Soo; Cho, Nam Chul; Jeon, Ho Jun; Jung, Gun Hyo; Han, Kyu Min; Lee, Seng Woo [Hanyang Univ., Seoul (Korea, Republic of)

    2006-03-15

    Since hydrogen production facility using IS process requires high temperature of nuclear power plant, safety assessment should be performed to guarantee the safety of facility. First of all, accident cases of hydrogen production and utilization has been surveyed. Based on the results, risk factors which can be derived from hydrogen production facility were identified. Besides the correlation between risk factors are schematized using influence diagram. Also initiating events of hydrogen production facility were identified and accident scenario development and quantification were performed. PSA methodology was used for identification of initiating event and master logic diagram was used for selection method of initiating event. Event tree analysis was used for quantification of accident scenario. The sum of all the leakage frequencies is 1.22x10{sup -4} which is similar value (1.0x10{sup -4}) for core damage frequency that International Nuclear Safety Advisory Group of IAEA suggested as a criteria.

  14. Image registration based on virtual frame sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Chen, H.; Ng, W.S. [Nanyang Technological University, Computer Integrated Medical Intervention Laboratory, School of Mechanical and Aerospace Engineering, Singapore (Singapore); Shi, D. (Nanyang Technological University, School of Computer Engineering, Singapore, Singpore); Wee, S.B. [Tan Tock Seng Hospital, Department of General Surgery, Singapore (Singapore)

    2007-08-15

    This paper is to propose a new framework for medical image registration with large nonrigid deformations, which still remains one of the biggest challenges for image fusion and further analysis in many medical applications. Registration problem is formulated as to recover a deformation process with the known initial state and final state. To deal with large nonlinear deformations, virtual frames are proposed to be inserted to model the deformation process. A time parameter is introduced and the deformation between consecutive frames is described with a linear affine transformation. Experiments are conducted with simple geometric deformation as well as complex deformations presented in MRI and ultrasound images. All the deformations are characterized with nonlinearity. The positive results demonstrated the effectiveness of this algorithm. The framework proposed in this paper is feasible to register medical images with large nonlinear deformations and is especially useful for sequential images. (orig.)

  15. Next-generation sequencing of multiple individuals per barcoded library by deconvolution of sequenced amplicons using endonuclease fragment analysis

    DEFF Research Database (Denmark)

    Andersen, Jeppe D; Pereira, Vania; Pietroni, Carlotta

    2014-01-01

    The simultaneous sequencing of samples from multiple individuals increases the efficiency of next-generation sequencing (NGS) while also reducing costs. Here we describe a novel and simple approach for sequencing DNA from multiple individuals per barcode. Our strategy relies on the endonuclease...... digestion of PCR amplicons prior to library preparation, creating a specific fragment pattern for each individual that can be resolved after sequencing. By using both barcodes and restriction fragment patterns, we demonstrate the ability to sequence the human melanocortin 1 receptor (MC1R) genes from 72...... individuals using only 24 barcoded libraries....

  16. VisRseq: R-based visual framework for analysis of sequencing data

    OpenAIRE

    Younesy, Hamid; Möller, Torsten; Lorincz, Matthew C; Karimi, Mohammad M; Jones, Steven JM

    2015-01-01

    Background Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. Results We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for ...

  17. Targeted DNA Methylation Analysis by High Throughput Sequencing in Porcine Peri-attachment Embryos

    OpenAIRE

    MORRILL, Benson H.; COX, Lindsay; WARD, Anika; HEYWOOD, Sierra; PRATHER, Randall S.; ISOM, S. Clay

    2013-01-01

    Abstract The purpose of this experiment was to implement and evaluate the effectiveness of a next-generation sequencing-based method for DNA methylation analysis in porcine embryonic samples. Fourteen discrete genomic regions were amplified by PCR using bisulfite-converted genomic DNA derived from day 14 in vivo-derived (IVV) and parthenogenetic (PA) porcine embryos as template DNA. Resulting PCR products were subjected to high-throughput sequencing using the Illumina Genome Analyzer IIx plat...

  18. CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline

    OpenAIRE

    Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S.; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M.; Tettelin, Herv?; White, Owen; Angiuoli, Samuel V.; Mahurkar, Anup; Fricke, W. Florian

    2017-01-01

    Background The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. Results CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. ...

  19. Third-Generation Sequencing and Analysis of Four Complete Pig Liver Esterase Gene Sequences in Clones Identified by Screening BAC Library.

    Science.gov (United States)

    Zhou, Qiongqiong; Sun, Wenjuan; Liu, Xiyan; Wang, Xiliang; Xiao, Yuncai; Bi, Dingren; Yin, Jingdong; Shi, Deshi

    2016-01-01

    Pig liver carboxylesterase (PLE) gene sequences in GenBank are incomplete, which has led to difficulties in studying the genetic structure and regulation mechanisms of gene expression of PLE family genes. The aim of this study was to obtain and analysis of complete gene sequences of PLE family by screening from a Rongchang pig BAC library and third-generation PacBio gene sequencing. After a number of existing incomplete PLE isoform gene sequences were analysed, primers were designed based on conserved regions in PLE exons, and the whole pig genome used as a template for Polymerase chain reaction (PCR) amplification. Specific primers were then selected based on the PCR amplification results. A three-step PCR screening method was used to identify PLE-positive clones by screening a Rongchang pig BAC library and PacBio third-generation sequencing was performed. BLAST comparisons and other bioinformatics methods were applied for sequence analysis. Five PLE-positive BAC clones, designated BAC-10, BAC-70, BAC-75, BAC-119 and BAC-206, were identified. Sequence analysis yielded the complete sequences of four PLE genes, PLE1, PLE-B9, PLE-C4, and PLE-G2. Complete PLE gene sequences were defined as those containing regulatory sequences, exons, and introns. It was found that, not only did the PLE exon sequences of the four genes show a high degree of homology, but also that the intron sequences were highly similar. Additionally, the regulatory region of the genes contained two 720bps reverse complement sequences that may have an important function in the regulation of PLE gene expression. This is the first report to confirm the complete sequences of four PLE genes. In addition, the study demonstrates that each PLE isoform is encoded by a single gene and that the various genes exhibit a high degree of sequence homology, suggesting that the PLE family evolved from a single ancestral gene. Obtaining the complete sequences of these PLE genes provides the necessary foundation for

  20. Quantitative trait loci affecting phenotypic variation in the vacuolated lens mouse mutant, a multigenic mouse model of neural tube defects

    NARCIS (Netherlands)

    Korstanje, Ron; Desai, Jigar; Lazar, Gloria; King, Benjamin; Rollins, Jarod; Spurr, Melissa; Joseph, Jamie; Kadambi, Sindhuja; Li, Yang; Cherry, Allison; Matteson, Paul G.; Paigen, Beverly; Millonig, James H.

    Korstanje R, Desai J, Lazar G, King B, Rollins J, Spurr M, Joseph J, Kadambi S, Li Y, Cherry A, Matteson PG, Paigen B, Millonig JH. Quantitative trait loci affecting phenotypic variation in the vacuolated lens mouse mutant, a multigenic mouse model of neural tube defects. Physiol Genomics 35:

  1. Genome sequencing and analysis of BCG vaccine strains.

    Directory of Open Access Journals (Sweden)

    Wen Zhang

    Full Text Available BACKGROUND: Although the Bacillus Calmette-Guérin (BCG vaccine against tuberculosis (TB has been available for more than 75 years, one third of the world's population is still infected with Mycobacterium tuberculosis and approximately 2 million people die of TB every year. To reduce this immense TB burden, a clearer understanding of the functional genes underlying the action of BCG and the development of new vaccines are urgently needed. METHODS AND FINDINGS: Comparative genomic analysis of 19 M. tuberculosis complex strains showed that BCG strains underwent repeated human manipulation, had higher region of deletion rates than those of natural M. tuberculosis strains, and lost several essential components such as T-cell epitopes. A total of 188 BCG strain T-cell epitopes were lost to various degrees. The non-virulent BCG Tokyo strain, which has the largest number of T-cell epitopes (359, lost 124. Here we propose that BCG strain protection variability results from different epitopes. This study is the first to present BCG as a model organism for genetics research. BCG strains have a very well-documented history and now detailed genome information. Genome comparison revealed the selection process of BCG strains under human manipulation (1908-1966. CONCLUSIONS: Our results revealed the cause of BCG vaccine strain protection variability at the genome level and supported the hypothesis that the restoration of lost BCG Tokyo epitopes is a useful future vaccine development strategy. Furthermore, these detailed BCG vaccine genome investigation results will be useful in microbial genetics, microbial engineering and other research fields.

  2. Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

    KAUST Repository

    Doan, Ryan; Cohen, Noah D; Sawyer, Jason; Ghaffari, Noushin; Johnson, Charlie D; Dindot, Scott V

    2012-01-01

    BACKGROUND: The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. RESULTS: Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. CONCLUSIONS: This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  3. Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

    KAUST Repository

    Doan, Ryan

    2012-02-17

    BACKGROUND: The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. RESULTS: Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse\\'s genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. CONCLUSIONS: This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  4. mESAdb: microRNA expression and sequence analysis database.

    Science.gov (United States)

    Kaya, Koray D; Karakülah, Gökhan; Yakicier, Cengiz M; Acar, Aybar C; Konu, Ozlen

    2011-01-01

    microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.

  5. Novel primer specific false terminations during DNA sequencing reactions: danger of inaccuracy of mutation analysis in molecular diagnostics

    Science.gov (United States)

    Anwar, R; Booth, A; Churchill, A J; Markham, A F

    1996-01-01

    The determination of nucleotide sequence is fundamental to the identification and molecular analysis of genes. Direct sequencing of PCR products is now becoming a commonplace procedure for haplotype analysis, and for defining mutations and polymorphism within genes, particularly for diagnostic purposes. A previously unrecognised phenomenon, primer related variability, observed in sequence data generated using Taq cycle sequencing and T7 Sequenase sequencing, is reported. This suggests that caution is necessary when interpreting DNA sequence data. This is particularly important in situations where treatment may be dependent on the accuracy of the molecular diagnosis. Images PMID:16696096

  6. A priori Considerations When Conducting High-Throughput Amplicon-Based Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Aditi Sengupta

    2016-03-01

    Full Text Available Amplicon-based sequencing strategies that include 16S rRNA and functional genes, alongside “meta-omics” analyses of communities of microorganisms, have allowed researchers to pose questions and find answers to “who” is present in the environment and “what” they are doing. Next-generation sequencing approaches that aid microbial ecology studies of agricultural systems are fast gaining popularity among agronomy, crop, soil, and environmental science researchers. Given the rapid development of these high-throughput sequencing techniques, researchers with no prior experience will desire information about the best practices that can be used before actually starting high-throughput amplicon-based sequence analyses. We have outlined items that need to be carefully considered in experimental design, sampling, basic bioinformatics, sequencing of mock communities and negative controls, acquisition of metadata, and in standardization of reaction conditions as per experimental requirements. Not all considerations mentioned here may pertain to a particular study. The overall goal is to inform researchers about considerations that must be taken into account when conducting high-throughput microbial DNA sequencing and sequences analysis.

  7. Regularized rare variant enrichment analysis for case-control exome sequencing data.

    Science.gov (United States)

    Larson, Nicholas B; Schaid, Daniel J

    2014-02-01

    Rare variants have recently garnered an immense amount of attention in genetic association analysis. However, unlike methods traditionally used for single marker analysis in GWAS, rare variant analysis often requires some method of aggregation, since single marker approaches are poorly powered for typical sequencing study sample sizes. Advancements in sequencing technologies have rendered next-generation sequencing platforms a realistic alternative to traditional genotyping arrays. Exome sequencing in particular not only provides base-level resolution of genetic coding regions, but also a natural paradigm for aggregation via genes and exons. Here, we propose the use of penalized regression in combination with variant aggregation measures to identify rare variant enrichment in exome sequencing data. In contrast to marginal gene-level testing, we simultaneously evaluate the effects of rare variants in multiple genes, focusing on gene-based least absolute shrinkage and selection operator (LASSO) and exon-based sparse group LASSO models. By using gene membership as a grouping variable, the sparse group LASSO can be used as a gene-centric analysis of rare variants while also providing a penalized approach toward identifying specific regions of interest. We apply extensive simulations to evaluate the performance of these approaches with respect to specificity and sensitivity, comparing these results to multiple competing marginal testing methods. Finally, we discuss our findings and outline future research. © 2013 WILEY PERIODICALS, INC.

  8. Molecular characterization, sequence analysis and tissue expression of a porcine gene – MOSPD2

    Directory of Open Access Journals (Sweden)

    Yang Jie

    2017-01-01

    Full Text Available The full-length cDNA sequence of a porcine gene, MOSPD2, was amplified using the rapid amplification of cDNA ends method based on a pig expressed sequence tag sequence which was highly homologous to the coding sequence of the human MOSPD2 gene. Sequence prediction analysis revealed that the open reading frame of this gene encodes a protein of 491 amino acids that has high homology with the motile sperm domain-containing protein 2 (MOSPD2 of five species: horse (89%, human (90%, chimpanzee (89%, rhesus monkey (89% and mouse (85%; thus, it could be defined as a porcine MOSPD2 gene. This novel porcine gene was assigned GeneID: 100153601. This gene is structured in 15 exons and 14 introns as revealed by computer-assisted analysis. The phylogenetic analysis revealed that the porcine MOSPD2 gene has a closer genetic relationship with the MOSPD2 gene of horse. Tissue expression analysis indicated that the porcine MOSPD2 gene is generally and differentially expressed in the spleen, muscle, skin, kidney, lung, liver, fat and heart. Our experiment is the first to establish the primary foundation for further research on the porcine MOSPD2 gene.

  9. Sequence and phylogenetic analysis of chicken anaemia virus obtained from backyard and commercial chickens in Nigeria.

    Science.gov (United States)

    Oluwayelu, D O; Todd, D; Olaleye, O D

    2008-12-01

    This work reports the first molecular analysis study of chicken anaemia virus (CAV) in backyard chickens in Africa using molecular cloning and sequence analysis to characterize CAV strains obtained from commercial chickens and Nigerian backyard chickens. Partial VP1 gene sequences were determined for three CAVs from commercial chickens and for six CAV variants present in samples from a backyard chicken. Multiple alignment analysis revealed that the 6% and 4% nucleotide diversity obtained respectively for the commercial and backyard chicken strains translated to only 2% amino acid diversity for each breed. Overall, the amino acid composition of Nigerian CAVs was found to be highly conserved. Since the partial VP1 gene sequence of two backyard chicken cloned CAV strains (NGR/CI-8 and NGR/CI-9) were almost identical and evolutionarily closely related to the commercial chicken strains NGR-1, and NGR-4 and NGR-5, respectively, we concluded that CAV infections had crossed the farm boundary.

  10. Comparative analysis of catfish BAC end sequences with the zebrafish genome

    Directory of Open Access Journals (Sweden)

    Abernathy Jason

    2009-12-01

    Full Text Available Abstract Background Comparative mapping is a powerful tool to transfer genomic information from sequenced genomes to closely related species for which whole genome sequence data are not yet available. However, such an approach is still very limited in catfish, the most important aquaculture species in the United States. This project was initiated to generate additional BAC end sequences and demonstrate their applications in comparative mapping in catfish. Results We reported the generation of 43,000 BAC end sequences and their applications for comparative genome analysis in catfish. Using these and the additional 20,000 existing BAC end sequences as a resource along with linkage mapping and existing physical map, conserved syntenic regions were identified between the catfish and zebrafish genomes. A total of 10,943 catfish BAC end sequences (17.3% had significant BLAST hits to the zebrafish genome (cutoff value ≤ e-5, of which 3,221 were unique gene hits, providing a platform for comparative mapping based on locations of these genes in catfish and zebrafish. Genetic linkage mapping of microsatellites associated with contigs allowed identification of large conserved genomic segments and construction of super scaffolds. Conclusion BAC end sequences and their associated polymorphic markers are great resources for comparative genome analysis in catfish. Highly conserved chromosomal regions were identified to exist between catfish and zebrafish. However, it appears that the level of conservation at local genomic regions are high while a high level of chromosomal shuffling and rearrangements exist between catfish and zebrafish genomes. Orthologous regions established through comparative analysis should facilitate both structural and functional genome analysis in catfish.

  11. PMS2 gene mutational analysis: direct cDNA sequencing to circumvent pseudogene interference.

    Science.gov (United States)

    Wimmer, Katharina; Wernstedt, Annekatrin

    2014-01-01

    The presence of highly homologous pseudocopies can compromise the mutation analysis of a gene of interest. In particular, when using PCR-based strategies, pseudogene co-amplification has to be effectively prevented. This is often achieved by using primers designed to be parental gene specific according to the reference sequence and by applying stringent PCR conditions. However, there are cases in which this approach is of limited utility. For example, it has been shown that the PMS2 gene exchanges sequences with one of its pseudogenes, named PMS2CL. This results in functional PMS2 alleles containing pseudogene-derived sequences at their 3'-end and in nonfunctional PMS2CL pseudogene alleles that contain gene-derived sequences. Hence, the paralogues cannot be distinguished according to the reference sequence. This shortcoming can be effectively circumvented by using direct cDNA sequencing. This approach is based on the selective amplification of PMS2 transcripts in two overlapping 1.6-kb RT-PCR products. In addition to avoiding pseudogene co-amplification and allele dropout, this method has also the advantage that it allows to effectively identify deletions, splice mutations, and de novo retrotransposon insertions that escape the detection of most DNA-based mutation analysis protocols.

  12. An integrative variant analysis suite for whole exome next-generation sequencing data

    Directory of Open Access Journals (Sweden)

    Challis Danny

    2012-01-01

    Full Text Available Abstract Background Whole exome capture sequencing allows researchers to cost-effectively sequence the coding regions of the genome. Although the exome capture sequencing methods have become routine and well established, there is currently a lack of tools specialized for variant calling in this type of data. Results Using statistical models trained on validated whole-exome capture sequencing data, the Atlas2 Suite is an integrative variant analysis pipeline optimized for variant discovery on all three of the widely used next generation sequencing platforms (SOLiD, Illumina, and Roche 454. The suite employs logistic regression models in conjunction with user-adjustable cutoffs to accurately separate true SNPs and INDELs from sequencing and mapping errors with high sensitivity (96.7%. Conclusion We have implemented the Atlas2 Suite and applied it to 92 whole exome samples from the 1000 Genomes Project. The Atlas2 Suite is available for download at http://sourceforge.net/projects/atlas2/. In addition to a command line version, the suite has been integrated into the Genboree Workbench, allowing biomedical scientists with minimal informatics expertise to remotely call, view, and further analyze variants through a simple web interface. The existing genomic databases displayed via the Genboree browser also streamline the process from variant discovery to functional genomics analysis, resulting in an off-the-shelf toolkit for the broader community.

  13. Cloning and sequence analysis of cDNA coding for rat nucleolar protein C23

    International Nuclear Information System (INIS)

    Ghaffari, S.H.; Olson, M.O.J.

    1986-01-01

    Using synthetic oligonucleotides as primers and probes, the authors have isolated and sequenced cDNA clones encoding protein C23, a putative nucleolus organizer protein. Poly(A + ) RNA was isolated from rat Novikoff hepatoma cells and enriched in C23 mRNA by sucrose density gradient ultracentrifugation. Two deoxyoligonuleotides, a 48- and a 27-mer, were synthesized on the basis of amino acid sequence from the C-terminal half of protein C23 and cDNA sequence data from CHO cell protein. The 48-mer was used a primer for synthesis of cDNA which was then inserted into plasmid pUC9. Transformed bacterial colonies were screened by hybridization with 32 P labeled 27-mer. Two clones among 5000 gave a strong positive signal. Plasmid DNAs from these clones were purified and characterized by blotting and nucleotide sequence analysis. The length of C23 mRNA was estimated to be 3200 bases in a northern blot analysis. The sequence of a 267 b.p. insert shows high homology with the CHO cDNA with only 9 nucleotide differences and an identical amino acid sequence. These studies indicate that this region of the protein is highly conserved

  14. Illumina MiSeq Sequencing for Preliminary Analysis of Microbiome Causing Primary Endodontic Infections in Egypt

    Directory of Open Access Journals (Sweden)

    Sally Ali Tawfik

    2018-01-01

    Full Text Available The use of high throughput next generation technologies has allowed more comprehensive analysis than traditional Sanger sequencing. The specific aim of this study was to investigate the microbial diversity of primary endodontic infections using Illumina MiSeq sequencing platform in Egyptian patients. Samples were collected from 19 patients in Suez Canal University Hospital (Endodontic Department using sterile # 15K file and paper points. DNA was extracted using Mo Bio power soil DNA isolation extraction kit followed by PCR amplification and agarose gel electrophoresis. The microbiome was characterized on the basis of the V3 and V4 hypervariable region of the 16S rRNA gene by using paired-end sequencing on Illumina MiSeq device. MOTHUR software was used in sequence filtration and analysis of sequenced data. A total of 1858 operational taxonomic units at 97% similarity were assigned to 26 phyla, 245 families, and 705 genera. Four main phyla Firmicutes, Bacteroidetes, Proteobacteria, and Synergistetes were predominant in all samples. At genus level, Prevotella, Bacillus, Porphyromonas, Streptococcus, and Bacteroides were the most abundant. Illumina MiSeq platform sequencing can be used to investigate oral microbiome composition of endodontic infections. Elucidating the ecology of endodontic infections is a necessary step in developing effective intracanal antimicrobials.

  15. Comparative analysis of the prion protein gene sequences in African lion.

    Science.gov (United States)

    Wu, Chang-De; Pang, Wan-Yong; Zhao, De-Ming

    2006-10-01

    The prion protein gene of African lion (Panthera Leo) was first cloned and polymorphisms screened. The results suggest that the prion protein gene of eight African lions is highly homogenous. The amino acid sequences of the prion protein (PrP) of all samples tested were identical. Four single nucleotide polymorphisms (C42T, C81A, C420T, T600C) in the prion protein gene (Prnp) of African lion were found, but no amino acid substitutions. Sequence analysis showed that the higher homology is observed to felis catus AF003087 (96.7%) and to sheep number M31313.1 (96.2%) Genbank accessed. With respect to all the mammalian prion protein sequences compared, the African lion prion protein sequence has three amino acid substitutions. The homology might in turn affect the potential intermolecular interactions critical for cross species transmission of prion disease.

  16. Capillary electrophoresis fragment analysis and clone sequencing in detection of dynamic mutations of spinocerebellar ataxia

    Directory of Open Access Journals (Sweden)

    Yuan-yuan CHEN

    2018-04-01

    Full Text Available Objective To estimate the accuracy and stability of capillary electrophoresis fragment analysis and clone sequencing in detecting dynamic mutations of spinocerebellar ataxia (SCA. Methods Capillary electrophoresis fragment analysis and clone sequencing were used in detecting trinucleotide repeated sequence of 14 SCA patients (3 cases of SCA2, 2 cases of SCA7, 7 cases of SCA8 and 2 cases of SCA17. Results Capillary electrophoresis fragment analysis of 3 SCA2 cases showed the expanded cytosine-adenine-guanine (CAG repeats were 31, 30 and 32, and the copy numbers of 3 clone sequencing for 3 colonies in each case were 37/40/40, 37/38/39 and 38/39/40 respectively. Capillary electrophoresis fragment analysis of 2 SCA7 cases showed the expanded CAG repeats were 57 and 34, and the copy numbers of repeats were 69, 74, 75 in 3 colonies of one case, and was 45 in the other case. For the 7 SCA8 cases with the expanded cytosine-thymine-adenine (CTA/cytosine-thymine-guanine (CTG repeats of 99, 111, 104, 92, 89, 104 and 75, the results of clone sequencing were 97, 116, 104, 90, 90, 102 and 76 respectively. For 2 SCA17 cases with the short/expanded CAG repeats of 37/50 and 36/45, the results of clone sequencing were 51/50/52 and 45/44 for 3 and 2 colonies. Conclusions Although the higher mobility of polymerase chain reaction (PCR products containing dynamic mutation in the capillary electrophoresis fragment analysis might cause the deviation for analysis of copy numbers, the deviation was predictable and the results were repeatable. The clone sequencing results showed obvious instability, especially for SCA2 and SCA7 genes, which might owing to their simple CAG repeats. Consequently, clone sequencing is not suited for detection of dynamic mutation, not to mention the quantitative criteria of dynamic mutation sequencing. DOI: 10.3969/j.issn.1672-6731.2018.03.008

  17. Automatic knowledge extraction in sequencing analysis with multiagent system and grid computing.

    Science.gov (United States)

    González, Roberto; Zato, Carolina; Benito, Rocío; Bajo, Javier; Hernández, Jesús M; De Paz, Juan F; Vera, Vicente; Corchado, Juan M

    2012-12-01

    Advances in bioinformatics have contributed towards a significant increase in available information. Information analysis requires the use of distributed computing systems to best engage the process of data analysis. This study proposes a multiagent system that incorporates grid technology to facilitate distributed data analysis by dynamically incorporating the roles associated to each specific case study. The system was applied to genetic sequencing data to extract relevant information about insertions, deletions or polymorphisms.

  18. Automatic knowledge extraction in sequencing analysis with multiagent system and grid computing

    Directory of Open Access Journals (Sweden)

    González Roberto

    2012-12-01

    Full Text Available Advances in bioinformatics have contributed towards a significant increase in available information. Information analysis requires the use of distributed computing systems to best engage the process of data analysis. This study proposes a multiagent system that incorporates grid technology to facilitate distributed data analysis by dynamically incorporating the roles associated to each specific case study. The system was applied to genetic sequencing data to extract relevant information about insertions, deletions or polymorphisms.

  19. Characterization of Liaoning cashmere goat transcriptome: sequencing, de novo assembly, functional annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Hongliang Liu

    Full Text Available Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology.Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51% unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17% unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes.The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses.

  20. Genomic sequence around butterfly wing development genes: annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Inês C Conceição

    Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high

  1. Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida

    Science.gov (United States)

    Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping

    2007-01-01

    Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730

  2. Building a multigenic model of breast cancer susceptibility: CYP17 and HSD17B1 are two important candidates.

    Science.gov (United States)

    Feigelson, H S; McKean-Cowdin, R; Coetzee, G A; Stram, D O; Kolonel, L N; Henderson, B E

    2001-01-15

    We conducted a nested case-control study to evaluate whether polymorphisms in two genes involved in estrogen metabolism, CYP17 and HSD17B1, were useful in developing a breast cancer risk model that could help discriminate women who are at higher risk of breast cancer. If polymorphisms in these genes affect the level of circulating estrogens, they may directly influence breast cancer risk. The base population for this study is a multiethnic cohort study that includes African-American, Non-Latina White, Japanese, Latina, and Native Hawaiian women. For this analysis, 1508 randomly selected controls and 850 incident breast cancer cases of the first four ethnic groups who agreed to provide a blood specimen were included (76 and 80% response rates, respectively). The CYP17 A2 allele and the HSD17B1 A allele were considered "high-risk" alleles. Subjects were then classified according to number of high-risk alleles. After adjusting for age, weight, and ethnicity, we found that carrying one or more high-risk alleles increases the risk of advanced breast cancer in a dose-response fashion. The risk among women carrying four high-risk alleles was 2.21 [95% confidence interval (CI), 0.98-5.00; P for trend = 0.03] compared with those who carried none. This risk was largely limited to women who were not taking hormone replacement therapy (relative risk, 2.60; 95% CI, 0.95-7.14) and was most pronounced among those weighing 170 pounds or less (RR, 3.05; 95% CI, 1.29-7.25). These findings suggest that breast cancer risk has a strong genetic component and supports the theory that the underlying mechanism of "complex traits" can be understood using a multigenic model of candidate genes.

  3. Complete motif analysis of sequence requirements for translation initiation at non-AUG start codons.

    Science.gov (United States)

    Diaz de Arce, Alexander J; Noderer, William L; Wang, Clifford L

    2018-01-25

    The initiation of mRNA translation from start codons other than AUG was previously believed to be rare and of relatively low impact. More recently, evidence has suggested that as much as half of all translation initiation utilizes non-AUG start codons, codons that deviate from AUG by a single base. Furthermore, non-AUG start codons have been shown to be involved in regulation of expression and disease etiology. Yet the ability to gauge expression based on the sequence of a translation initiation site (start codon and its flanking bases) has been limited. Here we have performed a comprehensive analysis of translation initiation sites that utilize non-AUG start codons. By combining genetic-reporter, cell-sorting, and high-throughput sequencing technologies, we have analyzed the expression associated with all possible variants of the -4 to +4 positions of non-AUG translation initiation site motifs. This complete motif analysis revealed that 1) with the right sequence context, certain non-AUG start codons can generate expression comparable to that of AUG start codons, 2) sequence context affects each non-AUG start codon differently, and 3) initiation at non-AUG start codons is highly sensitive to changes in the flanking sequences. Complete motif analysis has the potential to be a key tool for experimental and diagnostic genomics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. A strategic stakeholder approach for addressing further analysis requests in whole genome sequencing research.

    Science.gov (United States)

    Thornock, Bradley Steven O

    2016-01-01

    Whole genome sequencing (WGS) can be a cost-effective and efficient means of diagnosis for some children, but it also raises a number of ethical concerns. One such concern is how researchers derive and communicate results from WGS, including future requests for further analysis of stored sequences. The purpose of this paper is to think about what is at stake, and for whom, in any solution that is developed to deal with such requests. To accomplish this task, this paper will utilize stakeholder theory, a common method used in business ethics. Several scenarios that connect stakeholder concerns and WGS will also posited and analyzed. This paper concludes by developing criteria composed of a series of questions that researchers can answer in order to more effectively address requests for further analysis of stored sequences.

  5. Antibody-based screening for hereditary nonpolyposis colorectal carcinoma compared with microsatellite analysis and sequencing

    DEFF Research Database (Denmark)

    Christensen, Mariann; Katballe, Niels; Wikman, Friedrik

    2002-01-01

    BACKGROUND: Germline mutations in the DNA mismatch repair genes, MSH2, MLH1, and others are associated with hereditary nonpolyposis colorectal cancer (HNPCC). Due to the high costs of sequencing, cheaper screening methods are needed to identify HNPCC cases. Ideally, these methods should have a high...... carcinoma of whom 11 met the Amsterdam criteria and 31 were suspected to belong to HNPCC families. Thirty-five patients were examined by microsatellite analysis, 40 by immunohistochemical staining, and in 31 patients both the MLH1 and MSH2 genes were sequenced. RESULTS: Ninety-two percent of patients...... the three methods was found in 74 % of the tumors. CONCLUSIONS: The authors suggest that immunohistochemistry should be used in combination with microsatellite analysis to prescreen suspected HNPCC patients for the selection of cases where sequencing of the MLH1 and MSH2 mismatch repair genes is indicated....

  6. Analysis of loss of decay heat removal sequences at Browns Ferry Unit One: Chapter 17

    International Nuclear Information System (INIS)

    Harrington, R.M.

    1983-01-01

    This paper summarizes the Oak Ridge National Laboratory (ORNL) report ''Loss of DHR Sequences at Browns Ferry Unit One - Accident Sequence Analysis'' (NUREG/CR-2973). The Loss of DHR investigation is the third in a series of accident studies concerning the BWR 4 - MK I containment plant design. These studies, sponsored by the Nuclear Regulatory Commission Severe Accident Sequence Analysis (SASA) program, have been conducted at ORNL with the full cooperation of the Tennessee Valley Authority (TVA), using Unit One of the Browns Ferry Nuclear Plant as the model design. Each unit of this three-unit plant has a maximum authorized power of 3293 MW(t) or 1067 net MW(e). The primary containments are of the Mark I pressure suppression pool type and the three units share a secondary containment of the controlled leakage, elevated release design. Each unit occupies a separate reactor building located in one structure underneath the common refueling floor

  7. Analysis of quality raw data of second generation sequencers with Quality Assessment Software.

    Science.gov (United States)

    Ramos, Rommel Tj; Carneiro, Adriana R; Baumbach, Jan; Azevedo, Vasco; Schneider, Maria Pc; Silva, Artur

    2011-04-18

    Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage. Independent of the program chosen for the construction process, DNA sequences are superimposed, based on identity, to extend the reads, generating contigs; mismatches indicate a lack of homology and are not included. This process improves our confidence in the sequences that are generated. We developed Quality Assessment Software, with which one can review graphs showing the distribution of quality values from the sequencing reads. This software allow us to adopt more stringent quality standards for sequence data, based on quality-graph analysis and estimated coverage after applying the quality filter, providing acceptable sequence coverage for genome construction from short reads. Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies. Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

  8. Hunting down frame shifts: Ecological analysis of diverse functional gene sequences

    Directory of Open Access Journals (Sweden)

    Michal eStrejcek

    2015-11-01

    Full Text Available Functional gene ecological analyses using amplicon sequencing can be challenging as translated sequences are often burdened with shifted reading frames. The aim of this work was to evaluate several bioinformatics tools designed to correct errors which arise during sequencing in an effort to reduce the number of frame-shifts (FS. Genes encoding for alpha subunits of biphenyl (bphA and benzoate (benA dioxygenases were used as model sequences. FrameBot, a FS correction tool, was able to reduce the number of detected FS to zero. However, up to 43.1% of sequences were discarded by FrameBot as non-specific targets. Therefore, we proposed a de novo mode of FrameBot for FS correction, which works on a similar basis as common chimera identifying platforms and is not dependent on reference sequences. By nature of FrameBot de novo design, it is crucial to provide it with data as error free as possible. We tested the ability of several publicly available correction tools to decrease the number of errors in the data sets. The combination of Maximum Expected Error (MEE filtering and single linkage pre-clustering (SLP proved the most efficient read procession. Applying FrameBot de novo on the processed data enabled analysis of BphA sequences with minimal losses of potentially functional sequences not homologous to those previously known. This experiment also demonstrated the extensive diversity of dioxygenases in soil. A script which performs FrameBot de novo is presented in the supplementary material to the study and the tool was implemented into FunGene Pipeline available at http://fungene.cme.msu.edu/FunGenePipeline/ and https://github.com/rdpstaff/Framebot.

  9. Estimation of a Killer Whale (Orcinus orca Population's Diet Using Sequencing Analysis of DNA from Feces.

    Directory of Open Access Journals (Sweden)

    Michael J Ford

    Full Text Available Estimating diet composition is important for understanding interactions between predators and prey and thus illuminating ecosystem function. The diet of many species, however, is difficult to observe directly. Genetic analysis of fecal material collected in the field is therefore a useful tool for gaining insight into wild animal diets. In this study, we used high-throughput DNA sequencing to quantitatively estimate the diet composition of an endangered population of wild killer whales (Orcinus orca in their summer range in the Salish Sea. We combined 175 fecal samples collected between May and September from five years between 2006 and 2011 into 13 sample groups. Two known DNA composition control groups were also created. Each group was sequenced at a ~330bp segment of the 16s gene in the mitochondrial genome using an Illumina MiSeq sequencing system. After several quality controls steps, 4,987,107 individual sequences were aligned to a custom sequence database containing 19 potential fish prey species and the most likely species of each fecal-derived sequence was determined. Based on these alignments, salmonids made up >98.6% of the total sequences and thus of the inferred diet. Of the six salmonid species, Chinook salmon made up 79.5% of the sequences, followed by coho salmon (15%. Over all years, a clear pattern emerged with Chinook salmon dominating the estimated diet early in the summer, and coho salmon contributing an average of >40% of the diet in late summer. Sockeye salmon appeared to be occasionally important, at >18% in some sample groups. Non-salmonids were rarely observed. Our results are consistent with earlier results based on surface prey remains, and confirm the importance of Chinook salmon in this population's summer diet.

  10. ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

    Science.gov (United States)

    Meiler, Arno; Klinger, Claudia; Kaufmann, Michael

    2012-09-08

    The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.

  11. ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

    Directory of Open Access Journals (Sweden)

    Meiler Arno

    2012-09-01

    Full Text Available Abstract Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.

  12. ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

    Science.gov (United States)

    2012-01-01

    Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836

  13. Multifractal analysis of 2001 Mw 7 . 7 Bhuj earthquake sequence in Gujarat, Western India

    Science.gov (United States)

    Aggarwal, Sandeep Kumar; Pastén, Denisse; Khan, Prosanta Kumar

    2017-12-01

    The 2001 Mw 7 . 7 Bhuj mainshock seismic sequence in the Kachchh area, occurring during 2001 to 2012, has been analyzed using mono-fractal and multi-fractal dimension spectrum analysis technique. This region was characterized by frequent moderate shocks of Mw ≥ 5 . 0 for more than a decade since the occurrence of 2001 Bhuj earthquake. The present study is therefore important for precursory analysis using this sequence. The selected long-sequence has been investigated first time for completeness magnitude Mc 3.0 using the maximum curvature method. Multi-fractal Dq spectrum (Dq ∼ q) analysis was carried out using effective window-length of 200 earthquakes with a moving window of 20 events overlapped by 180 events. The robustness of the analysis has been tested by considering the magnitude completeness correction term of 0.2 to Mc 3.0 as Mc 3.2 and we have tested the error in the calculus of Dq for each magnitude threshold. On the other hand, the stability of the analysis has been investigated down to the minimum magnitude of Mw ≥ 2 . 6 in the sequence. The analysis shows the multi-fractal dimension spectrum Dq decreases with increasing of clustering of events with time before a moderate magnitude earthquake in the sequence, which alternatively accounts for non-randomness in the spatial distribution of epicenters and its self-organized criticality. Similar behavior is ubiquitous elsewhere around the globe, and warns for proximity of a damaging seismic event in an area. OS: Please confirm math roman or italics in abs.

  14. CPSS: a computational platform for the analysis of small RNA deep sequencing data.

    Science.gov (United States)

    Zhang, Yuanwei; Xu, Bo; Yang, Yifan; Ban, Rongjun; Zhang, Huan; Jiang, Xiaohua; Cooke, Howard J; Xue, Yu; Shi, Qinghua

    2012-07-15

    Next generation sequencing (NGS) techniques have been widely used to document the small ribonucleic acids (RNAs) implicated in a variety of biological, physiological and pathological processes. An integrated computational tool is needed for handling and analysing the enormous datasets from small RNA deep sequencing approach. Herein, we present a novel web server, CPSS (a computational platform for the analysis of small RNA deep sequencing data), designed to completely annotate and functionally analyse microRNAs (miRNAs) from NGS data on one platform with a single data submission. Small RNA NGS data can be submitted to this server with analysis results being returned in two parts: (i) annotation analysis, which provides the most comprehensive analysis for small RNA transcriptome, including length distribution and genome mapping of sequencing reads, small RNA quantification, prediction of novel miRNAs, identification of differentially expressed miRNAs, piwi-interacting RNAs and other non-coding small RNAs between paired samples and detection of miRNA editing and modifications and (ii) functional analysis, including prediction of miRNA targeted genes by multiple tools, enrichment of gene ontology terms, signalling pathway involvement and protein-protein interaction analysis for the predicted genes. CPSS, a ready-to-use web server that integrates most functions of currently available bioinformatics tools, provides all the information wanted by the majority of users from small RNA deep sequencing datasets. CPSS is implemented in PHP/PERL+MySQL+R and can be freely accessed at http://mcg.ustc.edu.cn/db/cpss/index.html or http://mcg.ustc.edu.cn/sdap1/cpss/index.html.

  15. Estimation of physiological parameters using knowledge-based factor analysis of dynamic nuclear medicine image sequences

    International Nuclear Information System (INIS)

    Yap, J.T.; Chen, C.T.; Cooper, M.

    1995-01-01

    The authors have previously developed a knowledge-based method of factor analysis to analyze dynamic nuclear medicine image sequences. In this paper, the authors analyze dynamic PET cerebral glucose metabolism and neuroreceptor binding studies. These methods have shown the ability to reduce the dimensionality of the data, enhance the image quality of the sequence, and generate meaningful functional images and their corresponding physiological time functions. The new information produced by the factor analysis has now been used to improve the estimation of various physiological parameters. A principal component analysis (PCA) is first performed to identify statistically significant temporal variations and remove the uncorrelated variations (noise) due to Poisson counting statistics. The statistically significant principal components are then used to reconstruct a noise-reduced image sequence as well as provide an initial solution for the factor analysis. Prior knowledge such as the compartmental models or the requirement of positivity and simple structure can be used to constrain the analysis. These constraints are used to rotate the factors to the most physically and physiologically realistic solution. The final result is a small number of time functions (factors) representing the underlying physiological processes and their associated weighting images representing the spatial localization of these functions. Estimation of physiological parameters can then be performed using the noise-reduced image sequence generated from the statistically significant PCs and/or the final factor images and time functions. These results are compared to the parameter estimation using standard methods and the original raw image sequences. Graphical analysis was performed at the pixel level to generate comparable parametric images of the slope and intercept (influx constant and distribution volume)

  16. Context based computational analysis and characterization of ARS consensus sequences (ACS of Saccharomyces cerevisiae genome

    Directory of Open Access Journals (Sweden)

    Vinod Kumar Singh

    2016-09-01

    Full Text Available Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS requires an essential consensus sequence (ACS for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC denoted as ORC-ACS and non-replicating ACS sequences (nrACS, that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.

  17. Sequence analysis of the N-acetyltransferase 2 gene (NAT2) among ...

    African Journals Online (AJOL)

    Yazun Bashir Jarrar

    2017-11-26

    Nov 26, 2017 ... Sequence analysis of the N-acetyltransferase 2 gene (NAT2) among Jordanian volunteers. Yazun Bashir Jarrar, Ayat Ahmed Balasmeh and Wassan Jarrar. Department of Pharmacy, College of Pharmacy, AlZaytoonah University of Jordan, Amman, Jordan. ABSTRACT. The present study aimed to identify ...

  18. Molecular cloning and sequence analysis of VP6 gene of giant ...

    African Journals Online (AJOL)

    Jane

    2011-10-24

    Oct 24, 2011 ... G), and the major structural protein of inner capsid particles (ICP), and also specific antigen of mucosa immunization that mediate specific immunological reaction. In this report, sequence analysis of VP6 gene of giant panda rotavirus was carried out. Full-length VP6 gene encoding for ICP of giant panda.

  19. Decreasing Sports Activity with Increasing Age? Findings from a 20-Year Longitudinal and Cohort Sequence Analysis

    Science.gov (United States)

    Breuer, Christoph; Wicker, Pamela

    2009-01-01

    According to cross-sectional studies in sport science literature, decreasing sports activity with increasing age is generally assumed. In this paper, the validity of this assumption is checked by applying more effective methods of analysis, such as longitudinal and cohort sequence analyses. With the help of 20 years' worth of data records from the…

  20. Multilocus Sequence Analysis for Typing Leptospira interrogans and Leptospira kirschneri▿ †

    Science.gov (United States)

    Leon, Albertine; Pronost, Stéphane; Fortier, Guillaume; Andre-Fontaine, Geneviève; Leclercq, Roland

    2010-01-01

    Fifty-three strains belonging to the pathogenic species Leptospira interrogans and Leptospira kirschneri were analyzed by multilocus sequence analysis. The species formed two distinct branches. In the L. interrogans branch, the phylogenetic tree clustered the strains into three subgroups. Genogroups and serogroups were superimposed but not strictly. PMID:19955271

  1. Multilocus Sequence Analysis for Typing Leptospira interrogans and Leptospira kirschneri▿ †

    OpenAIRE

    Leon, Albertine; Pronost, Stéphane; Fortier, Guillaume; Andre-Fontaine, Geneviève; Leclercq, Roland

    2009-01-01

    Fifty-three strains belonging to the pathogenic species Leptospira interrogans and Leptospira kirschneri were analyzed by multilocus sequence analysis. The species formed two distinct branches. In the L. interrogans branch, the phylogenetic tree clustered the strains into three subgroups. Genogroups and serogroups were superimposed but not strictly.

  2. Formative Research on the Simplifying Conditions Method (SCM) for Task Analysis and Sequencing.

    Science.gov (United States)

    Kim, YoungHwan; Reigluth, Charles M.

    The Simplifying Conditions Method (SCM) is a set of guidelines for task analysis and sequencing of instructional content under the Elaboration Theory (ET). This article introduces the fundamentals of SCM and presents the findings from a formative research study on SCM. It was conducted in two distinct phases: design and instruction. In the first…

  3. Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis.

    Science.gov (United States)

    Guo, Yan; Dai, Yulin; Yu, Hui; Zhao, Shilin; Samuels, David C; Shyr, Yu

    2017-03-01

    Analyses of high throughput sequencing data starts with alignment against a reference genome, which is the foundation for all re-sequencing data analyses. Each new release of the human reference genome has been augmented with improved accuracy and completeness. It is presumed that the latest release of human reference genome, GRCh38 will contribute more to high throughput sequencing data analysis by providing more accuracy. But the amount of improvement has not yet been quantified. We conducted a study to compare the genomic analysis results between the GRCh38 reference and its predecessor GRCh37. Through analyses of alignment, single nucleotide polymorphisms, small insertion/deletions, copy number and structural variants, we show that GRCh38 offers overall more accurate analysis of human sequencing data. More importantly, GRCh38 produced fewer false positive structural variants. In conclusion, GRCh38 is an improvement over GRCh37 not only from the genome assembly aspect, but also yields more reliable genomic analysis results. Copyright © 2017. Published by Elsevier Inc.

  4. Cloning and sequence analysis of putative type II fatty acid synthase ...

    Indian Academy of Sciences (India)

    Prakash

    Cloning and sequence analysis of putative type II fatty acid synthase genes from Arachis hypogaea L. ... acyl carrier protein (ACP), malonyl-CoA:ACP transacylase, β-ketoacyl-ACP .... Helix II plays a dominant role in the interaction ... main distinguishing features of plant ACPs in plastids and ..... synthase component; J. Biol.

  5. Sequence analysis of putative swrW gene required for surfactant ...

    African Journals Online (AJOL)

    owner

    2012-07-17

    Jul 17, 2012 ... These nucleotide and protein sequence analysis of the putative swrW gene provides vital information on the versatility .... chain reaction (PCR) products were stored at 4°C. Presence of ... identical to the same gene with an E-value of 0.0. .... The Prokaryotes-A Handbook on the Biol. of Bacteria:Ecophysiol.

  6. sRNAnalyzer-a flexible and customizable small RNA sequencing data analysis pipeline.

    Science.gov (United States)

    Wu, Xiaogang; Kim, Taek-Kyun; Baxter, David; Scherler, Kelsey; Gordon, Aaron; Fong, Olivia; Etheridge, Alton; Galas, David J; Wang, Kai

    2017-12-01

    Although many tools have been developed to analyze small RNA sequencing (sRNA-Seq) data, it remains challenging to accurately analyze the small RNA population, mainly due to multiple sequence ID assignment caused by short read length. Additional issues in small RNA analysis include low consistency of microRNA (miRNA) measurement results across different platforms, miRNA mapping associated with miRNA sequence variation (isomiR) and RNA editing, and the origin of those unmapped reads after screening against all endogenous reference sequence databases. To address these issues, we built a comprehensive and customizable sRNA-Seq data analysis pipeline-sRNAnalyzer, which enables: (i) comprehensive miRNA profiling strategies to better handle isomiRs and summarization based on each nucleotide position to detect potential SNPs in miRNAs, (ii) different sequence mapping result assignment approaches to simulate results from microarray/qRT-PCR platforms and a local probabilistic model to assign mapping results to the most-likely IDs, (iii) comprehensive ribosomal RNA filtering for accurate mapping of exogenous RNAs and summarization based on taxonomy annotation. We evaluated our pipeline on both artificial samples (including synthetic miRNA and Escherichia coli cultures) and biological samples (human tissue and plasma). sRNAnalyzer is implemented in Perl and available at: http://srnanalyzer.systemsbiology.net/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. sRNAnalyzer—a flexible and customizable small RNA sequencing data analysis pipeline

    Science.gov (United States)

    Kim, Taek-Kyun; Baxter, David; Scherler, Kelsey; Gordon, Aaron; Fong, Olivia; Etheridge, Alton; Galas, David J.

    2017-01-01

    Abstract Although many tools have been developed to analyze small RNA sequencing (sRNA-Seq) data, it remains challenging to accurately analyze the small RNA population, mainly due to multiple sequence ID assignment caused by short read length. Additional issues in small RNA analysis include low consistency of microRNA (miRNA) measurement results across different platforms, miRNA mapping associated with miRNA sequence variation (isomiR) and RNA editing, and the origin of those unmapped reads after screening against all endogenous reference sequence databases. To address these issues, we built a comprehensive and customizable sRNA-Seq data analysis pipeline—sRNAnalyzer, which enables: (i) comprehensive miRNA profiling strategies to better handle isomiRs and summarization based on each nucleotide position to detect potential SNPs in miRNAs, (ii) different sequence mapping result assignment approaches to simulate results from microarray/qRT-PCR platforms and a local probabilistic model to assign mapping results to the most-likely IDs, (iii) comprehensive ribosomal RNA filtering for accurate mapping of exogenous RNAs and summarization based on taxonomy annotation. We evaluated our pipeline on both artificial samples (including synthetic miRNA and Escherichia coli cultures) and biological samples (human tissue and plasma). sRNAnalyzer is implemented in Perl and available at: http://srnanalyzer.systemsbiology.net/. PMID:29069500

  8. Masking as an effective quality control method for next-generation sequencing data analysis.

    Science.gov (United States)

    Yun, Sajung; Yun, Sijung

    2014-12-13

    Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with 'N's (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).

  9. RNA-Seq analysis and gene discovery of Andrias davidianus using Illumina short read sequencing.

    Directory of Open Access Journals (Sweden)

    Fenggang Li

    Full Text Available The Chinese giant salamander, Andrias davidianus, is an important species in the course of evolution; however, there is insufficient genomic data in public databases for understanding its immunologic mechanisms. High-throughput transcriptome sequencing is necessary to generate an enormous number of transcript sequences from A. davidianus for gene discovery. In this study, we generated more than 40 million reads from samples of spleen and skin tissue using the Illumina paired-end sequencing technology. De novo assembly yielded 87,297 transcripts with a mean length of 734 base pairs (bp. Based on the sequence similarities, searching with known proteins, 38,916 genes were identified. Gene enrichment analysis determined that 981 transcripts were assigned to the immune system. Tissue-specific expression analysis indicated that 443 of transcripts were specifically expressed in the spleen and skin. Among these transcripts, 147 transcripts were found to be involved in immune responses and inflammatory reactions, such as fucolectin, β-defensins and lymphotoxin beta. Eight tissue-specific genes were selected for validation using real time reverse transcription quantitative PCR (qRT-PCR. The results showed that these genes were significantly more expressed in spleen and skin than in other tissues, suggesting that these genes have vital roles in the immune response. This work provides a comprehensive genomic sequence resource for A. davidianus and lays the foundation for future research on the immunologic and disease resistance mechanisms of A. davidianus and other amphibians.

  10. EFL LEARNERS REPAIR SEQUENCE TYPES ANALYSIS AS PEER- ASSESSMENT IN ORAL PERFORMANCE

    Directory of Open Access Journals (Sweden)

    Novia Trisanti

    2017-04-01

    Full Text Available There are certain concerns that EFL teacher needs to observe in assessing students oral performance, such as the amount of words which the learners utter, the grammatical errors that they make, the hesitation and certain expression that they produce. This paper attempts to give overview of research results using qualitative method which show the impacts of repair sequence types analysis on those elements needed to be observed as students peer and self-assessment to enhance their speaking ability. The subject was tertiary level learners of English Department, State University of Semarang, Indonesia in 2012. Concerning the repair types, there are four repair sequences as reviewed by Buckwalter (2001, they are Self-Initiated Self Repair (SISR, Self-Initiated Other Repair (SIOR, Other-Initiated Self Repair (OISR, and Other-Initiated Other Repair (OIOR. Having the repair sequences types anaysis, the students investigated the repair sequence of their peers while they performed in class conversation. The modified peer- assessment guideline as proposed by Brown (2004 was used in identifying, categorizing and classifying the types of repair sequences in their peers oral performance. While, the peer-assessment can be a valuable additional means to improve students speaking since it is one of the motives that drive peer- evaluation, along with peer- verification, also peer and self- enhancement. The analysis results were then interpreted to see whether there was significant finding related to the students’ oral performance enhancement.

  11. RNA2 of grapevine fanleaf virus: sequence analysis and coat protein cistron location.

    Science.gov (United States)

    Serghini, M A; Fuchs, M; Pinck, M; Reinbolt, J; Walter, B; Pinck, L

    1990-07-01

    The nucleotide sequence of the genomic RNA2 (3774 nucleotides) of grapevine fanleaf virus strain F13 was determined from overlapping cDNA clones and its genetic organization was deduced. Two rapid and efficient methods were used for cDNA cloning of the 5' region of RNA2. The complete sequence contained only one long open reading frame of 3555 nucleotides (1184 codons, 131K product). The analysis of the N-terminal sequence of purified coat protein (CP) and identification of its C-terminal residue have allowed the CP cistron to be precisely positioned within the polyprotein. The CP produced by proteolytic cleavage at the Arg/Gly site between residues 680 and 681 contains 504 amino acids (Mr 56019) and has hydrophobic properties. The Arg/Gly cleavage site deduced by N-terminal amino acid sequence analysis is the first for a nepovirus coat protein and for plant viruses expressing their genomic RNAs by polyprotein synthesis. Comparison of GFLV RNA2 with M RNA of cowpea mosaic comovirus and with RNA2 of two closely related nepoviruses, tomato black ring virus and Hungarian grapevine chrome mosaic virus, showed strong similarities among the 3' non-coding regions but less similarity among the 5' end non-coding sequences than reported among other nepovirus RNAs.

  12. Genomic localization, sequence analysis, and transcription of the putative human cytomegalovirus DNA polymerase gene

    International Nuclear Information System (INIS)

    Heilbronn, T.; Jahn, G.; Buerkle, A.; Freese, U.K.; Fleckenstein, B.; Zur Hausen, H.

    1987-01-01

    The human cytomegalovirus (HCMV)-induced DNA polymerase has been well characterized biochemically and functionally, but its genomic location has not yet been assigned. To identify the coding sequence, cross-hybridization with the herpes simplex virus type 1 (HSV-1) polymerase gene was used, as suggested by the close similarity of the herpes group virus-induced DNA polymerases to the HCMV DNA polymerase. A cosmid and plasmid library of the entire HCMV genome was screened with the BamHI Q fragment of HSF-1 at different stringency conditions. One PstI-HincII restriction fragment of 850 base pairs mapping within the EcoRI M fragment of HCMV cross-hybridized at T/sub m/ - 25/degrees/C. Sequence analysis revealed one open reading frame spanning the entire sequence. The amino acid sequence showed a highly conserved domain of 133 amino acids shared with the HSV and putative Esptein-Barr virus polymerase sequences. This domain maps within the C-terminal part of the HSV polymerase gene, which has been suggested to contain part of the catalytic center of the enzyme. Transcription analysis revealed one 5.4-kilobase early transcript in the sense orientation with respect to the open reading frame identified. This transcript appears to code for the 140-kilodalton HCMV polymerase protein

  13. Core genome conservation of Staphylococcus haemolyticus limits sequence based population structure analysis.

    Science.gov (United States)

    Cavanagh, Jorunn Pauline; Klingenberg, Claus; Hanssen, Anne-Merethe; Fredheim, Elizabeth Aarag; Francois, Patrice; Schrenzel, Jacques; Flægstad, Trond; Sollid, Johanna Ericson

    2012-06-01

    The notoriously multi-resistant Staphylococcus haemolyticus is an emerging pathogen causing serious infections in immunocompromised patients. Defining the population structure is important to detect outbreaks and spread of antimicrobial resistant clones. Currently, the standard typing technique is pulsed-field gel electrophoresis (PFGE). In this study we describe novel molecular typing schemes for S. haemolyticus using multi locus sequence typing (MLST) and multi locus variable number of tandem repeats (VNTR) analysis. Seven housekeeping genes (MLST) and five VNTR loci (MLVF) were selected for the novel typing schemes. A panel of 45 human and veterinary S. haemolyticus isolates was investigated. The collection had diverse PFGE patterns (38 PFGE types) and was sampled over a 20 year-period from eight countries. MLST resolved 17 sequence types (Simpsons index of diversity [SID]=0.877) and MLVF resolved 14 repeat types (SID=0.831). We found a low sequence diversity. Phylogenetic analysis clustered the isolates in three (MLST) and one (MLVF) clonal complexes, respectively. Taken together, neither the MLST nor the MLVF scheme was suitable to resolve the population structure of this S. haemolyticus collection. Future MLVF and MLST schemes will benefit from addition of more variable core genome sequences identified by comparing different fully sequenced S. haemolyticus genomes. Copyright © 2012 Elsevier B.V. All rights reserved.

  14. Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx

    Directory of Open Access Journals (Sweden)

    Colbourne John K

    2009-05-01

    Full Text Available Abstract Background New methods are needed for genomic-scale analysis of emerging model organisms that exemplify important biological questions but lack fully sequenced genomes. For example, there is an urgent need to understand the potential for corals to adapt to climate change, but few molecular resources are available for studying these processes in reef-building corals. To facilitate genomics studies in corals and other non-model systems, we describe methods for transcriptome sequencing using 454, as well as strategies for assembling a useful catalog of genes from the output. We have applied these methods to sequence the transcriptome of planulae larvae from the coral Acropora millepora. Results More than 600,000 reads produced in a single 454 sequencing run were assembled into ~40,000 contigs with five-fold average sequencing coverage. Based on sequence similarity with known proteins, these analyses identified ~11,000 different genes expressed in a range of conditions including thermal stress and settlement induction. Assembled sequences were annotated with gene names, conserved domains, and Gene Ontology terms. Targeted searches using these annotations identified the majority of genes associated with essential metabolic pathways and conserved signaling pathways, as well as novel candidate genes for stress-related processes. Comparisons with the genome of the anemone Nematostella vectensis revealed ~8,500 pairs of orthologs and ~100 candidate coral-specific genes. More than 30,000 SNPs were detected in the coral sequences, and a subset of these validated by re-sequencing. Conclusion The methods described here for deep sequencing of the transcriptome should be widely applicable to generate catalogs of genes and genetic markers in emerging model organisms. Our data provide the most comprehensive sequence resource currently available for reef-building corals, and include an extensive collection of potential genetic markers for association and

  15. Whole genome sequence phylogenetic analysis of four Mexican rabies viruses isolated from cattle.

    Science.gov (United States)

    Bárcenas-Reyes, I; Loza-Rubio, E; Cantó-Alarcón, G J; Luna-Cozar, J; Enríquez-Vázquez, A; Barrón-Rodríguez, R J; Milián-Suazo, F

    2017-08-01

    Phylogenetic analysis of the rabies virus in molecular epidemiology has been traditionally performed on partial sequences of the genome, such as the N, G, and P genes; however, that approach raises concerns about the discriminatory power compared to whole genome sequencing. In this study we characterized four strains of the rabies virus isolated from cattle in Querétaro, Mexico by comparing the whole genome sequence to that of strains from the American, European and Asian continents. Four cattle brain samples positive to rabies and characterized as AgV11, genotype 1, were used in the study. A cDNA sequence was generated by reverse transcription PCR (RT-PCR) using oligo dT. cDNA samples were sequenced in an Illumina NextSeq 500 platform. The phylogenetic analysis was performed with MEGA 6.0. Minimum evolution phylogenetic trees were constructed with the Neighbor-Joining method and bootstrapped with 1000 replicates. Three large and seven small clusters were formed with the 26 sequences used. The largest cluster grouped strains from different species in South America: Brazil, and the French Guyana. The second cluster grouped five strains from Mexico. A Mexican strain reported in a different study was highly related to our four strains, suggesting common source of infection. The phylogenetic analysis shows that the type of host is different for the different regions in the American Continent; rabies is more related to bats. It was concluded that the rabies virus in central Mexico is genetically stable and that it is transmitted by the vampire bat Desmodus rotundus. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. System-level hazard analysis using the sequence-tree method

    International Nuclear Information System (INIS)

    Huang, H.-W.; Shih Chunkuan; Yih Swu; Chen, M.-H.

    2008-01-01

    A system-level PHA using the sequence-tree method is presented to perform safety-related digital I and C system SSA. The conventional PHA involves brainstorming among experts on various portions of the system to identify hazards through discussions. However, since the conventional PHA is not a systematic technique, the analysis results depend strongly on the experts' subjective opinions. The quality of analysis cannot be appropriately controlled. Therefore, this study presents a system-level sequence tree based PHA, which can clarify the relationship among the major digital I and C systems. This sequence-tree-based technique has two major phases. The first phase adopts a table to analyze each event in SAR Chapter 15 for a specific safety-related I and C system, such as RPS. The second phase adopts a sequence tree to recognize the I and C systems involved in the event, the working of the safety-related systems and how the backup systems can be activated to mitigate the consequence if the primary safety systems fail. The defense-in-depth echelons, namely the Control echelon, Reactor trip echelon, ESFAS echelon and Monitoring and indicator echelon, are arranged to build the sequence-tree structure. All the related I and C systems, including the digital systems and the analog back-up systems, are allocated in their specific echelons. This system-centric sequence-tree analysis not only systematically identifies preliminary hazards, but also vulnerabilities in a nuclear power plant. Hence, an effective simplified D3 evaluation can also be conducted

  17. Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing

    OpenAIRE

    Manske, Magnus; Miotto, Olivo; Campino, Susana; Auburn, Sarah; Almagro-Garcia, Jacob; Maslen, Gareth; O?Brien, Jack; Djimde, Abdoulaye; Doumbo, Ogobara; Zongo, Issaka; Ouedraogo, Jean-Bosco; Michon, Pascal; Mueller, Ivo; Siba, Peter; Nzila, Alexis

    2012-01-01

    : Malaria elimination strategies require surveillance of the parasite population for genetic changes that demand a public health response, such as new forms of drug resistance. Here we describe methods for the large-scale analysis of genetic variation in Plasmodium falciparum by deep sequencing of parasite DNA obtained from the blood of patients with malaria, either directly or after short-term culture. Analysis of 86,158 exonic single nucleotide polymorphisms that passed genotyping quality c...

  18. HPV-QUEST: A highly customized system for automated HPV sequence analysis capable of processing Next Generation sequencing data set.

    Science.gov (United States)

    Yin, Li; Yao, Jiqiang; Gardner, Brent P; Chang, Kaifen; Yu, Fahong; Goodenow, Maureen M

    2012-01-01

    Next Generation sequencing (NGS) applied to human papilloma viruses (HPV) can provide sensitive methods to investigate the molecular epidemiology of multiple type HPV infection. Currently a genotyping system with a comprehensive collection of updated HPV reference sequences and a capacity to handle NGS data sets is lacking. HPV-QUEST was developed as an automated and rapid HPV genotyping system. The web-based HPV-QUEST subtyping algorithm was developed using HTML, PHP, Perl scripting language, and MYSQL as the database backend. HPV-QUEST includes a database of annotated HPV reference sequences with updated nomenclature covering 5 genuses, 14 species and 150 mucosal and cutaneous types to genotype blasted query sequences. HPV-QUEST processes up to 10 megabases of sequences within 1 to 2 minutes. Results are reported in html, text and excel formats and display e-value, blast score, and local and coverage identities; provide genus, species, type, infection site and risk for the best matched reference HPV sequence; and produce results ready for additional analyses.

  19. Analysis of the AD sequence in Zion plant using the March 1.1 code

    International Nuclear Information System (INIS)

    Oriolo, F.; Paci, S.

    1985-01-01

    The analyses of the AD sequences for the Zion power plant, made at the Pisa University, in the framework of the participation in the Source Tern Working Group. After a short description of the plant and the sequence under analysis, the model used for the reference computation and the results obtained using the March 1.1 code are shown. Together with the reference computation a series of parametric tests have been also made, concerning some input code variables, in order to ascertain their influence on the transient trend. The results of these analyses are shown in Appendix

  20. In silico Analysis of osr40c1 Promoter Sequence Isolated from Indica Variety Pokkali

    OpenAIRE

    W.S.I. de Silva; M.M.N. Perera; K.L.N.S. Perera; A.M. Wickramasuriya; G.A.U. Jayasekera

    2017-01-01

    The promoter region of a drought and abscisic acid (ABA) inducible gene, osr40c1, was isolated from a salt-tolerant indica rice variety Pokkali, which is 670 bp upstream of the putative translation start codon. In silico promoter analysis of resulted sequence showed that at least 15 types of putative motifs were distributed within the sequence, including two types of common promoter elements, TATA and CAAT boxes. Additionally, several putative cis-acing regulatory elements which may be involv...

  1. An overview of the Phalaenopsis orchid genome through BAC end sequence analysis

    Directory of Open Access Journals (Sweden)

    Hsiao Yu-Yun

    2011-01-01

    Full Text Available Abstract Background Phalaenopsis orchids are popular floral crops, and development of new cultivars is economically important to floricultural industries worldwide. Analysis of orchid genes could facilitate orchid improvement. Bacterial artificial chromosome (BAC end sequences (BESs can provide the first glimpses into the sequence composition of a novel genome and can yield molecular markers for use in genetic mapping and breeding. Results We used two BAC libraries (constructed using the BamHI and HindIII restriction enzymes of Phalaenopsis equestris to generate pair-end sequences from 2,920 BAC clones (71.4% and 28.6% from the BamHI and HindIII libraries, respectively, at a success rate of 95.7%. A total of 5,535 BESs were generated, representing 4.5 Mb, or about 0.3% of the Phalaenopsis genome. The trimmed sequences ranged from 123 to 1,397 base pairs (bp in size, with an average edited read length of 821 bp. When these BESs were subjected to sequence homology searches, it was found that 641 (11.6% were predicted to represent protein-encoding regions, whereas 1,272 (23.0% contained repetitive DNA. Most of the repetitive DNA sequences were gypsy- and copia-like retrotransposons (41.9% and 12.8%, respectively, whereas only 10.8% were DNA transposons. Further, 950 potential simple sequence repeats (SSRs were discovered. Dinucleotides were the most abundant repeat motifs; AT/TA dimer repeats were the most frequent SSRs, representing 253 (26.6% of all identified SSRs. Microsynteny analysis revealed that more BESs mapped to the whole-genome sequences of poplar than to those of grape or Arabidopsis, and even fewer mapped to the rice genome. This work will facilitate analysis of the Phalaenopsis genome, and will help clarify similarities and differences in genome composition between orchids and other plant species. Conclusion Using BES analysis, we obtained an overview of the Phalaenopsis genome in terms of gene abundance, the presence of repetitive

  2. DNA sequence analysis of X-ray induced Adh null mutations in Drosophila melanogaster

    International Nuclear Information System (INIS)

    Mahmoud, J.; Fossett, N.G.; Arbour-Reily, P.; McDaniel, M.; Tucker, A.; Chang, S.H.; Lee, W.R.

    1991-01-01

    The mutational spectrum for 28 X-ray induced mutations and 2 spontaneous mutations, previously determined by genetic and cytogenetic methods, consisted of 20 multilocus deficiencies (19 induced and 1 spontaneous) and 10 intragenic mutations (9 induced and 1 spontaneous). One of the X-ray induced intragenic mutations was lost, and another was determined to be a recombinant with the allele used in the recovery scheme. The DNA sequence of two X-ray induced intragenic mutations has been published. This paper reports the results of DNA sequence analysis of the remaining intragenic mutations and a summary of the X-ray induced mutational spectrum. The combination of DNA sequence analysis with genetic complementation analysis shows a continuous distribution in size of deletions rather than two different types of mutations consisting of deletions and 'point mutations'. Sequencing is shown to be essential for detecting intragenic deletions. Of particular importance for future studies is the observation that all of the intragenic deletions consist of a direct repeat adjacent to the breakpoint with one of the repeats deleted

  3. Genetic diversity analysis of Leuconostoc mesenteroides from Korean vegetables and food products by multilocus sequence typing.

    Science.gov (United States)

    Sharma, Anshul; Kaur, Jasmine; Lee, Sulhee; Park, Young-Seo

    2018-06-01

    In the present study, 35 Leuconostoc mesenteroides strains isolated from vegetables and food products from South Korea were studied by multilocus sequence typing (MLST) of seven housekeeping genes (atpA, groEL, gyrB, pheS, pyrG, rpoA, and uvrC). The fragment sizes of the seven amplified housekeeping genes ranged in length from 366 to 1414 bp. Sequence analysis indicated 27 different sequence types (STs) with 25 of them being represented by a single strain indicating high genetic diversity, whereas the remaining 2 were characterized by five strains each. In total, 220 polymorphic nucleotide sites were detected among seven housekeeping genes. The phylogenetic analysis based on the STs of the seven loci indicated that the 35 strains belonged to two major groups, A (28 strains) and B (7 strains). Split decomposition analysis showed that intraspecies recombination played a role in generating diversity among strains. The minimum spanning tree showed that the evolution of the STs was not correlated with food source. This study signifies that the multilocus sequence typing is a valuable tool to access the genetic diversity among L. mesenteroides strains from South Korea and can be used further to monitor the evolutionary changes.

  4. A novel RNA sequencing data analysis method for cell line authentication.

    Directory of Open Access Journals (Sweden)

    Erik Fasterius

    Full Text Available We have developed a novel analysis method that can interrogate the authenticity of biological samples used for generation of transcriptome profiles in public data repositories. The method uses RNA sequencing information to reveal mutations in expressed transcripts and subsequently confirms the identity of analysed cells by comparison with publicly available cell-specific mutational profiles. Cell lines constitute key model systems widely used within cancer research, but their identity needs to be confirmed in order to minimise the influence of cell contaminations and genetic drift on the analysis. Using both public and novel data, we demonstrate the use of RNA-sequencing data analysis for cell line authentication by examining the validity of COLO205, DLD1, HCT15, HCT116, HKE3, HT29 and RKO colorectal cancer cell lines. We successfully authenticate the studied cell lines and validate previous reports indicating that DLD1 and HCT15 are synonymous. We also show that the analysed HKE3 cells harbour an unexpected KRAS-G13D mutation and confirm that this cell line is a genuine KRAS dosage mutant, rather than a true isogenic derivative of HCT116 expressing only the wild type KRAS. This authentication method could be used to revisit the numerous cell line based RNA sequencing experiments available in public data repositories, analyse new experiments where whole genome sequencing is not available, as well as facilitate comparisons of data from different experiments, platforms and laboratories.

  5. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    Science.gov (United States)

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.

  6. Whole-Genome Sequencing and Variant Analysis of Human Papillomavirus 16 Infections.

    Science.gov (United States)

    van der Weele, Pascal; Meijer, Chris J L M; King, Audrey J

    2017-10-01

    Human papillomavirus (HPV) is a strongly conserved DNA virus, high-risk types of which can cause cervical cancer in persistent infections. The most common type found in HPV-attributable cancer is HPV16, which can be subdivided into four lineages (A to D) with different carcinogenic properties. Studies have shown HPV16 sequence diversity in different geographical areas, but only limited information is available regarding HPV16 diversity within a population, especially at the whole-genome level. We analyzed HPV16 major variant diversity and conservation in persistent infections and performed a single nucleotide polymorphism (SNP) comparison between persistent and clearing infections. Materials were obtained in the Netherlands from a cohort study with longitudinal follow-up for up to 3 years. Our analysis shows a remarkably large variant diversity in the population. Whole-genome sequences were obtained for 57 persistent and 59 clearing HPV16 infections, resulting in 109 unique variants. Interestingly, persistent infections were completely conserved through time. One reinfection event was identified where the initial and follow-up samples clustered differently. Non-A1/A2 variants seemed to clear preferentially ( P = 0.02). Our analysis shows that population-wide HPV16 sequence diversity is very large. In persistent infections, the HPV16 sequence was fully conserved. Sequencing can identify HPV16 reinfections, although occurrence is rare. SNP comparison identified no strongly acting effect of the viral genome affecting HPV16 infection clearance or persistence in up to 3 years of follow-up. These findings suggest the progression of an early HPV16 infection could be host related. IMPORTANCE Human papillomavirus 16 (HPV16) is the predominant type found in cervical cancer. Progression of initial infection to cervical cancer has been linked to sequence properties; however, knowledge of variants circulating in European populations, especially with longitudinal follow-up, is

  7. Cloning and sequence analysis of sucrose phosphate synthase gene from varieties of Pennisetum species.

    Science.gov (United States)

    Li, H C; Lu, H B; Yang, F Y; Liu, S J; Bai, C J; Zhang, Y W

    2015-03-31

    Sucrose phosphate synthase (SPS) is an enzyme used by higher plants for sucrose synthesis. In this study, three primer sets were designed on the basis of known SPS sequences from maize (GenBank: NM_001112224.1) and sugarcane (GenBank: JN584485.1), and five novel SPS genes were identified by RT-PCR from the genomes of Pennisetum spp (the hybrid P. americanum x P. purpureum, P. purpureum Schum., P. purpureum Schum. cv. Red, P. purpureum Schum. cv. Taiwan, and P. purpureum Schum. cv. Mott). The cloned sequences showed 99.9% identity and 80-88% similarity to the SPS sequences of other plants. The SPS gene of hybrid Pennisetum had one nucleotide and four amino acid polymorphisms compared to the other four germplasms, and cluster analysis was performed to assess genetic diversity in this species. Additional characterization of the SPS gene product can potentially allow Pennisetum to be exploited as a biofuel source.

  8. Porcine MYF6 gene: sequence, homology analysis, and variation in the promoter region.

    Science.gov (United States)

    Wyszyńska-Koko, J; Kurył, J

    2004-01-01

    MYF6 gene codes for the bHLH transcription factor belonging to MyoD family. Its expression accompanies the processes of differentiation and maturation of myotubes during embriogenesis and continues on a relatively high level after birth, affecting the muscle phenotype. The porcine MYF6 gene was amplified and sequenced and compared with MYF6 gene sequences of other species. The amino acid sequence was deduced and an interspecies homology analysis was performed. Myf-6 protein shows a high conservation among species of 99 and 97% identity when comparing pig with cow and human, respectively, and of 93% when comparing pig with mouse and rat. The single nucleotide polymorphism (SNP) was revealed within the promoter region, which appeared to be T --> C transition recognized by a MspI restriction enzyme.

  9. Analysis on sequence stratigraphy and depositional systems of Mangbang formation, upper tertiary in Longchuanjiang basin

    International Nuclear Information System (INIS)

    Sun Zexuan; Yao Yifeng; Chen Yong; Li Guoxin

    2004-01-01

    Longchuanjiang basin is a small Cenozoic intramontane down-faulted basin. This paper, combining the Pliocene structure, the volcanic activities and the sedimentation of the basin, analyses the sequence stratigraphy and the depositional systems of Mangbang formation (the cover of the basin). Based on the analysis of depositional systems of Mangbang formation, the depositional pattern of Pliocene in Longchuanjiang basin is set up. It is suggested that because of the fast accumulation in early down-faulted zone during Pliocene time, the alluvial fan depositional system was dominated at that time. During the middle-late period, the alluvial fan entered the lake forming a combination of fan-fandelta-lacustrine depositional systems. Authors propose a view point that the formation of Mangbang formation sequence was constrained by multistage tectonic movement, and three structural sequences were established, and system tracts were divided. (authors)

  10. fCCAC: functional canonical correlation analysis to evaluate covariance between nucleic acid sequencing datasets.

    Science.gov (United States)

    Madrigal, Pedro

    2017-03-01

    Computational evaluation of variability across DNA or RNA sequencing datasets is a crucial step in genomic science, as it allows both to evaluate reproducibility of biological or technical replicates, and to compare different datasets to identify their potential correlations. Here we present fCCAC, an application of functional canonical correlation analysis to assess covariance of nucleic acid sequencing datasets such as chromatin immunoprecipitation followed by deep sequencing (ChIP-seq). We show how this method differs from other measures of correlation, and exemplify how it can reveal shared covariance between histone modifications and DNA binding proteins, such as the relationship between the H3K4me3 chromatin mark and its epigenetic writers and readers. An R/Bioconductor package is available at http://bioconductor.org/packages/fCCAC/ . pmb59@cam.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  11. Phylogenetic analysis of Demodex caprae based on mitochondrial 16S rDNA sequence.

    Science.gov (United States)

    Zhao, Ya-E; Hu, Li; Ma, Jun-Xian

    2013-11-01

    Demodex caprae infests the hair follicles and sebaceous glands of goats worldwide, which not only seriously impairs goat farming, but also causes a big economic loss. However, there are few reports on the DNA level of D. caprae. To reveal the taxonomic position of D. caprae within the genus Demodex, the present study conducted phylogenetic analysis of D. caprae based on mt16S rDNA sequence data. D. caprae adults and eggs were obtained from a skin nodule of the goat suffering demodicidosis. The mt16S rDNA sequences of individual mite were amplified using specific primers, and then cloned, sequenced, and aligned. The sequence divergence, genetic distance, and transition/transversion rate were computed, and the phylogenetic trees in Demodex were reconstructed. Results revealed the 339-bp partial sequences of six D. caprae isolates were obtained, and the sequence identity was 100% among isolates. The pairwise divergences between D. caprae and Demodex canis or Demodex folliculorum or Demodex brevis were 22.2-24.0%, 24.0-24.9%, and 22.9-23.2%, respectively. The corresponding average genetic distances were 2.840, 2.926, and 2.665, and the average transition/transversion rates were 0.70, 0.55, and 0.54, respectively. The divergences, genetic distances, and transition/transversion rates of D. caprae versus the other three species all reached interspecies level. The five phylogenetic trees all presented that D. caprae clustered with D. brevis first, and then with D. canis, D. folliculorum, and Demodex injai in sequence. In conclusion, D. caprae is an independent species, and it is closer to D. brevis than to D. canis, D. folliculorum, or D. injai.

  12. [Analysis of COX1 sequences of Taenia isolates from four areas of Guangxi].

    Science.gov (United States)

    Yang, Yi-Chao; Ou-Yang, Yi; Su, Ai-Rong; Wan, Xiao-Ling; Li, Shu-Lin

    2012-06-01

    To analyze the COX1 sequences of Taenia isolates from four areas of Guangxi Zhuang Autonomous Region, and to understand the distribution of Taenia asiatica in Guangxi. Patients with taeniasis in Luzhai, Rongshui, Tiandong and Sanjiang in Guangxi were treated by deworming, and the Taenia isolates were collected. Cyclooxygenase-1 (COX1) sequences of these isolates were amplified by PCR, and the PCR products were sequenced by T-A clone sequencing. The homogeneities and genetic distances were calculated and analyzed, and the phylogenic trees were constructed by some softwares. Meanwhile, the COX1 sequences of the isolates from the 4 areas were compared separately with the sequences of Taenia species in GenBank. The COX1 sequence of the 5 Taenia isolates collected had the same length of 444 bp. There were 5 variable positions between the Luzhai isolate and Taenia asiatica, the homogeneity was 98.87% and their genetic distance was 0.011. The phylogenetic tree analysis revealed that the Luzhai isolate and Taenia asiatica locating at the same node had a close relationship. The homogeneity between Rongshui isolate A and Taenia solium was 100%, while the homogeneity of Rongshui isolate B with Taeniasis saginata and Taenia asiatica were 98.20% and 96.17%, respectively. The homogeneities of the Tiandong and Sanjiang isolates with Taenia solium were 99.55% and 96.40%, respectively, and the genetic distances were 0.005 and 0.037, respectively. The homogeneity between the Luzhai isolate and Taeniasis saginate was 96.40%. Taenia asiatica exists in Luzhai and Taenia solium and Taenia saginata coexist in Rongshui, Guangxi Zhuang Autonomous Region.

  13. Microbiological profile of chicken carcasses: A comparative analysis using shotgun metagenomic sequencing

    Directory of Open Access Journals (Sweden)

    Alessandra De Cesare

    2018-04-01

    Full Text Available In the last few years metagenomic and 16S rRNA sequencing have completly changed the microbiological investigations of food products. In this preliminary study, the microbiological profile of chicken carcasses collected from animals fed with different diets were tested by using shotgun metagenomic sequencing. A total of 15 carcasses have been collected at the slaughetrhouse at the end of the refrigeration tunnel from chickens reared for 35 days and fed with a control diet (n=5, a diet supplemented with 1500 FTU/kg of commercial phytase (n=5 and a diet supplemented with 1500 FTU/kg of commercial phytase and 3g/kg of inositol (n=5. Ten grams of neck and breast skin were obtained from each carcass and submited to total DNA extraction by using the DNeasy Blood & Tissue Kit (Qiagen. Sequencing libraries have been prepared by using the Nextera XT DNA Library Preparation Kit (Illumina and sequenced in a HiScanSQ (Illumina at 100 bp in paired ends. A number of sequences ranging between 5 and 9 million was obtained for each sample. Sequence analysis showed that Proteobacteria and Firmicutes represented more than 98% of whole bacterial populations associated to carcass skin in all groups but their abundances were different between groups. Moraxellaceae and other degradative bacteria showed a significantly higher abundance in the control compared to the treated groups. Furthermore, Clostridium perfringens showed a relative frequency of abundance significantly higher in the group fed with phytase and Salmonella enterica in the group fed with phytase plus inositol. The results of this preliminary study showed that metagenome sequencing is suitable to investigate and monitor carcass microbiota in order to detect specific pathogenic and/or degradative populations.

  14. Chaos game representation of functional protein sequences, and simulation and multifractal analysis of induced measures

    International Nuclear Information System (INIS)

    Zu-Guo, Yu; Qian-Jun, Xiao; Long, Shi; Jun-Wu, Yu; Anh, Vo

    2010-01-01

    Investigating the biological function of proteins is a key aspect of protein studies. Bioinformatic methods become important for studying the biological function of proteins. In this paper, we first give the chaos game representation (CGR) of randomly-linked functional protein sequences, then propose the use of the recurrent iterated function systems (RIFS) in fractal theory to simulate the measure based on their chaos game representations. This method helps to extract some features of functional protein sequences, and furthermore the biological functions of these proteins. Then multifractal analysis of the measures based on the CGRs of randomly-linked functional protein sequences are performed. We find that the CGRs have clear fractal patterns. The numerical results show that the RIFS can simulate the measure based on the CGR very well. The relative standard error and the estimated probability matrix in the RIFS do not depend on the order to link the functional protein sequences. The estimated probability matrices in the RIFS with different biological functions are evidently different. Hence the estimated probability matrices in the RIFS can be used to characterise the difference among linked functional protein sequences with different biological functions. From the values of the D q curves, one sees that these functional protein sequences are not completely random. The D q of all linked functional proteins studied are multifractal-like and sufficiently smooth for the C q (analogous to specific heat) curves to be meaningful. Furthermore, the D q curves of the measure μ based on their CGRs for different orders to link the functional protein sequences are almost identical if q ≥ 0. Finally, the C q curves of all linked functional proteins resemble a classical phase transition at a critical point. (cross-disciplinary physics and related areas of science and technology)

  15. Human papilloma viruses and cervical tumours: mapping of integration sites and analysis of adjacent cellular sequences

    International Nuclear Information System (INIS)

    Klimov, Eugene; Vinokourova, Svetlana; Moisjak, Elena; Rakhmanaliev, Elian; Kobseva, Vera; Laimins, Laimonis; Kisseljov, Fjodor; Sulimova, Galina

    2002-01-01

    In cervical tumours the integration of human papilloma viruses (HPV) transcripts often results in the generation of transcripts that consist of hybrids of viral and cellular sequences. Mapping data using a variety of techniques has demonstrated that HPV integration occurred without obvious specificity into human genome. However, these techniques could not demonstrate whether integration resulted in the generation of transcripts encoding viral or viral-cellular sequences. The aim of this work was to map the integration sites of HPV DNA and to analyse the adjacent cellular sequences. Amplification of the INTs was done by the APOT technique. The APOT products were sequenced according to standard protocols. The analysis of the sequences was performed using BLASTN program and public databases. To localise the INTs PCR-based screening of GeneBridge4-RH-panel was used. Twelve cellular sequences adjacent to integrated HPV16 (INT markers) expressed in squamous cell cervical carcinomas were isolated. For 11 INT markers homologous human genomic sequences were readily identified and 9 of these showed significant homologies to known genes/ESTs. Using the known locations of homologous cDNAs and the RH-mapping techniques, mapping studies showed that the INTs are distributed among different human chromosomes for each tumour sample and are located in regions with the high levels of expression. Integration of HPV genomes occurs into the different human chromosomes but into regions that contain highly transcribed genes. One interpretation of these studies is that integration of HPV occurs into decondensed regions, which are more accessible for integration of foreign DNA

  16. Internal event analysis for Laguna Verde Unit 1 Nuclear Power Plant. Accident sequence quantification and results

    International Nuclear Information System (INIS)

    Huerta B, A.; Aguilar T, O.; Nunez C, A.; Lopez M, R.

    1994-01-01

    The Level 1 results of Laguna Verde Nuclear Power Plant PRA are presented in the I nternal Event Analysis for Laguna Verde Unit 1 Nuclear Power Plant, CNSNS-TR 004, in five volumes. The reports are organized as follows: CNSNS-TR 004 Volume 1: Introduction and Methodology. CNSNS-TR4 Volume 2: Initiating Event and Accident Sequences. CNSNS-TR 004 Volume 3: System Analysis. CNSNS-TR 004 Volume 4: Accident Sequence Quantification and Results. CNSNS-TR 005 Volume 5: Appendices A, B and C. This volume presents the development of the dependent failure analysis, the treatment of the support system dependencies, the identification of the shared-components dependencies, and the treatment of the common cause failure. It is also presented the identification of the main human actions considered along with the possible recovery actions included. The development of the data base and the assumptions and limitations in the data base are also described in this volume. The accident sequences quantification process and the resolution of the core vulnerable sequences are presented. In this volume, the source and treatment of uncertainties associated with failure rates, component unavailabilities, initiating event frequencies, and human error probabilities are also presented. Finally, the main results and conclusions for the Internal Event Analysis for Laguna Verde Nuclear Power Plant are presented. The total core damage frequency calculated is 9.03x 10-5 per year for internal events. The most dominant accident sequences found are the transients involving the loss of offsite power, the station blackout accidents, and the anticipated transients without SCRAM (ATWS). (Author)

  17. MetaSeq: privacy preserving meta-analysis of sequencing-based association studies.

    Science.gov (United States)

    Singh, Angad Pal; Zafer, Samreen; Pe'er, Itsik

    2013-01-01

    Human genetics recently transitioned from GWAS to studies based on NGS data. For GWAS, small effects dictated large sample sizes, typically made possible through meta-analysis by exchanging summary statistics across consortia. NGS studies groupwise-test for association of multiple potentially-causal alleles along each gene. They are subject to similar power constraints and therefore likely to resort to meta-analysis as well. The problem arises when considering privacy of the genetic information during the data-exchange process. Many scoring schemes for NGS association rely on the frequency of each variant thus requiring the exchange of identity of the sequenced variant. As such variants are often rare, potentially revealing the identity of their carriers and jeopardizing privacy. We have thus developed MetaSeq, a protocol for meta-analysis of genome-wide sequencing data by multiple collaborating parties, scoring association for rare variants pooled per gene across all parties. We tackle the challenge of tallying frequency counts of rare, sequenced alleles, for metaanalysis of sequencing data without disclosing the allele identity and counts, thereby protecting sample identity. This apparent paradoxical exchange of information is achieved through cryptographic means. The key idea is that parties encrypt identity of genes and variants. When they transfer information about frequency counts in cases and controls, the exchanged data does not convey the identity of a mutation and therefore does not expose carrier identity. The exchange relies on a 3rd party, trusted to follow the protocol although not trusted to learn about the raw data. We show applicability of this method to publicly available exome-sequencing data from multiple studies, simulating phenotypic information for powerful meta-analysis. The MetaSeq software is publicly available as open source.

  18. A functional U-statistic method for association analysis of sequencing data.

    Science.gov (United States)

    Jadhav, Sneha; Tong, Xiaoran; Lu, Qing

    2017-11-01

    Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence. © 2017 WILEY PERIODICALS, INC.

  19. MIToS.jl: mutual information tools for protein sequence analysis in the Julia language

    DEFF Research Database (Denmark)

    Zea, Diego J.; Anfossi, Diego; Nielsen, Morten

    2017-01-01

    Motivation: MIToS is an environment for mutual information analysis and a framework for protein multiple sequence alignments (MSAs) and protein structures (PDB) management in Julia language. It integrates sequence and structural information through SIFTS, making Pfam MSAs analysis straightforward....... MIToS streamlines the implementation of any measure calculated from residue contingency tables and its optimization and testing in terms of protein contact prediction. As an example, we implemented and tested a BLOSUM62-based pseudo-count strategy in mutual information analysis. Availability...... and Implementation: The software is totally implemented in Julia and supported for Linux, OS X and Windows. It’s freely available on GitHub under MIT license: http://mitos.leloir.org.ar. Contacts:diegozea@gmail.com or cmb@leloir.org.ar Supplementary information: Supplementary data are available at Bioinformatics...

  20. Survey of methods for integrated sequence analysis with emphasis on man-machine interaction

    Energy Technology Data Exchange (ETDEWEB)

    Kahlbom, U; Holmgren, P [RELCON, Stockholm (Sweden)

    1995-05-01

    This report presents a literature study concerning recently developed monotonic methodologies in the human reliability area. The work was performed by RELCON AB on commission by NKS/RAK-1, subproject 3. The topic of subproject 3 is `Integrated Sequence Analysis with Emphasis on Man-Machine Interaction`. The purpose with the study was to compile recently developed methodologies and to propose some of these methodologies for use in the sequence analysis task. The report describes mainly non-dynamic (monotonic) methodologies. One exception is HITLINE, which is a semi-dynamic method. Reference provides a summary of approaches to dynamic analysis of man-machine-interaction, and explains the differences between monotonic and dynamic methodologies. (au) 21 refs.

  1. Characterising the CRISPR immune system in Archaea using genome sequence analysis

    DEFF Research Database (Denmark)

    Shah, Shiraz Ali

    Archaea, a group of microorganisms distinct from bacteria and eukaryotes, are equipped with an adaptive immune system called the CRISPR system, which relies on an RNA interference mechanism to combat invading viruses and plasmids. Using a genome sequence analysis approach, the four components...... of archaeal genomic CRISPR loci were analysed, namely, repeats, spacers, leaders and cas genes. Based on analysis of spacer sequences it was predicted that the immune system combats viruses and plasmids by targeting their DNA. Furthermore, analysis of repeats, leaders and cas genes revealed that CRISPR...... systems exist as distinct families which have key differences between themselves. Closely related organisms were seen harbouring different CRISPR systems, while some distantly related species carried similar systems, indicating frequent horizontal exchange. Moreover, it was found that cas genes of Type I...

  2. Survey of methods for integrated sequence analysis with emphasis on man-machine interaction

    International Nuclear Information System (INIS)

    Kahlbom, U.; Holmgren, P.

    1995-05-01

    This report presents a literature study concerning recently developed monotonic methodologies in the human reliability area. The work was performed by RELCON AB on commission by NKS/RAK-1, subproject 3. The topic of subproject 3 is 'Integrated Sequence Analysis with Emphasis on Man-Machine Interaction'. The purpose with the study was to compile recently developed methodologies and to propose some of these methodologies for use in the sequence analysis task. The report describes mainly non-dynamic (monotonic) methodologies. One exception is HITLINE, which is a semi-dynamic method. Reference provides a summary of approaches to dynamic analysis of man-machine-interaction, and explains the differences between monotonic and dynamic methodologies. (au) 21 refs

  3. HTSstation: a web application and open-access libraries for high-throughput sequencing data analysis.

    Science.gov (United States)

    David, Fabrice P A; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch.

  4. Microarray and cDNA sequence analysis of transcription during nerve-dependent limb regeneration

    Directory of Open Access Journals (Sweden)

    Bryant Susan V

    2009-01-01

    Full Text Available Abstract Background Microarray analysis and 454 cDNA sequencing were used to investigate a centuries-old problem in regenerative biology: the basis of nerve-dependent limb regeneration in salamanders. Innervated (NR and denervated (DL forelimbs of Mexican axolotls were amputated and transcripts were sampled after 0, 5, and 14 days of regeneration. Results Considerable similarity was observed between NR and DL transcriptional programs at 5 and 14 days post amputation (dpa. Genes with extracellular functions that are critical to wound healing were upregulated while muscle-specific genes were downregulated. Thus, many processes that are regulated during early limb regeneration do not depend upon nerve-derived factors. The majority of the transcriptional differences between NR and DL limbs were correlated with blastema formation; cell numbers increased in NR limbs after 5 dpa and this yielded distinct transcriptional signatures of cell proliferation in NR limbs at 14 dpa. These transcriptional signatures were not observed in DL limbs. Instead, gene expression changes within DL limbs suggest more diverse and protracted wound-healing responses. 454 cDNA sequencing complemented the microarray analysis by providing deeper sampling of transcriptional programs and associated biological processes. Assembly of new 454 cDNA sequences with existing expressed sequence tag (EST contigs from the Ambystoma EST database more than doubled (3935 to 9411 the number of non-redundant human-A. mexicanum orthologous sequences. Conclusion Many new candidate gene sequences were discovered for the first time and these will greatly enable future studies of wound healing, epigenetics, genome stability, and nerve-dependent blastema formation and outgrowth using the axolotl model.

  5. Genome-wide SNP discovery in tetraploid alfalfa using 454 sequencing and high resolution melting analysis

    Directory of Open Access Journals (Sweden)

    Zhao Patrick X

    2011-07-01

    Full Text Available Abstract Background Single nucleotide polymorphisms (SNPs are the most common type of sequence variation among plants and are often functionally important. We describe the use of 454 technology and high resolution melting analysis (HRM for high throughput SNP discovery in tetraploid alfalfa (Medicago sativa L., a species with high economic value but limited genomic resources. Results The alfalfa genotypes selected from M. sativa subsp. sativa var. 'Chilean' and M. sativa subsp. falcata var. 'Wisfal', which differ in water stress sensitivity, were used to prepare cDNA from tissue of clonally-propagated plants grown under either well-watered or water-stressed conditions, and then pooled for 454 sequencing. Based on 125.2 Mb of raw sequence, a total of 54,216 unique sequences were obtained including 24,144 tentative consensus (TCs sequences and 30,072 singletons, ranging from 100 bp to 6,662 bp in length, with an average length of 541 bp. We identified 40,661 candidate SNPs distributed throughout the genome. A sample of candidate SNPs were evaluated and validated using high resolution melting (HRM analysis. A total of 3,491 TCs harboring 20,270 candidate SNPs were located on the M. truncatula (MT 3.5.1 chromosomes. Gene Ontology assignments indicate that sequences obtained cover a broad range of GO categories. Conclusions We describe an efficient method to identify thousands of SNPs distributed throughout the alfalfa genome covering a broad range of GO categories. Validated SNPs represent valuable molecular marker resources that can be used to enhance marker density in linkage maps, identify potential factors involved in heterosis and genetic variation, and as tools for association mapping and genomic selection in alfalfa.

  6. An optimized protocol for generation and analysis of Ion Proton sequencing reads for RNA-Seq.

    Science.gov (United States)

    Yuan, Yongxian; Xu, Huaiqian; Leung, Ross Ka-Kit

    2016-05-26

    Previous studies compared running cost, time and other performance measures of popular sequencing platforms. However, comprehensive assessment of library construction and analysis protocols for Proton sequencing platform remains unexplored. Unlike Illumina sequencing platforms, Proton reads are heterogeneous in length and quality. When sequencing data from different platforms are combined, this can result in reads with various read length. Whether the performance of the commonly used software for handling such kind of data is satisfactory is unknown. By using universal human reference RNA as the initial material, RNaseIII and chemical fragmentation methods in library construction showed similar result in gene and junction discovery number and expression level estimated accuracy. In contrast, sequencing quality, read length and the choice of software affected mapping rate to a much larger extent. Unspliced aligner TMAP attained the highest mapping rate (97.27 % to genome, 86.46 % to transcriptome), though 47.83 % of mapped reads were clipped. Long reads could paradoxically reduce mapping in junctions. With reference annotation guide, the mapping rate of TopHat2 significantly increased from 75.79 to 92.09 %, especially for long (>150 bp) reads. Sailfish, a k-mer based gene expression quantifier attained highly consistent results with that of TaqMan array and highest sensitivity. We provided for the first time, the reference statistics of library preparation methods, gene detection and quantification and junction discovery for RNA-Seq by the Ion Proton platform. Chemical fragmentation performed equally well with the enzyme-based one. The optimal Ion Proton sequencing options and analysis software have been evaluated.

  7. Comparative analysis of full genomic sequences among different genotypes of dengue virus type 3

    Directory of Open Access Journals (Sweden)

    Lin Ting-Hsiang

    2008-05-01

    Full Text Available Abstract Background Although the previous study demonstrated the envelope protein of dengue viruses is under purifying selection pressure, little is known about the genetic differences of full-length viral genomes of DENV-3. In our study, complete genomic sequencing of DENV-3 strains collected from different geographical locations and isolation years were determined and the sequence diversity as well as selection pressure sites in the DENV genome other than within the E gene were also analyzed. Results Using maximum likelihood and Bayesian approaches, our phylogenetic analysis revealed that the Taiwan's indigenous DENV-3 isolated from 1994 and 1998 dengue/DHF epidemics and one 1999 sporadic case were of the three different genotypes – I, II, and III, each associated with DENV-3 circulating in Indonesia, Thailand and Sri Lanka, respectively. Sequence diversity and selection pressure of different genomic regions among DENV-3 different genotypes was further examined to understand the global DENV-3 evolution. The highest nucleotide sequence diversity among the fully sequenced DENV-3 strains was found in the nonstructural protein 2A (mean ± SD: 5.84 ± 0.54 and envelope protein gene regions (mean ± SD: 5.04 ± 0.32. Further analysis found that positive selection pressure of DENV-3 may occur in the non-structural protein 1 gene region and the positive selection site was detected at position 178 of the NS1 gene. Conclusion Our study confirmed that the envelope protein is under purifying selection pressure although it presented higher sequence diversity. The detection of positive selection pressure in the non-structural protein along genotype II indicated that DENV-3 originated from Southeast Asia needs to monitor the emergence of DENV strains with epidemic potential for better epidemic prevention and vaccine development.

  8. Software for rapid time dependent ChIP-sequencing analysis (TDCA).

    Science.gov (United States)

    Myschyshyn, Mike; Farren-Dai, Marco; Chuang, Tien-Jui; Vocadlo, David

    2017-11-25

    Chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) and associated methods are widely used to define the genome wide distribution of chromatin associated proteins, post-translational epigenetic marks, and modifications found on DNA bases. An area of emerging interest is to study time dependent changes in the distribution of such proteins and marks by using serial ChIP-seq experiments performed in a time resolved manner. Despite such time resolved studies becoming increasingly common, software to facilitate analysis of such data in a robust automated manner is limited. We have designed software called Time-Dependent ChIP-Sequencing Analyser (TDCA), which is the first program to automate analysis of time-dependent ChIP-seq data by fitting to sigmoidal curves. We provide users with guidance for experimental design of TDCA for modeling of time course (TC) ChIP-seq data using two simulated data sets. Furthermore, we demonstrate that this fitting strategy is widely applicable by showing that automated analysis of three previously published TC data sets accurately recapitulates key findings reported in these studies. Using each of these data sets, we highlight how biologically relevant findings can be readily obtained by exploiting TDCA to yield intuitive parameters that describe behavior at either a single locus or sets of loci. TDCA enables customizable analysis of user input aligned DNA sequencing data, coupled with graphical outputs in the form of publication-ready figures that describe behavior at either individual loci or sets of loci sharing common traits defined by the user. TDCA accepts sequencing data as standard binary alignment map (BAM) files and loci of interest in browser extensible data (BED) file format. TDCA accurately models the number of sequencing reads, or coverage, at loci from TC ChIP-seq studies or conceptually related TC sequencing experiments. TC experiments are reduced to intuitive parametric values that facilitate biologically

  9. Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data

    Directory of Open Access Journals (Sweden)

    William H Thiel

    2016-01-01

    Full Text Available Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment. High-throughput sequencing (HTS revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs.

  10. Welding distortion analysis of multipass joint combination with different sequences using 3D FEM and experiment

    International Nuclear Information System (INIS)

    Manurung, Yupiter H.P.; Lidam, Robert Ngendang; Rahim, M. Ridzwan; Zakaria, M. Yusof; Redza, M. Ridhwan; Sulaiman, M. Shahar; Tham, Ghalib; Abas, Sunhaji K.

    2013-01-01

    This paper presents an investigation of the welding sequence effect on induced angular distortion using FEM and experiments. The specimen of a combined joint geometry was modeled and simulated using Multipass Welding Advisor (MWA) in SYSWELD 2010 based on the thermal-elastic-plastic approach with low manganese carbon steel S3355J2G3 as specimen material and Goldak's double ellipsoid as heat source model. To validate the simulation results, a series of experiments was conducted with two different welding sequences using automated welding process, low carbon steel as parent metal, digital GMAW power source with premixed shielding gas and both-sided clamping technique. Based on the results, it was established that the thermo-elastic-plastic 3D FEM analysis shows good agreement with experimental results and the welding sequence “from outside to inside” induced less angular distortion compared to “from inside to outside”. -- Highlights: • 3D FEM was used to analyze the welding distortion on two different sequences. • Simulation results were validated with experiments using automated welding system. • Simulation results and experiments showed acceptable accuracy. • Welding sequence “outside–inside” showed less distortion than “inside–outside”

  11. Transcriptome sequencing and de novo analysis of the copepod Calanus sinicus using 454 GS FLX.

    Directory of Open Access Journals (Sweden)

    Juan Ning

    Full Text Available BACKGROUND: Despite their species abundance and primary economic importance, genomic information about copepods is still limited. In particular, genomic resources are lacking for the copepod Calanus sinicus, which is a dominant species in the coastal waters of East Asia. In this study, we performed de novo transcriptome sequencing to produce a large number of expressed sequence tags for the copepod C. sinicus. RESULTS: Copepodid larvae and adults were used as the basic material for transcriptome sequencing. Using 454 pyrosequencing, a total of 1,470,799 reads were obtained, which were assembled into 56,809 high quality expressed sequence tags. Based on their sequence similarity to known proteins, about 14,000 different genes were identified, including members of all major conserved signaling pathways. Transcripts that were putatively involved with growth, lipid metabolism, molting, and diapause were also identified among these genes. Differentially expressed genes related to several processes were found in C. sinicus copepodid larvae and adults. We detected 284,154 single nucleotide polymorphisms (SNPs that provide a resource for gene function studies. CONCLUSION: Our data provide the most comprehensive transcriptome resource available for C. sinicus. This resource allowed us to identify genes associated with primary physiological processes and SNPs in coding regions, which facilitated the quantitative analysis of differential gene expression. These data should provide foundation for future genetic and genomic studies of this and related species.

  12. Analysis and comparison of fragrant gene sequence in some rice cultivars

    Directory of Open Access Journals (Sweden)

    Karami Noushafarin

    2016-01-01

    Full Text Available It is known that the fragrant trait in rice (Oryza sativa L. is largely controlled by fgr gene on chromosome 8 and it has been specified that the existence of an 8 bp deletion and three single nucleotide polymorphism (SNP in exon 7 is effective on this trait. In this study, sequence alignment analysis of fgr exon7 on chromosome 8 for 11 different fragrant and non-fragrant cultivars revealed that 5 aromatic rice cultivars carried 3 SNPs and 8 bp deletion in exon7 which terminates prematurely at a TAA stop codon. However, 5 of the non-aromatics showed a sequence identical to the published Nipponbare, being non-fragrant Japonica variety sequence. An exception among them was Bejar, which had 8 bp deletion and 3SNPs but it was non-aromatic. Sequencing can determine nucleotide alignment of a gene and give beneficial information about gene function. In silico prediction showed proteins sequences alignment of fgr gene for Khazar and Domsiah genotypes were different. Betaine aldehyde dehydrogenase complete enzyme belongs to Khazar non-fragrant genotype that has complete length and 503 amino acids while non-functional BADH2 enzyme for Domsiah fragrant genotype has 251 amino acids that result in accumulate 2-acetyl-1-pyrroline (2AP and produces aroma in fragrant genotypes.

  13. Chromosome-scale comparative sequence analysis unravels molecular mechanisms of genome evolution between two wheat cultivars

    KAUST Repository

    Thind, Anupriya Kaur

    2018-02-08

    Background: Recent improvements in DNA sequencing and genome scaffolding have paved the way to generate high-quality de novo assemblies of pseudomolecules representing complete chromosomes of wheat and its wild relatives. These assemblies form the basis to compare the evolutionary dynamics of wheat genomes on a megabase-scale. Results: Here, we provide a comparative sequence analysis of the 700-megabase chromosome 2D between two bread wheat genotypes, the old landrace Chinese Spring and the elite Swiss spring wheat line CH Campala Lr22a. There was a high degree of sequence conservation between the two chromosomes. Analysis of large structural variations revealed four large insertions/deletions (InDels) of >100 kb. Based on the molecular signatures at the breakpoints, unequal crossing over and double-strand break repair were identified as the evolutionary mechanisms that caused these InDels. Three of the large InDels affected copy number of NLRs, a gene family involved in plant immunity. Analysis of single nucleotide polymorphism (SNP) density revealed three haploblocks of 8 Mb, 9 Mb and 48 Mb with a 35-fold increased SNP density compared to the rest of the chromosome. Conclusions: This comparative analysis of two high-quality chromosome assemblies enabled a comprehensive assessment of large structural variations. The insight obtained from this analysis will form the basis of future wheat pan-genome studies.

  14. The Expansion and Functional Diversification of the Mammalian Ribonuclease A Superfamily Epitomizes the Efficiency of Multigene Families at Generating Biological Novelty

    Science.gov (United States)

    Goo, Stephen M.; Cho, Soochin

    2013-01-01

    The ribonuclease (RNase) A superfamily is a vertebrate-specific gene family. Because of a massive expansion that occurred during the early mammalian evolution, extant mammals in general have much more RNase genes than nonmammalian vertebrates. Mammalian RNases have been associated with diverse physiological functions including digestion, cytotoxicity, angiogenesis, male reproduction, and host defense. However, it is still uncertain when their expansion occurred and how a wide array of functions arose during their evolution. To answer these questions, we generate a compendium of all RNase genes identified in 20 complete mammalian genomes including the platypus, Ornithorhynchus anatinus. Using this, we delineate 13 ancient RNase gene lineages that arose before the divergence between the monotreme and the other mammals (∼220 Ma). These 13 ancient gene lineages are differentially retained in the 20 mammals, and the rate of protein sequence evolution is highly variable among them, which suggest that they have undergone extensive functional diversification. In addition, we identify 22 episodes of recent expansion of RNase genes, many of which have signatures of adaptive functional differentiation. Exemplifying this, bursts of gene duplication occurred for the RNase1, RNase4, and RNase5 genes of the little brown bat (Myotis lucifugus), which might have contributed to the species’ effective defense against heavier pathogen loads caused by its communal roosting behavior. Our study illustrates how host-defense systems can generate new functions efficiently by employing a multigene family, which is crucial for a host organism to adapt to its ever-changing pathogen environment. PMID:24162010

  15. Multi-gene genetic programming based predictive models for municipal solid waste gasification in a fluidized bed gasifier.

    Science.gov (United States)

    Pandey, Daya Shankar; Pan, Indranil; Das, Saptarshi; Leahy, James J; Kwapinski, Witold

    2015-03-01

    A multi-gene genetic programming technique is proposed as a new method to predict syngas yield production and the lower heating value for municipal solid waste gasification in a fluidized bed gasifier. The study shows that the predicted outputs of the municipal solid waste gasification process are in good agreement with the experimental dataset and also generalise well to validation (untrained) data. Published experimental datasets are used for model training and validation purposes. The results show the effectiveness of the genetic programming technique for solving complex nonlinear regression problems. The multi-gene genetic programming are also compared with a single-gene genetic programming model to show the relative merits and demerits of the technique. This study demonstrates that the genetic programming based data-driven modelling strategy can be a good candidate for developing models for other types of fuels as well. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Computational sequence analysis of predicted long dsRNA transcriptomes of major crops reveals sequence complementarity with human genes.

    Science.gov (United States)

    Jensen, Peter D; Zhang, Yuanji; Wiggins, B Elizabeth; Petrick, Jay S; Zhu, Jin; Kerstetter, Randall A; Heck, Gregory R; Ivashuta, Sergey I

    2013-01-01

    Long double-stranded RNAs (long dsRNAs) are precursors for the effector molecules of sequence-specific RNA-based gene silencing in eukaryotes. Plant cells can contain numerous endogenous long dsRNAs. This study demonstrates that such endogenous long dsRNAs in plants have sequence complementarity to human genes. Many of these complementary long dsRNAs have perfect sequence complementarity of at least 21 nucleotides to human genes; enough complementarity to potentially trigger gene silencing in targeted human cells if delivered in functional form. However, the number and diversity of long dsRNA molecules in plant tissue from crops such as lettuce, tomato, corn, soy and rice with complementarity to human genes that have a long history of safe consumption supports a conclusion that long dsRNAs do not present a significant dietary risk.

  17. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    Science.gov (United States)

    2012-01-01

    Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920

  18. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    Directory of Open Access Journals (Sweden)

    Liu Chang

    2012-12-01

    Full Text Available Abstract Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas.

  19. An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries

    Science.gov (United States)

    Habermann, Bianca; Bebin, Anne-Gaelle; Herklotz, Stephan; Volkmer, Michael; Eckelt, Kay; Pehlke, Kerstin; Epperlein, Hans Henning; Schackert, Hans Konrad; Wiebe, Glenis; Tanaka, Elly M

    2004-01-01

    Background The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum. Results Two cDNA libraries, one made from stage 18-22 embryos and the other from day-6 regenerating tail blastemas, generated 17,352 sequences. From the sequenced ESTs, 6,377 contigs were assembled that probably represent 25% of the expressed genes in this organism. Sequence comparison revealed significant homology to entries in the NCBI non-redundant database. Further examination of this gene set revealed the presence of genes involved in important cell and developmental processes, including cell proliferation, cell differentiation and cell-cell communication. On the basis of these data, we have performed phylogenetic analysis of key cell-cycle regulators. Interestingly, while cell-cycle proteins such as the cyclin B family display expected evolutionary relationships, the cyclin-dependent kinase inhibitor 1 gene family shows an unusual evolutionary behavior among the amphibians. Conclusions Our analysis reveals the importance of a comprehensive sequence set from a representative of the Caudata and illustrates that the EST sequence database is a rich source of molecular, developmental and regeneration studies. To aid in data mining, the ESTs have been organized into an easily searchable database that is freely available online. PMID:15345051

  20. The Swiss-Army-Knife Approach to the Nearly Automatic Analysis for Microearthquake Sequences.

    Science.gov (United States)

    Kraft, T.; Simon, V.; Tormann, T.; Diehl, T.; Herrmann, M.

    2017-12-01

    Many Swiss earthquake sequence have been studied using relative location techniques, which often allowed to constrain the active fault planes and shed light on the tectonic processes that drove the seismicity. Yet, in the majority of cases the number of located earthquakes was too small to infer the details of the space-time evolution of the sequences, or their statistical properties. Therefore, it has mostly been impossible to resolve clear patterns in the seismicity of individual sequences, which are needed to improve our understanding of the mechanisms behind them. Here we present a nearly automatic workflow that combines well-established seismological analysis techniques and allows to significantly improve the completeness of detected and located earthquakes of a sequence. We start from the manually timed routine catalog of the Swiss Seismological Service (SED), which contains the larger events of a sequence. From these well-analyzed earthquakes we dynamically assemble a template set and perform a matched filter analysis on the station with: the best SNR for the sequence; and a recording history of at least 10-15 years, our typical analysis period. This usually allows us to detect events several orders of magnitude below the SED catalog detection threshold. The waveform similarity of the events is then further exploited to derive accurate and consistent magnitudes. The enhanced catalog is then analyzed statistically to derive high-resolution time-lines of the a- and b-value and consequently the occurrence probability of larger events. Many of the detected events are strong enough to be located using double-differences. No further manual interaction is needed; we simply time-shift the arrival-time pattern of the detecting template to the associated detection. Waveform similarity assures a good approximation of the expected arrival-times, which we use to calculate event-pair arrival-time differences by cross correlation. After a SNR and cycle-skipping quality

  1. Implementation of Cloud based next generation sequencing data analysis in a clinical laboratory.

    Science.gov (United States)

    Onsongo, Getiria; Erdmann, Jesse; Spears, Michael D; Chilton, John; Beckman, Kenneth B; Hauge, Adam; Yohe, Sophia; Schomaker, Matthew; Bower, Matthew; Silverstein, Kevin A T; Thyagarajan, Bharat

    2014-05-23

    The introduction of next generation sequencing (NGS) has revolutionized molecular diagnostics, though several challenges remain limiting the widespread adoption of NGS testing into clinical practice. One such difficulty includes the development of a robust bioinformatics pipeline that can handle the volume of data generated by high-throughput sequencing in a cost-effective manner. Analysis of sequencing data typically requires a substantial level of computing power that is often cost-prohibitive to most clinical diagnostics laboratories. To address this challenge, our institution has developed a Galaxy-based data analysis pipeline which relies on a web-based, cloud-computing infrastructure to process NGS data and identify genetic variants. It provides additional flexibility, needed to control storage costs, resulting in a pipeline that is cost-effective on a per-sample basis. It does not require the usage of EBS disk to run a sample. We demonstrate the validation and feasibility of implementing this bioinformatics pipeline in a molecular diagnostics laboratory. Four samples were analyzed in duplicate pairs and showed 100% concordance in mutations identified. This pipeline is currently being used in the clinic and all identified pathogenic variants confirmed using Sanger sequencing further validating the software.

  2. Cloning, nucleotide sequence and transcriptional analysis of the uvrA gene from Neisseria gonorrhoeae

    International Nuclear Information System (INIS)

    Black, C.G.; Fyfe, J.A.M.; Davies, J.K.

    1997-01-01

    A recombinant plasmid capable of restoring UV resistance to an Escherichia coli uvrA mutant was isolated from a genomic library of Neisseria gonorrhoeae. Sequence analysis revealed an open reading frame whose deduced amino acid sequence displayed significant similarity to those of the UvrA proteins of other bacterial species. A second open reading frame (ORF259) was identified upstream from, and in the opposite orientation to the gonococcal uvrA gene. Transcriptional fusions between portions of the gonococcal uvrA upstream region and a reporter gene were used to localise promoter activity in both E. coli and N. gonorrhoeae. The transcriptional starting points of uvrA and ORF259 were mapped in E. coli by primer extension analysis, and corresponding σ 70 promoters were identified. The arrangement of the uvrA-ORF259 intergenic region is similar to that of the gonococcal recA-aroD intergenic region. Both contain inverted copies of the 10 bp neisserial DNA uptake sequence situated between divergently transcribed genes. However, there is no evidence that either the uptake sequence or the proximity of the promoters influences expression of these genes. (author)

  3. In Silico Genome Comparison and Distribution Analysis of Simple Sequences Repeats in Cassava

    Directory of Open Access Journals (Sweden)

    Andrea Vásquez

    2014-01-01

    Full Text Available We conducted a SSRs density analysis in different cassava genomic regions. The information obtained was useful to establish comparisons between cassava’s SSRs genomic distribution and those of poplar, flax, and Jatropha. In general, cassava has a low SSR density (~50 SSRs/Mbp and has a high proportion of pentanucleotides, (24,2 SSRs/Mbp. It was found that coding sequences have 15,5 SSRs/Mbp, introns have 82,3 SSRs/Mbp, 5′ UTRs have 196,1 SSRs/Mbp, and 3′ UTRs have 50,5 SSRs/Mbp. Through motif analysis of cassava’s genome SSRs, the most abundant motif was AT/AT while in intron sequences and UTRs regions it was AG/CT. In addition, in coding sequences the motif AAG/CTT was also found to occur most frequently; in fact, it is the third most used codon in cassava. Sequences containing SSRs were classified according to their functional annotation of Gene Ontology categories. The identified SSRs here may be a valuable addition for genetic mapping and future studies in phylogenetic analyses and genomic evolution.

  4. Micropathogen Community Analysis in Hyalomma rufipes via High-Throughput Sequencing of Small RNAs

    Science.gov (United States)

    Luo, Jin; Liu, Min-Xuan; Ren, Qiao-Yun; Chen, Ze; Tian, Zhan-Cheng; Hao, Jia-Wei; Wu, Feng; Liu, Xiao-Cui; Luo, Jian-Xun; Yin, Hong; Wang, Hui; Liu, Guang-Yuan

    2017-01-01

    Ticks are important vectors in the transmission of a broad range of micropathogens to vertebrates, including humans. Because of the role of ticks in disease transmission, identifying and characterizing the micropathogen profiles of tick populations have become increasingly important. The objective of this study was to survey the micropathogens of Hyalomma rufipes ticks. Illumina HiSeq2000 technology was utilized to perform deep sequencing of small RNAs (sRNAs) extracted from field-collected H. rufipes ticks in Gansu Province, China. The resultant sRNA library data revealed that the surveyed tick populations produced reads that were homologous to St. Croix River Virus (SCRV) sequences. We also observed many reads that were homologous to microbial and/or pathogenic isolates, including bacteria, protozoa, and fungi. As part of this analysis, a phylogenetic tree was constructed to display the relationships among the homologous sequences that were identified. The study offered a unique opportunity to gain insight into the micropathogens of H. rufipes ticks. The effective control of arthropod vectors in the future will require knowledge of the micropathogen composition of vectors harboring infectious agents. Understanding the ecological factors that regulate vector propagation in association with the prevalence and persistence of micropathogen lineages is also imperative. These interactions may affect the evolution of micropathogen lineages, especially if the micropathogens rely on the vector or host for dispersal. The sRNA deep-sequencing approach used in this analysis provides an intuitive method to survey micropathogen prevalence in ticks and other vector species. PMID:28861401

  5. A systematic identification of Kolobok superfamily transposons in Trichomonas vaginalis and sequence analysis on related transposases

    Institute of Scientific and Technical Information of China (English)

    Qingshu Meng; Kaifu Chen; Lina Ma; Songnian Hu; Jun Yu

    2011-01-01

    Transposons are sequence elements widely distributed among genomes of all three kingdoms of life, providing genomic changes and playing significant roles in genome evolution. Trichomonas vaginalis is an excellent model system for transposon study since its genome ( ~ 160 Mb) has been sequenced and is composed of ~65% transposons and other repetitive elements. In this study, we primarily report the identification of Kolobok-type transposons (termed tvBac) in T. vaginalis and the results of transposase sequence analysis. We categorized 24 novel subfamilies of the Kolobok element, including one autonomous subfamily and 23 non-autonomous subfamilies. We also identified a novel H2CH motif in tvBac transposases based on multiple sequence alignment. In addition, we supposed that tvBac and Mutator transposons may have evolved independently from a common ancestor according to our phylogenetic analysis. Our results provide basic information for the understanding of the function and evolution of tvBac transposons in particular and other related transposon families in general.

  6. Exact combinatorial reliability analysis of dynamic systems with sequence-dependent failures

    International Nuclear Information System (INIS)

    Xing Liudong; Shrestha, Akhilesh; Dai Yuanshun

    2011-01-01

    Many real-life fault-tolerant systems are subjected to sequence-dependent failure behavior, in which the order in which the fault events occur is important to the system reliability. Such systems can be modeled by dynamic fault trees (DFT) with priority-AND (pAND) gates. Existing approaches for the reliability analysis of systems subjected to sequence-dependent failures are typically state-space-based, simulation-based or inclusion-exclusion-based methods. Those methods either suffer from the state-space explosion problem or require long computation time especially when results with high degree of accuracy are desired. In this paper, an analytical method based on sequential binary decision diagrams is proposed. The proposed approach can analyze the exact reliability of non-repairable dynamic systems subjected to the sequence-dependent failure behavior. Also, the proposed approach is combinatorial and is applicable for analyzing systems with any arbitrary component time-to-failure distributions. The application and advantages of the proposed approach are illustrated through analysis of several examples. - Highlights: → We analyze the sequence-dependent failure behavior using combinatorial models. → The method has no limitation on the type of time-to-failure distributions. → The method is analytical and based on sequential binary decision diagrams (SBDD). → The method is computationally more efficient than existing methods.

  7. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    Science.gov (United States)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  8. Analysis of whole genome sequencing for the Escherichia coli O157:H7 typing phages.

    Science.gov (United States)

    Cowley, Lauren A; Beckett, Stephen J; Chase-Topping, Margo; Perry, Neil; Dallman, Tim J; Gally, David L; Jenkins, Claire

    2015-04-08

    Shiga toxin producing Escherichia coli O157 can cause severe bloody diarrhea and haemolytic uraemic syndrome. Phage typing of E. coli O157 facilitates public health surveillance and outbreak investigations, certain phage types are more likely to occupy specific niches and are associated with specific age groups and disease severity. The aim of this study was to analyse the genome sequences of 16 (fourteen T4 and two T7) E. coli O157 typing phages and to determine the genes responsible for the subtle differences in phage type profiles. The typing phages were sequenced using paired-end Illumina sequencing at The Genome Analysis Centre and the Animal Health and Veterinary Laboratories Agency and bioinformatics programs including Velvet, Brig and Easyfig were used to analyse them. A two-way Euclidian cluster analysis highlighted the associations between groups of phage types and typing phages. The analysis showed that the T7 typing phages (9 and 10) differed by only three genes and that the T4 typing phages formed three distinct groups of similar genomic sequences: Group 1 (1, 8, 11, 12 and 15, 16), Group 2 (3, 6, 7 and 13) and Group 3 (2, 4, 5 and 14). The E. coli O157 phage typing scheme exhibited a significantly modular network linked to the genetic similarity of each group showing that these groups are specialised to infect a subset of phage types. Sequencing the typing phage has enabled us to identify the variable genes within each group and to determine how this corresponds to changes in phage type.

  9. The Use of Next Generation Sequencing and Junction Sequence Analysis Bioinformatics to Achieve Molecular Characterization of Crops Improved Through Modern Biotechnology

    Directory of Open Access Journals (Sweden)

    David Kovalic

    2012-11-01

    Full Text Available The assessment of genetically modified (GM crops for regulatory approval currently requires a detailed molecular characterization of the DNA sequence and integrity of the transgene locus. In addition, molecular characterization is a critical component of event selection and advancement during product development. Typically, molecular characterization has relied on Southern blot analysis to establish locus and copy number along with targeted sequencing of polymerase chain reaction products spanning any inserted DNA to complete the characterization process. Here we describe the use of next generation (NexGen sequencing and junction sequence analysis bioinformatics in a new method for achieving full molecular characterization of a GM event without the need for Southern blot analysis. In this study, we examine a typical GM soybean [ (L. Merr.] line and demonstrate that this new method provides molecular characterization equivalent to the current Southern blot-based method. We also examine an event containing in vivo DNA rearrangement of multiple transfer DNA inserts to demonstrate that the new method is effective at identifying complex cases. Next generation sequencing and bioinformatics offers certain advantages over current approaches, most notably the simplicity, efficiency, and consistency of the method, and provides a viable alternative for efficiently and robustly achieving molecular characterization of GM crops.

  10. Citrate synthase gene sequence: a new tool for phylogenetic analysis and identification of Ehrlichia.

    Science.gov (United States)

    Inokuma, H; Brouqui, P; Drancourt, M; Raoult, D

    2001-09-01

    The sequence of the citrate synthase gene (gltA) of 13 ehrlichial species (Ehrlichia chaffeensis, Ehrlichia canis, Ehrlichia muris, an Ehrlichia species recently detected from Ixodes ovatus, Cowdria ruminantium, Ehrlichia phagocytophila, Ehrlichia equi, the human granulocytic ehrlichiosis [HGE] agent, Anaplasma marginale, Anaplasma centrale, Ehrlichia sennetsu, Ehrlichia risticii, and Neorickettsia helminthoeca) have been determined by degenerate PCR and the Genome Walker method. The ehrlichial gltA genes are 1,197 bp (E. sennetsu and E. risticii) to 1,254 bp (A. marginale and A. centrale) long, and GC contents of the gene vary from 30.5% (Ehrlichia sp. detected from I. ovatus) to 51.0% (A. centrale). The percent identities of the gltA nucleotide sequences among ehrlichial species were 49.7% (E. risticii versus A. centrale) to 99.8% (HGE agent versus E. equi). The percent identities of deduced amino acid sequences were 44.4% (E. sennetsu versus E. muris) to 99.5% (HGE agent versus E. equi), whereas the homology range of 16S rRNA genes was 83.5% (E. risticii versus the Ehrlichia sp. detected from I. ovatus) to 99.9% (HGE agent, E. equi, and E. phagocytophila). The architecture of the phylogenetic trees constructed by gltA nucleotide sequences or amino acid sequences was similar to that derived from the 16S rRNA gene sequences but showed more-significant bootstrap values. Based upon the alignment analysis of the ehrlichial gltA sequences, two sets of primers were designed to amplify tick-borne Ehrlichia and Neorickettsia genogroup Ehrlichia (N. helminthoeca, E. sennetsu, and E. risticii), respectively. Tick-borne Ehrlichia species were specifically identified by restriction fragment length polymorphism (RFLP) patterns of AcsI and XhoI with the exception of E. muris and the very closely related ehrlichia derived from I. ovatus for which sequence analysis of the PCR product is needed. Similarly, Neorickettsia genogroup Ehrlichia species were specifically identified by

  11. Chimira: analysis of small RNA sequencing data and microRNA modifications.

    Science.gov (United States)

    Vitsios, Dimitrios M; Enright, Anton J

    2015-10-15

    Chimira is a web-based system for microRNA (miRNA) analysis from small RNA-Seq data. Sequences are automatically cleaned, trimmed, size selected and mapped directly to miRNA hairpin sequences. This generates count-based miRNA expression data for subsequent statistical analysis. Moreover, it is capable of identifying epi-transcriptomic modifications in the input sequences. Supported modification types include multiple types of 3'-modifications (e.g. uridylation, adenylation), 5'-modifications and also internal modifications or variation (ADAR editing or single nucleotide polymorphisms). Besides cleaning and mapping of input sequences to miRNAs, Chimira provides a simple and intuitive set of tools for the analysis and interpretation of the results (see also Supplementary Material). These allow the visual study of the differential expression between two specific samples or sets of samples, the identification of the most highly expressed miRNAs within sample pairs (or sets of samples) and also the projection of the modification profile for specific miRNAs across all samples. Other tools have already been published in the past for various types of small RNA-Seq analysis, such as UEA workbench, seqBuster, MAGI, OASIS and CAP-miRSeq, CPSS for modifications identification. A comprehensive comparison of Chimira with each of these tools is provided in the Supplementary Material. Chimira outperforms all of these tools in total execution speed and aims to facilitate simple, fast and reliable analysis of small RNA-Seq data allowing also, for the first time, identification of global microRNA modification profiles in a simple intuitive interface. Chimira has been developed as a web application and it is accessible here: http://www.ebi.ac.uk/research/enright/software/chimira. aje@ebi.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  12. Hidden Markov models for sequence analysis: extension and analysis of the basic method

    DEFF Research Database (Denmark)

    Hughey, Richard; Krogh, Anders Stærmose

    1996-01-01

    -maximization training procedure is relatively straight-forward. In this paper,we review the mathematical extensions and heuristics that move the method from the theoreticalto the practical. Then, we experimentally analyze the effectiveness of model regularization,dynamic model modification, and optimization strategies......Hidden Markov models (HMMs) are a highly effective means of modeling a family of unalignedsequences or a common motif within a set of unaligned sequences. The trained HMM can then beused for discrimination or multiple alignment. The basic mathematical description of an HMMand its expectation....... Finally it is demonstrated on the SH2domain how a domain can be found from unaligned sequences using a special model type. Theexperimental work was completed with the aid of the Sequence Alignment and Modeling softwaresuite....

  13. Re-Analysis of Metagenomic Sequences from Acute Flaccidmyelitis Patients Reveals Alternatives to Enterovirus D68 Infection

    Science.gov (United States)

    2015-07-13

    caused in some cases by infection with enterovirus D68. We found that among the patients whose symptoms were previously attributed to enterovirus D68...distribution is unlimited. Re-analysis of metagenomic sequences from acute flaccidmyelitis patients reveals alternatives to enterovirus D68...Street Baltimore, MD 21218 -2685 ABSTRACT Re-analysis of metagenomic sequences from acute flaccidmyelitis patients reveals alternatives to enterovirus

  14. Genotypic tropism testing by massively parallel sequencing: qualitative and quantitative analysis

    Directory of Open Access Journals (Sweden)

    Thiele Bernhard

    2011-05-01

    Full Text Available Abstract Background Inferring viral tropism from genotype is a fast and inexpensive alternative to phenotypic testing. While being highly predictive when performed on clonal samples, sensitivity of predicting CXCR4-using (X4 variants drops substantially in clinical isolates. This is mainly attributed to minor variants not detected by standard bulk-sequencing. Massively parallel sequencing (MPS detects single clones thereby being much more sensitive. Using this technology we wanted to improve genotypic prediction of coreceptor usage. Methods Plasma samples from 55 antiretroviral-treated patients tested for coreceptor usage with the Monogram Trofile Assay were sequenced with standard population-based approaches. Fourteen of these samples were selected for further analysis with MPS. Tropism was predicted from each sequence with geno2pheno[coreceptor]. Results Prediction based on bulk-sequencing yielded 59.1% sensitivity and 90.9% specificity compared to the trofile assay. With MPS, 7600 reads were generated on average per isolate. Minorities of sequences with high confidence in CXCR4-usage were found in all samples, irrespective of phenotype. When using the default false-positive-rate of geno2pheno[coreceptor] (10%, and defining a minority cutoff of 5%, the results were concordant in all but one isolate. Conclusions The combination of MPS and coreceptor usage prediction results in a fast and accurate alternative to phenotypic assays. The detection of X4-viruses in all isolates suggests that coreceptor usage as well as fitness of minorities is important for therapy outcome. The high sensitivity of this technology in combination with a quantitative description of the viral population may allow implementing meaningful cutoffs for predicting response to CCR5-antagonists in the presence of X4-minorities.

  15. Genotypic tropism testing by massively parallel sequencing: qualitative and quantitative analysis.

    Science.gov (United States)

    Däumer, Martin; Kaiser, Rolf; Klein, Rolf; Lengauer, Thomas; Thiele, Bernhard; Thielen, Alexander

    2011-05-13

    Inferring viral tropism from genotype is a fast and inexpensive alternative to phenotypic testing. While being highly predictive when performed on clonal samples, sensitivity of predicting CXCR4-using (X4) variants drops substantially in clinical isolates. This is mainly attributed to minor variants not detected by standard bulk-sequencing. Massively parallel sequencing (MPS) detects single clones thereby being much more sensitive. Using this technology we wanted to improve genotypic prediction of coreceptor usage. Plasma samples from 55 antiretroviral-treated patients tested for coreceptor usage with the Monogram Trofile Assay were sequenced with standard population-based approaches. Fourteen of these samples were selected for further analysis with MPS. Tropism was predicted from each sequence with geno2pheno[coreceptor]. Prediction based on bulk-sequencing yielded 59.1% sensitivity and 90.9% specificity compared to the trofile assay. With MPS, 7600 reads were generated on average per isolate. Minorities of sequences with high confidence in CXCR4-usage were found in all samples, irrespective of phenotype. When using the default false-positive-rate of geno2pheno[coreceptor] (10%), and defining a minority cutoff of 5%, the results were concordant in all but one isolate. The combination of MPS and coreceptor usage prediction results in a fast and accurate alternative to phenotypic assays. The detection of X4-viruses in all isolates suggests that coreceptor usage as well as fitness of minorities is important for therapy outcome. The high sensitivity of this technology in combination with a quantitative description of the viral population may allow implementing meaningful cutoffs for predicting response to CCR5-antagonists in the presence of X4-minorities.

  16. Genetic analysis of 430 Chinese Cynodon dactylon accessions using sequence-related amplified polymorphism markers.

    Science.gov (United States)

    Huang, Chunqiong; Liu, Guodao; Bai, Changjun; Wang, Wenqiang

    2014-10-21

    Although Cynodon dactylon (C. dactylon) is widely distributed in China, information on its genetic diversity within the germplasm pool is limited. The objective of this study was to reveal the genetic variation and relationships of 430 C. dactylon accessions collected from 22 Chinese provinces using sequence-related amplified polymorphism (SRAP) markers. Fifteen primer pairs were used to amplify specific C. dactylon genomic sequences. A total of 481 SRAP fragments were generated, with fragment sizes ranging from 260-1800 base pairs (bp). Genetic similarity coefficients (GSC) among the 430 accessions averaged 0.72 and ranged from 0.53-0.96. Cluster analysis conducted by two methods, namely the unweighted pair-group method with arithmetic averages (UPGMA) and principle coordinate analysis (PCoA), separated the accessions into eight distinct groups. Our findings verify that Chinese C. dactylon germplasms have rich genetic diversity, which is an excellent basis for C. dactylon breeding for new cultivars.

  17. Comparative genome sequence analysis underscores mycoparasitism as the ancestral life style of Trichoderma

    OpenAIRE

    Kubicek, Christian P.; Herrera-Estrella, Alfredo; Seidl-Seiboth, Verena; Martinez, Diego A.; Druzhinina, Irina S.; Thon, Michael; Zeilinger, Susanne; Casas-Flores, Sergio; Horwitz, Benjamin A.; Mukherjee, Prasun K.; Mukherjee, Mala; Kredics, László; Alcaraz, Luis D.; Aerts, Andrea; Antal, Zsuzsanna

    2011-01-01

    Background Mycoparasitism, a lifestyle where one fungus is parasitic on another fungus, has special relevance when the prey is a plant pathogen, providing a strategy for biological control of pests for plant protection. Probably, the most studied biocontrol agents are species of the genus Hypocrea/Trichoderma. Results Here we report an analysis of the genome sequences of the two biocontrol species Trichoderma atroviride (teleomorph Hypocrea atroviridis) and Trichoderma virens (formerly Gliocl...

  18. Evaluation of next generation sequencing for the analysis of Eimeria communities in wildlife.

    Science.gov (United States)

    Vermeulen, Elke T; Lott, Matthew J; Eldridge, Mark D B; Power, Michelle L

    2016-05-01

    Next-generation sequencing (NGS) techniques are well-established for studying bacterial communities but not yet for microbial eukaryotes. Parasite communities remain poorly studied, due in part to the lack of reliable and accessible molecular methods to analyse eukaryotic communities. We aimed to develop and evaluate a methodology to analyse communities of the protozoan parasite Eimeria from populations of the Australian marsupial Petrogale penicillata (brush-tailed rock-wallaby) using NGS. An oocyst purification method for small sample sizes and polymerase chain reaction (PCR) protocol for the 18S rRNA locus targeting Eimeria was developed and optimised prior to sequencing on the Illumina MiSeq platform. A data analysis approach was developed by modifying methods from bacterial metagenomics and utilising existing Eimeria sequences in GenBank. Operational taxonomic unit (OTU) assignment at a high similarity threshold (97%) was more accurate at assigning Eimeria contigs into Eimeria OTUs but at a lower threshold (95%) there was greater resolution between OTU consensus sequences. The assessment of two amplification PCR methods prior to Illumina MiSeq, single and nested PCR, determined that single PCR was more sensitive to Eimeria as more Eimeria OTUs were detected in single amplicons. We have developed a simple and cost-effective approach to a data analysis pipeline for community analysis of eukaryotic organisms using Eimeria communities as a model. The pipeline provides a basis for evaluation using other eukaryotic organisms and potential for diverse community analysis studies. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Differentiation of Xylella fastidiosa Strains via Multilocus Sequence Analysis of Environmentally Mediated Genes (MLSA-E)

    OpenAIRE

    Parker, Jennifer K.; Havird, Justin C.; De La Fuente, Leonardo

    2012-01-01

    Isolates of the plant pathogen Xylella fastidiosa are genetically very similar, but studies on their biological traits have indicated differences in virulence and infection symptomatology. Taxonomic analyses have identified several subspecies, and phylogenetic analyses of housekeeping genes have shown broad host-based genetic differences; however, results are still inconclusive for genetic differentiation of isolates within subspecies. This study employs multilocus sequence analysis of enviro...

  20. Human factors review for nuclear power plant severe accident sequence analysis

    International Nuclear Information System (INIS)

    Krois, P.A.; Haas, P.M.

    1985-01-01

    The paper discusses work conducted to: (1) support the severe accident sequence analysis of a nuclear power plant transient based on an assessment of operator actions, and (2) develop a descriptive model of operator severe accident management. Operator actions during the transient are assessed using qualitative and quantitative methods. A function-oriented accident management model provides a structure for developing technical operator guidance on mitigating core damage preventing radiological release

  1. The analysis of energy-time sequences in the nuclear power plants construction

    International Nuclear Information System (INIS)

    Milivojevic, S.; Jovanovic, V.; Riznic, J.

    1983-01-01

    The current nuclear energy development pose many problems; one of them is nuclear power plant construction. They are evaluated energy and time features of the construction and their relative ratios by the analysis of available data. The results point at the reached efficiency of the construction and, in the same time, they are the basis for real estimation of energy-time sequences of the construction in the future. (author)

  2. High-resolution analysis of the 5'-end transcriptome using a next generation DNA sequencer.

    Directory of Open Access Journals (Sweden)

    Shin-ichi Hashimoto

    Full Text Available Massively parallel, tag-based sequencing systems, such as the SOLiD system, hold the promise of revolutionizing the study of whole genome gene expression due to the number of data points that can be generated in a simple and cost-effective manner. We describe the development of a 5'-end transcriptome workflow for the SOLiD system and demonstrate the advantages in sensitivity and dynamic range offered by this tag-based application over traditional approaches for the study of whole genome gene expression. 5'-end transcriptome analysis was used to study whole genome gene expression within a colon cancer cell line, HT-29, treated with the DNA methyltransferase inhibitor, 5-aza-2'-deoxycytidine (5Aza. More than 20 million 25-base 5'-end tags were obtained from untreated and 5Aza-treated cells and matched to sequences within the human genome. Seventy three percent of the mapped unique tags were associated with RefSeq cDNA sequences, corresponding to approximately 14,000 different protein-coding genes in this single cell type. The level of expression of these genes ranged from 0.02 to 4,704 transcripts per cell. The sensitivity of a single sequence run of the SOLiD platform was 100-1,000 fold greater than that observed from 5'end SAGE data generated from the analysis of 70,000 tags obtained by Sanger sequencing. The high-resolution 5'end gene expression profiling presented in this study will not only provide novel insight into the transcriptional machinery but should also serve as a basis for a better understanding of cell biology.

  3. Complete genome sequencing and phylogenetic analysis of dengue type 1 virus isolated from Jeddah, Saudi Arabia.

    Science.gov (United States)

    Azhar, Esam I; Hashem, Anwar M; El-Kafrawy, Sherif A; Abol-Ela, Said; Abd-Alla, Adly M M; Sohrab, Sayed Sartaj; Farraj, Suha A; Othman, Norah A; Ben-Helaby, Huda G; Ashshi, Ahmed; Madani, Tariq A; Jamjoom, Ghazi

    2015-01-16

    Dengue viruses (DENVs) are mosquito-borne viruses which can cause disease ranging from mild fever to severe dengue infection. These viruses are endemic in several tropical and subtropical regions. Multiple outbreaks of DENV serotypes 1, 2 and 3 (DENV-1, DENV-2 and DENV-3) have been reported from the western region in Saudi Arabia since 1994. Strains from at least two genotypes of DENV-1 (Asia and America/Africa genotypes) have been circulating in western Saudi Arabia until 2006. However, all previous studies reported from Saudi Arabia were based on partial sequencing data of the envelope (E) gene without any reports of full genome sequences for any DENV serotypes circulating in Saudi Arabia. Here, we report the isolation and the first complete genome sequence of a DENV-1 strain (DENV-1-Jeddah-1-2011) isolated from a patient from Jeddah, Saudi Arabia in 2011. Whole genome sequence alignment and phylogenetic analysis showed high similarity between DENV-1-Jeddah-1-2011 strain and D1/H/IMTSSA/98/606 isolate (Asian genotype) reported from Djibouti in 1998. Further analysis of the full envelope gene revealed a close relationship between DENV-1-Jeddah-1-2011 strain and isolates reported between 2004-2006 from Jeddah as well as recent isolates from Somalia, suggesting the widespread of the Asian genotype in this region. These data suggest that strains belonging to the Asian genotype might have been introduced into Saudi Arabia long before 2004 most probably by African pilgrims and continued to circulate in western Saudi Arabia at least until 2011. Most importantly, these results indicate that pilgrims from dengue endemic regions can play an important role in the spread of new DENVs in Saudi Arabia and the rest of the world. Therefore, availability of complete genome sequences would serve as a reference for future epidemiological studies of DENV-1 viruses.

  4. Identification of Bacillus Probiotics Isolated from Soil Rhizosphere Using 16S rRNA, recA, rpoB Gene Sequencing and RAPD-PCR.

    Science.gov (United States)

    Mohkam, Milad; Nezafat, Navid; Berenjian, Aydin; Mobasher, Mohammad Ali; Ghasemi, Younes

    2016-03-01

    Some Bacillus species, especially Bacillus subtilis and Bacillus pumilus groups, have highly similar 16S rRNA gene sequences, which are hard to identify based on 16S rDNA sequence analysis. To conquer this drawback, rpoB, recA sequence analysis along with randomly amplified polymorphic (RAPD) fingerprinting was examined as an alternative method for differentiating Bacillus species. The 16S rRNA, rpoB and recA genes were amplified via a polymerase chain reaction using their specific primers. The resulted PCR amplicons were sequenced, and phylogenetic analysis was employed by MEGA 6 software. Identification based on 16S rRNA gene sequencing was underpinned by rpoB and recA gene sequencing as well as RAPD-PCR technique. Subsequently, concatenation and phylogenetic analysis showed that extent of diversity and similarity were better obtained by rpoB and recA primers, which are also reinforced by RAPD-PCR methods. However, in one case, these approaches failed to identify one isolate, which in combination with the phenotypical method offsets this issue. Overall, RAPD fingerprinting, rpoB and recA along with concatenated genes sequence analysis discriminated closely related Bacillus species, which highlights the significance of the multigenic method in more precisely distinguishing Bacillus strains. This research emphasizes the benefit of RAPD fingerprinting, rpoB and recA sequence analysis superior to 16S rRNA gene sequence analysis for suitable and effective identification of Bacillus species as recommended for probiotic products.

  5. Domain fusion analysis by applying relational algebra to protein sequence and domain databases.

    Science.gov (United States)

    Truong, Kevin; Ikura, Mitsuhiko

    2003-05-06

    Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at http://calcium.uhnres.utoronto.ca/pi. As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time.

  6. TranslatomeDB: a comprehensive database and cloud-based analysis platform for translatome sequencing data.

    Science.gov (United States)

    Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie; Zhang, Gong

    2018-01-04

    Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Multilocus sequence analysis of nectar pseudomonads reveals high genetic diversity and contrasting recombination patterns.

    Science.gov (United States)

    Alvarez-Pérez, Sergio; de Vega, Clara; Herrera, Carlos M

    2013-01-01

    The genetic and evolutionary relationships among floral nectar-dwelling Pseudomonas 'sensu stricto' isolates associated to South African and Mediterranean plants were investigated by multilocus sequence analysis (MLSA) of four core housekeeping genes (rrs, gyrB, rpoB and rpoD). A total of 35 different sequence types were found for the 38 nectar bacterial isolates characterised. Phylogenetic analyses resulted in the identification of three main clades [nectar groups (NGs) 1, 2 and 3] of nectar pseudomonads, which were closely related to five intrageneric groups: Pseudomonas oryzihabitans (NG 1); P. fluorescens, P. lutea and P. syringae (NG 2); and P. rhizosphaerae (NG 3). Linkage disequilibrium analysis pointed to a mostly clonal population structure, even when the analysis was restricted to isolates from the same floristic region or belonging to the same NG. Nevertheless, signatures of recombination were observed for NG 3, which exclusively included isolates retrieved from the floral nectar of insect-pollinated Mediterranean plants. In contrast, the other two NGs comprised both South African and Mediterranean isolates. Analyses relating diversification to floristic region and pollinator type revealed that there has been more unique evolution of the nectar pseudomonads within the Mediterranean region than would be expected by chance. This is the first work analysing the sequence of multiple loci to reveal geno- and ecotypes of nectar bacteria.

  8. Multilocus Sequence Analysis of Nectar Pseudomonads Reveals High Genetic Diversity and Contrasting Recombination Patterns

    Science.gov (United States)

    Álvarez-Pérez, Sergio; de Vega, Clara; Herrera, Carlos M.

    2013-01-01

    The genetic and evolutionary relationships among floral nectar-dwelling Pseudomonas ‘sensu stricto’ isolates associated to South African and Mediterranean plants were investigated by multilocus sequence analysis (MLSA) of four core housekeeping genes (rrs, gyrB, rpoB and rpoD). A total of 35 different sequence types were found for the 38 nectar bacterial isolates characterised. Phylogenetic analyses resulted in the identification of three main clades [nectar groups (NGs) 1, 2 and 3] of nectar pseudomonads, which were closely related to five intrageneric groups: Pseudomonas oryzihabitans (NG 1); P. fluorescens, P. lutea and P. syringae (NG 2); and P. rhizosphaerae (NG 3). Linkage disequilibrium analysis pointed to a mostly clonal population structure, even when the analysis was restricted to isolates from the same floristic region or belonging to the same NG. Nevertheless, signatures of recombination were observed for NG 3, which exclusively included isolates retrieved from the floral nectar of insect-pollinated Mediterranean plants. In contrast, the other two NGs comprised both South African and Mediterranean isolates. Analyses relating diversification to floristic region and pollinator type revealed that there has been more unique evolution of the nectar pseudomonads within the Mediterranean region than would be expected by chance. This is the first work analysing the sequence of multiple loci to reveal geno- and ecotypes of nectar bacteria. PMID:24116076

  9. VisRseq: R-based visual framework for analysis of sequencing data.

    Science.gov (United States)

    Younesy, Hamid; Möller, Torsten; Lorincz, Matthew C; Karimi, Mohammad M; Jones, Steven J M

    2015-01-01

    Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for integrative and interactive analyses without requiring programming expertise. We achieve this aim by providing R apps, which offer a semi-auto generated and unified graphical user interface for computational packages in R and repositories such as Bioconductor. To address the interactivity limitation inherent in R libraries, our framework includes several native apps that provide exploration and brushing operations as well as an integrated genome browser. The apps can be chained together to create more powerful analysis workflows. To validate the usability of VisRseq for analysis of sequencing data, we present two case studies performed by our collaborators and report their workflow and insights.

  10. [Sequence analysis of LEAFY homologous gene from Dendrobium moniliforme and application for identification of medicinal Dendrobium].

    Science.gov (United States)

    Xing, Wen-Rui; Hou, Bei-Wei; Guan, Jing-Jiao; Luo, Jing; Ding, Xiao-Yu

    2013-04-01

    The LEAFY (LFY) homologous gene of Dendrobium moniliforme (L.) Sw. was cloned by new primers which were designed based on the conservative region of known sequences of orchid LEAFY gene. Partial LFY homologous gene was cloned by common PCR, then we got the complete LFY homologous gene Den LFY by Tail-PCR. The complete sequence of DenLFY gene was 3 575 bp which contained three exons and two introns. Using BLAST method, comparison analysis among the exon of LFY homologous gene indicted that the DenLFY gene had high identity with orchids LFY homologous, including the related fragment of PhalLFY (84%) in Phalaenopsis hybrid cultivar, LFY homologous gene in Oncidium (90%) and in other orchid (over 80%). Using MP analysis, Dendrobium is found to be the sister to Oncidium and Phalaenopsis. Homologous analysis demonstrated that the C-terminal amino acids were highly conserved. When the exons and introns were separately considered, exons and the sequence of amino acid were good markers for the function research of DenLFY gene. The second intron can be used in authentication research of Dendrobium based on the length polymorphism between Dendrobium moniliforme and Dendrobium officinale.

  11. Multigene phylogeny of land plants with special reference to bryophytes and the earliest land plants.

    Science.gov (United States)

    Nickrent, D L; Parkinson, C L; Palmer, J D; Duff, R J

    2000-12-01

    A widely held view of land plant relationships places liverworts as the first branch of the land plant tree, whereas some molecular analyses and a cladistic study of morphological characters indicate that hornworts are the earliest land plants. To help resolve this conflict, we used parsimony and likelihood methods to analyze a 6, 095-character data set composed of four genes (chloroplast rbcL and small-subunit rDNA from all three plant genomes) from all major land plant lineages. In all analyses, significant support was obtained for the monophyly of vascular plants, lycophytes, ferns (including PSILOTUM: and EQUISETUM:), seed plants, and angiosperms. Relationships among the three bryophyte lineages were unresolved in parsimony analyses in which all positions were included and weighted equally. However, in parsimony and likelihood analyses in which rbcL third-codon-position transitions were either excluded or downweighted (due to apparent saturation), hornworts were placed as sister to all other land plants, with mosses and liverworts jointly forming the second deepest lineage. Decay analyses and Kishino-Hasegawa tests of the third-position-excluded data set showed significant support for the hornwort-basal topology over several alternative topologies, including the commonly cited liverwort-basal topology. Among the four genes used, mitochondrial small-subunit rDNA showed the lowest homoplasy and alone recovered essentially the same topology as the multigene tree. This molecular phylogeny presents new opportunities to assess paleontological evidence and morphological innovations that occurred during the early evolution of terrestrial plants.

  12. Development of multigene expression signature maps at the protein level from digitized immunohistochemistry slides.

    Directory of Open Access Journals (Sweden)

    Gregory J Metzger

    Full Text Available Molecular classification of diseases based on multigene expression signatures is increasingly used for diagnosis, prognosis, and prediction of response to therapy. Immunohistochemistry (IHC is an optimal method for validating expression signatures obtained using high-throughput genomics techniques since IHC allows a pathologist to examine gene expression at the protein level within the context of histologically interpretable tissue sections. Additionally, validated IHC assays may be readily implemented as clinical tests since IHC is performed on routinely processed clinical tissue samples. However, methods have not been available for automated n-gene expression profiling at the protein level using IHC data. We have developed methods to compute expression level maps (signature maps of multiple genes from IHC data digitized on a commercial whole slide imaging system. Areas of cancer for these expression level maps are defined by a pathologist on adjacent, co-registered H&E slides, allowing assessment of IHC statistics and heterogeneity within the diseased tissue. This novel way of representing multiple IHC assays as signature maps will allow the development of n-gene expression profiling databases in three dimensions throughout virtual whole organ reconstructions.

  13. A Multi-Gene Phylogeny of Ceratocystis Manginecans Infecting Mango in Pakistan

    International Nuclear Information System (INIS)

    Rashid, A.; Ahmad, I.; Iram, S.

    2016-01-01

    Mango trees (Mangifera indica L.) are affected by a serious wilt disease, recognized as mango sudden death first time reported in Muzafargargh Punjab, Pakistan in 1995. Its prevalent is in almost all mango growing areas with severity ranged from 2-5 percent in Punjab and 5-10 percent in Sindh. Survey and sampling was conducted during the year 2011-12, on mango orchids in different distracts of Punjab and Sindh and no location was found free from this Disease. For molecular identification, DNA was successfully extracted and was then amplified by using ITS, BT, TEF (600-800)primers through Polymerase Chain Reaction (PCR) assay and nucleotide evidence of Pakistani isolates (45 for each gene) exhibiting the maximum genetic homology with Ceratocystis manginecans (99-100 percent) followed by C. fimbriata (97 percent) and C. omanensis (80 percent) respectively. On the basics of morphological tools and comparison of nucleotide evidence of multi-genes, C. manginecans is different from C. fimbriata and C. omanensis which infect mango in Pakistan. The availability of disease-free planting material and management in combination with fertilization and proper irrigation system would help in improving orchard management system. (author)

  14. DNA Barcoding: Amplification and sequence analysis of rbcl and matK genome regions in three divergent plant species

    Directory of Open Access Journals (Sweden)

    Javed Iqbal Wattoo

    2016-11-01

    Full Text Available Background: DNA barcoding is a novel method of species identification based on nucleotide diversity of conserved sequences. The establishment and refining of plant DNA barcoding systems is more challenging due to high genetic diversity among different species. Therefore, targeting the conserved nuclear transcribed regions would be more reliable for plant scientists to reveal genetic diversity, species discrimination and phylogeny. Methods: In this study, we amplified and sequenced the chloroplast DNA regions (matk+rbcl of Solanum nigrum, Euphorbia helioscopia and Dalbergia sissoo to study the functional annotation, homology modeling and sequence analysis to allow a more efficient utilization of these sequences among different plant species. These three species represent three families; Solanaceae, Euphorbiaceae and Fabaceae respectively. Biological sequence homology and divergence of amplified sequences was studied using Basic Local Alignment Tool (BLAST. Results: Both primers (matk+rbcl showed good amplification in three species. The sequenced regions reveled conserved genome information for future identification of different medicinal plants belonging to these species. The amplified conserved barcodes revealed different levels of biological homology after sequence analysis. The results clearly showed that the use of these conserved DNA sequences as barcode primers would be an accurate way for species identification and discrimination. Conclusion: The amplification and sequencing of conserved genome regions identified a novel sequence of matK in native species of Solanum nigrum. The findings of the study would be applicable in medicinal industry to establish DNA based identification of different medicinal plant species to monitor adulteration.

  15. Analysis of mutations in the entire coding sequence of the factor VIII gene

    Energy Technology Data Exchange (ETDEWEB)

    Bidichadani, S.I.; Lanyon, W.G.; Connor, J.M. [Glascow Univ. (United Kingdom)] [and others

    1994-09-01

    Hemophilia A is a common X-linked recessive disorder of bleeding caused by deleterious mutations in the gene for clotting factor VIII. The large size of the factor VIII gene, the high frequency of de novo mutations and its tissue-specific expression complicate the detection of mutations. We have used a combination of RT-PCR of ectopic factor VIII transcripts and genomic DNA-PCRs to amplify the entire essential sequence of the factor VIII gene. This is followed by chemical mismatch cleavage analysis and direct sequencing in order to facilitate a comprehensive search for mutations. We describe the characterization of nine potentially pathogenic mutations, six of which are novel. In each case, a correlation of the genotype with the observed phenotype is presented. In order to evaluate the pathogenicity of the five missense mutations detected, we have analyzed them for evolutionary sequence conservation and for their involvement of sequence motifs catalogued in the PROSITE database of protein sites and patterns.

  16. Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Yaron Orenstein

    2017-10-01

    Full Text Available With the rapidly increasing volume of deep sequencing data, more efficient algorithms and data structures are needed. Minimizers are a central recent paradigm that has improved various sequence analysis tasks, including hashing for faster read overlap detection, sparse suffix arrays for creating smaller indexes, and Bloom filters for speeding up sequence search. Here, we propose an alternative paradigm that can lead to substantial further improvement in these and other tasks. For integers k and L > k, we say that a set of k-mers is a universal hitting set (UHS if every possible L-long sequence must contain a k-mer from the set. We develop a heuristic called DOCKS to find a compact UHS, which works in two phases: The first phase is solved optimally, and for the second we propose several efficient heuristics, trading set size for speed and memory. The use of heuristics is motivated by showing the NP-hardness of a closely related problem. We show that DOCKS works well in practice and produces UHSs that are very close to a theoretical lower bound. We present results for various values of k and L and by applying them to real genomes show that UHSs indeed improve over minimizers. In particular, DOCKS uses less than 30% of the 10-mers needed to span the human genome compared to minimizers. The software and computed UHSs are freely available at github.com/Shamir-Lab/DOCKS/ and acgt.cs.tau.ac.il/docks/, respectively.

  17. Likelihood functions for the analysis of single-molecule binned photon sequences

    Energy Technology Data Exchange (ETDEWEB)

    Gopich, Irina V., E-mail: irinag@niddk.nih.gov [Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892 (United States)

    2012-03-02

    Graphical abstract: Folding of a protein with attached fluorescent dyes, the underlying conformational trajectory of interest, and the observed binned photon trajectory. Highlights: Black-Right-Pointing-Pointer A sequence of photon counts can be analyzed using a likelihood function. Black-Right-Pointing-Pointer The exact likelihood function for a two-state kinetic model is provided. Black-Right-Pointing-Pointer Several approximations are considered for an arbitrary kinetic model. Black-Right-Pointing-Pointer Improved likelihood functions are obtained to treat sequences of FRET efficiencies. - Abstract: We consider the analysis of a class of experiments in which the number of photons in consecutive time intervals is recorded. Sequence of photon counts or, alternatively, of FRET efficiencies can be studied using likelihood-based methods. For a kinetic model of the conformational dynamics and state-dependent Poisson photon statistics, the formalism to calculate the exact likelihood that this model describes such sequences of photons or FRET efficiencies is developed. Explicit analytic expressions for the likelihood function for a two-state kinetic model are provided. The important special case when conformational dynamics are so slow that at most a single transition occurs in a time bin is considered. By making a series of approximations, we eventually recover the likelihood function used in hidden Markov models. In this way, not only is insight gained into the range of validity of this procedure, but also an improved likelihood function can be obtained.

  18. Transcriptome sequencing and De Novo analysis of Youngia japonica using the illumina platform.

    Directory of Open Access Journals (Sweden)

    Yulan Peng

    Full Text Available Youngia japonica, a weed species distributed worldwide, has been widely used in traditional Chinese medicine. It is an ideal plant for studying the evolution of Asteraceae plants because of its short life history and abundant source. However, little is known about its evolution and genetic diversity. In this study, de novo transcriptome sequencing was conducted for the first time for the comprehensive analysis of the genetic diversity of Y. japonica. The Y. japonica transcriptome was sequenced using Illumina paired-end sequencing technology. We produced 21,847,909 high-quality reads for Y. japonica and assembled them into contigs. A total of 51,850 unigenes were identified, among which 46,087 were annotated in the NCBI non-redundant protein database and 41,752 were annotated in the Swiss-Prot database. We mapped 9,125 unigenes onto 163 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database. In addition, 3,648 simple sequence repeats (SSRs were detected. Our data provide the most comprehensive transcriptome resource currently available for Y. japonica. C4 photosynthesis unigenes were found in the biological process of Y. japonica. There were 5596 unigenes related to defense response and 1344 ungienes related to signal transduction mechanisms (10.95%. These data provide insights into the genetic diversity of Y. japonica. Numerous SSRs contributed to the development of novel markers. These data may serve as a new valuable resource for genomic studies on Youngia and, more generally, Cichoraceae.

  19. Genomic Analysis of a Marine Bacterium: Bioinformatics for Comparison, Evaluation, and Interpretation of DNA Sequences

    Directory of Open Access Journals (Sweden)

    Bhagwan N. Rekadwad

    2016-01-01

    Full Text Available A total of five highly related strains of an unidentified marine bacterium were analyzed through their short genome sequences (AM260709–AM260713. Genome-to-Genome Distance (GGDC showed high similarity to Pseudoalteromonas haloplanktis (X67024. The generated unique Quick Response (QR codes indicated no identity to other microbial species or gene sequences. Chaos Game Representation (CGR showed the number of bases concentrated in the area. Guanine residues were highest in number followed by cytosine. Frequency of Chaos Game Representation (FCGR indicated that CC and GG blocks have higher frequency in the sequence from the evaluated marine bacterium strains. Maximum GC content for the marine bacterium strains ranged 53-54%. The use of QR codes, CGR, FCGR, and GC dataset helped in identifying and interpreting short genome sequences from specific isolates. A phylogenetic tree was constructed with the bootstrap test (1000 replicates using MEGA6 software. Principal Component Analysis (PCA was carried out using EMBL-EBI MUSCLE program. Thus, generated genomic data are of great assistance for hierarchical classification in Bacterial Systematics which combined with phenotypic features represents a basic procedure for a polyphasic approach on unambiguous bacterial isolate taxonomic classification.

  20. Post-contrast T1-weighted sequences in pediatric abdominal imaging: comparative analysis of three different sequences and imaging approach

    Energy Technology Data Exchange (ETDEWEB)

    Roque, Andreia; Ramalho, Miguel; AlObaidy, Mamdoh; Heredia, Vasco; Burke, Lauren M.; De Campos, Rafael O.P.; Semelka, Richard C. [University of North Carolina at Chapel Hill, Department of Radiology, Chapel Hill, NC (United States)

    2014-10-15

    Post-contrast T1-weighted imaging is an essential component of a comprehensive pediatric abdominopelvic MR examination. However, consistent good image quality is challenging, as respiratory motion in sedated children can substantially degrade the image quality. To compare the image quality of three different post-contrast T1-weighted imaging techniques - standard three-dimensional gradient-echo (3-D-GRE), magnetization-prepared gradient-recall echo (MP-GRE) and 3-D-GRE with radial data sampling (radial 3-D-GRE) - acquired in pediatric patients younger than 5 years of age. Sixty consecutive exams performed in 51 patients (23 females, 28 males; mean age 2.5 ± 1.4 years) constituted the final study population. Thirty-nine scans were performed at 3 T and 21 scans were performed at 1.5 T. Two different reviewers independently and blindly qualitatively evaluated all sequences to determine image quality and extent of artifacts. MP-GRE and radial 3-D-GRE sequences had the least respiratory motion (P < 0.0001). Standard 3-D-GRE sequences displayed the lowest average score ratings in hepatic and pancreatic edge definition, hepatic vessel clarity and overall image quality. Radial 3-D-GRE sequences showed the highest scores ratings in overall image quality. Our preliminary results support the preference of fat-suppressed radial 3-D-GRE as the best post-contrast T1-weighted imaging approach for patients under the age of 5 years, when dynamic imaging is not essential. (orig.)

  1. Expressed Sequence Tag-Simple Sequence Repeat (EST-SSR Marker Resources for Diversity Analysis of Mango (Mangifera indica L.

    Directory of Open Access Journals (Sweden)

    Natalie L. Dillon

    2014-01-01

    Full Text Available In this study, a collection of 24,840 expressed sequence tags (ESTs generated from five mango (Mangifera indica L. cDNA libraries was mined for EST-based simple sequence repeat (SSR markers. Over 1,000 ESTs with SSR motifs were detected from more than 24,000 EST sequences with di- and tri-nucleotide repeat motifs the most abundant. Of these, 25 EST-SSRs in genes involved in plant development, stress response, and fruit color and flavor development pathways were selected, developed into PCR markers and characterized in a population of 32 mango selections including M. indica varieties, and related Mangifera species. Twenty-four of the 25 EST-SSR markers exhibited polymorphisms, identifying a total of 86 alleles with an average of 5.38 alleles per locus, and distinguished between all Mangifera selections. Private alleles were identified for Mangifera species. These newly developed EST-SSR markers enhance the current 11 SSR mango genetic identity panel utilized by the Australian Mango Breeding Program. The current panel has been used to identify progeny and parents for selection and the application of this extended panel will further improve and help to design mango hybridization strategies for increased breeding efficiency.

  2. On avoided words, absent words, and their application to biological sequence analysis.

    Science.gov (United States)

    Almirantis, Yannis; Charalampopoulos, Panagiotis; Gao, Jia; Iliopoulos, Costas S; Mohamed, Manal; Pissis, Solon P; Polychronopoulos, Dimitris

    2017-01-01

    The deviation of the observed frequency of a word w from its expected frequency in a given sequence x is used to determine whether or not the word is avoided . This concept is particularly useful in DNA linguistic analysis. The value of the deviation of w , denoted by [Formula: see text], effectively characterises the extent of a word by its edge contrast in the context in which it occurs. A word w of length [Formula: see text] is a [Formula: see text]-avoided word in x if [Formula: see text], for a given threshold [Formula: see text]. Notice that such a word may be completely absent from x . Hence, computing all such words naïvely can be a very time-consuming procedure, in particular for large k . In this article, we propose an [Formula: see text]-time and [Formula: see text]-space algorithm to compute all [Formula: see text]-avoided words of length k in a given sequence of length n over a fixed-sized alphabet. We also present a time-optimal [Formula: see text]-time algorithm to compute all [Formula: see text]-avoided words (of any length) in a sequence of length n over an integer alphabet of size [Formula: see text]. In addition, we provide a tight asymptotic upper bound for the number of [Formula: see text]-avoided words over an integer alphabet and the expected length of the longest one. We make available an implementation of our algorithm. Experimental results, using both real and synthetic data, show the efficiency and applicability of our implementation in biological sequence analysis. The systematic search for avoided words is particularly useful for biological sequence analysis. We present a linear-time and linear-space algorithm for the computation of avoided words of length k in a given sequence x . We suggest a modification to this algorithm so that it computes all avoided words of x , irrespective of their length, within the same time complexity. We also present combinatorial results with regards to avoided words and absent words.

  3. Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud.

    Directory of Open Access Journals (Sweden)

    Malachi Griffith

    2015-08-01

    Full Text Available Massively parallel RNA sequencing (RNA-seq has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki.

  4. Tracing the evolutionary history of the little-known Mediterranean-Macaronesian genus Andryala (Asteraceae) by multigene sequencing

    Czech Academy of Sciences Publication Activity Database

    Ferreira, M. Z.; Zahradníček, Jaroslav; Kadlecová, Jana; Menezes de Sequeria, M.; Chrtek, Jindřich; Fehrer, Judith

    2015-01-01

    Roč. 64, č. 3 (2015), s. 535-551 ISSN 0040-0262 R&D Projects: GA ČR GAP506/10/1363 Institutional support: RVO:67985939 Keywords : Andryala * colonization * molecular phylogeny Subject RIV: EF - Botanics Impact factor: 2.907, year: 2015

  5. Characterization of an ethylene-related small multigene family from Lycopersicon esculentum

    Energy Technology Data Exchange (ETDEWEB)

    Holdsworth, M

    1987-01-01

    cDNA clones derived from a tomato ripening-related cDNA library were used in RNA dot-blot experiments to investigate changes in the abundance of ripening related mRNAs during both natural and ethylene-induced ripening. Accumulation of the ripening-related mRNAs during natural ripening began at the time of autocatalytic ethylene production by the fruit, reached a maximum in orange fruit and declined as they became red. Analysis of the induction kinetics of these mRNAs revealed several patterns of expression as tomatoes ripened. The pTOM 13 cDNA insert was sequenced and used to identify related sequences in a tomato genomic library. 21 hybridizing genomic clones were isolated and divided into three groups of similar sequences based on their restriction maps. The DNA sequences of two of these groups of genomic clones that hybridized to pTOM 13 were determined. This allowed the identification of an incomplete pTOM 13-homologous gene, and a closely related complete gene. Nuclei were isolated from unwounded and wounded leaves and were used in run-off transcription experiments in the presence of (..cap alpha../sup 32/P)UTP. (/sup 32/P)-labelled RNA obtained from transcription experiments was used in dot-blot experiments against pTOM 13 and related genomic subclones. The results of these experiments demonstrated that the accumulation of pTOM 13-related genes in leaves may be controlled at transcriptional and post-transcriptional levels.

  6. QuickNGS elevates Next-Generation Sequencing data analysis to a new level of automation.

    Science.gov (United States)

    Wagle, Prerana; Nikolić, Miloš; Frommolt, Peter

    2015-07-01

    Next-Generation Sequencing (NGS) has emerged as a widely used tool in molecular biology. While time and cost for the sequencing itself are decreasing, the analysis of the massive amounts of data remains challenging. Since multiple algorithmic approaches for the basic data analysis have been developed, there is now an increasing need to efficiently use these tools to obtain results in reasonable time. We have developed QuickNGS, a new workflow system for laboratories with the need to analyze data from multiple NGS projects at a time. QuickNGS takes advantage of parallel computing resources, a comprehensive back-end database, and a careful selection of previously published algorithmic approaches to build fully automated data analysis workflows. We demonstrate the efficiency of our new software by a comprehensive analysis of 10 RNA-Seq samples which we can finish in only a few minutes of hands-on time. The approach we have taken is suitable to process even much larger numbers of samples and multiple projects at a time. Our approach considerably reduces the barriers that still limit the usability of the powerful NGS technology and finally decreases the time to be spent before proceeding to further downstream analysis and interpretation of the data.

  7. Nucleotide and amino acid sequences of a coat protein of an Ukrainian isolate of Potato virus Y: comparison with homologous sequences of other isolates and phylogenetic analysis

    Directory of Open Access Journals (Sweden)

    Budzanivska I. G.

    2014-03-01

    Full Text Available Aim. Identification of the widespread Ukrainian isolate(s of PVY (Potato virus Y in different potato cultivars and subsequent phylogenetic analysis of detected PVY isolates based on NA and AA sequences of coat protein. Methods. ELISA, RT-PCR, DNA sequencing and phylogenetic analysis. Results. PVY has been identified serologically in potato cultivars of Ukrainian selection. In this work we have optimized a method for total RNA extraction from potato samples and offered a sensitive and specific PCR-based test system of own design for diagnostics of the Ukrainian PVY isolates. Part of the CP gene of the Ukrainian PVY isolate has been sequenced and analyzed phylogenetically. It is demonstrated that the Ukrainian isolate of Potato virus Y (CP gene has a higher percentage of homology with the recombinant isolates (strains of this pathogen (approx. 98.8– 99.8 % of homology for both nucleotide and translated amino acid sequences of the CP gene. The Ukrainian isolate of PVY is positioned in the separate cluster together with the isolates found in Syria, Japan and Iran; these isolates possibly have common origin. The Ukrainian PVY isolate is confirmed to be recombinant. Conclusions. This work underlines the need and provides the means for accurate monitoring of Potato virus Y in the agroecosystems of Ukraine. Most importantly, the phylogenetic analysis demonstrated the recombinant nature of this PVY isolate which has been attributed to the strain group O, subclade N:O.

  8. A Reference Viral Database (RVDB) To Enhance Bioinformatics Analysis of High-Throughput Sequencing for Novel Virus Detection.

    Science.gov (United States)

    Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike; Khan, Arifa S

    2018-01-01

    Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have

  9. Applications of statistical physics and information theory to the analysis of DNA sequences

    Science.gov (United States)

    Grosse, Ivo

    2000-10-01

    DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.

  10. 16S ribosomal RNA sequence analysis for determination of phylogenetic relationship among methylotrophs.

    Science.gov (United States)

    Tsuji, K; Tsien, H C; Hanson, R S; DePalma, S R; Scholtz, R; LaRoche, S

    1990-01-01

    16S ribosomal RNAs (rRNA) of 12 methylotrophic bacteria have been almost completely sequenced to establish their phylogenetic relationships. Methylotrophs that are physiologically related are phylogenetically diverse and are scattered among the purple eubacteria (class Proteobacteria). Group I methylotrophs can be classified in the beta- and the gamma-subdivisions and group II methylotrophs in the alpha-subdivision of the purple eubacteria, respectively. Pink-pigmented facultative and non-pigmented obligate group II methylotrophs form two distinctly separate branches within the alpha-subdivision. The secondary structures of the 16S rRNA sequences of 'Methylocystis parvus' strain OBBP, 'Methylosinus trichosporium' strain OB3b, 'Methylosporovibrio methanica' strain 81Z and Hyphomicrobium sp. strain DM2 are similar, and these non-pigmented obligate group II methylotrophs form one tight cluster in the alpha-subdivision. The pink-pigmented facultative methylotrophs, Methylobacterium extorquens strain AM1, Methylobacterium sp. strain DM4 and Methylobacterium organophilum strain XX form another cluster within the alpha-subdivision. Although similar in phenotypic characteristics, Methylobacterium organophilum strain XX and Methylobacterium extorquens strain AM1 are clearly distinguishable by their 16S rRNA sequences. The group I methylotrophs, Methylophilus methylotrophus strain AS1 and methylotrophic species DM11, which do not utilize methane, are similar in 16S rRNA sequence to bacteria in the beta-subdivision. The methane-utilizing, obligate group I methanotrophs, Methylococcus capsulatus strain BATH and Methylomonas methanica, are placed in the gamma-subdivision. The results demonstrate that it is possible to distinguish and classify the methylotrophic bacteria using 16S rRNA sequence analysis.

  11. Re-annotation and re-analysis of the Campylobacter jejuni NCTC11168 genome sequence

    Directory of Open Access Journals (Sweden)

    Dorrell Nick

    2007-06-01

    Full Text Available Abstract Background Campylobacter jejuni is the leading bacterial cause of human gastroenteritis in the developed world. To improve our understanding of this important human pathogen, the C. jejuni NCTC11168 genome was sequenced and published in 2000. The original annotation was a milestone in Campylobacter research, but is outdated. We now describe the complete re-annotation and re-analysis of the C. jejuni NCTC11168 genome using current database information, novel tools and annotation techniques not used during the original annotation. Results Re-annotation was carried out using sequence database searches such as FASTA, along with programs such as TMHMM for additional support. The re-annotation also utilises sequence data from additional Campylobacter strains and species not available during the original annotation. Re-annotation was accompanied by a full literature search that was incorporated into the updated EMBL file [EMBL: AL111168]. The C. jejuni NCTC11168 re-annotation reduced the total number of coding sequences from 1654 to 1643, of which 90.0% have additional information regarding the identification of new motifs and/or relevant literature. Re-annotation has led to 18.2% of coding sequence product functions being revised. Conclusions Major updates were made to genes involved in the biosynthesis of important surface structures such as lipooligosaccharide, capsule and both O- and N-linked glycosylation. This re-annotation will be a key resource for Campylobacter research and will also provide a prototype for the re-annotation and re-interpretation of other bacterial genomes.

  12. Insights into the emergent bacterial pathogen Cronobacter spp., generated by multilocus sequence typing and analysis

    Directory of Open Access Journals (Sweden)

    Susan eJoseph

    2012-11-01

    Full Text Available Cronobacter spp. (previously known as Enterobacter sakazakii is a bacterial pathogen affecting all age groups, with particularly severe clinical complications in neonates and infants. One recognised route of infection being the consumption of contaminated infant formula. As a recently recognised bacterial pathogen of considerable importance and regulatory control, appropriate detection and identification schemes are required. The application of multilocus sequence typing (MLST and analysis (MLSA of the seven alleles atpD, fusA, glnS, gltB, gyrB, infB and ppsA (concatenated length 3036 base pairs has led to considerable advances in our understanding of the genus. This approach is supported by both the reliability of DNA sequencing over subjective phenotyping and the establishment of a MLST database which has open access and is also curated; http://www.pubMLST.org/cronobacter. MLST has been used to describe the diversity of the newly recognised genus, instrumental in the formal recognition of new Cronobacter species (C. universalis and C. condimenti and revealed the high clonality of strains and the association of clonal complex 4 with neonatal meningitis cases. Clearly the MLST approach has considerable benefits over the use of non-DNA sequence based methods of analysis for newly emergent bacterial pathogens. The application of MLST and MLSA has dramatically enabled us to better understand this opportunistic bacterium which can cause irreparable damage to a newborn baby’s brain, and has contributed to improved control measures to protect neonatal health.

  13. Multilocus Sequence Analysis of Cercospora spp. from Different Host Plant Families

    Directory of Open Access Journals (Sweden)

    Floreta Fiska Yuliarni

    2014-06-01

    Full Text Available Identification of the genus Cercospora is still complicated due to the host preferences often being used as the main criteria to propose a new name. We determined the relationship between host plants and multilocus sequence variations (ITS rDNA including 5.8S rDNA, elongation factor 1-α, and calmodulin in Cercospora spp. to investigate the host specificity. We used 53 strains of Cercospora spp. infecting 12 plant families for phylogenetic analysis. The sequences of 23 strains of Cercospora spp. infecting the plant families of Asteraceae, Cucurbitaceae, and Solanaceae were determined in this study. The sequences of 30 strains of Cercospora spp. infecting the plant families of Fabaceae, Amaranthaceae, Apiaceae, Plumbaginaceae, Malvaceae, Cistaceae, Plantaginaceae, Lamiaceae, and Poaceae were obtained from GenBank. The molecular phylogenetic analysis revealed that the majority of Cercospora species lack host specificity, and only C. zinniicola, C. zeina, C. zeae-maydis, C. cocciniae, and C. mikaniicola were found to be host-specific. Closely related species of Cercospora could not be distinguished using molecular analyses of ITS, EF, and CAL gene regions. The topology of the phylogenetic tree based on the CAL gene showed a better topology and Cercospora species separation than the trees developed based on the ITS rDNA region or the EF gene.

  14. Molecular cloning and sequencing analysis of the interferon receptor (IFNAR-1) from Columba livia.

    Science.gov (United States)

    Li, Chao; Chang, Wei Shan

    2014-01-01

    Partial sequence cloning of interferon receptor (IFNAR-1) of Columba livia. In order to obtain a certain length (630 bp) of gene, a pair of primers was designed according to the conserved nucleotide sequence of Gallus (EU477527.1) and Taeniopygia guttata (XM_002189232.1) IFNAR-1 gene fragment that was published by GenBank. Special primers were designed by the Race method to amplify the 3'terminal cDNA. The Columba livia IFNAR-1 displayed 88.5%, 80.5% and 73.8% nucleotide identity to Falco peregrinus, Gallus and Taeniopygia guttata, respectively. Phylogenetic analysis of the IFNAR1 gene showed that the relationship of Columba livia, Falco peregrinus and chicken had high homology. We successfully obtained a Columba livia IFNAR-1 gene partial sequence. Analysis of the genetic tree showed that the relationship of Columba livia and Falco peregrinus IFNAR-1 had high homology. This result can be used as reference for further research and practical application.

  15. In silico Analysis of osr40c1 Promoter Sequence Isolated from Indica Variety Pokkali

    Directory of Open Access Journals (Sweden)

    W.S.I. de Silva

    2017-07-01

    Full Text Available The promoter region of a drought and abscisic acid (ABA inducible gene, osr40c1, was isolated from a salt-tolerant indica rice variety Pokkali, which is 670 bp upstream of the putative translation start codon. In silico promoter analysis of resulted sequence showed that at least 15 types of putative motifs were distributed within the sequence, including two types of common promoter elements, TATA and CAAT boxes. Additionally, several putative cis-acing regulatory elements which may be involved in regulation of osr40c1 expression under different conditions were found in the 5′-upstream region of osr40c1. These are ABA-responsive element, light-responsive elements (ATCT-motif, Box I, G-box, GT1-motif, Gap-box and Sp1, myeloblastosis oncogene response element (CCAAT-box, auxin responsive element (TGA-element, gibberellin-responsive element (GARE-motif and fungal-elicitor responsive elements (Box E and Box-W1. A putative regulatory element, required for endosperm-specific pattern of gene expression designated as Skn-1 motif, was also detected in the Pokkali osr40c1 promoter region. In conclusion, the bioinformatic analysis of osr40c1 promoter region isolated from indica rice variety Pokkali led to the identification of several important stress-responsive cis-acting regulatory elements, and therefore, the isolated promoter sequence could be employed in rice genetic transformation to mediate expression of abiotic stress induced genes.

  16. HIERARCHICAL ADAPTIVE ROOD PATTERN SEARCH FOR MOTION ESTIMATION AT VIDEO SEQUENCE ANALYSIS

    Directory of Open Access Journals (Sweden)

    V. T. Nguyen

    2016-05-01

    Full Text Available Subject of Research.The paper deals with the motion estimation algorithms for the analysis of video sequences in compression standards MPEG-4 Visual and H.264. Anew algorithm has been offered based on the analysis of the advantages and disadvantages of existing algorithms. Method. Thealgorithm is called hierarchical adaptive rood pattern search (Hierarchical ARPS, HARPS. This new algorithm includes the classic adaptive rood pattern search ARPS and hierarchical search MP (Hierarchical search or Mean pyramid. All motion estimation algorithms have been implemented using MATLAB package and tested with several video sequences. Main Results. The criteria for evaluating the algorithms were: speed, peak signal to noise ratio, mean square error and mean absolute deviation. The proposed method showed a much better performance at a comparable error and deviation. The peak signal to noise ratio in different video sequences shows better and worse results than characteristics of known algorithms so it requires further investigation. Practical Relevance. Application of this algorithm in MPEG-4 and H.264 codecs instead of the standard can significantly reduce compression time. This feature enables to recommend it in telecommunication systems for multimedia data storing, transmission and processing.

  17. Multi-objective Analysis for a Sequencing Planning of Mixed-model Assembly Line

    Science.gov (United States)

    Shimizu, Yoshiaki; Waki, Toshiya; Yoo, Jae Kyu

    Diversified customer demands are raising importance of just-in-time and agile manufacturing much more than before. Accordingly, introduction of mixed-model assembly lines becomes popular to realize the small-lot-multi-kinds production. Since it produces various kinds on the same assembly line, a rational management is of special importance. With this point of view, this study focuses on a sequencing problem of mixed-model assembly line including a paint line as its preceding process. By taking into account the paint line together, reducing work-in-process (WIP) inventory between these heterogeneous lines becomes a major concern of the sequencing problem besides improving production efficiency. Finally, we have formulated the sequencing problem as a bi-objective optimization problem to prevent various line stoppages, and to reduce the volume of WIP inventory simultaneously. Then we have proposed a practical method for the multi-objective analysis. For this purpose, we applied the weighting method to derive the Pareto front. Actually, the resulting problem is solved by a meta-heuristic method like SA (Simulated Annealing). Through numerical experiments, we verified the validity of the proposed approach, and discussed the significance of trade-off analysis between the conflicting objectives.

  18. Analysis and prediction of stacking sequences in intercalated lamellar vanadium phosphates

    Energy Technology Data Exchange (ETDEWEB)

    Gautier, Romain [Institut des Sciences Chimiques de Rennes, UMR 6226 CNRS - Ecole Nationale Superieure de Chimie de Rennes (France); Centre Nationale de la Recherche Scientifique (CNRS), Institut des Materiaux Jean Rouxel (IMN), Universite de Nantes (France); Fourre, Yoann; Furet, Eric; Gautier, Regis; Le Fur, Eric [Institut des Sciences Chimiques de Rennes, UMR 6226 CNRS - Ecole Nationale Superieure de Chimie de Rennes (France)

    2015-04-15

    An approach is presented that enables the analysis and prediction of stacking sequences in intercalated lamellar vanadium phosphates. A comparison of previously reported vanadium phosphates reveals two modes of intercalation: (i) 3d transition metal ions intercalated between VOPO{sub 4} layers and (ii) alkali/alkaline earth metal ions between VOPO{sub 4}.H{sub 2}O layers. Both intercalations were investigated using DFT calculations in order to understand the relative shifts of the vanadium phosphate layers. These calculations in addition to an analysis of the stacking sequences in previously reported materials enable the prediction of the crystal structures of M{sub x}(VOPO{sub 4}).yH{sub 2}O (M = Cs{sup +}, Cd{sup 2+} and Sn{sup 2+}). Experimental realization and structural determination of Cd(VOPO{sub 4}){sub 2}.4H{sub 2}O by single-crystal X-ray diffraction confirmed the predicted stacking sequences. (Copyright copyright 2015 WILEY-VCH Verlag GmbH and Co. KGaA, Weinheim)

  19. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU

    Directory of Open Access Journals (Sweden)

    Ruibang Luo

    2014-06-01

    Full Text Available This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels, BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads, or just 25 min for 210-fold whole exome sequencing. BALSA’s speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.

  20. Molecular Analysis of Methanogen Richness in Landfill and Marshland Targeting 16S rDNA Sequences.

    Science.gov (United States)

    Yadav, Shailendra; Kundu, Sharbadeb; Ghosh, Sankar K; Maitra, S S

    2015-01-01

    Methanogens, a key contributor in global carbon cycling, methane emission, and alternative energy production, generate methane gas via anaerobic digestion of organic matter. The methane emission potential depends upon methanogenic diversity and activity. Since they are anaerobes and difficult to isolate and culture, their diversity present in the landfill sites of Delhi and marshlands of Southern Assam, India, was analyzed using molecular techniques like 16S rDNA sequencing, DGGE, and qPCR. The sequencing results indicated the presence of methanogens belonging to the seventh order and also the order Methanomicrobiales in the Ghazipur and Bhalsawa landfill sites of Delhi. Sequences, related to the phyla Crenarchaeota (thermophilic) and Thaumarchaeota (mesophilic), were detected from marshland sites of Southern Assam, India. Jaccard analysis of DGGE gel using Gel2K showed three main clusters depending on the number and similarity of band patterns. The copy number analysis of hydrogenotrophic methanogens using qPCR indicates higher abundance in landfill sites of Delhi as compared to the marshlands of Southern Assam. The knowledge about "methanogenic archaea composition" and "abundance" in the contrasting ecosystems like "landfill" and "marshland" may reorient our understanding of the Archaea inhabitants. This study could shed light on the relationship between methane-dynamics and the global warming process.

  1. Molecular Analysis of Methanogen Richness in Landfill and Marshland Targeting 16S rDNA Sequences

    Directory of Open Access Journals (Sweden)

    Shailendra Yadav

    2015-01-01

    Full Text Available Methanogens, a key contributor in global carbon cycling, methane emission, and alternative energy production, generate methane gas via anaerobic digestion of organic matter. The methane emission potential depends upon methanogenic diversity and activity. Since they are anaerobes and difficult to isolate and culture, their diversity present in the landfill sites of Delhi and marshlands of Southern Assam, India, was analyzed using molecular techniques like 16S rDNA sequencing, DGGE, and qPCR. The sequencing results indicated the presence of methanogens belonging to the seventh order and also the order Methanomicrobiales in the Ghazipur and Bhalsawa landfill sites of Delhi. Sequences, related to the phyla Crenarchaeota (thermophilic and Thaumarchaeota (mesophilic, were detected from marshland sites of Southern Assam, India. Jaccard analysis of DGGE gel using Gel2K showed three main clusters depending on the number and similarity of band patterns. The copy number analysis of hydrogenotrophic methanogens using qPCR indicates higher abundance in landfill sites of Delhi as compared to the marshlands of Southern Assam. The knowledge about “methanogenic archaea composition” and “abundance” in the contrasting ecosystems like “landfill” and “marshland” may reorient our understanding of the Archaea inhabitants. This study could shed light on the relationship between methane-dynamics and the global warming process.

  2. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU.

    Science.gov (United States)

    Luo, Ruibang; Wong, Yiu-Lun; Law, Wai-Chun; Lee, Lap-Kei; Cheung, Jeanno; Liu, Chi-Man; Lam, Tak-Wah

    2014-01-01

    This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA's speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.

  3. Sequence and transcription analysis of the human cytomegalovirus DNA polymerase gene

    International Nuclear Information System (INIS)

    Kouzarides, T.; Bankier, A.T.; Satchwell, S.C.; Weston, K.; Tomlinson, P.; Barrell, B.G.

    1987-01-01

    DNA sequence analysis has revealed that the gene coding for the human cytomegalovirus (HCMV) DNA polymerase is present within the long unique region of the virus genome. Identification is based on extensive amino acid homology between the predicted HCMV open reading frame HFLF2 and the DNA polymerase of herpes simplex virus type 1. The authors present here a 5280 base-pair DNA sequence containing the HCMV pol gene, along with the analysis of transcripts encoded within this region. Since HCMV pol also shows homology to the predicted Epstein-Barr virus pol, they were able to analyze the extent of homology between the DNA polymerases of three distantly related herpes viruses, HCMV, Epstein-Barr virus, and herpes simplex virus. The comparison shows that these DNA polymerases exhibit considerable amino acid homology and highlights a number of highly conserved regions; two such regions show homology to sequences within the adenovirus type 2 DNA polymerase. The HCMV pol gene is flanked by open reading frames with homology to those of other herpes viruses; upstream, there is a reading frame homologous to the glycoprotein B gene of herpes simplex virus type I and Epstein-Barr virus, and downstream there is a reading frame homologous to BFLF2 of Epstein-Barr virus

  4. Confirmation of a novel siadenovirus species detected in raptors: partial sequence and phylogenetic analysis.

    Science.gov (United States)

    Kovács, Endre R; Benko, Mária

    2009-03-01

    Partial genome characterisation of a novel adenovirus, found recently in organ samples of multiple species of dead birds of prey, was carried out by sequence analysis of PCR-amplified DNA fragments. The virus, named as raptor adenovirus 1 (RAdV-1), has originally been detected by a nested PCR method with consensus primers targeting the adenoviral DNA polymerase gene. Phylogenetic analysis with the deduced amino acid sequence of the small PCR product has implied a new siadenovirus type present in the samples. Since virus isolation attempts remained unsuccessful, further characterisation of this putative novel siadenovirus was carried out with the use of PCR on the infected organ samples. The DNA sequence of the central genome part of RAdV-1, encompassing nine full (pTP, 52K, pIIIa, III, pVII, pX, pVI, hexon, protease) and two partial (DNA polymerase and DBP) genes and exceeding 12 kb pairs in size, was determined. Phylogenetic tree reconstructions, based on several genes, unambiguously confirmed the preliminary classification of RAdV-1 as a new species within the genus Siadenovirus. Further study of RAdV-1 is of interest since it represents a rare adenovirus genus of yet undetermined host origin.

  5. Genome-Wide Analysis of Simple Sequence Repeats in Bitter Gourd (Momordica charantia

    Directory of Open Access Journals (Sweden)

    Junjie Cui

    2017-06-01

    Full Text Available Bitter gourd (Momordica charantia is widely cultivated as a vegetable and medicinal herb in many Asian and African countries. After the sequencing of the cucumber (Cucumis sativus, watermelon (Citrullus lanatus, and melon (Cucumis melo genomes, bitter gourd became the fourth cucurbit species whose whole genome was sequenced. However, a comprehensive analysis of simple sequence repeats (SSRs in bitter gourd, including a comparison with the three aforementioned cucurbit species has not yet been published. Here, we identified a total of 188,091 and 167,160 SSR motifs in the genomes of the bitter gourd lines ‘Dali-11’ and ‘OHB3-1,’ respectively. Subsequently, the SSR content, motif lengths, and classified motif types were characterized for the bitter gourd genomes and compared among all the cucurbit genomes. Lastly, a large set of 138,727 unique in silico SSR primer pairs were designed for bitter gourd. Among these, 71 primers were selected, all of which successfully amplified SSRs from the two bitter gourd lines ‘Dali-11’ and ‘K44’. To further examine the utilization of unique SSR primers, 21 SSR markers were used to genotype a collection of 211 bitter gourd lines from all over the world. A model-based clustering method and phylogenetic analysis indicated a clear separation among the geographic groups. The genomic SSR markers developed in this study have considerable potential value in advancing bitter gourd research.

  6. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    Science.gov (United States)

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  7. SNP Analysis and Whole Exome Sequencing: Their Application in the Analysis of a Consanguineous Pedigree Segregating Ataxia

    Directory of Open Access Journals (Sweden)

    Sarah L. Nickerson

    2015-10-01

    Full Text Available Autosomal recessive cerebellar ataxia encompasses a large and heterogeneous group of neurodegenerative disorders. We employed single nucleotide polymorphism (SNP analysis and whole exome sequencing to investigate a consanguineous Maori pedigree segregating ataxia. We identified a novel mutation in exon 10 of the SACS gene: c.7962T>G p.(Tyr2654*, establishing the diagnosis of autosomal recessive spastic ataxia of Charlevoix-Saguenay (ARSACS. Our findings expand both the genetic and phenotypic spectrum of this rare disorder, and highlight the value of high-density SNP analysis and whole exome sequencing as powerful and cost-effective tools in the diagnosis of genetically heterogeneous disorders such as the hereditary ataxias.

  8. Cloning and sequence analysis of serine proteinase of Gloydius ussuriensis venom gland

    International Nuclear Information System (INIS)

    Sun Dejun; Liu Shanshan; Yang Chunwei; Zhao Yizhuo; Chang Shufang; Yan Weiqun

    2005-01-01

    Objective: To construct a cDNA library by using mRNA from Gloydius ussuriensis (G. Ussuriensis) venom gland, to clone and analyze serine proteinase gene from the cDNA library. Methods: Total RNA was isolated from venom gland of G. ussuriensis, mRNA was purified by using mRNA isolation Kit. The whole length cDNA was synthesized by means of smart cDNA synthesis strategy, and amplified by long distance PCR procedure, lately cDAN was cloned into vector pBluescrip-sk. The recombinant cDNA was transformed into E. coli DH5α. The cDNA of serine proteinase gene in the venom gland of G. ussuriensis was detected and amplified using the in situ hybridization. The cDNA fragment was inserted into pGEMT vector, cloned and its nucleotide sequence was determined. Results: The capacity of cDNA library of venom gland was above 2.3 x 10 6 . Its open reading frame was composed of 702 nucleotides and coded a protein pre-zymogen of 234 amino acids. It contained 12 cysteine residues. The sequence analysis indicated that the deduced amino acid sequence of the cDNA fragment shared high identity with the thrombin-like enzyme genes of other snakes in the GenBank. the query sequence exhibited strong amino acid sequence homology of 85% to the serine proteas of T. gramineus, thrombin-like serine proteinase I of D. acutus and serine protease catroxase II of C. atrox respectively. Based on the amino acid sequences of other thrombin-like enzymes, the catalytic residues and disulfide bridges of this thrombin-like enzyme were deduced as follows: catalytic residues, His 41 , Asp 86 , Ser 180 ; and six disulfide bridges Cys 7 -Cys 139 , Cys 26 -Cys 42 , Cys 74 -Cys 232 , Cys 118 -Cys 186 , Cys 150 -Cys 165 , Cys 176 -Cys 201 . Conclusion: The capacity of cDNA library of venom gland is above 2.3 x 10 6 , overtop the level of 10 5 capicity. The constructed cDNA library of G. ussuriensis venom gland would be helpful platform to detect new target genes and further gene manipulate. The cloned serine

  9. Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.

    Science.gov (United States)

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance; Messina, Thomas; Fan, Hongtao; Jaeger, Edward; Stephens, Susan

    2013-06-27

    Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available

  10. Analysis of simple sequence repeats in rice bean (Vigna umbellata using an SSR-enriched library

    Directory of Open Access Journals (Sweden)

    Lixia Wang

    2016-02-01

    Full Text Available Rice bean (Vigna umbellata Thunb., a warm-season annual legume, is grown in Asia mainly for dried grain or fodder and plays an important role in human and animal nutrition because the grains are rich in protein and some essential fatty acids and minerals. With the aim of expediting the genetic improvement of rice bean, we initiated a project to develop genomic resources and tools for molecular breeding in this little-known but important crop. Here we report the construction of an SSR-enriched genomic library from DNA extracted from pooled young leaf tissues of 22 rice bean genotypes and developing SSR markers. In 433,562 reads generated by a Roche 454 GS-FLX sequencer, we identified 261,458 SSRs, of which 48.8% were of compound form. Dinucleotide repeats were predominant with an absolute proportion of 81.6%, followed by trinucleotides (17.8%. Other types together accounted for 0.6%. The motif AC/GT accounted for 77.7% of the total, followed by AAG/CTT (14.3%, and all others accounted for 12.0%. Among the flanking sequences, 2928 matched putative genes or gene models in the protein database of Arabidopsis thaliana, corresponding with 608 non-redundant Gene Ontology terms. Of these sequences, 11.2% were involved in cellular components, 24.2% were involved molecular functions, and 64.6% were associated with biological processes. Based on homolog analysis, 1595 flanking sequences were similar to mung bean and 500 to common bean genomic sequences. Comparative mapping was conducted using 350 sequences homologous to both mung bean and common bean sequences. Finally, a set of primer pairs were designed, and a validation test showed that 58 of 220 new primers can be used in rice bean and 53 can be transferred to mung bean. However, only 11 were polymorphic when tested on 32 rice bean varieties. We propose that this study lays the groundwork for developing novel SSR markers and will enhance the mapping of qualitative and quantitative traits and marker

  11. Identification of succinimide sites in proteins by N-terminal sequence analysis after alkaline hydroxylamine cleavage.

    Science.gov (United States)

    Kwong, M. Y.; Harris, R. J.

    1994-01-01

    Under favorable conditions, Asp or Asn residues can undergo rearrangement to a succinimide (cyclic imide), which may also serve as an intermediate for deamidation and/or isoaspartate formation. Direct identification of such succinimides by peptide mapping is hampered by their lability at neutral and alkaline pH. We determined that incubation in 2 M hydroxylamine, 0.2 M Tris buffer, pH 9, for 2 h at 45 degrees C will specifically cleave on the C-terminal side of succinimides without cleavage at Asn-Gly bonds; yields are typically approximately 50%. N-terminal sequence analysis can then be used to identify an internal sequence generated by cleavage of the succinimide, hence identifying the succinimide site. PMID:8142891

  12. Analysis Of Segmental Duplications In The Pig Genome Based On Next-Generation Sequencing

    DEFF Research Database (Denmark)

    Fadista, João; Bendixen, Christian

    Segmental duplications are >1kb segments of duplicated DNA present in a genome with high sequence identity (>90%). They are associated with genomic rearrangements and provide a significant source of gene and genome evolution within mammalian genomes. Although segmental duplications have been...... extensively studied in other organisms, its analysis in pig has been hampered by the lack of a complete pig genome assembly. By measuring the depth of coverage of Illumina whole-genome shotgun sequencing reads of the Tabasco animal aligned to the latest pig genome assembly (Sus scrofa 10 – based also...... and their associated copy number alterations, focusing on the global organization of these segments and their possible functional significance in porcine phenotypes. This work provides insights into mammalian genome evolution and generates a valuable resource for porcine genomics research...

  13. Comparative analysis of idiom selection and sequencing 5 in Estonian basic school EFL coursebooks

    Directory of Open Access Journals (Sweden)

    Rita Anita Forssten

    2017-05-01

    Full Text Available The article investigates the selection and sequencing of the idioms encountered in two locally-produced and international coursebook series currently employed in Estonian basic schools. It is hypothesized that there exists a positive correlation between idioms’ difficulty and coursebooks’ language proficiency level. The hypothesis is tested through a statistical analysis of the idioms found which are categorized in terms of their analysability into three categories where category 1 includes analysable semi-literal idioms, category 2 comprises analysable semi-transparent idioms, and category 3 encompasses non-analysable opaque idioms, and then analysed through an online language corpus (British National Corpus. The results of the study reveal that the coursebook authors under discussion have disregarded idioms’ frequency as a criterion for selection or sequencing, whereas the factor utilized to some extent is the degree of analysability.

  14. In silico analysis of Simple Sequence Repeats from chloroplast genomes of Solanaceae species

    Directory of Open Access Journals (Sweden)

    Evandro Vagner Tambarussi

    2009-01-01

    Full Text Available The availability of chloroplast genome (cpDNA sequences of Atropa belladonna, Nicotiana sylvestris, N.tabacum, N. tomentosiformis, Solanum bulbocastanum, S. lycopersicum and S. tuberosum, which are Solanaceae species,allowed us to analyze the organization of cpSSRs in their genic and intergenic regions. In general, the number of cpSSRs incpDNA ranged from 161 in S. tuberosum to 226 in N. tabacum, and the number of intergenic cpSSRs was higher than geniccpSSRs. The mononucleotide repeats were the most frequent in studied species, but we also identified di-, tri-, tetra-, pentaandhexanucleotide repeats. Multiple alignments of all cpSSRs sequences from Solanaceae species made the identification ofnucleotide variability possible and the phylogeny was estimated by maximum parsimony. Our study showed that the plastomedatabase can be exploited for phylogenetic analysis and biotechnological approaches.

  15. Genome sequence and comparative analysis of a putative entomopathogenic Serratia isolated from Caenorhabditis briggsae.

    Science.gov (United States)

    Abebe-Akele, Feseha; Tisa, Louis S; Cooper, Vaughn S; Hatcher, Philip J; Abebe, Eyualem; Thomas, W Kelley

    2015-07-18

    Entomopathogenic associations between nematodes in the genera Steinernema and Heterorhabdus with their cognate bacteria from the bacterial genera Xenorhabdus and Photorhabdus, respectively, are extensively studied for their potential as biological control agents against invasive insect species. These two highly coevolved associations were results of convergent evolution. Given the natural abundance of bacteria, nematodes and insects, it is surprising that only these two associations with no intermediate forms are widely studied in the entomopathogenic context. Discovering analogous systems involving novel bacterial and nematode species would shed light on the evolutionary processes involved in the transition from free living organisms to obligatory partners in entomopathogenicity. We report the complete genome sequence of a new member of the enterobacterial genus Serratia that forms a putative entomopathogenic complex with Caenorhabditis briggsae. Analysis of the 5.04 MB chromosomal genome predicts 4599 protein coding genes, seven sets of ribosomal RNA genes, 84 tRNA genes and a 64.8 KB plasmid encoding 74 genes. Comparative genomic analysis with three of the previously sequenced Serratia species, S. marcescens DB11 and S. proteamaculans 568, and Serratia sp. AS12, revealed that these four representatives of the genus share a core set of ~3100 genes and extensive structural conservation. The newly identified species shares a more recent common ancestor with S. marcescens with 99% sequence identity in rDNA sequence and orthology across 85.6% of predicted genes. Of the 39 genes/operons implicated in the virulence, symbiosis, recolonization, immune evasion and bioconversion, 21 (53.8%) were present in Serratia while 33 (84.6%) and 35 (89%) were present in Xenorhabdus and Photorhabdus EPN bacteria respectively. The majority of unique sequences in Serratia sp. SCBI (South African Caenorhabditis briggsae Isolate) are found in ~29 genomic islands of 5 to 65 genes and are

  16. Multilocus sequence typing and rtxA toxin gene sequencing analysis of Kingella kingae isolates demonstrates genetic diversity and international clones.

    Directory of Open Access Journals (Sweden)

    Romain Basmaci

    Full Text Available BACKGROUND: Kingella kingae, a normal component of the upper respiratory flora, is being increasingly recognized as an important invasive pathogen in young children. Genetic diversity of this species has not been studied. METHODS: We analyzed 103 strains from different countries and clinical origins by a new multilocus sequence-typing (MLST schema. Putative virulence gene rtxA, encoding an RTX toxin, was also sequenced, and experimental virulence of representative strains was assessed in a juvenile-rat model. RESULTS: Thirty-six sequence-types (ST and nine ST-complexes (STc were detected. The main STc 6, 14 and 23 comprised 23, 17 and 20 strains respectively, and were internationally distributed. rtxA sequencing results were mostly congruent with MLST, and showed horizontal transfer events. Of interest, all members of the distantly related ST-6 (n = 22 and ST-5 (n = 4 harboured a 33 bp duplication or triplication in their rtxA sequence, suggesting that this genetic trait arose through selective advantage. The animal model revealed significant differences in virulence among strains of the species. CONCLUSION: MLST analysis reveals international spread of ST-complexes and will help to decipher acquisition and evolution of virulence traits and diversity of pathogenicity among K. kingae strains, for which an experimental animal model is now available.

  17. Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation

    Science.gov (United States)

    Wang, Chaolong; Zhan, Xiaowei; Liang, Liming; Abecasis, Gonçalo R.; Lin, Xihong

    2015-01-01

    Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources. PMID:26027497

  18. ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains

    Science.gov (United States)

    Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz

    2016-01-01

    With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity. PMID:27420734

  19. Complete Chloroplast Genome Sequences and Comparative Analysis of Chenopodium quinoa and C. album.

    Science.gov (United States)

    Hong, Su-Young; Cheon, Kyeong-Sik; Yoo, Ki-Oug; Lee, Hyun-Oh; Cho, Kwang-Soo; Suh, Jong-Taek; Kim, Su-Jeong; Nam, Jeong-Hwan; Sohn, Hwang-Bae; Kim, Yul-Ho

    2017-01-01

    The Chenopodium genus comprises ~150 species, including Chenopodium quinoa and Chenopodium album , two important crops with high nutritional value. To elucidate the phylogenetic relationship between the two species, the complete chloroplast (cp) genomes of these species were obtained by next generation sequencing. We performed comparative analysis of the sequences and, using InDel markers, inferred phylogeny and genetic diversity of the Chenopodium genus. The cp genome is 152,099 bp ( C. quinoa ) and 152,167 bp ( C. album ) long. In total, 119 genes (78 protein-coding, 37 tRNA, and 4 rRNA) were identified. We found 14 ( C. quinoa ) and 15 ( C. album ) tandem repeats (TRs); 14 TRs were present in both species and C. album and C. quinoa each had one species-specific TR. The trnI-GAU intron sequences contained one ( C. quinoa ) or two ( C. album ) copies of TRs (66 bp); the InDel marker was designed based on the copy number variation in TRs. Using the InDel markers, we detected this variation in the TR copy number in four species, Chenopodium hybridum, Chenopodium pumilio, Chenopodium ficifolium , and Chenopodium koraiense , but not in Chenopodium glaucum . A comparison of coding and non-coding regions between C. quinoa and C. album revealed divergent sites. Nucleotide diversity >0.025 was found in 17 regions-14 were located in the large single copy region (LSC), one in the inverted repeats, and two in the small single copy region (SSC). A phylogenetic analysis based on 59 protein-coding genes from 25 taxa resolved Chenopodioideae monophyletic and sister to Betoideae. The complete plastid genome sequences and molecular markers based on divergence hotspot regions in the two Chenopodium taxa will help to resolve the phylogenetic relationships of Chenopodium .

  20. Generation, analysis and functional annotation of expressed sequence tags from the ectoparasitic mite Psoroptes ovis

    Directory of Open Access Journals (Sweden)

    Kenyon Fiona

    2011-07-01

    Full Text Available Abstract Background Sheep scab is caused by Psoroptes ovis and is arguably the most important ectoparasitic disease affecting sheep in the UK. The disease is highly contagious and causes and considerable pruritis and irritation and is therefore a major welfare concern. Current methods of treatment are unsustainable and in order to elucidate novel methods of disease control a more comprehensive understanding of the parasite is required. To date, no full genomic DNA sequence or large scale transcript datasets are available and prior to this study only 484 P. ovis expressed sequence tags (ESTs were accessible in public databases. Results In order to further expand upon the transcriptomic coverage of P. ovis thus facilitating novel insights into the mite biology we undertook a larger scale EST approach, incorporating newly generated and previously described P. ovis transcript data and representing the largest collection of P. ovis ESTs to date. We sequenced 1,574 ESTs and assembled these along with 484 previously generated P. ovis ESTs, which resulted in the identification of 1,545 unique P. ovis sequences. BLASTX searches identified 961 ESTs with significant hits (E-value P. ovis ESTs. Gene Ontology (GO analysis allowed the functional annotation of 880 ESTs and included predictions of signal peptide and transmembrane domains; allowing the identification of potential P. ovis excreted/secreted factors, and mapping of metabolic pathways. Conclusions This dataset currently represents the largest collection of P. ovis ESTs, all of which are publicly available in the GenBank EST database (dbEST (accession numbers FR748230 - FR749648. Functional analysis of this dataset identified important homologues, including house dust mite allergens and tick salivary factors. These findings offer new insights into the underlying biology of P. ovis, facilitating further investigations into mite biology and the identification of novel methods of intervention.