WorldWideScience

Sample records for genomic motif detection

  1. Highly scalable Ab initio genomic motif identification

    KAUST Repository

    Marchand, Benoit; Bajic, Vladimir B.; Kaushik, Dinesh

    2011-01-01

    We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time. Copyright 2011 ACM.

  2. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data.

    Science.gov (United States)

    Tran, Ngoc Tam L; Huang, Chun-Hsi

    2014-02-20

    ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data.

  3. RMOD: a tool for regulatory motif detection in signaling network.

    Directory of Open Access Journals (Sweden)

    Jinki Kim

    Full Text Available Regulatory motifs are patterns of activation and inhibition that appear repeatedly in various signaling networks and that show specific regulatory properties. However, the network structures of regulatory motifs are highly diverse and complex, rendering their identification difficult. Here, we present a RMOD, a web-based system for the identification of regulatory motifs and their properties in signaling networks. RMOD finds various network structures of regulatory motifs by compressing the signaling network and detecting the compressed forms of regulatory motifs. To apply it into a large-scale signaling network, it adopts a new subgraph search algorithm using a novel data structure called path-tree, which is a tree structure composed of isomorphic graphs of query regulatory motifs. This algorithm was evaluated using various sizes of signaling networks generated from the integration of various human signaling pathways and it showed that the speed and scalability of this algorithm outperforms those of other algorithms. RMOD includes interactive analysis and auxiliary tools that make it possible to manipulate the whole processes from building signaling network and query regulatory motifs to analyzing regulatory motifs with graphical illustration and summarized descriptions. As a result, RMOD provides an integrated view of the regulatory motifs and mechanism underlying their regulatory motif activities within the signaling network. RMOD is freely accessible online at the following URL: http://pks.kaist.ac.kr/rmod.

  4. Discovering regulatory motifs in the Plasmodium genome using comparative genomics

    OpenAIRE

    Wu, Jie; Sieglaff, Douglas H.; Gervin, Joshua; Xie, Xiaohui S.

    2008-01-01

    Motivation: Understanding gene regulation in Plasmodium, the causative agent of malaria, is an important step in deciphering its complex life cycle as well as leading to possible new targets for therapeutic applications. Very little is known about gene regulation in Plasmodium, and in particular, few regulatory elements have been identified. Such discovery has been significantly hampered by the high A-T content of some of the genomes of Plasmodium species, as well as the challenge in associat...

  5. An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.

    Science.gov (United States)

    Liu, Bingqiang; Zhang, Hanyuan; Zhou, Chuan; Li, Guojun; Fennell, Anne; Wang, Guanghui; Kang, Yu; Liu, Qi; Ma, Qin

    2016-08-09

    Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance

  6. A Simple Decision Rule for Recognition of Poly(A) Tail Signal Motifs in Human Genome

    KAUST Repository

    AbouEisha, Hassan M.; Chikalov, Igor; Moshkov, Mikhail; Jankovic, Boris R.

    2015-01-01

    Background is the numerous attempts were made to predict motifs in genomic sequences that correspond to poly (A) tail signals. Vast portion of this effort has been directed to a plethora of nonlinear classification methods. Even when such approaches

  7. Genome-wide conserved consensus transcription factor binding motifs are hyper-methylated

    Directory of Open Access Journals (Sweden)

    Down Thomas A

    2010-09-01

    Full Text Available Abstract Background DNA methylation can regulate gene expression by modulating the interaction between DNA and proteins or protein complexes. Conserved consensus motifs exist across the human genome ("predicted transcription factor binding sites": "predicted TFBS" but the large majority of these are proven by chromatin immunoprecipitation and high throughput sequencing (ChIP-seq not to be biological transcription factor binding sites ("empirical TFBS". We hypothesize that DNA methylation at conserved consensus motifs prevents promiscuous or disorderly transcription factor binding. Results Using genome-wide methylation maps of the human heart and sperm, we found that all conserved consensus motifs as well as the subset of those that reside outside CpG islands have an aggregate profile of hyper-methylation. In contrast, empirical TFBS with conserved consensus motifs have a profile of hypo-methylation. 40% of empirical TFBS with conserved consensus motifs resided in CpG islands whereas only 7% of all conserved consensus motifs were in CpG islands. Finally we further identified a minority subset of TF whose profiles are either hypo-methylated or neutral at their respective conserved consensus motifs implicating that these TF may be responsible for establishing or maintaining an un-methylated DNA state, or whose binding is not regulated by DNA methylation. Conclusions Our analysis supports the hypothesis that at least for a subset of TF, empirical binding to conserved consensus motifs genome-wide may be controlled by DNA methylation.

  8. Genome Analysis of Conserved Dehydrin Motifs in Vascular Plants

    Directory of Open Access Journals (Sweden)

    Ahmad A. Malik

    2017-05-01

    Full Text Available Dehydrins, a large family of abiotic stress proteins, are defined by the presence of a mostly conserved motif known as the K-segment, and may also contain two other conserved motifs known as the Y-segment and S-segment. Using the dehydrin literature, we developed a sequence motif definition of the K-segment, which we used to create a large dataset of dehydrin sequences by searching the Pfam00257 dehydrin dataset and the Phytozome 10 sequences of vascular plants. A comprehensive analysis of these sequences reveals that lysine residues are highly conserved in the K-segment, while the amino acid type is often conserved at other positions. Despite the Y-segment name, the central tyrosine is somewhat conserved, but can be substituted with two other small aromatic amino acids (phenylalanine or histidine. The S-segment contains a series of serine residues, but in some proteins is also preceded by a conserved LHR sequence. In many dehydrins containing all three of these motifs the S-segment is linked to the K-segment by a GXGGRRKK motif (where X can be any amino acid, suggesting a functional linkage between these two motifs. An analysis of the sequences shows that the dehydrin architecture and several biochemical properties (isoelectric point, molecular mass, and hydrophobicity score are dependent on each other, and that some dehydrin architectures are overexpressed during certain abiotic stress, suggesting that they may be optimized for a specific abiotic stress while others are involved in all forms of dehydration stress (drought, cold, and salinity.

  9. SSTRAP: A computational model for genomic motif discovery ...

    African Journals Online (AJOL)

    Computational methods can potentially provide high-quality prediction of biological molecules such as DNA binding sites and Transcription factors and therefore reduce the time needed for experimental verification and challenges associated with experimental methods. These biological molecules or motifs have significant ...

  10. Systematic discovery of regulatory motifs in Fusarium graminearum by comparing four Fusarium genomes

    Directory of Open Access Journals (Sweden)

    Kistler Corby

    2010-03-01

    Full Text Available Abstract Background Fusarium graminearum (Fg, a major fungal pathogen of cultivated cereals, is responsible for billions of dollars in agriculture losses. There is a growing interest in understanding the transcriptional regulation of this organism, especially the regulation of genes underlying its pathogenicity. The generation of whole genome sequence assemblies for Fg and three closely related Fusarium species provides a unique opportunity for such a study. Results Applying comparative genomics approaches, we developed a computational pipeline to systematically discover evolutionarily conserved regulatory motifs in the promoter, downstream and the intronic regions of Fg genes, based on the multiple alignments of sequenced Fusarium genomes. Using this method, we discovered 73 candidate regulatory motifs in the promoter regions. Nearly 30% of these motifs are highly enriched in promoter regions of Fg genes that are associated with a specific functional category. Through comparison to Saccharomyces cerevisiae (Sc and Schizosaccharomyces pombe (Sp, we observed conservation of transcription factors (TFs, their binding sites and the target genes regulated by these TFs related to pathways known to respond to stress conditions or phosphate metabolism. In addition, this study revealed 69 and 39 conserved motifs in the downstream regions and the intronic regions, respectively, of Fg genes. The top intronic motif is the splice donor site. For the downstream regions, we noticed an intriguing absence of the mammalian and Sc poly-adenylation signals among the list of conserved motifs. Conclusion This study provides the first comprehensive list of candidate regulatory motifs in Fg, and underscores the power of comparative genomics in revealing functional elements among related genomes. The conservation of regulatory pathways among the Fusarium genomes and the two yeast species reveals their functional significance, and provides new insights in their

  11. Comparative genomics of metabolic capacities of regulons controlled by cis-regulatory RNA motifs in bacteria.

    Science.gov (United States)

    Sun, Eric I; Leyn, Semen A; Kazanov, Marat D; Saier, Milton H; Novichkov, Pavel S; Rodionov, Dmitry A

    2013-09-02

    In silico comparative genomics approaches have been efficiently used for functional prediction and reconstruction of metabolic and regulatory networks. Riboswitches are metabolite-sensing structures often found in bacterial mRNA leaders controlling gene expression on transcriptional or translational levels.An increasing number of riboswitches and other cis-regulatory RNAs have been recently classified into numerous RNA families in the Rfam database. High conservation of these RNA motifs provides a unique advantage for their genomic identification and comparative analysis. A comparative genomics approach implemented in the RegPredict tool was used for reconstruction and functional annotation of regulons controlled by RNAs from 43 Rfam families in diverse taxonomic groups of Bacteria. The inferred regulons include ~5200 cis-regulatory RNAs and more than 12000 target genes in 255 microbial genomes. All predicted RNA-regulated genes were classified into specific and overall functional categories. Analysis of taxonomic distribution of these categories allowed us to establish major functional preferences for each analyzed cis-regulatory RNA motif family. Overall, most RNA motif regulons showed predictable functional content in accordance with their experimentally established effector ligands. Our results suggest that some RNA motifs (including thiamin pyrophosphate and cobalamin riboswitches that control the cofactor metabolism) are widespread and likely originated from the last common ancestor of all bacteria. However, many more analyzed RNA motifs are restricted to a narrow taxonomic group of bacteria and likely represent more recent evolutionary innovations. The reconstructed regulatory networks for major known RNA motifs substantially expand the existing knowledge of transcriptional regulation in bacteria. The inferred regulons can be used for genetic experiments, functional annotations of genes, metabolic reconstruction and evolutionary analysis. The obtained genome

  12. A Simple Decision Rule for Recognition of Poly(A) Tail Signal Motifs in Human Genome

    KAUST Repository

    AbouEisha, Hassan M.

    2015-05-12

    Background is the numerous attempts were made to predict motifs in genomic sequences that correspond to poly (A) tail signals. Vast portion of this effort has been directed to a plethora of nonlinear classification methods. Even when such approaches yield good discriminant results, identifying dominant features of regulatory mechanisms nevertheless remains a challenge. In this work, we look at decision rules that may help identifying such features. Findings are we present a simple decision rule for classification of candidate poly (A) tail signal motifs in human genomic sequence obtained by evaluating features during the construction of gradient boosted trees. We found that values of a single feature based on the frequency of adenine in the genomic sequence surrounding candidate signal and the number of consecutive adenine molecules in a well-defined region immediately following the motif displays good discriminative potential in classification of poly (A) tail motifs for samples covered by the rule. Conclusions is the resulting simple rule can be used as an efficient filter in construction of more complex poly(A) tail motifs classification algorithms.

  13. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal M.

    2011-11-15

    Motivation: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants. The Author(s) 2011. Published by Oxford University Press. All rights reserved.

  14. De Novo Discovery of Structured ncRNA Motifs in Genomic Sequences

    DEFF Research Database (Denmark)

    Ruzzo, Walter L; Gorodkin, Jan

    2014-01-01

    De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphas...... on an approach based on the CMfinder CMfinder program as a case study. Applications to genomic screens for novel de novo structured ncRNA ncRNA s, including structured RNA elements in untranslated portions of protein-coding genes, are presented.......De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis...

  15. Nencki Genomics Database--Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs.

    Science.gov (United States)

    Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal

    2013-01-01

    We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql -h database.nencki-genomics.org -u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface.

  16. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    Science.gov (United States)

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  17. Detecting microsatellites within genomes: significant variation among algorithms

    Directory of Open Access Journals (Sweden)

    Rivals Eric

    2007-04-01

    Full Text Available Abstract Background Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker. Results Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp, regardless of motif. Conclusion Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions.

  18. Clustering and Candidate Motif Detection in Exosomal miRNAs by Application of Machine Learning Algorithms.

    Science.gov (United States)

    Gaur, Pallavi; Chaturvedi, Anoop

    2017-07-22

    The clustering pattern and motifs give immense information about any biological data. An application of machine learning algorithms for clustering and candidate motif detection in miRNAs derived from exosomes is depicted in this paper. Recent progress in the field of exosome research and more particularly regarding exosomal miRNAs has led much bioinformatic-based research to come into existence. The information on clustering pattern and candidate motifs in miRNAs of exosomal origin would help in analyzing existing, as well as newly discovered miRNAs within exosomes. Along with obtaining clustering pattern and candidate motifs in exosomal miRNAs, this work also elaborates the usefulness of the machine learning algorithms that can be efficiently used and executed on various programming languages/platforms. Data were clustered and sequence candidate motifs were detected successfully. The results were compared and validated with some available web tools such as 'BLASTN' and 'MEME suite'. The machine learning algorithms for aforementioned objectives were applied successfully. This work elaborated utility of machine learning algorithms and language platforms to achieve the tasks of clustering and candidate motif detection in exosomal miRNAs. With the information on mentioned objectives, deeper insight would be gained for analyses of newly discovered miRNAs in exosomes which are considered to be circulating biomarkers. In addition, the execution of machine learning algorithms on various language platforms gives more flexibility to users to try multiple iterations according to their requirements. This approach can be applied to other biological data-mining tasks as well.

  19. A cell-surface-anchored ratiometric i-motif sensor for extracellular pH detection.

    Science.gov (United States)

    Ying, Le; Xie, Nuli; Yang, Yanjing; Yang, Xiaohai; Zhou, Qifeng; Yin, Bincheng; Huang, Jin; Wang, Kemin

    2016-06-14

    A FRET-based sensor is anchored on the cell surface through streptavidin-biotin interactions. Due to the excellent properties of the pH-sensitive i-motif structure, the sensor can detect extracellular pH with high sensitivity and excellent reversibility.

  20. C-terminal motif prediction in eukaryotic proteomes using comparative genomics and statistical over-representation across protein families

    Directory of Open Access Journals (Sweden)

    Cutler Sean R

    2007-06-01

    Full Text Available Abstract Background The carboxy termini of proteins are a frequent site of activity for a variety of biologically important functions, ranging from post-translational modification to protein targeting. Several short peptide motifs involved in protein sorting roles and dependent upon their proximity to the C-terminus for proper function have already been characterized. As a limited number of such motifs have been identified, the potential exists for genome-wide statistical analysis and comparative genomics to reveal novel peptide signatures functioning in a C-terminal dependent manner. We have applied a novel methodology to the prediction of C-terminal-anchored peptide motifs involving a simple z-statistic and several techniques for improving the signal-to-noise ratio. Results We examined the statistical over-representation of position-specific C-terminal tripeptides in 7 eukaryotic proteomes. Sequence randomization models and simple-sequence masking were applied to the successful reduction of background noise. Similarly, as C-terminal homology among members of large protein families may artificially inflate tripeptide counts in an irrelevant and obfuscating manner, gene-family clustering was performed prior to the analysis in order to assess tripeptide over-representation across protein families as opposed to across all proteins. Finally, comparative genomics was used to identify tripeptides significantly occurring in multiple species. This approach has been able to predict, to our knowledge, all C-terminally anchored targeting motifs present in the literature. These include the PTS1 peroxisomal targeting signal (SKL*, the ER-retention signal (K/HDEL*, the ER-retrieval signal for membrane bound proteins (KKxx*, the prenylation signal (CC* and the CaaX box prenylation motif. In addition to a high statistical over-representation of these known motifs, a collection of significant tripeptides with a high propensity for biological function exists

  1. Bivariate Genomic Footprinting Detects Changes in Transcription Factor Activity

    Directory of Open Access Journals (Sweden)

    Songjoon Baek

    2017-05-01

    Full Text Available In response to activating signals, transcription factors (TFs bind DNA and regulate gene expression. TF binding can be measured by protection of the bound sequence from DNase digestion (i.e., footprint. Here, we report that 80% of TF binding motifs do not show a measurable footprint, partly because of a variable cleavage pattern within the motif sequence. To more faithfully portray the effect of TFs on chromatin, we developed an algorithm that captures two TF-dependent effects on chromatin accessibility: footprinting and motif-flanking accessibility. The algorithm, termed bivariate genomic footprinting (BaGFoot, efficiently detects TF activity. BaGFoot is robust to different accessibility assays (DNase-seq, ATAC-seq, all examined peak-calling programs, and a variety of cut bias correction approaches. BaGFoot reliably predicts TF binding and provides valuable information regarding the TFs affecting chromatin accessibility in various biological systems and following various biological events, including in cases where an absolute footprint cannot be determined.

  2. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal M.; Rangkuti, Farania; Schramm, Michael C.; Jankovic, Boris R.; Kamau, Allan; Chowdhary, Rajesh; Archer, John A.C.; Bajic, Vladimir B.

    2011-01-01

    . These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity

  3. Comparisons of Copy Number, Genomic Structure, and Conserved Motifs for α-Amylase Genes from Barley, Rice, and Wheat

    Directory of Open Access Journals (Sweden)

    Qisen Zhang

    2017-10-01

    Full Text Available Barley is an important crop for the production of malt and beer. However, crops such as rice and wheat are rarely used for malting. α-amylase is the key enzyme that degrades starch during malting. In this study, we compared the genomic properties, gene copies, and conserved promoter motifs of α-amylase genes in barley, rice, and wheat. In all three crops, α-amylase consists of four subfamilies designated amy1, amy2, amy3, and amy4. In wheat and barley, members of amy1 and amy2 genes are localized on chromosomes 6 and 7, respectively. In rice, members of amy1 genes are found on chromosomes 1 and 2, and amy2 genes on chromosome 6. The barley genome has six amy1 members and three amy2 members. The wheat B genome contains four amy1 members and three amy2 members, while the rice genome has three amy1 members and one amy2 member. The B genome has mostly amy1 and amy2 members among the three wheat genomes. Amy1 promoters from all three crop genomes contain a GA-responsive complex consisting of a GA-responsive element (CAATAAA, pyrimidine box (CCTTTT and TATCCAT/C box. This study has shown that amy1 and amy2 from both wheat and barley have similar genomic properties, including exon/intron structures and GA-responsive elements on promoters, but these differ in rice. Like barley, wheat should have sufficient amy activity to degrade starch completely during malting. Other factors, such as high protein with haze issues and the lack of husk causing Lauting difficulty, may limit the use of wheat for brewing.

  4. Genome-wide identification of VQ motif-containing proteins and their expression profiles under abiotic stresses in maize

    Directory of Open Access Journals (Sweden)

    Weibin eSong

    2016-01-01

    Full Text Available VQ motif-containing proteins play crucial roles in abiotic stress responses in plants. Recent studies have shown that some VQ proteins physically interact with WRKY transcription factors to activate downstream genes. In the present study, we identified and characterized genes encoding VQ motif-containing proteins using the most recent version of the maize genome sequence. In total, 61VQ genes were identified. In a cluster analysis, these genes clustered into nine groups together with their homologous genes in rice and Arabidopsis. Most of the VQ genes (57 out of 61 numbers identified in maize were found to be single-copy genes. Analyses of RNA-seq data obtained using seedlings under long-term drought treatment showed that the expression levels of most ZmVQ genes (41 out of 61 members changed during the drought stress response. Quantitative real-time PCR analyses showed that most of the ZmVQ genes were responsive to NaCl treatment. Also, approximately half of the ZmVQ genes were co-expressed with ZmWRKY genes. The identification of these VQ genes in the maize genome and knowledge of their expression profiles under drought and osmotic stresses will provide a solid foundation for exploring their specific functions in the abiotic stress responses of maize.

  5. Large-scale discovery of promoter motifs in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Thomas A Down

    2007-01-01

    Full Text Available A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes.

  6. Robust and Accurate Anomaly Detection in ECG Artifacts Using Time Series Motif Discovery

    Science.gov (United States)

    Sivaraks, Haemwaan

    2015-01-01

    Electrocardiogram (ECG) anomaly detection is an important technique for detecting dissimilar heartbeats which helps identify abnormal ECGs before the diagnosis process. Currently available ECG anomaly detection methods, ranging from academic research to commercial ECG machines, still suffer from a high false alarm rate because these methods are not able to differentiate ECG artifacts from real ECG signal, especially, in ECG artifacts that are similar to ECG signals in terms of shape and/or frequency. The problem leads to high vigilance for physicians and misinterpretation risk for nonspecialists. Therefore, this work proposes a novel anomaly detection technique that is highly robust and accurate in the presence of ECG artifacts which can effectively reduce the false alarm rate. Expert knowledge from cardiologists and motif discovery technique is utilized in our design. In addition, every step of the algorithm conforms to the interpretation of cardiologists. Our method can be utilized to both single-lead ECGs and multilead ECGs. Our experiment results on real ECG datasets are interpreted and evaluated by cardiologists. Our proposed algorithm can mostly achieve 100% of accuracy on detection (AoD), sensitivity, specificity, and positive predictive value with 0% false alarm rate. The results demonstrate that our proposed method is highly accurate and robust to artifacts, compared with competitive anomaly detection methods. PMID:25688284

  7. Detecting remote sequence homology in disordered proteins: discovery of conserved motifs in the N-termini of Mononegavirales phosphoproteins.

    Directory of Open Access Journals (Sweden)

    David Karlin

    Full Text Available Paramyxovirinae are a large group of viruses that includes measles virus and parainfluenza viruses. The viral Phosphoprotein (P plays a central role in viral replication. It is composed of a highly variable, disordered N-terminus and a conserved C-terminus. A second viral protein alternatively expressed, the V protein, also contains the N-terminus of P, fused to a zinc finger. We suspected that, despite their high variability, the N-termini of P/V might all be homologous; however, using standard approaches, we could previously identify sequence conservation only in some Paramyxovirinae. We now compared the N-termini using sensitive sequence similarity search programs, able to detect residual similarities unnoticeable by conventional approaches. We discovered that all Paramyxovirinae share a short sequence motif in their first 40 amino acids, which we called soyuz1. Despite its short length (11-16aa, several arguments allow us to conclude that soyuz1 probably evolved by homologous descent, unlike linear motifs. Conservation across such evolutionary distances suggests that soyuz1 plays a crucial role and experimental data suggest that it binds the viral nucleoprotein to prevent its illegitimate self-assembly. In some Paramyxovirinae, the N-terminus of P/V contains a second motif, soyuz2, which might play a role in blocking interferon signaling. Finally, we discovered that the P of related Mononegavirales contain similarly overlooked motifs in their N-termini, and that their C-termini share a previously unnoticed structural similarity suggesting a common origin. Our results suggest several testable hypotheses regarding the replication of Mononegavirales and suggest that disordered regions with little overall sequence similarity, common in viral and eukaryotic proteins, might contain currently overlooked motifs (intermediate in length between linear motifs and disordered domains that could be detected simply by comparing orthologous proteins.

  8. Genomic alterations detected by comparative genomic hybridization in ovarian endometriomas

    Directory of Open Access Journals (Sweden)

    L.C. Veiga-Castelli

    2010-08-01

    Full Text Available Endometriosis is a complex and multifactorial disease. Chromosomal imbalance screening in endometriotic tissue can be used to detect hot-spot regions in the search for a possible genetic marker for endometriosis. The objective of the present study was to detect chromosomal imbalances by comparative genomic hybridization (CGH in ectopic tissue samples from ovarian endometriomas and eutopic tissue from the same patients. We evaluated 10 ovarian endometriotic tissues and 10 eutopic endometrial tissues by metaphase CGH. CGH was prepared with normal and test DNA enzymatically digested, ligated to adaptors and amplified by PCR. A second PCR was performed for DNA labeling. Equal amounts of both normal and test-labeled DNA were hybridized in human normal metaphases. The Isis FISH Imaging System V 5.0 software was used for chromosome analysis. In both eutopic and ectopic groups, 4/10 samples presented chromosomal alterations, mainly chromosomal gains. CGH identified 11q12.3-q13.1, 17p11.1-p12, 17q25.3-qter, and 19p as critical regions. Genomic imbalances in 11q, 17p, 17q, and 19p were detected in normal eutopic and/or ectopic endometrium from women with ovarian endometriosis. These regions contain genes such as POLR2G, MXRA7 and UBA52 involved in biological processes that may lead to the establishment and maintenance of endometriotic implants. This genomic imbalance may affect genes in which dysregulation impacts both eutopic and ectopic endometrium.

  9. Motif-Independent De Novo Detection of Secondary Metabolite Gene Clusters – Towards Identification of Novel Secondary Metabolisms from Filamentous Fungi -

    Directory of Open Access Journals (Sweden)

    Myco eUmemura

    2015-05-01

    Full Text Available Secondary metabolites are produced mostly by clustered genes that are essential to their biosynthesis. The transcriptional expression of these genes is often cooperatively regulated by a transcription factor located inside or close to a cluster. Most of the secondary metabolism biosynthesis (SMB gene clusters identified to date contain so-called core genes with distinctive sequence features, such as polyketide synthase (PKS and non-ribosomal peptide synthetase (NRPS. Recent efforts in sequencing fungal genomes have revealed far more SMB gene clusters than expected based on the number of core genes in the genomes. Several bioinformatics tools have been developed to survey SMB gene clusters using the sequence motif information of the core genes, including SMURF and antiSMASH.More recently, accompanied by the development of sequencing techniques allowing to obtain large-scale genomic and transcriptomic data, motif-independent prediction methods of SMB gene clusters, including MIDDAS-M, have been developed. Most these methods detect the clusters in which the genes are cooperatively regulated at transcriptional levels, thus allowing the identification of novel SMB gene clusters regardless of the presence of the core genes. Another type of the method, MIPS-CG, uses the characteristics of SMB genes, which are highly enriched in non-syntenic blocks (NSBs, enabling the prediction even without transcriptome data although the results have not been evaluated in detail. Considering that large portion of SMB gene clusters might be sufficiently expressed only in limited uncommon conditions, it seems that prediction of SMB gene clusters by bioinformatics and successive experimental validation is an only way to efficiently uncover hidden SMB gene clusters. Here, we describe and discuss possible novel approaches for the determination of SMB gene clusters that have not been identified using conventional methods.

  10. Detection of Non-Amplified Genomic DNA

    CERN Document Server

    Corradini, Roberto

    2012-01-01

    This book offers a state-of-the-art overview on non amplified DNA detection methods and provides chemists, biochemists, biotechnologists and material scientists with an introduction to these methods. In fact all these fields have dedicated resources to the problem of nucleic acid detection, each contributing with their own specific methods and concepts. This book will explain the basic principles of the different non amplified DNA detection methods available, highlighting their respective advantages and limitations. The importance of non-amplified DNA sequencing technologies will be also discussed. Non-amplified DNA detection can be achieved by adopting different techniques. Such techniques have allowed the commercialization of innovative platforms for DNA detection that are expected to break into the DNA diagnostics market. The enhanced sensitivity required for the detection of non amplified genomic DNA has prompted new strategies that can achieve ultrasensitivity by combining specific materials with specifi...

  11. Identification of early zygotic genes in the yellow fever mosquito Aedes aegypti and discovery of a motif involved in early zygotic genome activation.

    Science.gov (United States)

    Biedler, James K; Hu, Wanqi; Tae, Hongseok; Tu, Zhijian

    2012-01-01

    During early embryogenesis the zygotic genome is transcriptionally silent and all mRNAs present are of maternal origin. The maternal-zygotic transition marks the time over which embryogenesis changes its dependence from maternal RNAs to zygotically transcribed RNAs. Here we present the first systematic investigation of early zygotic genes (EZGs) in a mosquito species and focus on genes involved in the onset of transcription during 2-4 hr. We used transcriptome sequencing to identify the "pure" (without maternal expression) EZGs by analyzing transcripts from four embryonic time ranges of 0-2, 2-4, 4-8, and 8-12 hr, which includes the time of cellular blastoderm formation and up to the start of gastrulation. Blast of 16,789 annotated transcripts vs. the transcriptome reads revealed evidence for 63 (P<0.001) and 143 (P<0.05) nonmaternally derived transcripts having a significant increase in expression at 2-4 hr. One third of the 63 EZG transcripts do not have predicted introns compared to 10% of all Ae. aegypti genes. We have confirmed by RT-PCR that zygotic transcription starts as early as 2-3 hours. A degenerate motif VBRGGTA was found to be overrepresented in the upstream sequences of the identified EZGs using a motif identification software called SCOPE. We find evidence for homology between this motif and the TAGteam motif found in Drosophila that has been implicated in EZG activation. A 38 bp sequence in the proximal upstream sequence of a kinesin light chain EZG (KLC2.1) contains two copies of the mosquito motif. This sequence was shown to support EZG transcription by luciferase reporter assays performed on injected early embryos, and confers early zygotic activity to a heterologous promoter from a divergent mosquito species. The results of these studies are consistent with the model of early zygotic genome activation via transcriptional activators, similar to what has been found recently in Drosophila.

  12. Identification of early zygotic genes in the yellow fever mosquito Aedes aegypti and discovery of a motif involved in early zygotic genome activation.

    Directory of Open Access Journals (Sweden)

    James K Biedler

    Full Text Available During early embryogenesis the zygotic genome is transcriptionally silent and all mRNAs present are of maternal origin. The maternal-zygotic transition marks the time over which embryogenesis changes its dependence from maternal RNAs to zygotically transcribed RNAs. Here we present the first systematic investigation of early zygotic genes (EZGs in a mosquito species and focus on genes involved in the onset of transcription during 2-4 hr. We used transcriptome sequencing to identify the "pure" (without maternal expression EZGs by analyzing transcripts from four embryonic time ranges of 0-2, 2-4, 4-8, and 8-12 hr, which includes the time of cellular blastoderm formation and up to the start of gastrulation. Blast of 16,789 annotated transcripts vs. the transcriptome reads revealed evidence for 63 (P<0.001 and 143 (P<0.05 nonmaternally derived transcripts having a significant increase in expression at 2-4 hr. One third of the 63 EZG transcripts do not have predicted introns compared to 10% of all Ae. aegypti genes. We have confirmed by RT-PCR that zygotic transcription starts as early as 2-3 hours. A degenerate motif VBRGGTA was found to be overrepresented in the upstream sequences of the identified EZGs using a motif identification software called SCOPE. We find evidence for homology between this motif and the TAGteam motif found in Drosophila that has been implicated in EZG activation. A 38 bp sequence in the proximal upstream sequence of a kinesin light chain EZG (KLC2.1 contains two copies of the mosquito motif. This sequence was shown to support EZG transcription by luciferase reporter assays performed on injected early embryos, and confers early zygotic activity to a heterologous promoter from a divergent mosquito species. The results of these studies are consistent with the model of early zygotic genome activation via transcriptional activators, similar to what has been found recently in Drosophila.

  13. I-motif DNA structures are formed in the nuclei of human cells

    Science.gov (United States)

    Zeraati, Mahdi; Langley, David B.; Schofield, Peter; Moye, Aaron L.; Rouet, Romain; Hughes, William E.; Bryan, Tracy M.; Dinger, Marcel E.; Christ, Daniel

    2018-06-01

    Human genome function is underpinned by the primary storage of genetic information in canonical B-form DNA, with a second layer of DNA structure providing regulatory control. I-motif structures are thought to form in cytosine-rich regions of the genome and to have regulatory functions; however, in vivo evidence for the existence of such structures has so far remained elusive. Here we report the generation and characterization of an antibody fragment (iMab) that recognizes i-motif structures with high selectivity and affinity, enabling the detection of i-motifs in the nuclei of human cells. We demonstrate that the in vivo formation of such structures is cell-cycle and pH dependent. Furthermore, we provide evidence that i-motif structures are formed in regulatory regions of the human genome, including promoters and telomeric regions. Our results support the notion that i-motif structures provide key regulatory roles in the genome.

  14. Selection against spurious promoter motifs correlates withtranslational efficiency across bacteria

    Energy Technology Data Exchange (ETDEWEB)

    Froula, Jeffrey L.; Francino, M. Pilar

    2007-05-01

    Because binding of RNAP to misplaced sites could compromise the efficiency of transcription, natural selection for the optimization of gene expression should regulate the distribution of DNA motifs capable of RNAP-binding across the genome. Here we analyze the distribution of the -10 promoter motifs that bind the {sigma}{sup 70} subunit of RNAP in 42 bacterial genomes. We show that selection on these motifs operates across the genome, maintaining an over-representation of -10 motifs in regulatory sequences while eliminating them from the nonfunctional and, in most cases, from the protein coding regions. In some genomes, however, -10 sites are over-represented in the coding sequences; these sites could induce pauses effecting regulatory roles throughout the length of a transcriptional unit. For nonfunctional sequences, the extent of motif under-representation varies across genomes in a manner that broadly correlates with the number of tRNA genes, a good indicator of translational speed and growth rate. This suggests that minimizing the time invested in gene transcription is an important selective pressure against spurious binding. However, selection against spurious binding is detectable in the reduced genomes of host-restricted bacteria that grow at slow rates, indicating that components of efficiency other than speed may also be important. Minimizing the number of RNAP molecules per cell required for transcription, and the corresponding energetic expense, may be most relevant in slow growers. These results indicate that genome-level properties affecting the efficiency of transcription and translation can respond in an integrated manner to optimize gene expression. The detection of selection against promoter motifs in nonfunctional regions also implies that no sequence may evolve free of selective constraints, at least in the relatively small and unstructured genomes of bacteria.

  15. Predicting tissue specific cis-regulatory modules in the human genome using pairs of co-occurring motifs

    Directory of Open Access Journals (Sweden)

    Girgis Hani Z

    2012-02-01

    Full Text Available Abstract Background Researchers seeking to unlock the genetic basis of human physiology and diseases have been studying gene transcription regulation. The temporal and spatial patterns of gene expression are controlled by mainly non-coding elements known as cis-regulatory modules (CRMs and epigenetic factors. CRMs modulating related genes share the regulatory signature which consists of transcription factor (TF binding sites (TFBSs. Identifying such CRMs is a challenging problem due to the prohibitive number of sequence sets that need to be analyzed. Results We formulated the challenge as a supervised classification problem even though experimentally validated CRMs were not required. Our efforts resulted in a software system named CrmMiner. The system mines for CRMs in the vicinity of related genes. CrmMiner requires two sets of sequences: a mixed set and a control set. Sequences in the vicinity of the related genes comprise the mixed set, whereas the control set includes random genomic sequences. CrmMiner assumes that a large percentage of the mixed set is made of background sequences that do not include CRMs. The system identifies pairs of closely located motifs representing vertebrate TFBSs that are enriched in the training mixed set consisting of 50% of the gene loci. In addition, CrmMiner selects a group of the enriched pairs to represent the tissue-specific regulatory signature. The mixed and the control sets are searched for candidate sequences that include any of the selected pairs. Next, an optimal Bayesian classifier is used to distinguish candidates found in the mixed set from their control counterparts. Our study proposes 62 tissue-specific regulatory signatures and putative CRMs for different human tissues and cell types. These signatures consist of assortments of ubiquitously expressed TFs and tissue-specific TFs. Under controlled settings, CrmMiner identified known CRMs in noisy sets up to 1:25 signal-to-noise ratio. CrmMiner was

  16. Molecular Detection, Phylogenetic Analysis, and Identification of Transcription Motifs in Feline Leukemia Virus from Naturally Infected Cats in Malaysia

    Directory of Open Access Journals (Sweden)

    Faruku Bande

    2014-01-01

    Full Text Available A nested PCR assay was used to determine the viral RNA and proviral DNA status of naturally infected cats. Selected samples that were FeLV-positive by PCR were subjected to sequencing, phylogenetic analysis, and motifs search. Of the 39 samples that were positive for FeLV p27 antigen, 87.2% (34/39 were confirmed positive with nested PCR. FeLV proviral DNA was detected in 38 (97.3% of p27-antigen negative samples. Malaysian FeLV isolates are found to be highly similar with a homology of 91% to 100%. Phylogenetic analysis revealed that Malaysian FeLV isolates divided into two clusters, with a majority (86.2% sharing similarity with FeLV-K01803 and fewer isolates (13.8% with FeLV-GM1 strain. Different enhancer motifs including NF-GMa, Krox-20/WT1I-del2, BAF1, AP-2, TBP, TFIIF-beta, TRF, and TFIID are found to occur either in single, duplicate, triplicate, or sets of 5 in different positions within the U3-LTR-gag region. The present result confirms the occurrence of FeLV viral RNA and provirus DNA in naturally infected cats. Malaysian FeLV isolates are highly similar, and a majority of them are closely related to a UK isolate. This study provides the first molecular based information on FeLV in Malaysia. Additionally, different enhancer motifs likely associated with FeLV related pathogenesis have been identified.

  17. Genome-wide prediction and functional validation of promoter motifs regulating gene expression in spore and infection stages of Phytophthora infestans.

    Directory of Open Access Journals (Sweden)

    Sourav Roy

    2013-03-01

    Full Text Available Most eukaryotic pathogens have complex life cycles in which gene expression networks orchestrate the formation of cells specialized for dissemination or host colonization. In the oomycete Phytophthora infestans, the potato late blight pathogen, major shifts in mRNA profiles during developmental transitions were identified using microarrays. We used those data with search algorithms to discover about 100 motifs that are over-represented in promoters of genes up-regulated in hyphae, sporangia, sporangia undergoing zoosporogenesis, swimming zoospores, or germinated cysts forming appressoria (infection structures. Most of the putative stage-specific transcription factor binding sites (TFBSs thus identified had features typical of TFBSs such as position or orientation bias, palindromy, and conservation in related species. Each of six motifs tested in P. infestans transformants using the GUS reporter gene conferred the expected stage-specific expression pattern, and several were shown to bind nuclear proteins in gel-shift assays. Motifs linked to the appressoria-forming stage, including a functionally validated TFBS, were over-represented in promoters of genes encoding effectors and other pathogenesis-related proteins. To understand how promoter and genome architecture influence expression, we also mapped transcription patterns to the P. infestans genome assembly. Adjacent genes were not typically induced in the same stage, including genes transcribed in opposite directions from small intergenic regions, but co-regulated gene pairs occurred more than expected by random chance. These data help illuminate the processes regulating development and pathogenesis, and will enable future attempts to purify the cognate transcription factors.

  18. gmos: Rapid Detection of Genome Mosaicism over Short Evolutionary Distances.

    Science.gov (United States)

    Domazet-Lošo, Mirjana; Domazet-Lošo, Tomislav

    2016-01-01

    Prokaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cannot map (align) a query to subject genomes. To address this problem, we have developed gmos (Genome MOsaic Structure), a new program that determines the mosaic structure of query genomes when compared to a set of closely related subject genomes. The program first computes local alignments between query and subject genomes and then reconstructs the query mosaic structure by choosing the best local alignment for each query region. To accomplish the analysis quickly, the program mostly relies on pairwise alignments and constructs multiple sequence alignments over short overlapping subject regions only when necessary. This fine-tuned implementation achieves an efficiency comparable to an alignment-free tool. The program performs well for simulated and real data sets of closely related genomes and can be used for fast recombination detection; for instance, when a new prokaryotic pathogen is discovered. As an example, gmos was used to detect genome mosaicism in a pathogenic Enterococcus faecium strain compared to seven closely related genomes. The analysis took less than two minutes on a single 2.1 GHz processor. The output is available in fasta format and can be visualized using an accessory program, gmosDraw (freely available with gmos).

  19. Mitotic control of human papillomavirus genome-containing cells is regulated by the function of the PDZ-binding motif of the E6 oncoprotein

    Science.gov (United States)

    Marsh, Elizabeth K.; Delury, Craig P.; Davies, Nicholas J.; Weston, Christopher J.; Miah, Mohammed A.L.; Banks, Lawrence; Parish, Joanna L.

    2017-01-01

    The function of a conserved PDS95/DLG1/ZO1 (PDZ) binding motif (E6 PBM) at the C-termini of E6 oncoproteins of high-risk human papillomavirus (HPV) types contributes to the development of HPV-associated malignancies. Here, using a primary human keratinocyte-based model of the high-risk HPV18 life cycle, we identify a novel link between the E6 PBM and mitotic stability. In cultures containing a mutant genome in which the E6 PBM was deleted there was an increase in the frequency of abnormal mitoses, including multinucleation, compared to cells harboring the wild type HPV18 genome. The loss of the E6 PBM was associated with a significant increase in the frequency of mitotic spindle defects associated with anaphase and telophase. Furthermore, cells carrying this mutant genome had increased chromosome segregation defects and they also exhibited greater levels of genomic instability, as shown by an elevated level of centromere-positive micronuclei. In wild type HPV18 genome-containing organotypic cultures, the majority of mitotic cells reside in the suprabasal layers, in keeping with the hyperplastic morphology of the structures. However, in mutant genome-containing structures a greater proportion of mitotic cells were retained in the basal layer, which were often of undefined polarity, thus correlating with their reduced thickness. We conclude that the ability of E6 to target cellular PDZ proteins plays a critical role in maintaining mitotic stability of HPV infected cells, ensuring stable episome persistence and vegetative amplification. PMID:28061478

  20. Mitotic control of human papillomavirus genome-containing cells is regulated by the function of the PDZ-binding motif of the E6 oncoprotein.

    Science.gov (United States)

    Marsh, Elizabeth K; Delury, Craig P; Davies, Nicholas J; Weston, Christopher J; Miah, Mohammed A L; Banks, Lawrence; Parish, Joanna L; Higgs, Martin R; Roberts, Sally

    2017-03-21

    The function of a conserved PDS95/DLG1/ZO1 (PDZ) binding motif (E6 PBM) at the C-termini of E6 oncoproteins of high-risk human papillomavirus (HPV) types contributes to the development of HPV-associated malignancies. Here, using a primary human keratinocyte-based model of the high-risk HPV18 life cycle, we identify a novel link between the E6 PBM and mitotic stability. In cultures containing a mutant genome in which the E6 PBM was deleted there was an increase in the frequency of abnormal mitoses, including multinucleation, compared to cells harboring the wild type HPV18 genome. The loss of the E6 PBM was associated with a significant increase in the frequency of mitotic spindle defects associated with anaphase and telophase. Furthermore, cells carrying this mutant genome had increased chromosome segregation defects and they also exhibited greater levels of genomic instability, as shown by an elevated level of centromere-positive micronuclei. In wild type HPV18 genome-containing organotypic cultures, the majority of mitotic cells reside in the suprabasal layers, in keeping with the hyperplastic morphology of the structures. However, in mutant genome-containing structures a greater proportion of mitotic cells were retained in the basal layer, which were often of undefined polarity, thus correlating with their reduced thickness. We conclude that the ability of E6 to target cellular PDZ proteins plays a critical role in maintaining mitotic stability of HPV infected cells, ensuring stable episome persistence and vegetative amplification.

  1. Motif trie: An efficient text index for pattern discovery with don't cares

    DEFF Research Database (Denmark)

    Grossi, Roberto; Menconi, Giulia; Pisanti, Nadia

    2017-01-01

    We introduce the motif trie data structure, which has applications in pattern matching and discovery in genomic analysis, plagiarism detection, data mining, intrusion detection, spam fighting and time series analysis, to name a few. Here the extraction of recurring patterns in sequential and text......We introduce the motif trie data structure, which has applications in pattern matching and discovery in genomic analysis, plagiarism detection, data mining, intrusion detection, spam fighting and time series analysis, to name a few. Here the extraction of recurring patterns in sequential...

  2. Adaptation of the short intergenic spacers between co-directional genes to the Shine-Dalgarno motif among prokaryote genomes

    DEFF Research Database (Denmark)

    Caro, Albert Pallejà; García-Vallvé, Santiago; Romeu, Antoni

    2009-01-01

    ABSTRACT: BACKGROUND: In prokaryote genomes most of the co-directional genes are in close proximity. Even the coding sequence or the stop codon of a gene can overlap with the Shine-Dalgarno (SD) sequence of the downstream co-directional gene. In this paper we analyze how the presence of SD may...... influence the stop codon usage or the spacing lengths between co-directional genes. RESULTS: The SD sequences for 530 prokaryote genomes have been predicted using computer calculations of the base-pairing free energy between translation initiation regions and the 16S rRNA 3' tail. Genomes with a large...... to the discussion of which factors affect the intergenic lengths, which cannot be totally explained by the pressure to compact the prokaryote genomes....

  3. Automatic annotation of protein motif function with Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  4. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology

    DEFF Research Database (Denmark)

    Cao, Hongzhi; Hastie, Alex R.; Cao, Dandan

    2014-01-01

    mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost......-effective genome mapping technology to comprehensively discover genome-wide SVs and characterize complex regions of the YH genome using long single molecules (>150 kb) in a global fashion. RESULTS: Utilizing nanochannel-based genome mapping technology, we obtained 708 insertions/deletions and 17 inversions larger...... fosmid data. Of the remaining 270 SVs, 260 are insertions and 213 overlap known SVs in the Database of Genomic Variants. Overall, 609 out of 666 (90%) variants were supported by experimental orthogonal methods or historical evidence in public databases. At the same time, genome mapping also provides...

  5. Detection of genomic instability in hypospadias patients by random ...

    African Journals Online (AJOL)

    DIRECTOR

    2011-05-16

    May 16, 2011 ... organism including bacteria (Sahoo et al., 2010), fungi. (Motlagh and Anvari ... technique that detects genomic alteration correlated with human tumor is .... chromosomal instability among hypospadias patients. REFERENCES.

  6. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    Directory of Open Access Journals (Sweden)

    Saray Santamaría-Hernando

    Full Text Available Proteins of the animal heme peroxidase (ANP superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20, where it was found to be involved in Ca(2+ coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+ binding with a K(D of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821 is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of

  7. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    Science.gov (United States)

    Santamaría-Hernando, Saray; Krell, Tino; Ramos-González, María-Isabel

    2012-01-01

    Proteins of the animal heme peroxidase (ANP) superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20), where it was found to be involved in Ca(2+) coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+) binding with a K(D) of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821) is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of life.

  8. Detecting uber-operons in prokaryotic genomes.

    Science.gov (United States)

    Che, Dongsheng; Li, Guojun; Mao, Fenglou; Wu, Hongwei; Xu, Ying

    2006-01-01

    We present a study on computational identification of uber-operons in a prokaryotic genome, each of which represents a group of operons that are evolutionarily or functionally associated through operons in other (reference) genomes. Uber-operons represent a rich set of footprints of operon evolution, whose full utilization could lead to new and more powerful tools for elucidation of biological pathways and networks than what operons have provided, and a better understanding of prokaryotic genome structures and evolution. Our prediction algorithm predicts uber-operons through identifying groups of functionally or transcriptionally related operons, whose gene sets are conserved across the target and multiple reference genomes. Using this algorithm, we have predicted uber-operons for each of a group of 91 genomes, using the other 90 genomes as references. In particular, we predicted 158 uber-operons in Escherichia coli K12 covering 1830 genes, and found that many of the uber-operons correspond to parts of known regulons or biological pathways or are involved in highly related biological processes based on their Gene Ontology (GO) assignments. For some of the predicted uber-operons that are not parts of known regulons or pathways, our analyses indicate that their genes are highly likely to work together in the same biological processes, suggesting the possibility of new regulons and pathways. We believe that our uber-operon prediction provides a highly useful capability and a rich information source for elucidation of complex biological processes, such as pathways in microbes. All the prediction results are available at our Uber-Operon Database: http://csbl.bmb.uga.edu/uber, the first of its kind.

  9. gmos: Rapid Detection of Genome Mosaicism over Short Evolutionary Distances.

    Directory of Open Access Journals (Sweden)

    Mirjana Domazet-Lošo

    Full Text Available Prokaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cannot map (align a query to subject genomes. To address this problem, we have developed gmos (Genome MOsaic Structure, a new program that determines the mosaic structure of query genomes when compared to a set of closely related subject genomes. The program first computes local alignments between query and subject genomes and then reconstructs the query mosaic structure by choosing the best local alignment for each query region. To accomplish the analysis quickly, the program mostly relies on pairwise alignments and constructs multiple sequence alignments over short overlapping subject regions only when necessary. This fine-tuned implementation achieves an efficiency comparable to an alignment-free tool. The program performs well for simulated and real data sets of closely related genomes and can be used for fast recombination detection; for instance, when a new prokaryotic pathogen is discovered. As an example, gmos was used to detect genome mosaicism in a pathogenic Enterococcus faecium strain compared to seven closely related genomes. The analysis took less than two minutes on a single 2.1 GHz processor. The output is available in fasta format and can be visualized using an accessory program, gmosDraw (freely available with gmos.

  10. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

    Directory of Open Access Journals (Sweden)

    Lynch Michael

    2010-05-01

    Full Text Available Abstract Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1 shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2 are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3 reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  11. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium.

    Science.gov (United States)

    Catania, Francesco; Lynch, Michael

    2010-05-04

    In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  12. Genomecmp: computer software to detect genomic rearrangements using markers

    Science.gov (United States)

    Kulawik, Maciej; Nowak, Robert M.

    2017-08-01

    Detection of genomics rearrangements is a tough task, because of the size of data to be processed. As genome sequences may consist of hundreds of millions symbols, it is not only practically impossible to compare them by hand, but it is also complex problem for computer software. The way to significantly accelerate the process is to use rearrangement detection algorithm based on unique short sequences called markers. The algorithm described in this paper develops markers using base genome and find the markers positions on other genome. The algorithm has been extended by support for ambiguity symbols. Web application with graphical user interface has been created using three-layer architecture, where users could run the task simultaneously. The accuracy and efficiency of proposed solution has been studied using generated and real data.

  13. An Inhibitory Motif on the 5’UTR of Several Rotavirus Genome Segments Affects Protein Expression and Reverse Genetics Strategies

    Science.gov (United States)

    Papa, Guido; Eichwald, Catherine; Burrone, Oscar R.

    2016-01-01

    Rotavirus genome consists of eleven segments of dsRNA, each encoding one single protein. Viral mRNAs contain an open reading frame (ORF) flanked by relatively short untranslated regions (UTRs), whose role in the viral cycle remains elusive. Here we investigated the role of 5’UTRs in T7 polymerase-driven cDNAs expression in uninfected cells. The 5’UTRs of eight genome segments (gs3, gs5-6, gs7-11) of the simian SA11 strain showed a strong inhibitory effect on the expression of viral proteins. Decreased protein expression was due to both compromised transcription and translation and was independent of the ORF and the 3’UTR sequences. Analysis of several mutants of the 21-nucleotide long 5’UTR of gs 11 defined an inhibitory motif (IM) represented by its primary sequence rather than its secondary structure. IM was mapped to the 5’ terminal 6-nucleotide long pyrimidine-rich tract 5’-GGY(U/A)UY-3’. The 5’ terminal position within the mRNA was shown to be essentially required, as inhibitory activity was lost when IM was moved to an internal position. We identified two mutations (insertion of a G upstream the 5’UTR and the U to A mutation of the fifth nucleotide of IM) that render IM non-functional and increase the transcription and translation rate to levels that could considerably improve the efficiency of virus helper-free reverse genetics strategies. PMID:27846320

  14. Genome-Enhanced Detection and Identification (GEDI of plant pathogens

    Directory of Open Access Journals (Sweden)

    Nicolas Feau

    2018-02-01

    Full Text Available Plant diseases caused by fungi and Oomycetes represent worldwide threats to crops and forest ecosystems. Effective prevention and appropriate management of emerging diseases rely on rapid detection and identification of the causal pathogens. The increase in genomic resources makes it possible to generate novel genome-enhanced DNA detection assays that can exploit whole genomes to discover candidate genes for pathogen detection. A pipeline was developed to identify genome regions that discriminate taxa or groups of taxa and can be converted into PCR assays. The modular pipeline is comprised of four components: (1 selection and genome sequencing of phylogenetically related taxa, (2 identification of clusters of orthologous genes, (3 elimination of false positives by filtering, and (4 assay design. This pipeline was applied to some of the most important plant pathogens across three broad taxonomic groups: Phytophthoras (Stramenopiles, Oomycota, Dothideomycetes (Fungi, Ascomycota and Pucciniales (Fungi, Basidiomycota. Comparison of 73 fungal and Oomycete genomes led the discovery of 5,939 gene clusters that were unique to the targeted taxa and an additional 535 that were common at higher taxonomic levels. Approximately 28% of the 299 tested were converted into qPCR assays that met our set of specificity criteria. This work demonstrates that a genome-wide approach can efficiently identify multiple taxon-specific genome regions that can be converted into highly specific PCR assays. The possibility to easily obtain multiple alternative regions to design highly specific qPCR assays should be of great help in tackling challenging cases for which higher taxon-resolution is needed.

  15. Directional genomic hybridization for chromosomal inversion discovery and detection.

    Science.gov (United States)

    Ray, F Andrew; Zimmerman, Erin; Robinson, Bruce; Cornforth, Michael N; Bedford, Joel S; Goodwin, Edwin H; Bailey, Susan M

    2013-04-01

    Chromosomal rearrangements are a source of structural variation within the genome that figure prominently in human disease, where the importance of translocations and deletions is well recognized. In principle, inversions-reversals in the orientation of DNA sequences within a chromosome-should have similar detrimental potential. However, the study of inversions has been hampered by traditional approaches used for their detection, which are not particularly robust. Even with significant advances in whole genome approaches, changes in the absolute orientation of DNA remain difficult to detect routinely. Consequently, our understanding of inversions is still surprisingly limited, as is our appreciation for their frequency and involvement in human disease. Here, we introduce the directional genomic hybridization methodology of chromatid painting-a whole new way of looking at structural features of the genome-that can be employed with high resolution on a cell-by-cell basis, and demonstrate its basic capabilities for genome-wide discovery and targeted detection of inversions. Bioinformatics enabled development of sequence- and strand-specific directional probe sets, which when coupled with single-stranded hybridization, greatly improved the resolution and ease of inversion detection. We highlight examples of the far-ranging applicability of this cytogenomics-based approach, which include confirmation of the alignment of the human genome database and evidence that individuals themselves share similar sequence directionality, as well as use in comparative and evolutionary studies for any species whose genome has been sequenced. In addition to applications related to basic mechanistic studies, the information obtainable with strand-specific hybridization strategies may ultimately enable novel gene discovery, thereby benefitting the diagnosis and treatment of a variety of human disease states and disorders including cancer, autism, and idiopathic infertility.

  16. A regenerated electrochemical biosensor for label-free detection of glucose and urea based on conformational switch of i-motif oligonucleotide probe

    Energy Technology Data Exchange (ETDEWEB)

    Gao, Zhong Feng; Chen, Dong Mei [Key Laboratory of Eco-environments in Three Gorges Reservoir Region (Ministry of Education), School of Chemistry and Chemical Engineering, Southwest University, Chongqing 400715 (China); Lei, Jing Lei [School of Chemistry and Chemical Engineering, Chongqing University, Chongqing 400044 (China); Luo, Hong Qun, E-mail: luohq@swu.edu.cn [Key Laboratory of Eco-environments in Three Gorges Reservoir Region (Ministry of Education), School of Chemistry and Chemical Engineering, Southwest University, Chongqing 400715 (China); Li, Nian Bing, E-mail: linb@swu.edu.cn [Key Laboratory of Eco-environments in Three Gorges Reservoir Region (Ministry of Education), School of Chemistry and Chemical Engineering, Southwest University, Chongqing 400715 (China)

    2015-10-15

    Improving the reproducibility of electrochemical signal remains a great challenge over the past decades. In this work, i-motif oligonucleotide probe-based electrochemical DNA (E-DNA) sensor is introduced for the first time as a regenerated sensing platform, which enhances the reproducibility of electrochemical signal, for label-free detection of glucose and urea. The addition of glucose or urea is able to activate glucose oxidase-catalyzed or urease-catalyzed reaction, inducing or destroying the formation of i-motif oligonucleotide probe. The conformational switch of oligonucleotide probe can be recorded by electrochemical impedance spectroscopy. Thus, the difference of electron transfer resistance is utilized for the quantitative determination of glucose and urea. We further demonstrate that the E-DNA sensor exhibits high selectivity, excellent stability, and remarkable regenerated ability. The human serum analysis indicates that this simple and regenerated strategy holds promising potential in future biosensing applications. - Highlights: • Conformational switch of i-motif is used for the detection of glucose and urea. • The sensor can be regenerated. • The proposed method is successfully applied in real sample assay. • Our method is label-free and inexpensive.

  17. Detecting individual ancestry in the human genome

    NARCIS (Netherlands)

    A. Wollstein (Andreas); O. Lao Grueso (Oscar)

    2015-01-01

    textabstractDetecting and quantifying the population substructure present in a sample of individuals are of main interest in the fields of genetic epidemiology, population genetics, and forensics among others. To date, several algorithms have been proposed for estimating the amount of genetic

  18. Motif enrichment tool.

    Science.gov (United States)

    Blatti, Charles; Sinha, Saurabh

    2014-07-01

    The Motif Enrichment Tool (MET) provides an online interface that enables users to find major transcriptional regulators of their gene sets of interest. MET searches the appropriate regulatory region around each gene and identifies which transcription factor DNA-binding specificities (motifs) are statistically overrepresented. Motif enrichment analysis is currently available for many metazoan species including human, mouse, fruit fly, planaria and flowering plants. MET also leverages high-throughput experimental data such as ChIP-seq and DNase-seq from ENCODE and ModENCODE to identify the regulatory targets of a transcription factor with greater precision. The results from MET are produced in real time and are linked to a genome browser for easy follow-up analysis. Use of the web tool is free and open to all, and there is no login requirement. ADDRESS: http://veda.cs.uiuc.edu/MET/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. VERSE: a novel approach to detect virus integration in host genomes through reference genome customization.

    Science.gov (United States)

    Wang, Qingguo; Jia, Peilin; Zhao, Zhongming

    2015-01-01

    Fueled by widespread applications of high-throughput next generation sequencing (NGS) technologies and urgent need to counter threats of pathogenic viruses, large-scale studies were conducted recently to investigate virus integration in host genomes (for example, human tumor genomes) that may cause carcinogenesis or other diseases. A limiting factor in these studies, however, is rapid virus evolution and resulting polymorphisms, which prevent reads from aligning readily to commonly used virus reference genomes, and, accordingly, make virus integration sites difficult to detect. Another confounding factor is host genomic instability as a result of virus insertions. To tackle these challenges and improve our capability to identify cryptic virus-host fusions, we present a new approach that detects Virus intEgration sites through iterative Reference SEquence customization (VERSE). To the best of our knowledge, VERSE is the first approach to improve detection through customizing reference genomes. Using 19 human tumors and cancer cell lines as test data, we demonstrated that VERSE substantially enhanced the sensitivity of virus integration site detection. VERSE is implemented in the open source package VirusFinder 2 that is available at http://bioinfo.mc.vanderbilt.edu/VirusFinder/.

  20. Comparing genetic variants detected in the 1000 genomes project ...

    Indian Academy of Sciences (India)

    Single-nucleotide polymorphisms (SNPs) determined based on SNP arrays from the international HapMap consortium (HapMap) and the genetic variants detected in the 1000 genomes project (1KGP) can serve as two references for genomewide association studies (GWAS). We conducted comparative analyses to provide ...

  1. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed; Mansour, Essam; Kalnis, Panos

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern

  2. Detection and analysis of ancient segmental duplications in mammalian genomes.

    Science.gov (United States)

    Pu, Lianrong; Lin, Yu; Pevzner, Pavel A

    2018-05-07

    Although segmental duplications (SDs) represent hotbeds for genomic rearrangements and emergence of new genes, there are still no easy-to-use tools for identifying SDs. Moreover, while most previous studies focused on recently emerged SDs, detection of ancient SDs remains an open problem. We developed an SDquest algorithm for SD finding and applied it to analyzing SDs in human, gorilla, and mouse genomes. Our results demonstrate that previous studies missed many SDs in these genomes and show that SDs account for at least 6.05% of the human genome (version hg19), a 17% increase as compared to the previous estimate. Moreover, SDquest classified 6.42% of the latest GRCh38 version of the human genome as SDs, a large increase as compared to previous studies. We thus propose to re-evaluate evolution of SDs based on their accurate representation across multiple genomes. Toward this goal, we analyzed the complex mosaic structure of SDs and decomposed mosaic SDs into elementary SDs, a prerequisite for follow-up evolutionary analysis. We also introduced the concept of the breakpoint graph of mosaic SDs that revealed SD hotspots and suggested that some SDs may have originated from circular extrachromosomal DNA (ecDNA), not unlike ecDNA that contributes to accelerated evolution in cancer. © 2018 Pu et al.; Published by Cold Spring Harbor Laboratory Press.

  3. BayesMotif: de novo protein sorting motif discovery from impure datasets.

    Science.gov (United States)

    Hu, Jianjun; Zhang, Fan

    2010-01-18

    Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of

  4. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

    Directory of Open Access Journals (Sweden)

    Qingyu Chen

    Full Text Available First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases.We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.

  5. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

    Science.gov (United States)

    Chen, Qingyu; Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.

  6. Statistical tests to compare motif count exceptionalities

    Directory of Open Access Journals (Sweden)

    Vandewalle Vincent

    2007-03-01

    Full Text Available Abstract Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use.

  7. Motif signatures of transcribed enhancers

    KAUST Repository

    Kleftogiannis, Dimitrios

    2017-09-14

    In mammalian cells, transcribed enhancers (TrEn) play important roles in the initiation of gene expression and maintenance of gene expression levels in spatiotemporal manner. One of the most challenging questions in biology today is how the genomic characteristics of enhancers relate to enhancer activities. This is particularly critical, as several recent studies have linked enhancer sequence motifs to specific functional roles. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers genomic code in a more systematic way. To address this problem, we developed a novel computational method, TELS, aimed at identifying predictive cell type/tissue specific motif signatures. We used TELS to compile a comprehensive catalog of motif signatures for all known TrEn identified by the FANTOM5 consortium across 112 human primary cells and tissues. Our results confirm that distinct cell type/tissue specific motif signatures characterize TrEn. These signatures allow discriminating successfully a) TrEn from random controls, proxy of non-enhancer activity, and b) cell type/tissue specific TrEn from enhancers expressed and transcribed in different cell types/tissues. TELS codes and datasets are publicly available at http://www.cbrc.kaust.edu.sa/TELS.

  8. Detecting Genomic Signatures of Natural Selection with Principal Component Analysis: Application to the 1000 Genomes Data.

    Science.gov (United States)

    Duforet-Frebourg, Nicolas; Luu, Keurcien; Laval, Guillaume; Bazin, Eric; Blum, Michael G B

    2016-04-01

    To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis (PCA). We show that the common FST index of genetic differentiation between populations can be viewed as the proportion of variance explained by the principal components. Considering the correlations between genetic variants and each principal component provides a conceptual framework to detect genetic variants involved in local adaptation without any prior definition of populations. To validate the PCA-based approach, we consider the 1000 Genomes data (phase 1) considering 850 individuals coming from Africa, Asia, and Europe. The number of genetic variants is of the order of 36 millions obtained with a low-coverage sequencing depth (3×). The correlations between genetic variation and each principal component provide well-known targets for positive selection (EDAR, SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN, KCNMA, MYO5C) and noncoding RNAs. In addition to identifying genes involved in biological adaptation, we identify two biological pathways involved in polygenic adaptation that are related to the innate immune system (beta defensins) and to lipid metabolism (fatty acid omega oxidation). An additional analysis of European data shows that a genome scan based on PCA retrieves classical examples of local adaptation even when there are no well-defined populations. PCA-based statistics, implemented in the PCAdapt R package and the PCAdapt fast open-source software, retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially when defining populations is difficult. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  9. Genome-wide targeted prediction of ABA responsive genes in rice based on over-represented cis-motif in co-expressed genes.

    Science.gov (United States)

    Lenka, Sangram K; Lohia, Bikash; Kumar, Abhay; Chinnusamy, Viswanathan; Bansal, Kailash C

    2009-02-01

    Abscisic acid (ABA), the popular plant stress hormone, plays a key role in regulation of sub-set of stress responsive genes. These genes respond to ABA through specific transcription factors which bind to cis-regulatory elements present in their promoters. We discovered the ABA Responsive Element (ABRE) core (ACGT) containing CGMCACGTGB motif as over-represented motif among the promoters of ABA responsive co-expressed genes in rice. Targeted gene prediction strategy using this motif led to the identification of 402 protein coding genes potentially regulated by ABA-dependent molecular genetic network. RT-PCR analysis of arbitrarily chosen 45 genes from the predicted 402 genes confirmed 80% accuracy of our prediction. Plant Gene Ontology (GO) analysis of ABA responsive genes showed enrichment of signal transduction and stress related genes among diverse functional categories.

  10. Genovar: a detection and visualization tool for genomic variants.

    Science.gov (United States)

    Jung, Kwang Su; Moon, Sanghoon; Kim, Young Jin; Kim, Bong-Jo; Park, Kiejung

    2012-05-08

    Along with single nucleotide polymorphisms (SNPs), copy number variation (CNV) is considered an important source of genetic variation associated with disease susceptibility. Despite the importance of CNV, the tools currently available for its analysis often produce false positive results due to limitations such as low resolution of array platforms, platform specificity, and the type of CNV. To resolve this problem, spurious signals must be separated from true signals by visual inspection. None of the previously reported CNV analysis tools support this function and the simultaneous visualization of comparative genomic hybridization arrays (aCGH) and sequence alignment. The purpose of the present study was to develop a useful program for the efficient detection and visualization of CNV regions that enables the manual exclusion of erroneous signals. A JAVA-based stand-alone program called Genovar was developed. To ascertain whether a detected CNV region is a novel variant, Genovar compares the detected CNV regions with previously reported CNV regions using the Database of Genomic Variants (DGV, http://projects.tcag.ca/variation) and the Single Nucleotide Polymorphism Database (dbSNP). The current version of Genovar is capable of visualizing genomic data from sources such as the aCGH data file and sequence alignment format files. Genovar is freely accessible and provides a user-friendly graphic user interface (GUI) to facilitate the detection of CNV regions. The program also provides comprehensive information to help in the elimination of spurious signals by visual inspection, making Genovar a valuable tool for reducing false positive CNV results. http://genovar.sourceforge.net/.

  11. SNP detection for massively parallel whole-genome resequencing

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Li, Yingrui; Fang, Xiaodong

    2009-01-01

    -genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology. We designed this method by carefully considering the data quality, alignment, and experimental errors common to this technology. All...... of this information was integrated into a single quality score for each base under Bayesian theory to measure the accuracy of consensus calling. We tested this methodology using a large-scale human resequencing data set of 36x coverage and assembled a high-quality nonrepetitive consensus sequence for 92.......25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered...

  12. On detection and assessment of statistical significance of Genomic Islands

    Directory of Open Access Journals (Sweden)

    Chaudhuri Probal

    2008-04-01

    Full Text Available Abstract Background Many of the available methods for detecting Genomic Islands (GIs in prokaryotic genomes use markers such as transposons, proximal tRNAs, flanking repeats etc., or they use other supervised techniques requiring training datasets. Most of these methods are primarily based on the biases in GC content or codon and amino acid usage of the islands. However, these methods either do not use any formal statistical test of significance or use statistical tests for which the critical values and the P-values are not adequately justified. We propose a method, which is unsupervised in nature and uses Monte-Carlo statistical tests based on randomly selected segments of a chromosome. Such tests are supported by precise statistical distribution theory, and consequently, the resulting P-values are quite reliable for making the decision. Results Our algorithm (named Design-Island, an acronym for Detection of Statistically Significant Genomic Island runs in two phases. Some 'putative GIs' are identified in the first phase, and those are refined into smaller segments containing horizontally acquired genes in the refinement phase. This method is applied to Salmonella typhi CT18 genome leading to the discovery of several new pathogenicity, antibiotic resistance and metabolic islands that were missed by earlier methods. Many of these islands contain mobile genetic elements like phage-mediated genes, transposons, integrase and IS elements confirming their horizontal acquirement. Conclusion The proposed method is based on statistical tests supported by precise distribution theory and reliable P-values along with a technique for visualizing statistically significant islands. The performance of our method is better than many other well known methods in terms of their sensitivity and accuracy, and in terms of specificity, it is comparable to other methods.

  13. The detection of large deletions or duplications in genomic DNA.

    Science.gov (United States)

    Armour, J A L; Barton, D E; Cockburn, D J; Taylor, G R

    2002-11-01

    While methods for the detection of point mutations and small insertions or deletions in genomic DNA are well established, the detection of larger (>100 bp) genomic duplications or deletions can be more difficult. Most mutation scanning methods use PCR as a first step, but the subsequent analyses are usually qualitative rather than quantitative. Gene dosage methods based on PCR need to be quantitative (i.e., they should report molar quantities of starting material) or semi-quantitative (i.e., they should report gene dosage relative to an internal standard). Without some sort of quantitation, heterozygous deletions and duplications may be overlooked and therefore be under-ascertained. Gene dosage methods provide the additional benefit of reporting allele drop-out in the PCR. This could impact on SNP surveys, where large-scale genotyping may miss null alleles. Here we review recent developments in techniques for the detection of this type of mutation and compare their relative strengths and weaknesses. We emphasize that comprehensive mutation analysis should include scanning for large insertions and deletions and duplications. Copyright 2002 Wiley-Liss, Inc.

  14. Motif discovery in ranked lists of sequences

    DEFF Research Database (Denmark)

    Nielsen, Morten Muhlig; Tataru, Paula; Madsen, Tobias

    2016-01-01

    Motif analysis has long been an important method to characterize biological functionality and the current growth of sequencing-based genomics experiments further extends its potential. These diverse experiments often generate sequence lists ranked by some functional property. There is therefore...... advantage of the regular expression feature, including enrichments for combinations of different microRNA seed sites. The method is implemented and made publicly available as an R package and supports high parallelization on multi-core machinery....... a growing need for motif analysis methods that can exploit this coupled data structure and be tailored for specific biological questions. Here, we present an exploratory motif analysis tool, Regmex (REGular expression Motif EXplorer), which offers several methods to evaluate the correlation of motifs...

  15. CD8+ T cells with characteristic T cell receptor beta motif are detected in blood and expanded in synovial fluid of ankylosing spondylitis patients.

    Science.gov (United States)

    Komech, Ekaterina A; Pogorelyy, Mikhail V; Egorov, Evgeniy S; Britanova, Olga V; Rebrikov, Denis V; Bochkova, Anna G; Shmidt, Evgeniya I; Shostak, Nadejda A; Shugay, Mikhail; Lukyanov, Sergey; Mamedov, Ilgar Z; Lebedev, Yuriy B; Chudakov, Dmitriy M; Zvyagin, Ivan V

    2018-02-22

    The risk of AS is associated with genomic variants related to antigen presentation and specific cytokine signalling pathways, suggesting the involvement of cellular immunity in disease initiation/progression. The aim of the present study was to explore the repertoire of TCR sequences in healthy donors and AS patients to uncover AS-linked TCR variants. Using quantitative molecular-barcoded 5'-RACE, we performed deep TCR β repertoire profiling of peripheral blood (PB) and SF samples for 25 AS patients and 108 healthy donors. AS-linked TCR variants were identified using a new computational approach that relies on a probabilistic model of the VDJ rearrangement process. Using the donor-agnostic probabilistic model, we reveal a TCR β motif characteristic for PB of AS patients, represented by eight highly homologous amino acid sequence variants. Some of these variants were previously reported in SF and PB of patients with ReA and in PB of AS patients. We demonstrate that identified AS-linked clones have a CD8+ phenotype, present at relatively low frequencies in PB, and are significantly enriched in matched SF samples of AS patients. Our results suggest the involvement of a particular antigen-specific subset of CD8+ T cells in AS pathogenesis, confirming and expanding earlier findings. The high similarity of the clonotypes with the ones found in ReA implies common mechanisms for the initiation of the diseases.

  16. Detection of genomic rearrangements in cucumber using genomecmp software

    Science.gov (United States)

    Kulawik, Maciej; Pawełkowicz, Magdalena Ewa; Wojcieszek, Michał; PlÄ der, Wojciech; Nowak, Robert M.

    2017-08-01

    Comparative genomic by increasing information about the genomes sequences available in the databases is a rapidly evolving science. A simple comparison of the general features of genomes such as genome size, number of genes, and chromosome number presents an entry point into comparative genomic analysis. Here we present the utility of the new tool genomecmp for finding rearrangements across the compared sequences and applications in plant comparative genomics.

  17. MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants.

    Science.gov (United States)

    Elshazly, Hatem; Souilmi, Yassine; Tonellato, Peter J; Wall, Dennis P; Abouelhoda, Mohamed

    2017-01-20

    Next Generation Genome sequencing techniques became affordable for massive sequencing efforts devoted to clinical characterization of human diseases. However, the cost of providing cloud-based data analysis of the mounting datasets remains a concerning bottleneck for providing cost-effective clinical services. To address this computational problem, it is important to optimize the variant analysis workflow and the used analysis tools to reduce the overall computational processing time, and concomitantly reduce the processing cost. Furthermore, it is important to capitalize on the use of the recent development in the cloud computing market, which have witnessed more providers competing in terms of products and prices. In this paper, we present a new package called MC-GenomeKey (Multi-Cloud GenomeKey) that efficiently executes the variant analysis workflow for detecting and annotating mutations using cloud resources from different commercial cloud providers. Our package supports Amazon, Google, and Azure clouds, as well as, any other cloud platform based on OpenStack. Our package allows different scenarios of execution with different levels of sophistication, up to the one where a workflow can be executed using a cluster whose nodes come from different clouds. MC-GenomeKey also supports scenarios to exploit the spot instance model of Amazon in combination with the use of other cloud platforms to provide significant cost reduction. To the best of our knowledge, this is the first solution that optimizes the execution of the workflow using computational resources from different cloud providers. MC-GenomeKey provides an efficient multicloud based solution to detect and annotate mutations. The package can run in different commercial cloud platforms, which enables the user to seize the best offers. The package also provides a reliable means to make use of the low-cost spot instance model of Amazon, as it provides an efficient solution to the sudden termination of spot

  18. GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

    Science.gov (United States)

    Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

    2012-01-01

    Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/

  19. GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

    Directory of Open Access Journals (Sweden)

    Pooya Zandevakili

    Full Text Available Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/

  20. Genome-wide identification of basic helix-loop-helix and NF-1 motifs underlying GR binding sites in male rat hippocampus

    DEFF Research Database (Denmark)

    Pooley, John R.; Flynn, Ben P.; Grøntved, Lars

    2017-01-01

    linked to structural and organizational roles, an absence of major tethering partners for GRs, and little or no evidence for binding at negative glucocorticoid response elements. A basic helix-loop-helix motif closely resembling a NeuroD1 or Olig2 binding site was found underlying a subset of GR binding......Glucocorticoids regulate hippocampal function in part by modulating gene expression through the glucocorticoid receptor (GR). GR binding is highly cell type specific, directed to accessible chromatin regions established during tissue differentiation. Distinct classes of GR binding sites...

  1. Comparative analysis of the full genome sequence of European bat lyssavirus type 1 and type 2 with other lyssaviruses and evidence for a conserved transcription termination and polyadenylation motif in the G-L 3' non-translated region.

    Science.gov (United States)

    Marston, D A; McElhinney, L M; Johnson, N; Müller, T; Conzelmann, K K; Tordo, N; Fooks, A R

    2007-04-01

    We report the first full-length genomic sequences for European bat lyssavirus type-1 (EBLV-1) and type-2 (EBLV-2). The EBLV-1 genomic sequence was derived from a virus isolated from a serotine bat in Hamburg, Germany, in 1968 and the EBLV-2 sequence was derived from a virus isolate from a human case of rabies that occurred in Scotland in 2002. A long-distance PCR strategy was used to amplify the open reading frames (ORFs), followed by standard and modified RACE (rapid amplification of cDNA ends) techniques to amplify the 3' and 5' ends. The lengths of each complete viral genome for EBLV-1 and EBLV-2 were 11 966 and 11 930 base pairs, respectively, and follow the standard rhabdovirus genome organization of five viral proteins. Comparison with other lyssavirus sequences demonstrates variation in degrees of homology, with the genomic termini showing a high degree of complementarity. The nucleoprotein was the most conserved, both intra- and intergenotypically, followed by the polymerase (L), matrix and glyco- proteins, with the phosphoprotein being the most variable. In addition, we have shown that the two EBLVs utilize a conserved transcription termination and polyadenylation (TTP) motif, approximately 50 nt upstream of the L gene start codon. All available lyssavirus sequences to date, with the exception of Pasteur virus (PV) and PV-derived isolates, use the second TTP site. This observation may explain differences in pathogenicity between lyssavirus strains, dependent on the length of the untranslated region, which might affect transcriptional activity and RNA stability.

  2. Bayesian centroid estimation for motif discovery.

    Science.gov (United States)

    Carvalho, Luis

    2013-01-01

    Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  3. Bayesian centroid estimation for motif discovery.

    Directory of Open Access Journals (Sweden)

    Luis Carvalho

    Full Text Available Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  4. A BAC clone fingerprinting approach to the detection of human genome rearrangements

    Science.gov (United States)

    Krzywinski, Martin; Bosdet, Ian; Mathewson, Carrie; Wye, Natasja; Brebner, Jay; Chiu, Readman; Corbett, Richard; Field, Matthew; Lee, Darlene; Pugh, Trevor; Volik, Stas; Siddiqui, Asim; Jones, Steven; Schein, Jacquie; Collins, Collin; Marra, Marco

    2007-01-01

    We present a method, called fingerprint profiling (FPP), that uses restriction digest fingerprints of bacterial artificial chromosome clones to detect and classify rearrangements in the human genome. The approach uses alignment of experimental fingerprint patterns to in silico digests of the sequence assembly and is capable of detecting micro-deletions (1-5 kb) and balanced rearrangements. Our method has compelling potential for use as a whole-genome method for the identification and characterization of human genome rearrangements. PMID:17953769

  5. CompariMotif: quick and easy comparisons of sequence motifs.

    Science.gov (United States)

    Edwards, Richard J; Davey, Norman E; Shields, Denis C

    2008-05-15

    CompariMotif is a novel tool for making motif-motif comparisons, identifying and describing similarities between regular expression motifs. CompariMotif can identify a number of different relationships between motifs, including exact matches, variants of degenerate motifs and complex overlapping motifs. Motif relationships are scored using shared information content, allowing the best matches to be easily identified in large comparisons. Many input and search options are available, enabling a list of motifs to be compared to itself (to identify recurring motifs) or to datasets of known motifs. CompariMotif can be run online at http://bioware.ucd.ie/ and is freely available for academic use as a set of open source Python modules under a GNU General Public License from http://bioinformatics.ucd.ie/shields/software/comparimotif/

  6. Discovery and validation of information theory-based transcription factor and cofactor binding site motifs.

    Science.gov (United States)

    Lu, Ruipeng; Mucaki, Eliseos J; Rogan, Peter K

    2017-03-17

    Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The reliability and accuracy of these iPWMs were determined via four independent validation methods, including the detection of experimentally proven binding sites, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. We also predict previously unreported TF coregulatory interactions (e.g. TF complexes). These iPWMs constitute a powerful tool for predicting the effects of sequence variants in known binding sites, performing mutation analysis on regulatory SNPs and predicting previously unrecognized binding sites and target genes. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. [Cloning and sequence analysis of the DHBV genome of the brown ducks in Guilin region and establishment of the quantitative method for detecting DHBV].

    Science.gov (United States)

    Su, He-Ling; Huang, Ri-Dong; He, Song-Qing; Xu, Qing; Zhu, Hua; Mo, Zhi-Jing; Liu, Qing-Bo; Liu, Yong-Ming

    2013-03-01

    Brown ducks carrying DHBV were widely used as hepatitis B animal model in the research of the activity and toxicity of anti-HBV dugs. Studies showed that the ratio of DHBV carriers in the brown ducks in Guilin region was relatively high. Nevertheless, the characters of the DHBV genome of Guilin brown duck remain unknown. Here we report the cloning of the genome of Guilin brown duck DHBV and the sequence analysis of the genome. The full length of the DHBV genome of Guilin brown duck was 3 027bp. Analysis using ORF finder found that there was an ORF for an unknown peptide other than S-ORF, PORF and C-ORF in the genome of the DHBV. Vector NTI 8. 0 analysis revealed that the unknown peptide contained a motif which binded to HLA * 0201. Aligning with the DHBV sequences from different countries and regions indicated that there were no obvious differences of regional distribution among the sequences. A fluorescence quantitative PCR for detecting DHBV was establishment based on the recombinant plasmid pGEM-DHBV-S constructed. This study laid the groundwork for using Guilin brown duck as a hepatitis B animal model.

  8. Fingerprint motifs of phytases | Fan | African Journal of Biotechnology

    African Journals Online (AJOL)

    Among the total of potential 173 phytases gained in 11 plant genomes through MAST, PAPhys are the major phytases, and HAPhys are the minor, and other phytase groups are not found in planta. Keywords: Phytase, fingerprint motif, multiple EM for motif elicitation (MEME), MAST African Journal of Biotechnology Vol.

  9. LDsplit: screening for cis-regulatory motifs stimulating meiotic recombination hotspots by analysis of DNA sequence polymorphisms.

    Science.gov (United States)

    Yang, Peng; Wu, Min; Guo, Jing; Kwoh, Chee Keong; Przytycka, Teresa M; Zheng, Jie

    2014-02-17

    As a fundamental genomic element, meiotic recombination hotspot plays important roles in life sciences. Thus uncovering its regulatory mechanisms has broad impact on biomedical research. Despite the recent identification of the zinc finger protein PRDM9 and its 13-mer binding motif as major regulators for meiotic recombination hotspots, other regulators remain to be discovered. Existing methods for finding DNA sequence motifs of recombination hotspots often rely on the enrichment of co-localizations between hotspots and short DNA patterns, which ignore the cross-individual variation of recombination rates and sequence polymorphisms in the population. Our objective in this paper is to capture signals encoded in genetic variations for the discovery of recombination-associated DNA motifs. Recently, an algorithm called "LDsplit" has been designed to detect the association between single nucleotide polymorphisms (SNPs) and proximal meiotic recombination hotspots. The association is measured by the difference of population recombination rates at a hotspot between two alleles of a candidate SNP. Here we present an open source software tool of LDsplit, with integrative data visualization for recombination hotspots and their proximal SNPs. Applying LDsplit on SNPs inside an established 7-mer motif bound by PRDM9 we observed that SNP alleles preserving the original motif tend to have higher recombination rates than the opposite alleles that disrupt the motif. Running on SNP windows around hotspots each containing an occurrence of the 7-mer motif, LDsplit is able to guide the established motif finding algorithm of MEME to recover the 7-mer motif. In contrast, without LDsplit the 7-mer motif could not be identified. LDsplit is a software tool for the discovery of cis-regulatory DNA sequence motifs stimulating meiotic recombination hotspots by screening and narrowing down to hotspot associated SNPs. It is the first computational method that utilizes the genetic variation of

  10. Memetic algorithms for de novo motif-finding in biomedical sequences.

    Science.gov (United States)

    Bi, Chengpeng

    2012-09-01

    The objectives of this study are to design and implement a new memetic algorithm for de novo motif discovery, which is then applied to detect important signals hidden in various biomedical molecular sequences. In this paper, memetic algorithms are developed and tested in de novo motif-finding problems. Several strategies in the algorithm design are employed that are to not only efficiently explore the multiple sequence local alignment space, but also effectively uncover the molecular signals. As a result, there are a number of key features in the implementation of the memetic motif-finding algorithm (MaMotif), including a chromosome replacement operator, a chromosome alteration-aware local search operator, a truncated local search strategy, and a stochastic operation of local search imposed on individual learning. To test the new algorithm, we compare MaMotif with a few of other similar algorithms using simulated and experimental data including genomic DNA, primary microRNA sequences (let-7 family), and transmembrane protein sequences. The new memetic motif-finding algorithm is successfully implemented in C++, and exhaustively tested with various simulated and real biological sequences. In the simulation, it shows that MaMotif is the most time-efficient algorithm compared with others, that is, it runs 2 times faster than the expectation maximization (EM) method and 16 times faster than the genetic algorithm-based EM hybrid. In both simulated and experimental testing, results show that the new algorithm is compared favorably or superior to other algorithms. Notably, MaMotif is able to successfully discover the transcription factors' binding sites in the chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) data, correctly uncover the RNA splicing signals in gene expression, and precisely find the highly conserved helix motif in the transmembrane protein sequences, as well as rightly detect the palindromic segments in the primary micro

  11. PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes.

    Science.gov (United States)

    Paul, Sandip; Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V; Chattopadhyay, Sujay

    2015-12-01

    A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing the pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen - a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for a species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars - Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. PanCoreGen – profiling, detecting, annotating protein-coding genes in microbial genomes

    Science.gov (United States)

    Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V.

    2015-01-01

    A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen – a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars – Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. PMID:26456591

  13. Identity and functions of CxxC-derived motifs.

    Science.gov (United States)

    Fomenko, Dmitri E; Gladyshev, Vadim N

    2003-09-30

    Two cysteines separated by two other residues (the CxxC motif) are employed by many redox proteins for formation, isomerization, and reduction of disulfide bonds and for other redox functions. The place of the C-terminal cysteine in this motif may be occupied by serine (the CxxS motif), modifying the functional repertoire of redox proteins. Here we found that the CxxC motif may also give rise to a motif, in which the C-terminal cysteine is replaced with threonine (the CxxT motif). Moreover, in contrast to a view that the N-terminal cysteine in the CxxC motif always serves as a nucleophilic attacking group, this residue could also be replaced with threonine (the TxxC motif), serine (the SxxC motif), or other residues. In each of these CxxC-derived motifs, the presence of a downstream alpha-helix was strongly favored. A search for conserved CxxC-derived motif/helix patterns in four complete genomes representing bacteria, archaea, and eukaryotes identified known redox proteins and suggested possible redox functions for several additional proteins. Catalytic sites in peroxiredoxins were major representatives of the TxxC motif, whereas those in glutathione peroxidases represented the CxxT motif. Structural assessments indicated that threonines in these enzymes could stabilize catalytic thiolates, suggesting revisions to previously proposed catalytic triads. Each of the CxxC-derived motifs was also observed in natural selenium-containing proteins, in which selenocysteine was present in place of a catalytic cysteine.

  14. Practical Approaches for Detecting Selection in Microbial Genomes

    OpenAIRE

    Hedge, Jessica; Wilson, Daniel J.

    2016-01-01

    Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers th...

  15. DMINDA: an integrated web server for DNA motif identification and analyses.

    Science.gov (United States)

    Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

    2014-07-01

    DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Detection of genomic deletions in rice using oligonucleotide microarrays

    Directory of Open Access Journals (Sweden)

    Bordeos Alicia

    2009-03-01

    Full Text Available Abstract Background The induction of genomic deletions by physical- or chemical- agents is an easy and inexpensive means to generate a genome-saturating collection of mutations. Different mutagens can be selected to ensure a mutant collection with a range of deletion sizes. This would allow identification of mutations in single genes or, alternatively, a deleted group of genes that might collectively govern a trait (e.g., quantitative trait loci, QTL. However, deletion mutants have not been widely used in functional genomics, because the mutated genes are not tagged and therefore, difficult to identify. Here, we present a microarray-based approach to identify deleted genomic regions in rice mutants selected from a large collection generated by gamma ray or fast neutron treatment. Our study focuses not only on the utility of this method for forward genetics, but also its potential as a reverse genetics tool through accumulation of hybridization data for a collection of deletion mutants harboring multiple genetic lesions. Results We demonstrate that hybridization of labeled genomic DNA directly onto the Affymetrix Rice GeneChip® allows rapid localization of deleted regions in rice mutants. Deletions ranged in size from one gene model to ~500 kb and were predicted on all 12 rice chromosomes. The utility of the technique as a tool in forward genetics was demonstrated in combination with an allelic series of mutants to rapidly narrow the genomic region, and eventually identify a candidate gene responsible for a lesion mimic phenotype. Finally, the positions of mutations in 14 mutants were aligned onto the rice pseudomolecules in a user-friendly genome browser to allow for rapid identification of untagged mutations http://irfgc.irri.org/cgi-bin/gbrowse/IR64_deletion_mutants/. Conclusion We demonstrate the utility of oligonucleotide arrays to discover deleted genes in rice. The density and distribution of deletions suggests the feasibility of a

  17. Functional validation of candidate genes detected by genomic feature models

    DEFF Research Database (Denmark)

    Rohde, Palle Duun; Østergaard, Solveig; Kristensen, Torsten Nygaard

    2018-01-01

    to investigate locomotor activity, and applied genomic feature prediction models to identify gene ontology (GO) cate- gories predictive of this phenotype. Next, we applied the covariance association test to partition the genomic variance of the predictive GO terms to the genes within these terms. We...... then functionally assessed whether the identified candidate genes affected locomotor activity by reducing gene expression using RNA interference. In five of the seven candidate genes tested, reduced gene expression altered the phenotype. The ranking of genes within the predictive GO term was highly correlated......Understanding the genetic underpinnings of complex traits requires knowledge of the genetic variants that contribute to phenotypic variability. Reliable statistical approaches are needed to obtain such knowledge. In genome-wide association studies, variants are tested for association with trait...

  18. Genome-wide detection of selection and other evolutionary forces

    DEFF Research Database (Denmark)

    Xu, Zhuofei; Zhou, Rui

    2015-01-01

    As is well known, pathogenic microbes evolve rapidly to escape from the host immune system and antibiotics. Genetic variations among microbial populations occur frequently during the long-term pathogen–host evolutionary arms race, and individual mutation beneficial for the fitness can be fixed...... to scan genome-wide alignments for evidence of positive Darwinian selection, recombination, and other evolutionary forces operating on the coding regions. In this chapter, we describe an integrative analysis pipeline and its application to tracking featured evolutionary trajectories on the genome...

  19. Practical Approaches for Detecting Selection in Microbial Genomes.

    Directory of Open Access Journals (Sweden)

    Jessica Hedge

    2016-02-01

    Full Text Available Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers through the fundamentals underpinning popular methods for measuring selection in pathogens. These methods are transferable to a wide variety of organisms, and the exercises provided are designed for researchers with any level of programming experience.

  20. Practical Approaches for Detecting Selection in Microbial Genomes.

    Science.gov (United States)

    Hedge, Jessica; Wilson, Daniel J

    2016-02-01

    Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers through the fundamentals underpinning popular methods for measuring selection in pathogens. These methods are transferable to a wide variety of organisms, and the exercises provided are designed for researchers with any level of programming experience.

  1. Sparse representation and Bayesian detection of genome copy number alterations from microarray data.

    Science.gov (United States)

    Pique-Regi, Roger; Monso-Varona, Jordi; Ortega, Antonio; Seeger, Robert C; Triche, Timothy J; Asgharzadeh, Shahab

    2008-02-01

    Genomic instability in cancer leads to abnormal genome copy number alterations (CNA) that are associated with the development and behavior of tumors. Advances in microarray technology have allowed for greater resolution in detection of DNA copy number changes (amplifications or deletions) across the genome. However, the increase in number of measured signals and accompanying noise from the array probes present a challenge in accurate and fast identification of breakpoints that define CNA. This article proposes a novel detection technique that exploits the use of piece wise constant (PWC) vectors to represent genome copy number and sparse Bayesian learning (SBL) to detect CNA breakpoints. First, a compact linear algebra representation for the genome copy number is developed from normalized probe intensities. Second, SBL is applied and optimized to infer locations where copy number changes occur. Third, a backward elimination (BE) procedure is used to rank the inferred breakpoints; and a cut-off point can be efficiently adjusted in this procedure to control for the false discovery rate (FDR). The performance of our algorithm is evaluated using simulated and real genome datasets and compared to other existing techniques. Our approach achieves the highest accuracy and lowest FDR while improving computational speed by several orders of magnitude. The proposed algorithm has been developed into a free standing software application (GADA, Genome Alteration Detection Algorithm). http://biron.usc.edu/~piquereg/GADA

  2. A G-C-rich palindromic structural motif and a stretch of single-stranded purines are required for optimal packaging of Mason-Pfizer monkey virus (MPMV) genomic RNA.

    Science.gov (United States)

    Jaballah, Soumeya Ali; Aktar, Suriya J; Ali, Jahabar; Phillip, Pretty Susan; Al Dhaheri, Noura Salem; Jabeen, Aayesha; Rizvi, Tahir A

    2010-09-03

    During retroviral RNA packaging, two copies of genomic RNA are preferentially packaged into the budding virus particles whereas the spliced viral RNAs and the cellular RNAs are excluded during this process. Specificity towards retroviral RNA packaging is dependent upon sequences at the 5' end of the viral genome, which at times extend into Gag sequences. It has earlier been suggested that the Mason-Pfizer monkey virus (MPMV) contains packaging sequences within the 5' untranslated region (UTR) and Gag. These studies have also suggested that the packaging determinants of MPMV that lie in the UTR are bipartite and are divided into two regions both upstream and downstream of the major splice donor. However, the precise boundaries of these discontinuous regions within the UTR and the role of the intervening sequences between these dipartite sequences towards MPMV packaging have not been investigated. Employing a combination of genetic and structural prediction analyses, we have shown that region "A", immediately downstream of the primer binding site, is composed of 50 nt, whereas region "B" is composed of the last 23 nt of UTR, and the intervening 55 nt between these two discontinuous regions do not contribute towards MPMV RNA packaging. In addition, we have identified a 14-nt G-C-rich palindromic sequence (with 100% autocomplementarity) within region A that has been predicted to fold into a structural motif and is essential for optimal MPMV RNA packaging. Furthermore, we have also identified a stretch of single-stranded purines (ssPurines) within the UTR and 8 nt of these ssPurines are duplicated in region B. The native ssPurines or its repeat in region B when predicted to refold as ssPurines has been shown to be essential for RNA packaging, possibly functioning as a potential nucleocapsid binding site. Findings from this study should enhance our understanding of the steps involved in MPMV replication including RNA encapsidation process. Copyright (c) 2010 Elsevier Ltd

  3. Genome-Wide Detection and Analysis of Multifunctional Genes

    Science.gov (United States)

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-01-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  4. Quadruplexes in 'Dicty': crystal structure of a four-quartet G-quadruplex formed by G-rich motif found in the Dictyostelium discoideum genome.

    Science.gov (United States)

    Guédin, Aurore; Lin, Linda Yingqi; Armane, Samir; Lacroix, Laurent; Mergny, Jean-Louis; Thore, Stéphane; Yatsunyk, Liliya A

    2018-06-01

    Guanine-rich DNA has the potential to fold into non-canonical G-quadruplex (G4) structures. Analysis of the genome of the social amoeba Dictyostelium discoideum indicates a low number of sequences with G4-forming potential (249-1055). Therefore, D. discoideum is a perfect model organism to investigate the relationship between the presence of G4s and their biological functions. As a first step in this investigation, we crystallized the dGGGGGAGGGGTACAGGGGTACAGGGG sequence from the putative promoter region of two divergent genes in D. discoideum. According to the crystal structure, this sequence folds into a four-quartet intramolecular antiparallel G4 with two lateral and one diagonal loops. The G-quadruplex core is further stabilized by a G-C Watson-Crick base pair and a A-T-A triad and displays high thermal stability (Tm > 90°C at 100 mM KCl). Biophysical characterization of the native sequence and loop mutants suggests that the DNA adopts the same structure in solution and in crystalline form, and that loop interactions are important for the G4 stability but not for its folding. Four-tetrad G4 structures are sparse. Thus, our work advances understanding of the structural diversity of G-quadruplexes and yields coordinates for in silico drug screening programs and G4 predictive tools.

  5. MHC motif viewer

    DEFF Research Database (Denmark)

    Rapin, Nicolas Philippe Jean-Pierre; Hoof, Ilka; Lund, Ole

    2008-01-01

    . Algorithms that predict which peptides MHC molecules bind have recently been developed and cover many different alleles, but the utility of these algorithms is hampered by the lack of tools for browsing and comparing the specificity of these molecules. We have, therefore, developed a web server, MHC motif....... A special viewing feature, MHC fight, allows for display of the specificity of two different MHC molecules side by side. We show how the web server can be used to discover and display surprising similarities as well as differences between MHC molecules within and between different species. The MHC motif...

  6. Detection of alien genetic introgressions in bread wheat using dot-blot genomic hybridisation.

    Science.gov (United States)

    Rey, María-Dolores; Prieto, Pilar

    2017-01-01

    Simple, reliable methods for the identification of alien genetic introgressions are required in plant breeding programmes. The use of genomic dot-blot hybridisation allows the detection of small Hordeum chilense genomic introgressions in the descendants of genetic crosses between wheat and H. chilense addition or substitution lines in wheat when molecular markers are difficult to use. Based on genomic in situ hybridisation, DNA samples from wheat lines carrying putatively H. chilense introgressions were immobilised on a membrane, blocked with wheat genomic DNA and hybridised with biotin-labelled H. chilense genomic DNA as a probe. This dot-blot screening reduced the number of plants necessary to be analysed by molecular markers or in situ hybridisation, saving time and money. The technique was sensitive enough to detect a minimum of 5 ng of total genomic DNA immobilised on the membrane or about 1/420 dilution of H. chilense genomic DNA in the wheat background. The robustness of the technique was verified by in situ hybridisation. In addition, the detection of other wheat relative species such as Hordeum vulgare , Secale cereale and Agropyron cristatum in the wheat background was also reported .

  7. Across Breed QTL Detection and Genomic Prediction in French and Danish Dairy Cattle Breeds

    DEFF Research Database (Denmark)

    van den Berg, Irene; Guldbrandtsen, Bernt; Hozé, C

    Our objective was to investigate the potential benefits of using sequence data to improve across breed genomic prediction, using data from five French and Danish dairy cattle breeds. First, QTL for protein yield were detected using high density genotypes. Part of the QTL detected within breed was...

  8. [Personal motif in art].

    Science.gov (United States)

    Gerevich, József

    2015-01-01

    One of the basic questions of the art psychology is whether a personal motif is to be found behind works of art and if so, how openly or indirectly it appears in the work itself. Analysis of examples and documents from the fine arts and literature allow us to conclude that the personal motif that can be identified by the viewer through symbols, at times easily at others with more difficulty, gives an emotional plus to the artistic product. The personal motif may be found in traumatic experiences, in communication to the model or with other emotionally important persons (mourning, disappointment, revenge, hatred, rivalry, revolt etc.), in self-searching, or self-analysis. The emotions are expressed in artistic activity either directly or indirectly. The intention nourished by the artist's identity (Kunstwollen) may stand in the way of spontaneous self-expression, channelling it into hidden paths. Under the influence of certain circumstances, the artist may arouse in the viewer, consciously or unconsciously, an illusionary, misleading image of himself. An examination of the personal motif is one of the important research areas of art therapy.

  9. Functional validation of candidate genes detected by genomic feature models

    DEFF Research Database (Denmark)

    Rohde, Palle Duun; Østergaard, Solveig; Kristensen, Torsten Nygaard

    2018-01-01

    Understanding the genetic underpinnings of complex traits requires knowledge of the genetic variants that contribute to phenotypic variability. Reliable statistical approaches are needed to obtain such knowledge. In genome-wide association studies, variants are tested for association with trait...... then functionally assessed whether the identified candidate genes affected locomotor activity by reducing gene expression using RNA interference. In five of the seven candidate genes tested, reduced gene expression altered the phenotype. The ranking of genes within the predictive GO term was highly correlated...

  10. An HMM-based comparative genomic framework for detecting introgression in eukaryotes.

    Directory of Open Access Journals (Sweden)

    Kevin J Liu

    2014-06-01

    Full Text Available One outcome of interspecific hybridization and subsequent effects of evolutionary forces is introgression, which is the integration of genetic material from one species into the genome of an individual in another species. The evolution of several groups of eukaryotic species has involved hybridization, and cases of adaptation through introgression have been already established. In this work, we report on PhyloNet-HMM-a new comparative genomic framework for detecting introgression in genomes. PhyloNet-HMM combines phylogenetic networks with hidden Markov models (HMMs to simultaneously capture the (potentially reticulate evolutionary history of the genomes and dependencies within genomes. A novel aspect of our work is that it also accounts for incomplete lineage sorting and dependence across loci. Application of our model to variation data from chromosome 7 in the mouse (Mus musculus domesticus genome detected a recently reported adaptive introgression event involving the rodent poison resistance gene Vkorc1, in addition to other newly detected introgressed genomic regions. Based on our analysis, it is estimated that about 9% of all sites within chromosome 7 are of introgressive origin (these cover about 13 Mbp of chromosome 7, and over 300 genes. Further, our model detected no introgression in a negative control data set. We also found that our model accurately detected introgression and other evolutionary processes from synthetic data sets simulated under the coalescent model with recombination, isolation, and migration. Our work provides a powerful framework for systematic analysis of introgression while simultaneously accounting for dependence across sites, point mutations, recombination, and ancestral polymorphism.

  11. [The genome and its metaphors. Detectives, heroes or prophets?].

    Science.gov (United States)

    Davo, M C; Alvarez-Dardet, C

    2003-01-01

    The new genetics, or the impetus given to this discipline by the Genome Project, aims to a change of paradigm of the Health Sciences. This change is postulated from a phenotypic approach to a genotypic one, thereby excluding the influence of the environment, which could seriously undermine the grounds for the development and exercise of Public Health. Since the beginning of the genome project, information on genetic discoveries has frequently been reported in the mass media. Metaphors are often used by geneticists and journalists to convey the complex concepts of genetic research for which there are no equivalents in the lay language. The media do not merely shape the social agenda but also provide the space in which health culture is constructed. We present the results of a preliminary study exploring the metaphors used in the three most widely-read national daily newspapers in Spain, namely ABC, El Pais and El Mundo, when reporting news of the new genetics. The possible consequences of the natural history of these metaphors, or the process through which figurative terms acquire a literal meaning, are discussed. A preliminary taxonomy for the metaphors identified was developed. Fifty-one out of 342 identified headings (14.8%) contained metaphors. Strategic metaphors such as program, control, code, map, and puzzle, were the most commonly used, followed by teleological ones such as mystery or God language and finally war-like metaphors such as attack, defeat, and capture. The three groups of metaphors are characterized by an attempt to giving intentionality to genes. Strategic metaphors predominated over teleological and war-like ones and thus a technocratic perspective could form the basis of the future construction of health culture.

  12. CombiMotif: A new algorithm for network motifs discovery in protein-protein interaction networks

    Science.gov (United States)

    Luo, Jiawei; Li, Guanghui; Song, Dan; Liang, Cheng

    2014-12-01

    Discovering motifs in protein-protein interaction networks is becoming a current major challenge in computational biology, since the distribution of the number of network motifs can reveal significant systemic differences among species. However, this task can be computationally expensive because of the involvement of graph isomorphic detection. In this paper, we present a new algorithm (CombiMotif) that incorporates combinatorial techniques to count non-induced occurrences of subgraph topologies in the form of trees. The efficiency of our algorithm is demonstrated by comparing the obtained results with the current state-of-the art subgraph counting algorithms. We also show major differences between unicellular and multicellular organisms. The datasets and source code of CombiMotif are freely available upon request.

  13. Evolutionarily conserved bias of amino-acid usage refines the definition of PDZ-binding motif

    Directory of Open Access Journals (Sweden)

    Launey Thomas

    2011-06-01

    Full Text Available Abstract Background The interactions between PDZ (PSD-95, Dlg, ZO-1 domains and PDZ-binding motifs play central roles in signal transductions within cells. Proteins with PDZ domains bind to PDZ-binding motifs almost exclusively when the motifs are located at the carboxyl (C- terminal ends of their binding partners. However, it remains little explored whether PDZ-binding motifs show any preferential location at the C-terminal ends of proteins, at genome-level. Results Here, we examined the distribution of the type-I (x-x-S/T-x-I/L/V or type-II (x-x-V-x-I/V PDZ-binding motifs in proteins encoded in the genomes of five different species (human, mouse, zebrafish, fruit fly and nematode. We first established that these PDZ-binding motifs are indeed preferentially present at their C-terminal ends. Moreover, we found specific amino acid (AA bias for the 'x' positions in the motifs at the C-terminal ends. In general, hydrophilic AAs were favored. Our genomics-based findings confirm and largely extend the results of previous interaction-based studies, allowing us to propose refined consensus sequences for all of the examined PDZ-binding motifs. An ontological analysis revealed that the refined motifs are functionally relevant since a large fraction of the proteins bearing the motif appear to be involved in signal transduction. Furthermore, co-precipitation experiments confirmed two new protein interactions predicted by our genomics-based approach. Finally, we show that influenza virus pathogenicity can be correlated with PDZ-binding motif, with high-virulence viral proteins bearing a refined PDZ-binding motif. Conclusions Our refined definition of PDZ-binding motifs should provide important clues for identifying functional PDZ-binding motifs and proteins involved in signal transduction.

  14. DNA motif alignment by evolving a population of Markov chains.

    Science.gov (United States)

    Bi, Chengpeng

    2009-01-30

    Deciphering cis-regulatory elements or de novo motif-finding in genomes still remains elusive although much algorithmic effort has been expended. The Markov chain Monte Carlo (MCMC) method such as Gibbs motif samplers has been widely employed to solve the de novo motif-finding problem through sequence local alignment. Nonetheless, the MCMC-based motif samplers still suffer from local maxima like EM. Therefore, as a prerequisite for finding good local alignments, these motif algorithms are often independently run a multitude of times, but without information exchange between different chains. Hence it would be worth a new algorithm design enabling such information exchange. This paper presents a novel motif-finding algorithm by evolving a population of Markov chains with information exchange (PMC), each of which is initialized as a random alignment and run by the Metropolis-Hastings sampler (MHS). It is progressively updated through a series of local alignments stochastically sampled. Explicitly, the PMC motif algorithm performs stochastic sampling as specified by a population-based proposal distribution rather than individual ones, and adaptively evolves the population as a whole towards a global maximum. The alignment information exchange is accomplished by taking advantage of the pooled motif site distributions. A distinct method for running multiple independent Markov chains (IMC) without information exchange, or dubbed as the IMC motif algorithm, is also devised to compare with its PMC counterpart. Experimental studies demonstrate that the performance could be improved if pooled information were used to run a population of motif samplers. The new PMC algorithm was able to improve the convergence and outperformed other popular algorithms tested using simulated and biological motif sequences.

  15. Challenging a bioinformatic tool's ability to detect microbial contaminants using in silico whole genome sequencing data.

    Science.gov (United States)

    Olson, Nathan D; Zook, Justin M; Morrow, Jayne B; Lin, Nancy J

    2017-01-01

    High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR) are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures) are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS) is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus , Escherichia , and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods.

  16. Intra-strain polymorphisms are detected but no genomic alteration is found in cloned mice

    International Nuclear Information System (INIS)

    Gotoh, Koshichi; Inoue, Kimiko; Ogura, Atsuo; Oishi, Michio

    2006-01-01

    In-gel competitive reassociation (IGCR) is a method for differential subtraction of polymorphic (RFLP) DNA fragments between two DNA samples of interest without probes or specific sequence information. Here, we applied the IGCR procedure to two cloned mice derived from an F1 hybrid of the C57BL/6Cr and DBA/2 strains, in order to investigate the possibility of genomic alteration in the cloned mouse genomes. Each of the five of the genomic alterations we detected between the two cloned mice corresponded to the 'intra-strain' polymorphisms in the C57BL/6Cr and DBA/2 mouse strains. Our result suggests that no severe aberration of genome sequences occurs due to somatic cell nuclear transfer

  17. A method for accurate detection of genomic microdeletions using real-time quantitative PCR

    Directory of Open Access Journals (Sweden)

    Bassett Anne S

    2005-12-01

    Full Text Available Abstract Background Quantitative Polymerase Chain Reaction (qPCR is a well-established method for quantifying levels of gene expression, but has not been routinely applied to the detection of constitutional copy number alterations of human genomic DNA. Microdeletions or microduplications of the human genome are associated with a variety of genetic disorders. Although, clinical laboratories routinely use fluorescence in situ hybridization (FISH to identify such cryptic genomic alterations, there remains a significant number of individuals in which constitutional genomic imbalance is suspected, based on clinical parameters, but cannot be readily detected using current cytogenetic techniques. Results In this study, a novel application for real-time qPCR is presented that can be used to reproducibly detect chromosomal microdeletions and microduplications. This approach was applied to DNA from a series of patient samples and controls to validate genomic copy number alteration at cytoband 22q11. The study group comprised 12 patients with clinical symptoms of chromosome 22q11 deletion syndrome (22q11DS, 1 patient trisomic for 22q11 and 4 normal controls. 6 of the patients (group 1 had known hemizygous deletions, as detected by standard diagnostic FISH, whilst the remaining 6 patients (group 2 were classified as 22q11DS negative using the clinical FISH assay. Screening of the patients and controls with a set of 10 real time qPCR primers, spanning the 22q11.2-deleted region and flanking sequence, confirmed the FISH assay results for all patients with 100% concordance. Moreover, this qPCR enabled a refinement of the region of deletion at 22q11. Analysis of DNA from chromosome 22 trisomic sample demonstrated genomic duplication within 22q11. Conclusion In this paper we present a qPCR approach for the detection of chromosomal microdeletions and microduplications. The strategic use of in silico modelling for qPCR primer design to avoid regions of repetitive

  18. RAPD-based detection of genomic instability in cucumber plants ...

    African Journals Online (AJOL)

    STORAGESEVER

    2009-07-20

    Jul 20, 2009 ... 2School of biology and Environmental Sciences, Belfield, Dublin 4, ... detectable differences between the somatic embryo derived plants compared to their F1 parents in the. Random Amplified Polymorphic DNA (RAPD) test using five primers ... The mixture was allowed to thaw and ice-cold polyvinyl-.

  19. Optical Detection of Non-amplified Genomic DNA

    Science.gov (United States)

    Li, Di; Fan, Chunhai

    Nucleic acid sequences are unique to every living organisms including animals, plants and even bacteria and virus, which provide a practical molecular target for the identification and diagnosis of various diseases. DNA contains heterocyclic rings that has inherent optical absorbance at 260 nm, which is widely used to quantify single and double stranded DNA in biology. However, this simple quantification method could not differentiate sequences; therefore it is not suitable for sequence-specific analyte detection. In addition to a few exceptions such as chiral-related circular dichroism spectra, DNA hybridization does not produce significant changes in optical signals, thus an optical label is generally needed for sequence-specific DNA detection with optical means. During the last two decades, we have witnessed explosive progress in the area of optical DNA detection, especially with the help of simultaneously rapidly developed nanomaterials. In this chapter, we will summarize recent advances in optical DNA detection including colorimetric, fluorescent, luminescent, surface plasmon resonance (SPR) and Raman scattering assays. Challenges and problems remained to be addressed are also discussed.

  20. Non-Enzymatic Detection of Bacterial Genomic DNA Using the Bio-Barcode Assay

    Science.gov (United States)

    Hill, Haley D.; Vega, Rafael A.; Mirkin, Chad A.

    2011-01-01

    The detection of bacterial genomic DNA through a non-enzymatic nanomaterials based amplification method, the bio-barcode assay, is reported. The assay utilizes oligonucleotide functionalized magnetic microparticles to capture the target of interest from the sample. A critical step in the new assay involves the use of blocking oligonucleotides during heat denaturation of the double stranded DNA. These blockers bind to specific regions of the target DNA upon cooling, and prevent the duplex DNA from re-hybridizing, which allows the particle probes to bind. Following target isolation using the magnetic particles, oligonucleotide functionalized gold nanoparticles act as target recognition agents. The oligonucleotides on the nanoparticle (barcodes) act as amplification surrogates. The barcodes are then detected using the Scanometric method. The limit of detection for this assay was determined to be 2.5 femtomolar, and this is the first demonstration of a barcode type assay for the detection of double stranded, genomic DNA. PMID:17927207

  1. Primers for polymerase chain reaction to detect genomic DNA of Toxocara canis and T. cati.

    Science.gov (United States)

    Wu, Z; Nagano, I; Xu, D; Takahashi, Y

    1997-03-01

    Primers for polymerase chain reaction to amplify genomic DNA of both Toxocara canis and T. cati were constructed by adapting cloning and sequencing random amplified polymorphic DNA. The primers are expected to detect eggs and/or larvae of T. canis and T. cati, both of which are known to cause toxocariasis in humans.

  2. Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing

    DEFF Research Database (Denmark)

    Hou, Yong; Wu, Kui; Shi, Xulian

    2015-01-01

    methods, focusing particularly on variations detection. Low-coverage whole-genome sequencing revealed that DOP-PCR had the highest duplication ratio, but an even read distribution and the best reproducibility and accuracy for detection of copy-number variations (CNVs). However, MDA had significantly...... performance using SCRS amplified by different WGA methods. It will guide researchers to determine which WGA method is best suited to individual experimental needs at single-cell level....

  3. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  4. DNA motif elucidation using belief propagation

    KAUST Repository

    Wong, Ka-Chun; Chan, Tak-Ming; Peng, Chengbin; Li, Yue; Zhang, Zhaolei

    2013-01-01

    Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k = 8 ?10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors' websites: e.g. http://www.cs.toronto.edu/?wkc/kmerHMM. 2013 The Author(s).

  5. DNA motif elucidation using belief propagation.

    Science.gov (United States)

    Wong, Ka-Chun; Chan, Tak-Ming; Peng, Chengbin; Li, Yue; Zhang, Zhaolei

    2013-09-01

    Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k=8∼10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors' websites: e.g. http://www.cs.toronto.edu/∼wkc/kmerHMM.

  6. DNA motif elucidation using belief propagation

    KAUST Repository

    Wong, Ka-Chun

    2013-06-29

    Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k = 8 ?10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors\\' websites: e.g. http://www.cs.toronto.edu/?wkc/kmerHMM. 2013 The Author(s).

  7. Sensitive and reliable detection of genomic imbalances in human neuroblastomas using comparative genomic hybridisation analysis

    NARCIS (Netherlands)

    van Gele, M.; van Roy, N.; Jauch, A.; Laureys, G.; Benoit, Y.; Schelfhout, V.; de Potter, C. R.; Brock, P.; Uyttebroeck, A.; Sciot, R.; Schuuring, E.; Versteeg, R.; Speleman, F.

    1997-01-01

    Deletions of the short arm of chromosome 1, extra copies of chromosome 17q and MYCN amplification are the most frequently encountered genetic changes in neuroblastomas. Standard techniques for detection of one or more of these genetic changes are karyotyping, FISH analysis and LOH analysis by

  8. Whole genomic DNA probe for detection of Porphyromonas endodontalis.

    Science.gov (United States)

    Nissan, R; Makkar, S R; Sela, M N; Stevens, R

    2000-04-01

    The purpose of the present study was to develop a DNA probe for Porphyromonas endodontalis. Pure cultures of P. endodontalis were grown in TYP medium, in an anaerobic chamber. DNA was extracted from the P. endodontalis and labeled using the Genius System by Boehringer Mannheim. The labeled P. endodontalis DNA was used in dot-blot hybridization reactions with homologous (P. endodontalis) and unrelated bacterial samples. To determine specificity, strains of 40 other oral bacterial species (e.g. Porphyromonas gingivalis, Porphyromonas asaccharolytica, and Prevotella intermedia) were spotted and reacted with the P. endodontalis DNA probe. None of the panel of 40 oral bacteria hybridized with the P. endodontalis probe, whereas the blot of the homologous organism showed a strong positive reaction. To determine the sensitivity of the probe, dilutions of a P. endodontalis suspension of known concentration were blotted onto a nylon membrane and reacted with the probe. The results of our investigation indicate that the DNA probe that we have prepared specifically detects only P. endodontalis and can detect at least 3 x 10(4) cells.

  9. Detection of Alicyclobacillus species in fruit juice using a random genomic DNA microarray chip.

    Science.gov (United States)

    Jang, Jun Hyeong; Kim, Sun-Joong; Yoon, Bo Hyun; Ryu, Jee-Hoon; Gu, Man Bock; Chang, Hyo-Ihl

    2011-06-01

    This study describes a method using a DNA microarray chip to rapidly and simultaneously detect Alicyclobacillus species in orange juice based on the hybridization of genomic DNA with random probes. Three food spoilage bacteria were used in this study: Alicyclobacillus acidocaldarius, Alicyclobacillus acidoterrestris, and Alicyclobacillus cycloheptanicus. The three Alicyclobacillus species were adjusted to 2 × 10(3) CFU/ml and inoculated into pasteurized 100% pure orange juice. Cy5-dCTP labeling was used for reference signals, and Cy3-dCTP was labeled for target genomic DNA. The molar ratio of 1:1 of Cy3-dCTP and Cy5-dCTP was used. DNA microarray chips were fabricated using randomly fragmented DNA of Alicyclobacillus spp. and were hybridized with genomic DNA extracted from Bacillus spp. Genomic DNA extracted from Alicyclobacillus spp. showed a significantly higher hybridization rate compared with DNA of Bacillus spp., thereby distinguishing Alicyclobacillus spp. from Bacillus spp. The results showed that the microarray DNA chip containing randomly fragmented genomic DNA was specific and clearly identified specific food spoilage bacteria. This microarray system is a good tool for rapid and specific detection of thermophilic spoilage bacteria, mainly Alicyclobacillus spp., and is useful and applicable to the fruit juice industry.

  10. Detection of extracellular genomic DNA scaffold in human thrombus

    DEFF Research Database (Denmark)

    Oklu, Rahmi; Albadawi, Hassan; Watkins, Michael T

    2012-01-01

    into thrombus remodeling. MATERIALS AND METHODS: Ten human thrombus samples were collected during cases of thrombectomy and open surgical repair of abdominal aortic aneurysms (five samples 1 y old). Additionally, an acute murine hindlimb ischemia model was created to evaluate...... thrombus samples in mice. Human sections were immunostained for the H2A/H2B/DNA complex, myeloperoxidase, fibrinogen, and von Willebrand factor. Mouse sections were immunostained with the H2A antibody. All samples were further evaluated after hematoxylin and eosin and Masson trichrome staining. RESULTS......: An extensive network of extracellular histone/DNA complex was demonstrated in the matrix of human ex vivo thrombus. This network is present throughout the highly cellular acute thrombus. However, in chronic thrombi, detection of the histone/DNA network was predominantly in regions of low collagen content...

  11. Motif-role-fingerprints: the building-blocks of motifs, clustering-coefficients and transitivities in directed networks.

    Directory of Open Access Journals (Sweden)

    Mark D McDonnell

    Full Text Available Complex networks are frequently characterized by metrics for which particular subgraphs are counted. One statistic from this category, which we refer to as motif-role fingerprints, differs from global subgraph counts in that the number of subgraphs in which each node participates is counted. As with global subgraph counts, it can be important to distinguish between motif-role fingerprints that are 'structural' (induced subgraphs and 'functional' (partial subgraphs. Here we show mathematically that a vector of all functional motif-role fingerprints can readily be obtained from an arbitrary directed adjacency matrix, and then converted to structural motif-role fingerprints by multiplying that vector by a specific invertible conversion matrix. This result demonstrates that a unique structural motif-role fingerprint exists for any given functional motif-role fingerprint. We demonstrate a similar result for the cases of functional and structural motif-fingerprints without node roles, and global subgraph counts that form the basis of standard motif analysis. We also explicitly highlight that motif-role fingerprints are elemental to several popular metrics for quantifying the subgraph structure of directed complex networks, including motif distributions, directed clustering coefficient, and transitivity. The relationships between each of these metrics and motif-role fingerprints also suggest new subtypes of directed clustering coefficients and transitivities. Our results have potential utility in analyzing directed synaptic networks constructed from neuronal connectome data, such as in terms of centrality. Other potential applications include anomaly detection in networks, identification of similar networks and identification of similar nodes within networks. Matlab code for calculating all stated metrics following calculation of functional motif-role fingerprints is provided as S1 Matlab File.

  12. Genome-Wide SNP Detection, Validation, and Development of an 8K SNP Array for Apple

    Science.gov (United States)

    Chagné, David; Crowhurst, Ross N.; Troggio, Michela; Davey, Mark W.; Gilmore, Barbara; Lawley, Cindy; Vanderzande, Stijn; Hellens, Roger P.; Kumar, Satish; Cestaro, Alessandro; Velasco, Riccardo; Main, Dorrie; Rees, Jasper D.; Iezzoni, Amy; Mockler, Todd; Wilhelm, Larry; Van de Weg, Eric; Gardiner, Susan E.; Bassil, Nahla; Peace, Cameron

    2012-01-01

    As high-throughput genetic marker screening systems are essential for a range of genetics studies and plant breeding applications, the International RosBREED SNP Consortium (IRSC) has utilized the Illumina Infinium® II system to develop a medium- to high-throughput SNP screening tool for genome-wide evaluation of allelic variation in apple (Malus×domestica) breeding germplasm. For genome-wide SNP discovery, 27 apple cultivars were chosen to represent worldwide breeding germplasm and re-sequenced at low coverage with the Illumina Genome Analyzer II. Following alignment of these sequences to the whole genome sequence of ‘Golden Delicious’, SNPs were identified using SoapSNP. A total of 2,113,120 SNPs were detected, corresponding to one SNP to every 288 bp of the genome. The Illumina GoldenGate® assay was then used to validate a subset of 144 SNPs with a range of characteristics, using a set of 160 apple accessions. This validation assay enabled fine-tuning of the final subset of SNPs for the Illumina Infinium® II system. The set of stringent filtering criteria developed allowed choice of a set of SNPs that not only exhibited an even distribution across the apple genome and a range of minor allele frequencies to ensure utility across germplasm, but also were located in putative exonic regions to maximize genotyping success rate. A total of 7867 apple SNPs was established for the IRSC apple 8K SNP array v1, of which 5554 were polymorphic after evaluation in segregating families and a germplasm collection. This publicly available genomics resource will provide an unprecedented resolution of SNP haplotypes, which will enable marker-locus-trait association discovery, description of the genetic architecture of quantitative traits, investigation of genetic variation (neutral and functional), and genomic selection in apple. PMID:22363718

  13. Genome-wide SNP detection, validation, and development of an 8K SNP array for apple.

    Directory of Open Access Journals (Sweden)

    David Chagné

    Full Text Available As high-throughput genetic marker screening systems are essential for a range of genetics studies and plant breeding applications, the International RosBREED SNP Consortium (IRSC has utilized the Illumina Infinium® II system to develop a medium- to high-throughput SNP screening tool for genome-wide evaluation of allelic variation in apple (Malus×domestica breeding germplasm. For genome-wide SNP discovery, 27 apple cultivars were chosen to represent worldwide breeding germplasm and re-sequenced at low coverage with the Illumina Genome Analyzer II. Following alignment of these sequences to the whole genome sequence of 'Golden Delicious', SNPs were identified using SoapSNP. A total of 2,113,120 SNPs were detected, corresponding to one SNP to every 288 bp of the genome. The Illumina GoldenGate® assay was then used to validate a subset of 144 SNPs with a range of characteristics, using a set of 160 apple accessions. This validation assay enabled fine-tuning of the final subset of SNPs for the Illumina Infinium® II system. The set of stringent filtering criteria developed allowed choice of a set of SNPs that not only exhibited an even distribution across the apple genome and a range of minor allele frequencies to ensure utility across germplasm, but also were located in putative exonic regions to maximize genotyping success rate. A total of 7867 apple SNPs was established for the IRSC apple 8K SNP array v1, of which 5554 were polymorphic after evaluation in segregating families and a germplasm collection. This publicly available genomics resource will provide an unprecedented resolution of SNP haplotypes, which will enable marker-locus-trait association discovery, description of the genetic architecture of quantitative traits, investigation of genetic variation (neutral and functional, and genomic selection in apple.

  14. RNA motif search with data-driven element ordering.

    Science.gov (United States)

    Rampášek, Ladislav; Jimenez, Randi M; Lupták, Andrej; Vinař, Tomáš; Brejová, Broňa

    2016-05-18

    In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo .

  15. A Perfect Match Genomic Landscape Provides a Unified Framework for the Precise Detection of Variation in Natural and Synthetic Haploid Genomes.

    Science.gov (United States)

    Palacios-Flores, Kim; García-Sotelo, Jair; Castillo, Alejandra; Uribe, Carina; Aguilar, Luis; Morales, Lucía; Gómez-Romero, Laura; Reyes, José; Garciarubio, Alejandro; Boege, Margareta; Dávila, Guillermo

    2018-04-01

    We present a conceptually simple, sensitive, precise, and essentially nonstatistical solution for the analysis of genome variation in haploid organisms. The generation of a Perfect Match Genomic Landscape (PMGL), which computes intergenome identity with single nucleotide resolution, reveals signatures of variation wherever a query genome differs from a reference genome. Such signatures encode the precise location of different types of variants, including single nucleotide variants, deletions, insertions, and amplifications, effectively introducing the concept of a general signature of variation. The precise nature of variants is then resolved through the generation of targeted alignments between specific sets of sequence reads and known regions of the reference genome. Thus, the perfect match logic decouples the identification of the location of variants from the characterization of their nature, providing a unified framework for the detection of genome variation. We assessed the performance of the PMGL strategy via simulation experiments. We determined the variation profiles of natural genomes and of a synthetic chromosome, both in the context of haploid yeast strains. Our approach uncovered variants that have previously escaped detection. Moreover, our strategy is ideally suited for further refining high-quality reference genomes. The source codes for the automated PMGL pipeline have been deposited in a public repository. Copyright © 2018 by the Genetics Society of America.

  16. Whole-genome resequencing of honeybee drones to detect genomic selection in a population managed for royal jelly.

    Science.gov (United States)

    Wragg, David; Marti-Marimon, Maria; Basso, Benjamin; Bidanel, Jean-Pierre; Labarthe, Emmanuelle; Bouchez, Olivier; Le Conte, Yves; Vignal, Alain

    2016-06-03

    Four main evolutionary lineages of A. mellifera have been described including eastern Europe (C) and western and northern Europe (M). Many apiculturists prefer bees from the C lineage due to their docility and high productivity. In France, the routine importation of bees from the C lineage has resulted in the widespread admixture of bees from the M lineage. The haplodiploid nature of the honeybee Apis mellifera, and its small genome size, permits affordable and extensive genomics studies. As a pilot study of a larger project to characterise French honeybee populations, we sequenced 60 drones sampled from two commercial populations managed for the production of honey and royal jelly. Results indicate a C lineage origin, whilst mitochondrial analysis suggests two drones originated from the O lineage. Analysis of heterozygous SNPs identified potential copy number variants near to genes encoding odorant binding proteins and several cytochrome P450 genes. Signatures of selection were detected using the hapFLK haplotype-based method, revealing several regions under putative selection for royal jelly production. The framework developed during this study will be applied to a broader sampling regime, allowing the genetic diversity of French honeybees to be characterised in detail.

  17. Genomes

    National Research Council Canada - National Science Library

    Brown, T. A. (Terence A.)

    2002-01-01

    ... of genome expression and replication processes, and transcriptomics and proteomics. This text is richly illustrated with clear, easy-to-follow, full color diagrams, which are downloadable from the book's website...

  18. Detection of Cytosolic Shigella flexneri via a C-Terminal Triple-Arginine Motif of GBP1 Inhibits Actin-Based Motility

    Directory of Open Access Journals (Sweden)

    Anthony S. Piro

    2017-12-01

    Full Text Available Dynamin-like guanylate binding proteins (GBPs are gamma interferon (IFN-γ-inducible host defense proteins that can associate with cytosol-invading bacterial pathogens. Mouse GBPs promote the lytic destruction of targeted bacteria in the host cell cytosol, but the antimicrobial function of human GBPs and the mechanism by which these proteins associate with cytosolic bacteria are poorly understood. Here, we demonstrate that human GBP1 is unique among the seven human GBP paralogs in its ability to associate with at least two cytosolic Gram-negative bacteria, Burkholderia thailandensis and Shigella flexneri. Rough lipopolysaccharide (LPS mutants of S. flexneri colocalize with GBP1 less frequently than wild-type S. flexneri does, suggesting that host recognition of O antigen promotes GBP1 targeting to Gram-negative bacteria. The targeting of GBP1 to cytosolic bacteria, via a unique triple-arginine motif present in its C terminus, promotes the corecruitment of four additional GBP paralogs (GBP2, GBP3, GBP4, and GBP6. GBP1-decorated Shigella organisms replicate but fail to form actin tails, leading to their intracellular aggregation. Consequentially, the wild type but not the triple-arginine GBP1 mutant restricts S. flexneri cell-to-cell spread. Furthermore, human-adapted S. flexneri, through the action of one its secreted effectors, IpaH9.8, is more resistant to GBP1 targeting than the non-human-adapted bacillus B. thailandensis. These studies reveal that human GBP1 uniquely functions as an intracellular “glue trap,” inhibiting the cytosolic movement of normally actin-propelled Gram-negative bacteria. In response to this powerful human defense program, S. flexneri has evolved an effective counterdefense to restrict GBP1 recruitment.

  19. Pooled genome wide association detects association upstream of FCRL3 with Graves' disease.

    Science.gov (United States)

    Khong, Jwu Jin; Burdon, Kathryn P; Lu, Yi; Laurie, Kate; Leonardos, Lefta; Baird, Paul N; Sahebjada, Srujana; Walsh, John P; Gajdatsy, Adam; Ebeling, Peter R; Hamblin, Peter Shane; Wong, Rosemary; Forehan, Simon P; Fourlanos, Spiros; Roberts, Anthony P; Doogue, Matthew; Selva, Dinesh; Montgomery, Grant W; Macgregor, Stuart; Craig, Jamie E

    2016-11-18

    Graves' disease is an autoimmune thyroid disease of complex inheritance. Multiple genetic susceptibility loci are thought to be involved in Graves' disease and it is therefore likely that these can be identified by genome wide association studies. This study aimed to determine if a genome wide association study, using a pooling methodology, could detect genomic loci associated with Graves' disease. Nineteen of the top ranking single nucleotide polymorphisms including HLA-DQA1 and C6orf10, were clustered within the Major Histo-compatibility Complex region on chromosome 6p21, with rs1613056 reaching genome wide significance (p = 5 × 10 -8 ). Technical validation of top ranking non-Major Histo-compatablity complex single nucleotide polymorphisms with individual genotyping in the discovery cohort revealed four single nucleotide polymorphisms with p ≤ 10 -4 . Rs17676303 on chromosome 1q23.1, located upstream of FCRL3, showed evidence of association with Graves' disease across the discovery, replication and combined cohorts. A second single nucleotide polymorphism rs9644119 downstream of DPYSL2 showed some evidence of association supported by finding in the replication cohort that warrants further study. Pooled genome wide association study identified a genetic variant upstream of FCRL3 as a susceptibility locus for Graves' disease in addition to those identified in the Major Histo-compatibility Complex. A second locus downstream of DPYSL2 is potentially a novel genetic variant in Graves' disease that requires further confirmation.

  20. Direct detection of chicken genomic DNA for gender determination by thymine-DNA glycosylase.

    Science.gov (United States)

    Porat, N; Bogdanov, K; Danielli, A; Arie, A; Samina, I; Hadani, A

    2011-02-01

    1. Birds, especially nestlings, are generally difficult to sex by morphology and early detection of chick gender in ovo in the hatchery would facilitate removal of unwanted chicks and diminish welfare objections regarding culling after hatch. 2. We describe a method to determine chicken gender without the need for PCR via use of Thymine-DNA Glycosylase (TDG). TDG restores thymine (T)/guanine (G) mismatches to cytosine (C)/G. We show here, that like DNA Polymerase, TDG can recognise, bind and function on a primer hybridised to chicken genomic DNA. 3. The primer contained a T to mismatch a G in a chicken genomic template and the T/G was cleaved with high fidelity by TDG. Thus, the chicken genomic DNA can be identified without PCR amplification via direct and linear detection. Sensitivity was increased using gender specific sequences from the chicken genome. 4. Currently, these are laboratory results, but we anticipate that further development will allow this method to be used in non-laboratory settings, where PCR cannot be employed.

  1. A speedup technique for (l, d-motif finding algorithms

    Directory of Open Access Journals (Sweden)

    Dinh Hieu

    2011-03-01

    Full Text Available Abstract Background The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem. Results Three versions of the motif search problem have been proposed in the literature: Simple Motif Search (SMS, (l, d-motif search (or Planted Motif Search (PMS, and Edit-distance-based Motif Search (EMS. In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate. An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. Exact algorithms proposed in the literature for PMS take time that is exponential in some of the underlying parameters. In this paper we propose a generic technique that can be used to speedup PMS algorithms. Conclusions We present a speedup technique that can be used on any PMS algorithm. We have tested our speedup technique on a number of algorithms. These experimental results show that our speedup technique is indeed very

  2. SOAPsplice: genome-wide ab initio detection of splice junctions from RNA-Seq data

    Directory of Open Access Journals (Sweden)

    Songbo eHuang

    2011-07-01

    Full Text Available RNA-Seq, a method using next generation sequencing technologies to sequence the transcriptome, facilitates genome-wide analysis of splice junction sites. In this paper, we introduce SOAPsplice, a robust tool to detect splice junctions using RNA-Seq data without using any information of known splice junctions. SOAPsplice uses a novel two-step approach consisting of first identifying as many reasonable splice junction candidates as possible, and then, filtering the false positives with two effective filtering strategies. In both simulated and real datasets, SOAPsplice is able to detect many reliable splice junctions with low false positive rate. The improvement gained by SOAPsplice, when compared to other existing tools, becomes more obvious when the depth of sequencing is low. SOAPsplice is freely available at http://soap.genomics.org.cn/soapsplice.html.

  3. A direct detection of Escherichia coli genomic DNA using gold nanoprobes

    Directory of Open Access Journals (Sweden)

    Padmavathy

    2012-02-01

    Full Text Available Abstract Background In situation like diagnosis of clinical and forensic samples there exists a need for highly sensitive, rapid and specific DNA detection methods. Though conventional DNA amplification using PCR can provide fast results, it is not widely practised in diagnostic laboratories partially because it requires skilled personnel and expensive equipment. To overcome these limitations nanoparticles have been explored as signalling probes for ultrasensitive DNA detection that can be used in field applications. Among the nanomaterials, gold nanoparticles (AuNPs have been extensively used mainly because of its optical property and ability to get functionalized with a variety of biomolecules. Results We report a protocol for the use of gold nanoparticles functionalized with single stranded oligonucleotide (AuNP- oligo probe as visual detection probes for rapid and specific detection of Escherichia coli. The AuNP- oligo probe on hybridization with target DNA containing complementary sequences remains red whereas test samples without complementary DNA sequences to the probe turns purple due to acid induced aggregation of AuNP- oligo probes. The color change of the solution is observed visually by naked eye demonstrating direct and rapid detection of the pathogenic Escherichia coli from its genomic DNA without the need for PCR amplification. The limit of detection was ~54 ng for unamplified genomic DNA. The method requires less than 30 minutes to complete after genomic DNA extraction. However, by using unamplified enzymatic digested genomic DNA, the detection limit of 11.4 ng was attained. Results of UV-Vis spectroscopic measurement and AFM imaging further support the hypothesis of aggregation based visual discrimination. To elucidate its utility in medical diagnostic, the assay was validated on clinical strains of pathogenic Escherichia coli obtained from local hospitals and spiked urine samples. It was found to be 100% sensitive and proves to

  4. Genome rearrangements detected by SNP microarrays in individuals with intellectual disability referred with possible Williams syndrome.

    Directory of Open Access Journals (Sweden)

    Ariel M Pani

    2010-08-01

    Full Text Available Intellectual disability (ID affects 2-3% of the population and may occur with or without multiple congenital anomalies (MCA or other medical conditions. Established genetic syndromes and visible chromosome abnormalities account for a substantial percentage of ID diagnoses, although for approximately 50% the molecular etiology is unknown. Individuals with features suggestive of various syndromes but lacking their associated genetic anomalies pose a formidable clinical challenge. With the advent of microarray techniques, submicroscopic genome alterations not associated with known syndromes are emerging as a significant cause of ID and MCA.High-density SNP microarrays were used to determine genome wide copy number in 42 individuals: 7 with confirmed alterations in the WS region but atypical clinical phenotypes, 31 with ID and/or MCA, and 4 controls. One individual from the first group had the most telomeric gene in the WS critical region deleted along with 2 Mb of flanking sequence. A second person had the classic WS deletion and a rearrangement on chromosome 5p within the Cri du Chat syndrome (OMIM:123450 region. Six individuals from the ID/MCA group had large rearrangements (3 deletions, 3 duplications, one of whom had a large inversion associated with a deletion that was not detected by the SNP arrays.Combining SNP microarray analyses and qPCR allowed us to clone and sequence 21 deletion breakpoints in individuals with atypical deletions in the WS region and/or ID or MCA. Comparison of these breakpoints to databases of genomic variation revealed that 52% occurred in regions harboring structural variants in the general population. For two probands the genomic alterations were flanked by segmental duplications, which frequently mediate recurrent genome rearrangements; these may represent new genomic disorders. While SNP arrays and related technologies can identify potentially pathogenic deletions and duplications, obtaining sequence information

  5. SV2: accurate structural variation genotyping and de novo mutation detection from whole genomes.

    Science.gov (United States)

    Antaki, Danny; Brandler, William M; Sebat, Jonathan

    2018-05-15

    Structural variation (SV) detection from short-read whole genome sequencing is error prone, presenting significant challenges for population or family-based studies of disease. Here, we describe SV2, a machine-learning algorithm for genotyping deletions and duplications from paired-end sequencing data. SV2 can rapidly integrate variant calls from multiple structural variant discovery algorithms into a unified call set with high genotyping accuracy and capability to detect de novo mutations. SV2 is freely available on GitHub (https://github.com/dantaki/SV2). jsebat@ucsd.edu. Supplementary data are available at Bioinformatics online.

  6. Identification of coupling DNA motif pairs on long-range chromatin interactions in human K562 cells

    KAUST Repository

    Wong, Ka-Chun; Li, Yue; Peng, Chengbin

    2015-01-01

    Motivation: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. Results: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. © The Author 2015. Published by Oxford University Press.

  7. Identification of coupling DNA motif pairs on long-range chromatin interactions in human K562 cells

    KAUST Repository

    Wong, Ka-Chun

    2015-09-27

    Motivation: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. Results: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. © The Author 2015. Published by Oxford University Press.

  8. Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sergey I Nikolaev

    Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.

  9. PISMA: A Visual Representation of Motif Distribution in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Rogelio Alcántara-Silva

    2017-03-01

    Full Text Available Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf .

  10. Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples

    Directory of Open Access Journals (Sweden)

    Maley Carlo C

    2008-10-01

    Full Text Available Abstract Background Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. Results We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12 genomes. Virtually all possible (> 98% 12 bp oligomers appear in vertebrate genomes while 98% to D. melanogaster (12–17 bp, C. elegans (11–17 bp, A. thaliana (11–17 bp, S. cerevisiae (10–16 bp and E. coli (9–15 bp. Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. Conclusion Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect

  11. Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples

    Science.gov (United States)

    Liu, Zhandong; Venkatesh, Santosh S; Maley, Carlo C

    2008-01-01

    Background Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. Results We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12) genomes. Virtually all possible (> 98%) 12 bp oligomers appear in vertebrate genomes while 98% to < 2% of possible oligomers in D. melanogaster (12–17 bp), C. elegans (11–17 bp), A. thaliana (11–17 bp), S. cerevisiae (10–16 bp) and E. coli (9–15 bp). Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. Conclusion Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to

  12. Codon based co-occurrence network motifs in human mitochondria

    Directory of Open Access Journals (Sweden)

    Pramod Shinde

    2017-10-01

    Full Text Available The nucleotide polymorphism in human mitochondrial genome (mtDNA tolled by codon position bias plays an indispensable role in human population dispersion and expansion. Herein, we constructed genome-wide nucleotide co-occurrence networks using a massive data consisting of five different geographical regions and around 3000 samples for each region. We developed a powerful network model to describe complex mitochondrial evolutionary patterns between codon and non-codon positions. It was interesting to report a different evolution of Asian genomes than those of the rest which is divulged by network motifs. We found evidence that mtDNA undergoes substantial amounts of adaptive evolution, a finding which was supported by a number of previous studies. The dominance of higher order motifs indicated the importance of long-range nucleotide co-occurrence in genomic diversity. Most notably, codon motifs apparently underpinned the preferences among codon positions for co-evolution which is probably highly biased during the origin of the genetic code. Our analyses manifested that codon position co-evolution is very well conserved across human sub-populations and independently maintained within human sub-populations implying the selective role of evolutionary processes on codon position co-evolution. Ergo, this study provided a framework to investigate cooperative genomic interactions which are critical in underlying complex mitochondrial evolution.

  13. MotifMark: Finding regulatory motifs in DNA sequences.

    Science.gov (United States)

    Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L; Wang, May D

    2017-07-01

    The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.

  14. Poor man’s 1000 genome project: Recent human population expansion confounds the detection of disease alleles in 7,098 complete mitochondrial genomes

    Directory of Open Access Journals (Sweden)

    Hie Lim eKim

    2013-02-01

    Full Text Available Rapid growth of the human population has caused the accumulation of rare genetic variants that may play a role in the origin of genetic diseases. However, it is challenging to identify those rare variants responsible for specific diseases without genetic data from an extraordinarily large population sample. Here we focused on the accumulated data from the human mitochondrial (mt genome sequences because this data provided 7,098 whole genomes for analysis. In this dataset we identified 6,110 single nucleotide variants (SNVs and their frequency and determined that the best-fit demographic model for the 7,098 genomes included severe population bottlenecks and exponential expansions of the non-African population. Using this model, we simulated the evolution of mt genomes in order to ascertain the behavior of deleterious mutations. We found that such deleterious mutations barely survived during population expansion. We derived the threshold frequency of a deleterious mutation in separate African, Asian, and European populations and used it to identify pathogenic mutations in our dataset. Although threshold frequency was very low, the proportion of variants showing a lower frequency than that threshold was 82%, 83%, and 91% of the total variants for the African, Asian, and European populations, respectively. Within these variants, only 18 known pathogenic mutations were detected in the 7,098 genomes. This result showed the difficulty of detecting a pathogenic mutation within an abundance of rare variants in the human population, even with a large number of genomes available for study.

  15. Detecting DNA double-stranded breaks in mammalian genomes by linear amplification-mediated high-throughput genome-wide translocation sequencing.

    Science.gov (United States)

    Hu, Jiazhi; Meyers, Robin M; Dong, Junchao; Panchakshari, Rohit A; Alt, Frederick W; Frock, Richard L

    2016-05-01

    Unbiased, high-throughput assays for detecting and quantifying DNA double-stranded breaks (DSBs) across the genome in mammalian cells will facilitate basic studies of the mechanisms that generate and repair endogenous DSBs. They will also enable more applied studies, such as those to evaluate the on- and off-target activities of engineered nucleases. Here we describe a linear amplification-mediated high-throughput genome-wide sequencing (LAM-HTGTS) method for the detection of genome-wide 'prey' DSBs via their translocation in cultured mammalian cells to a fixed 'bait' DSB. Bait-prey junctions are cloned directly from isolated genomic DNA using LAM-PCR and unidirectionally ligated to bridge adapters; subsequent PCR steps amplify the single-stranded DNA junction library in preparation for Illumina Miseq paired-end sequencing. A custom bioinformatics pipeline identifies prey sequences that contribute to junctions and maps them across the genome. LAM-HTGTS differs from related approaches because it detects a wide range of broken end structures with nucleotide-level resolution. Familiarity with nucleic acid methods and next-generation sequencing analysis is necessary for library generation and data interpretation. LAM-HTGTS assays are sensitive, reproducible, relatively inexpensive, scalable and straightforward to implement with a turnaround time of <1 week.

  16. Specific and selective target detection of supra-genome 21 Mers Salmonella via silicon nanowires biosensor

    Science.gov (United States)

    Mustafa, Mohammad Razif Bin; Dhahi, Th S.; Ehfaed, Nuri. A. K. H.; Adam, Tijjani; Hashim, U.; Azizah, N.; Mohammed, Mohammed; Noriman, N. Z.

    2017-09-01

    The nano structure based on silicon can be surface modified to be used as label-free biosensors that allow real-time measurements. The silicon nanowire surface was functionalized using 3-aminopropyltrimethoxysilane (APTES), which functions as a facilitator to immobilize biomolecules on the silicon nanowire surface. The process is simple, economical; this will pave the way for point-of-care applications. However, the surface modification and subsequent detection mechanism still not clear. Thus, study proposed step by step process of silicon nano surface modification and its possible in specific and selective target detection of Supra-genome 21 Mers Salmonella. The device captured the molecule with precisely; the approach took the advantages of strong binding chemistry created between APTES and biomolecule. The results indicated how modifications of the nanowires provide sensing capability with strong surface chemistries that can lead to specific and selective target detection.

  17. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs.

    Science.gov (United States)

    Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude

    2011-06-20

    One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  18. Machine Learning Detects Pan-cancer Ras Pathway Activation in The Cancer Genome Atlas

    Directory of Open Access Journals (Sweden)

    Gregory P. Way

    2018-04-01

    Full Text Available Summary: Precision oncology uses genomic evidence to match patients with treatment but often fails to identify all patients who may respond. The transcriptome of these “hidden responders” may reveal responsive molecular states. We describe and evaluate a machine-learning approach to classify aberrant pathway activity in tumors, which may aid in hidden responder identification. The algorithm integrates RNA-seq, copy number, and mutations from 33 different cancer types across The Cancer Genome Atlas (TCGA PanCanAtlas project to predict aberrant molecular states in tumors. Applied to the Ras pathway, the method detects Ras activation across cancer types and identifies phenocopying variants. The model, trained on human tumors, can predict response to MEK inhibitors in wild-type Ras cell lines. We also present data that suggest that multiple hits in the Ras pathway confer increased Ras activity. The transcriptome is underused in precision oncology and, combined with machine learning, can aid in the identification of hidden responders. : Way et al. develop a machine-learning approach using PanCanAtlas data to detect Ras activation in cancer. Integrating mutation, copy number, and expression data, the authors show that their method detects Ras-activating variants in tumors and sensitivity to MEK inhibitors in cell lines. Keywords: Gene expression, machine learning, Ras, NF1, KRAS, NRAS, HRAS, pan-cancer, TCGA, drug sensitivity

  19. Multiscale landscape genomic models to detect signatures of selection in the alpine plant Biscutella laevigata.

    Science.gov (United States)

    Leempoel, Kevin; Parisod, Christian; Geiser, Céline; Joost, Stéphane

    2018-02-01

    Plant species are known to adapt locally to their environment, particularly in mountainous areas where conditions can vary drastically over short distances. The climate of such landscapes being largely influenced by topography, using fine-scale models to evaluate environmental heterogeneity may help detecting adaptation to micro-habitats. Here, we applied a multiscale landscape genomic approach to detect evidence of local adaptation in the alpine plant Biscutella laevigata . The two gene pools identified, experiencing limited gene flow along a 1-km ridge, were different in regard to several habitat features derived from a very high resolution (VHR) digital elevation model (DEM). A correlative approach detected signatures of selection along environmental gradients such as altitude, wind exposure, and solar radiation, indicating adaptive pressures likely driven by fine-scale topography. Using a large panel of DEM-derived variables as ecologically relevant proxies, our results highlighted the critical role of spatial resolution. These high-resolution multiscale variables indeed indicate that the robustness of associations between genetic loci and environmental features depends on spatial parameters that are poorly documented. We argue that the scale issue is critical in landscape genomics and that multiscale ecological variables are key to improve our understanding of local adaptation in highly heterogeneous landscapes.

  20. Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae

    Directory of Open Access Journals (Sweden)

    Christian J. Michel

    2017-12-01

    Full Text Available A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C 3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X , using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X , in the complete genome of the yeast Saccharomyces cerevisiae. Several properties of X motifs are identified by basic statistics (at the frequency level, and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R . We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae. We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae, but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions. This property is true for all cardinalities of X motifs (from 4 to 20 and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non- X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together

  1. Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae.

    Science.gov (United States)

    Michel, Christian J; Ngoune, Viviane Nguefack; Poch, Olivier; Ripp, Raymond; Thompson, Julie D

    2017-12-03

    A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading) frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X, using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X, in the complete genome of the yeast Saccharomyces cerevisiae . Several properties of X motifs are identified by basic statistics (at the frequency level), and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R. We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae . We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae , but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions). This property is true for all cardinalities of X motifs (from 4 to 20) and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non-X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together, represent the first

  2. RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.

    Science.gov (United States)

    Castro-Mondragon, Jaime Abraham; Jaeger, Sébastien; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2017-07-27

    Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. The complete chloroplast genome sequence of Podocarpus lambertii: genome structure, evolutionary aspects, gene content and SSR detection.

    Directory of Open Access Journals (Sweden)

    Leila do Nascimento Vieira

    Full Text Available BACKGROUND: Podocarpus lambertii (Podocarpaceae is a native conifer from the Brazilian Atlantic Forest Biome, which is considered one of the 25 biodiversity hotspots in the world. The advancement of next-generation sequencing technologies has enabled the rapid acquisition of whole chloroplast (cp genome sequences at low cost. Several studies have proven the potential of cp genomes as tools to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, as well as further probe the structural and functional evolution of plants. In this work, we present the complete cp genome sequence of P. lambertii. METHODOLOGY/PRINCIPAL FINDINGS: The P. lambertii cp genome is 133,734 bp in length, and similar to other sequenced cupressophytes, it lacks one of the large inverted repeat regions (IR. It contains 118 unique genes and one duplicated tRNA (trnN-GUU, which occurs as an inverted repeat sequence. The rps16 gene was not found, which was previously reported for the plastid genome of another Podocarpaceae (Nageia nagi and Araucariaceae (Agathis dammara. Structurally, P. lambertii shows 4 inversions of a large DNA fragment ∼20,000 bp compared to the Podocarpus totara cp genome. These unexpected characteristics may be attributed to geographical distance and different adaptive needs. The P. lambertii cp genome presents a total of 28 tandem repeats and 156 SSRs, with homo- and dipolymers being the most common and tri-, tetra-, penta-, and hexapolymers occurring with less frequency. CONCLUSION: The complete cp genome sequence of P. lambertii revealed significant structural changes, even in species from the same genus. These results reinforce the apparently loss of rps16 gene in Podocarpaceae cp genome. In addition, several SSRs in the P. lambertii cp genome are likely intraspecific polymorphism sites, which may allow highly sensitive phylogeographic and population structure studies, as well as phylogenetic studies of species of

  4. DELISHUS: an efficient and exact algorithm for genome-wide detection of deletion polymorphism in autism

    Science.gov (United States)

    Aguiar, Derek; Halldórsson, Bjarni V.; Morrow, Eric M.; Istrail, Sorin

    2012-01-01

    Motivation: The understanding of the genetic determinants of complex disease is undergoing a paradigm shift. Genetic heterogeneity of rare mutations with deleterious effects is more commonly being viewed as a major component of disease. Autism is an excellent example where research is active in identifying matches between the phenotypic and genomic heterogeneities. A considerable portion of autism appears to be correlated with copy number variation, which is not directly probed by single nucleotide polymorphism (SNP) array or sequencing technologies. Identifying the genetic heterogeneity of small deletions remains a major unresolved computational problem partly due to the inability of algorithms to detect them. Results: In this article, we present an algorithmic framework, which we term DELISHUS, that implements three exact algorithms for inferring regions of hemizygosity containing genomic deletions of all sizes and frequencies in SNP genotype data. We implement an efficient backtracking algorithm—that processes a 1 billion entry genome-wide association study SNP matrix in a few minutes—to compute all inherited deletions in a dataset. We further extend our model to give an efficient algorithm for detecting de novo deletions. Finally, given a set of called deletions, we also give a polynomial time algorithm for computing the critical regions of recurrent deletions. DELISHUS achieves significantly lower false-positive rates and higher power than previously published algorithms partly because it considers all individuals in the sample simultaneously. DELISHUS may be applied to SNP array or sequencing data to identify the deletion spectrum for family-based association studies. Availability: DELISHUS is available at http://www.brown.edu/Research/Istrail_Lab/. Contact: Eric_Morrow@brown.edu and Sorin_Istrail@brown.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22689755

  5. Germline contamination and leakage in whole genome somatic single nucleotide variant detection.

    Science.gov (United States)

    Sendorek, Dorota H; Caloian, Cristian; Ellrott, Kyle; Bare, J Christopher; Yamaguchi, Takafumi N; Ewing, Adam D; Houlahan, Kathleen E; Norman, Thea C; Margolin, Adam A; Stuart, Joshua M; Boutros, Paul C

    2018-01-31

    The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called "germline leakage". The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software.

  6. Facilitating the indirect detection of genomic DNA in an electrochemical DNA biosensor using magnetic nanoparticles and DNA ligase

    Directory of Open Access Journals (Sweden)

    Roozbeh Hushiarian

    2015-12-01

    This technique was found to be reliably repeatable. The indirect detection of genomic DNA using this method is significantly improved and showed high efficiency in small amounts of samples with the detection limit of 5.37 × 10−14 M.

  7. Insights into the motif preference of APOBEC3 enzymes.

    Directory of Open Access Journals (Sweden)

    Diako Ebrahimi

    Full Text Available We used a multivariate data analysis approach to identify motifs associated with HIV hypermutation by different APOBEC3 enzymes. The analysis showed that APOBEC3G targets G mainly within GG, TG, TGG, GGG, TGGG and also GGGT. The G nucleotides flanked by a C at the 3' end (in +1 and +2 positions were indicated as disfavoured targets by APOBEC3G. The G nucleotides within GGGG were found to be targeted at a frequency much less than what is expected. We found that the infrequent G-to-A mutation within GGGG is not limited to the inaccessibility, to APOBEC3, of poly Gs in the central and 3'polypurine tracts (PPTs which remain double stranded during the HIV reverse transcription. GGGG motifs outside the PPTs were also disfavoured. The motifs GGAG and GAGG were also found to be disfavoured targets for APOBEC3. The motif-dependent mutation of G within the HIV genome by members of the APOBEC3 family other than APOBEC3G was limited to GA→AA changes. The results did not show evidence of other types of context dependent G-to-A changes in the HIV genome.

  8. Comparative analysis of evolutionarily conserved motifs of epidermal growth factor receptor 2 (HER2) predicts novel potential therapeutic epitopes

    DEFF Research Database (Denmark)

    Deng, Xiaohong; Zheng, Xuxu; Yang, Huanming

    2014-01-01

    druggable epitopes/targets. We employed the PROSITE Scan to detect structurally conserved motifs and PRINTS to search for linearly conserved motifs of ECD HER2. We found that the epitopes recognized by trastuzumab and pertuzumab are located in the predicted conserved motifs of ECD HER2, supporting our...

  9. MOCCS: Clarifying DNA-binding motif ambiguity using ChIP-Seq data.

    Science.gov (United States)

    Ozaki, Haruka; Iwasaki, Wataru

    2016-08-01

    As a key mechanism of gene regulation, transcription factors (TFs) bind to DNA by recognizing specific short sequence patterns that are called DNA-binding motifs. A single TF can accept ambiguity within its DNA-binding motifs, which comprise both canonical (typical) and non-canonical motifs. Clarification of such DNA-binding motif ambiguity is crucial for revealing gene regulatory networks and evaluating mutations in cis-regulatory elements. Although chromatin immunoprecipitation sequencing (ChIP-seq) now provides abundant data on the genomic sequences to which a given TF binds, existing motif discovery methods are unable to directly answer whether a given TF can bind to a specific DNA-binding motif. Here, we report a method for clarifying the DNA-binding motif ambiguity, MOCCS. Given ChIP-Seq data of any TF, MOCCS comprehensively analyzes and describes every k-mer to which that TF binds. Analysis of simulated datasets revealed that MOCCS is applicable to various ChIP-Seq datasets, requiring only a few minutes per dataset. Application to the ENCODE ChIP-Seq datasets proved that MOCCS directly evaluates whether a given TF binds to each DNA-binding motif, even if known position weight matrix models do not provide sufficient information on DNA-binding motif ambiguity. Furthermore, users are not required to provide numerous parameters or background genomic sequence models that are typically unavailable. MOCCS is implemented in Perl and R and is freely available via https://github.com/yuifu/moccs. By complementing existing motif-discovery software, MOCCS will contribute to the basic understanding of how the genome controls diverse cellular processes via DNA-protein interactions. Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. Assembly of viral genomes from metagenomes

    Directory of Open Access Journals (Sweden)

    Saskia L Smits

    2014-12-01

    Full Text Available Viral infections remain a serious global health issue. Metagenomic approaches are increasingly used in the detection of novel viral pathogens but also to generate complete genomes of uncultivated viruses. In silico identification of complete viral genomes from sequence data would allow rapid phylogenetic characterization of these new viruses. Often, however, complete viral genomes are not recovered, but rather several distinct contigs derived from a single entity, some of which have no sequence homology to any known proteins. De novo assembly of single viruses from a metagenome is challenging, not only because of the lack of a reference genome, but also because of intrapopulation variation and uneven or insufficient coverage. Here we explored different assembly algorithms, remote homology searches, genome-specific sequence motifs, k-mer frequency ranking, and coverage profile binning to detect and obtain viral target genomes from metagenomes. All methods were tested on 454-generated sequencing datasets containing three recently described RNA viruses with a relatively large genome which were divergent to previously known viruses from the viral families Rhabdoviridae and Coronaviridae. Depending on specific characteristics of the target virus and the metagenomic community, different assembly and in silico gap closure strategies were successful in obtaining near complete viral genomes.

  11. Structural fragment clustering reveals novel structural and functional motifs in α-helical transmembrane proteins

    Directory of Open Access Journals (Sweden)

    Vassilev Boris

    2010-04-01

    Full Text Available Abstract Background A large proportion of an organism's genome encodes for membrane proteins. Membrane proteins are important for many cellular processes, and several diseases can be linked to mutations in them. With the tremendous growth of sequence data, there is an increasing need to reliably identify membrane proteins from sequence, to functionally annotate them, and to correctly predict their topology. Results We introduce a technique called structural fragment clustering, which learns sequential motifs from 3D structural fragments. From over 500,000 fragments, we obtain 213 statistically significant, non-redundant, and novel motifs that are highly specific to α-helical transmembrane proteins. From these 213 motifs, 58 of them were assigned to function and checked in the scientific literature for a biological assessment. Seventy percent of the motifs are found in co-factor, ligand, and ion binding sites, 30% at protein interaction interfaces, and 12% bind specific lipids such as glycerol or cardiolipins. The vast majority of motifs (94% appear across evolutionarily unrelated families, highlighting the modularity of functional design in membrane proteins. We describe three novel motifs in detail: (1 a dimer interface motif found in voltage-gated chloride channels, (2 a proton transfer motif found in heme-copper oxidases, and (3 a convergently evolved interface helix motif found in an aspartate symporter, a serine protease, and cytochrome b. Conclusions Our findings suggest that functional modules exist in membrane proteins, and that they occur in completely different evolutionary contexts and cover different binding sites. Structural fragment clustering allows us to link sequence motifs to function through clusters of structural fragments. The sequence motifs can be applied to identify and characterize membrane proteins in novel genomes.

  12. Efficient motif finding algorithms for large-alphabet inputs

    Directory of Open Access Journals (Sweden)

    Pavlovic Vladimir

    2010-10-01

    Full Text Available Abstract Background We consider the problem of identifying motifs, recurring or conserved patterns, in the biological sequence data sets. To solve this task, we present a new deterministic algorithm for finding patterns that are embedded as exact or inexact instances in all or most of the input strings. Results The proposed algorithm (1 improves search efficiency compared to existing algorithms, and (2 scales well with the size of alphabet. On a synthetic planted DNA motif finding problem our algorithm is over 10× more efficient than MITRA, PMSPrune, and RISOTTO for long motifs. Improvements are orders of magnitude higher in the same setting with large alphabets. On benchmark TF-binding site problems (FNP, CRP, LexA we observed reduction in running time of over 12×, with high detection accuracy. The algorithm was also successful in rapidly identifying protein motifs in Lipocalin, Zinc metallopeptidase, and supersecondary structure motifs for Cadherin and Immunoglobin families. Conclusions Our algorithm reduces computational complexity of the current motif finding algorithms and demonstrate strong running time improvements over existing exact algorithms, especially in important and difficult cases of large-alphabet sequences.

  13. Finding a Leucine in a Haystack: Searching the Proteome for ambigous Leucine-Aspartic Acid motifs

    KAUST Repository

    Arold, Stefan T.

    2016-01-25

    Leucine-aspartic acid (LD) motifs are short helical protein-protein interaction motifs involved in cell motility, survival and communication. LD motif interactions are also implicated in cancer metastasis and are targeted by several viruses. LD motifs are notoriously difficult to detect because sequence pattern searches lead to an excessively high number of false positives. Hence, despite 20 years of research, only six LD motif–containing proteins are known in humans, three of which are close homologues of the paxillin family. To enable the proteome-wide discovery of LD motifs, we developed LD Motif Finder (LDMF), a web tool based on machine learning that combines sequence information with structural predictions to detect LD motifs with high accuracy. LDMF predicted 13 new LD motifs in humans. Using biophysical assays, we experimentally confirmed in vitro interactions for four novel LD motif proteins. Thus, LDMF allows proteome-wide discovery of LD motifs, despite a highly ambiguous sequence pattern. Functional implications will be discussed.

  14. Effective Normalization for Copy Number Variation Detection from Whole Genome Sequencing

    NARCIS (Netherlands)

    Janevski, A.; Varadan, V.; Kamalakaran, S.; Banerjee, N.; Dimitrova, D.

    2012-01-01

    Background Whole genome sequencing enables a high resolution view ofthe human genome and provides unique insights into genome structureat an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools while validatedalso include a number of

  15. Precise detection of de novo single nucleotide variants in human genomes.

    Science.gov (United States)

    Gómez-Romero, Laura; Palacios-Flores, Kim; Reyes, José; García, Delfino; Boege, Margareta; Dávila, Guillermo; Flores, Margarita; Schatz, Michael C; Palacios, Rafael

    2018-05-07

    The precise determination of de novo genetic variants has enormous implications across different fields of biology and medicine, particularly personalized medicine. Currently, de novo variations are identified by mapping sample reads from a parent-offspring trio to a reference genome, allowing for a certain degree of differences. While widely used, this approach often introduces false-positive (FP) results due to misaligned reads and mischaracterized sequencing errors. In a previous study, we developed an alternative approach to accurately identify single nucleotide variants (SNVs) using only perfect matches. However, this approach could be applied only to haploid regions of the genome and was computationally intensive. In this study, we present a unique approach, coverage-based single nucleotide variant identification (COBASI), which allows the exploration of the entire genome using second-generation short sequence reads without extensive computing requirements. COBASI identifies SNVs using changes in coverage of exactly matching unique substrings, and is particularly suited for pinpointing de novo SNVs. Unlike other approaches that require population frequencies across hundreds of samples to filter out any methodological biases, COBASI can be applied to detect de novo SNVs within isolated families. We demonstrate this capability through extensive simulation studies and by studying a parent-offspring trio we sequenced using short reads. Experimental validation of all 58 candidate de novo SNVs and a selection of non-de novo SNVs found in the trio confirmed zero FP calls. COBASI is available as open source at https://github.com/Laura-Gomez/COBASI for any researcher to use. Copyright © 2018 the Author(s). Published by PNAS.

  16. The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection

    Science.gov (United States)

    Jiang, Yue; Turinsky, Andrei L.; Brudno, Michael

    2015-01-01

    With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them. PMID:26130710

  17. Patterns of cross-contamination in a multispecies population genomic project: detection, quantification, impact, and solutions.

    Science.gov (United States)

    Ballenghien, Marion; Faivre, Nicolas; Galtier, Nicolas

    2017-03-29

    Contamination is a well-known but often neglected problem in molecular biology. Here, we investigated the prevalence of cross-contamination among 446 samples from 116 distinct species of animals, which were processed in the same laboratory and subjected to subcontracted transcriptome sequencing. Using cytochrome oxidase 1 as a barcode, we identified a minimum of 782 events of between-species contamination, with approximately 80% of our samples being affected. An analysis of laboratory metadata revealed a strong effect of the sequencing center: nearly all the detected events of between-species contamination involved species that were sent the same day to the same company. We introduce new methods to address the amount of within-species, between-individual contamination, and to correct for this problem when calling genotypes from base read counts. We report evidence for pervasive within-species contamination in this data set, and show that classical population genomic statistics, such as synonymous diversity, the ratio of non-synonymous to synonymous diversity, inbreeding coefficient F IT , and Tajima's D, are sensitive to this problem to various extents. Control analyses suggest that our published results are probably robust to the problem of contamination. Recommendations on how to prevent or avoid contamination in large-scale population genomics/molecular ecology are provided based on this analysis.

  18. Mi-DISCOVERER: A bioinformatics tool for the detection of mi-RNA in human genome.

    Science.gov (United States)

    Arshad, Saadia; Mumtaz, Asia; Ahmad, Freed; Liaquat, Sadia; Nadeem, Shahid; Mehboob, Shahid; Afzal, Muhammad

    2010-11-27

    MicroRNAs (miRNAs) are 22 nucleotides non-coding RNAs that play pivotal regulatory roles in diverse organisms including the humans and are difficult to be identified due to lack of either sequence features or robust algorithms to efficiently identify. Therefore, we made a tool that is Mi-Discoverer for the detection of miRNAs in human genome. The tools used for the development of software are Microsoft Office Access 2003, the JDK version 1.6.0, BioJava version 1.0, and the NetBeans IDE version 6.0. All already made miRNAs softwares were web based; so the advantage of our project was to make a desktop facility to the user for sequence alignment search with already identified miRNAs of human genome present in the database. The user can also insert and update the newly discovered human miRNA in the database. Mi-Discoverer, a bioinformatics tool successfully identifies human miRNAs based on multiple sequence alignment searches. It's a non redundant database containing a large collection of publicly available human miRNAs.

  19. Detecting Microsatellites in Genome Data: Variance in Definitions and Bioinformatic Approaches Cause Systematic Bias

    Directory of Open Access Journals (Sweden)

    Angelika Merkel

    2008-01-01

    Full Text Available Microsatellites are currently one of the most commonly used genetic markers. The application of bioinformatic tools has become common practice in the study of these short tandem repeats (STR. However, in silico studies can suffer from study bias. Using a meta-analysis on microsatellite distribution in yeast we show that estimates of numbers of repeats reported by different studies can differ in the order of several magnitudes, even within a single genome. These differences arise because varying definitions of microsatellites, spanning repeat size, array length and array composition, are used in different search paradigms, with minimum array length being the main influencing factor. Structural differences in the implemented search algorithm additionally contribute to variation in the number of repeats detected. We suggest that for future studies a consistent approach to STR searches is adopted in order to improve the power of intra- and interspecific comparisons

  20. diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data.

    Science.gov (United States)

    Lun, Aaron T L; Smyth, Gordon K

    2015-08-19

    Chromatin conformation capture with high-throughput sequencing (Hi-C) is a technique that measures the in vivo intensity of interactions between all pairs of loci in the genome. Most conventional analyses of Hi-C data focus on the detection of statistically significant interactions. However, an alternative strategy involves identifying significant changes in the interaction intensity (i.e., differential interactions) between two or more biological conditions. This is more statistically rigorous and may provide more biologically relevant results. Here, we present the diffHic software package for the detection of differential interactions from Hi-C data. diffHic provides methods for read pair alignment and processing, counting into bin pairs, filtering out low-abundance events and normalization of trended or CNV-driven biases. It uses the statistical framework of the edgeR package to model biological variability and to test for significant differences between conditions. Several options for the visualization of results are also included. The use of diffHic is demonstrated with real Hi-C data sets. Performance against existing methods is also evaluated with simulated data. On real data, diffHic is able to successfully detect interactions with significant differences in intensity between biological conditions. It also compares favourably to existing software tools on simulated data sets. These results suggest that diffHic is a viable approach for differential analyses of Hi-C data.

  1. Applications of flow cytometry in plant pathology for genome size determination, detection and physiological status.

    Science.gov (United States)

    D'Hondt, Liesbet; Höfte, Monica; Van Bockstaele, Erik; Leus, Leen

    2011-10-01

    Flow cytometers are probably the most multipurpose laboratory devices available. They can analyse a vast and very diverse range of cell parameters. This technique has left its mark on cancer, human immunodeficiency virus and immunology research, and is indispensable in routine clinical diagnostics. Flow cytometry (FCM) is also a well-known tool for the detection and physiological status assessment of microorganisms in drinking water, marine environments, food and fermentation processes. However, flow cytometers are seldom used in plant pathology, despite FCM's major advantages as both a detection method and a research tool. Potential uses of FCM include the characterization of genome sizes of fungal and oomycete populations, multiplexed pathogen detection and the monitoring of the viability, culturability and gene expression of plant pathogens, and many others. This review provides an overview of the history, advantages and disadvantages of FCM, and focuses on the current applications and future possibilities of FCM in plant pathology. © 2011 THE AUTHORS. MOLECULAR PLANT PATHOLOGY © 2011 BSPP AND BLACKWELL PUBLISHING LTD.

  2. HyDe: a Python Package for Genome-Scale Hybridization Detection.

    Science.gov (United States)

    Blischak, Paul D; Chifman, Julia; Wolfe, Andrea D; Kubatko, Laura S

    2018-03-19

    The analysis of hybridization and gene flow among closely related taxa is a common goal for researchers studying speciation and phylogeography. Many methods for hybridization detection use simple site pattern frequencies from observed genomic data and compare them to null models that predict an absence of gene flow. The theory underlying the detection of hybridization using these site pattern probabilities exploits the relationship between the coalescent process for gene trees within population trees and the process of mutation along the branches of the gene trees. For certain models, site patterns are predicted to occur in equal frequency (i.e., their difference is 0), producing a set of functions called phylogenetic invariants. In this paper we introduce HyDe, a software package for detecting hybridization using phylogenetic invariants arising under the coalescent model with hybridization. HyDe is written in Python, and can be used interactively or through the command line using pre-packaged scripts. We demonstrate the use of HyDe on simulated data, as well as on two empirical data sets from the literature. We focus in particular on identifying individual hybrids within population samples and on distinguishing between hybrid speciation and gene flow. HyDe is freely available as an open source Python package under the GNU GPL v3 on both GitHub (https://github.com/pblischak/HyDe) and the Python Package Index (PyPI: https://pypi.python.org/pypi/phyde).

  3. Detection of genomic signatures for pig hairlessness using high-density SNP data

    Directory of Open Access Journals (Sweden)

    Ying SU,Yi LONG,Xinjun LIAO,Huashui AI,Zhiyan ZHANG,Bin YANG,Shijun XIAO,Jianhong TANG,Wenshui XIN,Lusheng HUANG,Jun REN,Nengshui DING

    2014-12-01

    Full Text Available Hair provides thermal regulation for mammals and protects the skin from wounds, bites and ultraviolet (UV radiation, and is important in adaptation to volatile environments. Pigs in nature are divided into hairy and hairless, which provide a good model for deciphering the molecular mechanisms of hairlessness. We conducted a genomic scan for genetically differentiated regions between hairy and hairless pigs using 60K SNP data, with the aim to better understand the genetic basis for the hairless phenotype in pigs. A total of 38405 SNPs in 498 animals from 36 diverse breeds were used to detect genomic signatures for pig hairlessness by estimating between-population (FST values. Seven diversifying signatures between Yucatan hairless pig and hairy pigs were identified on pig chromosomes (SSC 1, 3, 7, 8, 10, 11 and 16, and the biological functions of two notable genes, RGS17 and RB1, were revealed. When Mexican hairless pigs were contrasted with hairypigs, strong signatures were detected on SSC1 and SSC10, which harbor two functionally plausible genes, REV3L and BAMBI. KEGG pathway analysis showed a subset of overrepresented genes involved in the T cell receptor signaling pathway, MAPK signaling pathway and the tight junction pathways. All of these pathways may be important in local adaptability of hairless pigs. The potential mechanisms underlying the hairless phenotype in pigs are reported for the first time. RB1 and BAMBI are interesting candidate genes for the hairless phenotype in Yucatan hairless and Mexico hairless pigs, respectively. RGS17, REV3L, ICOS and RASGRP1 as well as other genes involved in the MAPK and T cell receptor signaling pathways may be important in environmental adaption by improved tolerance to UV damage in hairless pigs. These findings improve our understanding of the genetic basis for inherited hairlessness in pigs.

  4. The most conserved genome segments for life detection on Earth and other planets.

    Science.gov (United States)

    Isenbarger, Thomas A; Carr, Christopher E; Johnson, Sarah Stewart; Finney, Michael; Church, George M; Gilbert, Walter; Zuber, Maria T; Ruvkun, Gary

    2008-12-01

    On Earth, very simple but powerful methods to detect and classify broad taxa of life by the polymerase chain reaction (PCR) are now standard practice. Using DNA primers corresponding to the 16S ribosomal RNA gene, one can survey a sample from any environment for its microbial inhabitants. Due to massive meteoritic exchange between Earth and Mars (as well as other planets), a reasonable case can be made for life on Mars or other planets to be related to life on Earth. In this case, the supremely sensitive technologies used to study life on Earth, including in extreme environments, can be applied to the search for life on other planets. Though the 16S gene has become the standard for life detection on Earth, no genome comparisons have established that the ribosomal genes are, in fact, the most conserved DNA segments across the kingdoms of life. We present here a computational comparison of full genomes from 13 diverse organisms from the Archaea, Bacteria, and Eucarya to identify genetic sequences conserved across the widest divisions of life. Our results identify the 16S and 23S ribosomal RNA genes as well as other universally conserved nucleotide sequences in genes encoding particular classes of transfer RNAs and within the nucleotide binding domains of ABC transporters as the most conserved DNA sequence segments across phylogeny. This set of sequences defines a core set of DNA regions that have changed the least over billions of years of evolution and provides a means to identify and classify divergent life, including ancestrally related life on other planets.

  5. Detecting single DNA copy number variations in complex genomes using one nanogram of starting DNA and BAC-array CGH.

    Science.gov (United States)

    Guillaud-Bataille, Marine; Valent, Alexander; Soularue, Pascal; Perot, Christine; Inda, Maria Mar; Receveur, Aline; Smaïli, Sadek; Roest Crollius, Hugues; Bénard, Jean; Bernheim, Alain; Gidrol, Xavier; Danglot, Gisèle

    2004-07-29

    Comparative genomic hybridization to bacterial artificial chromosome (BAC)-arrays (array-CGH) is a highly efficient technique, allowing the simultaneous measurement of genomic DNA copy number at hundreds or thousands of loci, and the reliable detection of local one-copy-level variations. We report a genome-wide amplification method allowing the same measurement sensitivity, using 1 ng of starting genomic DNA, instead of the classical 1 microg usually necessary. Using a discrete series of DNA fragments, we defined the parameters adapted to the most faithful ligation-mediated PCR amplification and the limits of the technique. The optimized protocol allows a 3000-fold DNA amplification, retaining the quantitative characteristics of the initial genome. Validation of the amplification procedure, using DNA from 10 tumour cell lines hybridized to BAC-arrays of 1500 spots, showed almost perfectly superimposed ratios for the non-amplified and amplified DNAs. Correlation coefficients of 0.96 and 0.99 were observed for regions of low-copy-level variations and all regions, respectively (including in vivo amplified oncogenes). Finally, labelling DNA using two nucleotides bearing the same fluorophore led to a significant increase in reproducibility and to the correct detection of one-copy gain or loss in >90% of the analysed data, even for pseudotriploid tumour genomes.

  6. Challenging a bioinformatic tool’s ability to detect microbial contaminants using in silico whole genome sequencing data

    Directory of Open Access Journals (Sweden)

    Nathan D. Olson

    2017-09-01

    Full Text Available High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus, Escherichia, and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods.

  7. Orion: Detecting regions of the human non-coding genome that are intolerant to variation using population genetics.

    Science.gov (United States)

    Gussow, Ayal B; Copeland, Brett R; Dhindsa, Ryan S; Wang, Quanli; Petrovski, Slavé; Majoros, William H; Allen, Andrew S; Goldstein, David B

    2017-01-01

    There is broad agreement that genetic mutations occurring outside of the protein-coding regions play a key role in human disease. Despite this consensus, we are not yet capable of discerning which portions of non-coding sequence are important in the context of human disease. Here, we present Orion, an approach that detects regions of the non-coding genome that are depleted of variation, suggesting that the regions are intolerant of mutations and subject to purifying selection in the human lineage. We show that Orion is highly correlated with known intolerant regions as well as regions that harbor putatively pathogenic variation. This approach provides a mechanism to identify pathogenic variation in the human non-coding genome and will have immediate utility in the diagnostic interpretation of patient genomes and in large case control studies using whole-genome sequences.

  8. A ChIP-Seq benchmark shows that sequence conservation mainly improves detection of strong transcription factor binding sites.

    Directory of Open Access Journals (Sweden)

    Tony Håndstad

    Full Text Available BACKGROUND: Transcription factors are important controllers of gene expression and mapping transcription factor binding sites (TFBS is key to inferring transcription factor regulatory networks. Several methods for predicting TFBS exist, but there are no standard genome-wide datasets on which to assess the performance of these prediction methods. Also, it is believed that information about sequence conservation across different genomes can generally improve accuracy of motif-based predictors, but it is not clear under what circumstances use of conservation is most beneficial. RESULTS: Here we use published ChIP-seq data and an improved peak detection method to create comprehensive benchmark datasets for prediction methods which use known descriptors or binding motifs to detect TFBS in genomic sequences. We use this benchmark to assess the performance of five different prediction methods and find that the methods that use information about sequence conservation generally perform better than simpler motif-scanning methods. The difference is greater on high-affinity peaks and when using short and information-poor motifs. However, if the motifs are specific and information-rich, we find that simple motif-scanning methods can perform better than conservation-based methods. CONCLUSIONS: Our benchmark provides a comprehensive test that can be used to rank the relative performance of transcription factor binding site prediction methods. Moreover, our results show that, contrary to previous reports, sequence conservation is better suited for predicting strong than weak transcription factor binding sites.

  9. Detection of bacterial contaminants and hybrid sequences in the genome of the kelp Saccharina japonica using Taxoblast

    Directory of Open Access Journals (Sweden)

    Simon M. Dittami

    2017-11-01

    Full Text Available Modern genome sequencing strategies are highly sensitive to contamination making the detection of foreign DNA sequences an important part of analysis pipelines. Here we use Taxoblast, a simple pipeline with a graphical user interface, for the post-assembly detection of contaminating sequences in the published genome of the kelp Saccharina japonica. Analyses were based on multiple blastn searches with short sequence fragments. They revealed a number of probable bacterial contaminations as well as hybrid scaffolds that contain both bacterial and algal sequences. This or similar types of analysis, in combination with manual curation, may thus constitute a useful complement to standard bioinformatics analyses prior to submission of genomic data to public repositories. Our analysis pipeline is open-source and freely available at http://sdittami.altervista.org/taxoblast and via SourceForge (https://sourceforge.net/projects/taxoblast.

  10. Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits.

    Science.gov (United States)

    Dessimoz, Christophe; Boeckmann, Brigitte; Roth, Alexander C J; Gonnet, Gaston H

    2006-01-01

    Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.

  11. DETECTING SELECTION IN NATURAL POPULATIONS: MAKING SENSE OF GENOME SCANS AND TOWARDS ALTERNATIVE SOLUTIONS

    Science.gov (United States)

    Haasl, Ryan J.; Payseur, Bret A.

    2016-01-01

    Genomewide scans for natural selection (GWSS) have become increasingly common over the last 15 years due to increased availability of genome-scale genetic data. Here, we report a representative survey of GWSS from 1999 to present and find that (i) between 1999 and 2009, 35 of 49 (71%) GWSS focused on human, while from 2010 to present, only 38 of 83 (46%) of GWSS focused on human, indicating increased focus on nonmodel organisms; (ii) the large majority of GWSS incorporate interpopulation or interspecific comparisons using, for example FST, cross-population extended haplotype homozygosity or the ratio of nonsynonymous to synonymous substitutions; (iii) most GWSS focus on detection of directional selection rather than other modes such as balancing selection; and (iv) in human GWSS, there is a clear shift after 2004 from microsatellite markers to dense SNP data. A survey of GWSS meant to identify loci positively selected in response to severe hypoxic conditions support an approach to GWSS in which a list of a priori candidate genes based on potential selective pressures are used to filter the list of significant hits a posteriori. We also discuss four frequently ignored determinants of genomic heterogeneity that complicate GWSS: mutation, recombination, selection and the genetic architecture of adaptive traits. We recommend that GWSS methodology should better incorporate aspects of genomewide heterogeneity using empirical estimates of relevant parameters and/or realistic, whole-chromosome simulations to improve interpretation of GWSS results. Finally, we argue that knowledge of potential selective agents improves interpretation of GWSS results and that new methods focused on correlations between environmental variables and genetic variation can help automate this approach. PMID:26224644

  12. Efficient utilization of rare variants for detection of disease-related genomic regions.

    Directory of Open Access Journals (Sweden)

    Lei Zhang

    2010-12-01

    Full Text Available When testing association between rare variants and diseases, an efficient analytical approach involves considering a set of variants in a genomic region as the unit of analysis. One factor complicating this approach is that the vast majority of rare variants in practical applications are believed to represent background neutral variation. As a result, analyzing a single set with all variants may not represent a powerful approach. Here, we propose two alternative strategies. In the first, we analyze the subsets of rare variants exhaustively. In the second, we categorize variants selectively into two subsets: one in which variants are overrepresented in cases, and the other in which variants are overrepresented in controls. When the proportion of neutral variants is moderate to large we show, by simulations, that the both proposed strategies improve the statistical power over methods analyzing a single set with total variants. When applied to a real sequencing association study, the proposed methods consistently produce smaller p-values than their competitors. When applied to another real sequencing dataset to study the difference of rare allele distributions between ethnic populations, the proposed methods detect the overrepresentation of variants between the CHB (Chinese Han in Beijing and YRI (Yoruba people of Ibadan populations with small p-values. Additional analyses suggest that there is no difference between the CHB and CHD (Chinese Han in Denver datasets, as expected. Finally, when applied to the CHB and JPT (Japanese people in Tokyo populations, existing methods fail to detect any difference, while it is detected by the proposed methods in several regions.

  13. A sensitive, support-vector-machine method for the detection of horizontal gene transfers in viral, archaeal and bacterial genomes.

    Science.gov (United States)

    Tsirigos, Aristotelis; Rigoutsos, Isidore

    2005-01-01

    In earlier work, we introduced and discussed a generalized computational framework for identifying horizontal transfers. This framework relied on a gene's nucleotide composition, obviated the need for knowledge of codon boundaries and database searches, and was shown to perform very well across a wide range of archaeal and bacterial genomes when compared with previously published approaches, such as Codon Adaptation Index and C + G content. Nonetheless, two considerations remained outstanding: we wanted to further increase the sensitivity of detecting horizontal transfers and also to be able to apply the method to increasingly smaller genomes. In the discussion that follows, we present such a method, Wn-SVM, and show that it exhibits a very significant improvement in sensitivity compared with earlier approaches. Wn-SVM uses a one-class support-vector machine and can learn using rather small training sets. This property makes Wn-SVM particularly suitable for studying small-size genomes, similar to those of viruses, as well as the typically larger archaeal and bacterial genomes. We show experimentally that the new method results in a superior performance across a wide range of organisms and that it improves even upon our own earlier method by an average of 10% across all examined genomes. As a small-genome case study, we analyze the genome of the human cytomegalovirus and demonstrate that Wn-SVM correctly identifies regions that are known to be conserved and prototypical of all beta-herpesvirinae, regions that are known to have been acquired horizontally from the human host and, finally, regions that had not up to now been suspected to be horizontally transferred. Atypical region predictions for many eukaryotic viruses, including the alpha-, beta- and gamma-herpesvirinae, and 123 archaeal and bacterial genomes, have been made available online at http://cbcsrv.watson.ibm.com/HGT_SVM/.

  14. Detection of Alien Oryza punctata Kotschy Chromosomes in Rice, Oryza sativa L., by Genomic in situ Hybridization

    OpenAIRE

    Yasui, Hideshi; Nonomura, Ken-ichi; Iwata, Nobuo; 安井, 秀; 野々村, 賢一; 岩田, 伸夫

    1997-01-01

    Genomic in situ hybridization (GIS H) using total Oryza punctata Kotschy genomic DNA as a probe was applied to detect alien chromosomes transferred from O. punctata (W1514: 2n=2x=24: BB) to O. sativa Japonica cultivar, Nipponbare (2n=2x=24: AA). Only 12 chromosomes in the interspecific hybrids (2n=3x=36: AAB) between autotetraploid of O. sativa cultivar Nipponbare and a diploid strain of O. punctata (W1514) showed intense staining by FITC in mitotic metaphase spreads. Only one homologous pair...

  15. Kopi dan Kakao dalam Kreasi Motif Batik Khas Jember

    Directory of Open Access Journals (Sweden)

    Irfa'ina Rohana Salma

    2015-06-01

    Full Text Available ABSTRAK Batik Jember selama ini identik dengan motif daun tembakau. Visualisasi daun tembakau dalam motif Batik Jember cukup lemah, yaitu kurang berkarakter karena motif yang muncul adalah seperti gambar daun pada umumnya. Oleh karena itu perlu diciptakan desain motif batik khas Jember yang sumber inspirasinya digali dari kekayaan alam lainnya dari Jember yang mempunyai bentuk spesifik dan karakteristik sehingga identitas motif bisa didapatkan dengan lebih kuat. Hasil alam khas Jember tersebut adalah kopi dan kakao. Tujuan penciptaan seni ini adalah untuk menghasilkan motif batik  baru yang mempunyai ciri khas Jember. Metode yang digunakan yaitu pengumpulan data, pengamatan mendalam terhadap objek penciptaan, pengkajian sumber inspirasi, pembuatan desain motif, dan perwujudan menjadi batik. Dari penciptaan seni ini berhasil dikreasikan 6 (enam motif batik yaitu: (1 Motif Uwoh Kopi; (2 Motif Godong Kopi;  (3 Motif Ceplok Kakao; (4 Motif Kakao Raja; (5 Motif Kakao Biru; dan (6 Motif Wiji Mukti. Berdasarkan hasil penilaian “Selera Estetika” diketahui bahwa motif yang paling banyak disukai adalah Motif Uwoh Kopi dan Motif Kakao Raja. Kata kunci: Motif Woh Kopi, Motif Godong Kopi, Motif Ceplok Kakao, Motif Kakao Raja, Motif Kakao Biru, Motif Wiji Mukti ABSTRACTBatik Jember is synonymous with tobacco leaf motif. Tobacco leaf shape is quite weak in the visual appearance characterized as that motif emerges like a picture of leaves in general. Therefore, it is necessary to create a distinctive design motif extracted from other natural resources of Jember that have specific shapes and characteristics that can be obtained as the stronger motif identity. The typical natural resources from Jember are coffee and cocoa. The purpose of the creation of this art is to produce the unique, creative and innovative batik and have specific characteristics of Jember. The method used are data collection, observation of the object, reviewing inspiration sources

  16. Factoring local sequence composition in motif significance analysis.

    Science.gov (United States)

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  17. Detection and correction of false segmental duplications caused by genome mis-assembly

    Science.gov (United States)

    2010-01-01

    Diploid genomes with divergent chromosomes present special problems for assembly software as two copies of especially polymorphic regions may be mistakenly constructed, creating the appearance of a recent segmental duplication. We developed a method for identifying such false duplications and applied it to four vertebrate genomes. For each genome, we corrected mis-assemblies, improved estimates of the amount of duplicated sequence, and recovered polymorphisms between the sequenced chromosomes. PMID:20219098

  18. Genome-wide detection and characterization of positive selection in human populations.

    Science.gov (United States)

    Sabeti, Pardis C; Varilly, Patrick; Fry, Ben; Lohmueller, Jason; Hostetter, Elizabeth; Cotsapas, Chris; Xie, Xiaohui; Byrne, Elizabeth H; McCarroll, Steven A; Gaudet, Rachelle; Schaffner, Stephen F; Lander, Eric S; Frazer, Kelly A; Ballinger, Dennis G; Cox, David R; Hinds, David A; Stuve, Laura L; Gibbs, Richard A; Belmont, John W; Boudreau, Andrew; Hardenbol, Paul; Leal, Suzanne M; Pasternak, Shiran; Wheeler, David A; Willis, Thomas D; Yu, Fuli; Yang, Huanming; Zeng, Changqing; Gao, Yang; Hu, Haoran; Hu, Weitao; Li, Chaohua; Lin, Wei; Liu, Siqi; Pan, Hao; Tang, Xiaoli; Wang, Jian; Wang, Wei; Yu, Jun; Zhang, Bo; Zhang, Qingrun; Zhao, Hongbin; Zhao, Hui; Zhou, Jun; Gabriel, Stacey B; Barry, Rachel; Blumenstiel, Brendan; Camargo, Amy; Defelice, Matthew; Faggart, Maura; Goyette, Mary; Gupta, Supriya; Moore, Jamie; Nguyen, Huy; Onofrio, Robert C; Parkin, Melissa; Roy, Jessica; Stahl, Erich; Winchester, Ellen; Ziaugra, Liuda; Altshuler, David; Shen, Yan; Yao, Zhijian; Huang, Wei; Chu, Xun; He, Yungang; Jin, Li; Liu, Yangfan; Shen, Yayun; Sun, Weiwei; Wang, Haifeng; Wang, Yi; Wang, Ying; Xiong, Xiaoyan; Xu, Liang; Waye, Mary M Y; Tsui, Stephen K W; Xue, Hong; Wong, J Tze-Fei; Galver, Luana M; Fan, Jian-Bing; Gunderson, Kevin; Murray, Sarah S; Oliphant, Arnold R; Chee, Mark S; Montpetit, Alexandre; Chagnon, Fanny; Ferretti, Vincent; Leboeuf, Martin; Olivier, Jean-François; Phillips, Michael S; Roumy, Stéphanie; Sallée, Clémentine; Verner, Andrei; Hudson, Thomas J; Kwok, Pui-Yan; Cai, Dongmei; Koboldt, Daniel C; Miller, Raymond D; Pawlikowska, Ludmila; Taillon-Miller, Patricia; Xiao, Ming; Tsui, Lap-Chee; Mak, William; Song, You Qiang; Tam, Paul K H; Nakamura, Yusuke; Kawaguchi, Takahisa; Kitamoto, Takuya; Morizono, Takashi; Nagashima, Atsushi; Ohnishi, Yozo; Sekine, Akihiro; Tanaka, Toshihiro; Tsunoda, Tatsuhiko; Deloukas, Panos; Bird, Christine P; Delgado, Marcos; Dermitzakis, Emmanouil T; Gwilliam, Rhian; Hunt, Sarah; Morrison, Jonathan; Powell, Don; Stranger, Barbara E; Whittaker, Pamela; Bentley, David R; Daly, Mark J; de Bakker, Paul I W; Barrett, Jeff; Chretien, Yves R; Maller, Julian; McCarroll, Steve; Patterson, Nick; Pe'er, Itsik; Price, Alkes; Purcell, Shaun; Richter, Daniel J; Sabeti, Pardis; Saxena, Richa; Schaffner, Stephen F; Sham, Pak C; Varilly, Patrick; Altshuler, David; Stein, Lincoln D; Krishnan, Lalitha; Smith, Albert Vernon; Tello-Ruiz, Marcela K; Thorisson, Gudmundur A; Chakravarti, Aravinda; Chen, Peter E; Cutler, David J; Kashuk, Carl S; Lin, Shin; Abecasis, Gonçalo R; Guan, Weihua; Li, Yun; Munro, Heather M; Qin, Zhaohui Steve; Thomas, Daryl J; McVean, Gilean; Auton, Adam; Bottolo, Leonardo; Cardin, Niall; Eyheramendy, Susana; Freeman, Colin; Marchini, Jonathan; Myers, Simon; Spencer, Chris; Stephens, Matthew; Donnelly, Peter; Cardon, Lon R; Clarke, Geraldine; Evans, David M; Morris, Andrew P; Weir, Bruce S; Tsunoda, Tatsuhiko; Johnson, Todd A; Mullikin, James C; Sherry, Stephen T; Feolo, Michael; Skol, Andrew; Zhang, Houcan; Zeng, Changqing; Zhao, Hui; Matsuda, Ichiro; Fukushima, Yoshimitsu; Macer, Darryl R; Suda, Eiko; Rotimi, Charles N; Adebamowo, Clement A; Ajayi, Ike; Aniagwu, Toyin; Marshall, Patricia A; Nkwodimmah, Chibuzor; Royal, Charmaine D M; Leppert, Mark F; Dixon, Missy; Peiffer, Andy; Qiu, Renzong; Kent, Alastair; Kato, Kazuto; Niikawa, Norio; Adewole, Isaac F; Knoppers, Bartha M; Foster, Morris W; Clayton, Ellen Wright; Watkin, Jessica; Gibbs, Richard A; Belmont, John W; Muzny, Donna; Nazareth, Lynne; Sodergren, Erica; Weinstock, George M; Wheeler, David A; Yakub, Imtaz; Gabriel, Stacey B; Onofrio, Robert C; Richter, Daniel J; Ziaugra, Liuda; Birren, Bruce W; Daly, Mark J; Altshuler, David; Wilson, Richard K; Fulton, Lucinda L; Rogers, Jane; Burton, John; Carter, Nigel P; Clee, Christopher M; Griffiths, Mark; Jones, Matthew C; McLay, Kirsten; Plumb, Robert W; Ross, Mark T; Sims, Sarah K; Willey, David L; Chen, Zhu; Han, Hua; Kang, Le; Godbout, Martin; Wallenburg, John C; L'Archevêque, Paul; Bellemare, Guy; Saeki, Koji; Wang, Hongguang; An, Daochang; Fu, Hongbo; Li, Qing; Wang, Zhen; Wang, Renwu; Holden, Arthur L; Brooks, Lisa D; McEwen, Jean E; Guyer, Mark S; Wang, Vivian Ota; Peterson, Jane L; Shi, Michael; Spiegel, Jack; Sung, Lawrence M; Zacharia, Lynn F; Collins, Francis S; Kennedy, Karen; Jamieson, Ruth; Stewart, John

    2007-10-18

    With the advent of dense maps of human genetic variation, it is now possible to detect positive natural selection across the human genome. Here we report an analysis of over 3 million polymorphisms from the International HapMap Project Phase 2 (HapMap2). We used 'long-range haplotype' methods, which were developed to identify alleles segregating in a population that have undergone recent selection, and we also developed new methods that are based on cross-population comparisons to discover alleles that have swept to near-fixation within a population. The analysis reveals more than 300 strong candidate regions. Focusing on the strongest 22 regions, we develop a heuristic for scrutinizing these regions to identify candidate targets of selection. In a complementary analysis, we identify 26 non-synonymous, coding, single nucleotide polymorphisms showing regional evidence of positive selection. Examination of these candidates highlights three cases in which two genes in a common biological process have apparently undergone positive selection in the same population:LARGE and DMD, both related to infection by the Lassa virus, in West Africa;SLC24A5 and SLC45A2, both involved in skin pigmentation, in Europe; and EDAR and EDA2R, both involved in development of hair follicles, in Asia.

  19. Whole Genome Association Study to Detect Single Nucleotide Polymorphisms for Behavior in Sapsaree Dog (

    Directory of Open Access Journals (Sweden)

    J. H. Ha

    2015-07-01

    Full Text Available The purpose of this study was to characterize genetic architecture of behavior patterns in Sapsaree dogs. The breed population (n = 8,256 has been constructed since 1990 over 12 generations and managed at the Sapsaree Breeding Research Institute, Gyeongsan, Korea. Seven behavioral traits were investigated for 882 individuals. The traits were classified as a quantitative or a categorical group, and heritabilities (h2 and variance components were estimated under the Animal model using ASREML 2.0 software program. In general, the h2 estimates of the traits ranged between 0.00 and 0.16. Strong genetic (rG and phenotypic (rP correlations were observed between nerve stability, affability and adaptability, i.e. 0.9 to 0.94 and 0.46 to 0.68, respectively. To detect significant single nucleotide polymorphism (SNP for the behavioral traits, a total of 134 and 60 samples were genotyped using the Illumina 22K CanineSNP20 and 170K CanineHD bead chips, respectively. Two datasets comprising 60 (Sap60 and 183 (Sap183 samples were analyzed, respectively, of which the latter was based on the SNPs that were embedded on both the 22K and 170K chips. To perform genome-wide association analysis, each SNP was considered with the residuals of each phenotype that were adjusted for sex and year of birth as fixed effects. A least squares based single marker regression analysis was followed by a stepwise regression procedure for the significant SNPs (p<0.01, to determine a best set of SNPs for each trait. A total of 41 SNPs were detected with the Sap183 samples for the behavior traits. The significant SNPs need to be verified using other samples, so as to be utilized to improve behavior traits via marker-assisted selection in the Sapsaree population.

  20. Mutation Detection with Next-Generation Resequencing through a Mediator Genome

    Energy Technology Data Exchange (ETDEWEB)

    Wurtzel, Omri; Dori-Bachash, Mally; Pietrokovski, Shmuel; Jurkevitch, Edouard; Sorek, Rotem; Ben-Jacob, Eshel

    2010-12-31

    The affordability of next generation sequencing (NGS) is transforming the field of mutation analysis in bacteria. The genetic basis for phenotype alteration can be identified directly by sequencing the entire genome of the mutant and comparing it to the wild-type (WT) genome, thus identifying acquired mutations. A major limitation for this approach is the need for an a-priori sequenced reference genome for the WT organism, as the short reads of most current NGS approaches usually prohibit de-novo genome assembly. To overcome this limitation we propose a general framework that utilizes the genome of relative organisms as mediators for comparing WT and mutant bacteria. Under this framework, both mutant and WT genomes are sequenced with NGS, and the short sequencing reads are mapped to the mediator genome. Variations between the mutant and the mediator that recur in the WT are ignored, thus pinpointing the differences between the mutant and the WT. To validate this approach we sequenced the genome of Bdellovibrio bacteriovorus 109J, an obligatory bacterial predator, and its prey-independent mutant, and compared both to the mediator species Bdellovibrio bacteriovorus HD100. Although the mutant and the mediator sequences differed in more than 28,000 nucleotide positions, our approach enabled pinpointing the single causative mutation. Experimental validation in 53 additional mutants further established the implicated gene. Our approach extends the applicability of NGS-based mutant analyses beyond the domain of available reference genomes.

  1. Transnationalism as a motif in family stories.

    Science.gov (United States)

    Stone, Elizabeth; Gomez, Erica; Hotzoglou, Despina; Lipnitsky, Jane Y

    2005-12-01

    Family stories have long been recognized as a vehicle for assessing components of a family's emotional and social life, including the degree to which an immigrant family has been willing to assimilate. Transnationalism, defined as living in one or more cultures and maintaining connections to both, is now increasingly common. A qualitative study of family stories in the family of those who appear completely "American" suggests that an affiliation with one's home country is nevertheless detectable in the stories via motifs such as (1) positively connotated home remedies, (2) continuing denigration of home country "enemies," (3) extensive knowledge of the home country history and politics, (4) praise of endogamy and negative assessment of exogamy, (5) superiority of home country to America, and (6) beauty of home country. Furthermore, an awareness of which model--assimilationist or transnational--governs a family's experience may help clarify a clinician's understanding of a family's strengths, vulnerabilities, and mode of framing their cultural experiences.

  2. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

    Directory of Open Access Journals (Sweden)

    Martin Juliette

    2011-06-01

    Full Text Available Abstract Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet, which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i ubiquitous motifs, shared by several superfamilies and (ii superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  3. The complete chloroplast genome sequence of an endemic monotypic genus Hagenia (Rosaceae: structural comparative analysis, gene content and microsatellite detection

    Directory of Open Access Journals (Sweden)

    Andrew W. Gichira

    2017-01-01

    Full Text Available Hagenia is an endangered monotypic genus endemic to the topical mountains of Africa. The only species, Hagenia abyssinica (Bruce J.F. Gmel, is an important medicinal plant producing bioactive compounds that have been traditionally used by African communities as a remedy for gastrointestinal ailments in both humans and animals. Complete chloroplast genomes have been applied in resolving phylogenetic relationships within plant families. We employed high-throughput sequencing technologies to determine the complete chloroplast genome sequence of H. abyssinica. The genome is a circular molecule of 154,961 base pairs (bp, with a pair of Inverted Repeats (IR 25,971 bp each, separated by two single copies; a large (LSC, 84,320 bp and a small single copy (SSC, 18,696. H. abyssinica’s chloroplast genome has a 37.1% GC content and encodes 112 unique genes, 78 of which code for proteins, 30 are tRNA genes and four are rRNA genes. A comparative analysis with twenty other species, sequenced to-date from the family Rosaceae, revealed similarities in structural organization, gene content and arrangement. The observed size differences are attributed to the contraction/expansion of the inverted repeats. The translational initiation factor gene (infA which had been previously reported in other chloroplast genomes was conspicuously missing in H. abyssinica. A total of 172 microsatellites and 49 large repeat sequences were detected in the chloroplast genome. A Maximum Likelihood analyses of 71 protein-coding genes placed Hagenia in Rosoideae. The availability of a complete chloroplast genome, the first in the Sanguisorbeae tribe, is beneficial for further molecular studies on taxonomic and phylogenomic resolution within the Rosaceae family.

  4. The complete chloroplast genome sequence of an endemic monotypic genus Hagenia (Rosaceae): structural comparative analysis, gene content and microsatellite detection.

    Science.gov (United States)

    Gichira, Andrew W; Li, Zhizhong; Saina, Josphat K; Long, Zhicheng; Hu, Guangwan; Gituru, Robert W; Wang, Qingfeng; Chen, Jinming

    2017-01-01

    Hagenia is an endangered monotypic genus endemic to the topical mountains of Africa. The only species, Hagenia abyssinica (Bruce) J.F. Gmel, is an important medicinal plant producing bioactive compounds that have been traditionally used by African communities as a remedy for gastrointestinal ailments in both humans and animals. Complete chloroplast genomes have been applied in resolving phylogenetic relationships within plant families. We employed high-throughput sequencing technologies to determine the complete chloroplast genome sequence of H. abyssinica. The genome is a circular molecule of 154,961 base pairs (bp), with a pair of Inverted Repeats (IR) 25,971 bp each, separated by two single copies; a large (LSC, 84,320 bp) and a small single copy (SSC, 18,696). H. abyssinica 's chloroplast genome has a 37.1% GC content and encodes 112 unique genes, 78 of which code for proteins, 30 are tRNA genes and four are rRNA genes. A comparative analysis with twenty other species, sequenced to-date from the family Rosaceae, revealed similarities in structural organization, gene content and arrangement. The observed size differences are attributed to the contraction/expansion of the inverted repeats. The translational initiation factor gene ( infA ) which had been previously reported in other chloroplast genomes was conspicuously missing in H. abyssinica . A total of 172 microsatellites and 49 large repeat sequences were detected in the chloroplast genome. A Maximum Likelihood analyses of 71 protein-coding genes placed Hagenia in Rosoideae. The availability of a complete chloroplast genome, the first in the Sanguisorbeae tribe, is beneficial for further molecular studies on taxonomic and phylogenomic resolution within the Rosaceae family.

  5. Imaging analysis of nuclear antiviral factors through direct detection of incoming adenovirus genome complexes

    Energy Technology Data Exchange (ETDEWEB)

    Komatsu, Tetsuro [Microbiologie Fondamentale et Pathogénicité, MFP CNRS UMR 5234, Université de Bordeaux, Bordeaux 33076 (France); Department of Infection Biology, Faculty of Medicine, University of Tsukuba, Tsukuba 305-8575 (Japan); Will, Hans [Department of Tumor Biology, University Hospital Hamburg-Eppendorf, 20246 Hamburg (Germany); Nagata, Kyosuke [Department of Infection Biology, Faculty of Medicine, University of Tsukuba, Tsukuba 305-8575 (Japan); Wodrich, Harald, E-mail: harald.wodrich@u-bordeaux.fr [Microbiologie Fondamentale et Pathogénicité, MFP CNRS UMR 5234, Université de Bordeaux, Bordeaux 33076 (France)

    2016-04-22

    Recent studies involving several viral systems have highlighted the importance of cellular intrinsic defense mechanisms through nuclear antiviral proteins that restrict viral propagation. These factors include among others components of PML nuclear bodies, the nuclear DNA sensor IFI16, and a potential restriction factor PHF13/SPOC1. For several nuclear replicating DNA viruses, it was shown that these factors sense and target viral genomes immediately upon nuclear import. In contrast to the anticipated view, we recently found that incoming adenoviral genomes are not targeted by PML nuclear bodies. Here we further explored cellular responses against adenoviral infection by focusing on specific conditions as well as additional nuclear antiviral factors. In line with our previous findings, we show that neither interferon treatment nor the use of specific isoforms of PML nuclear body components results in co-localization between incoming adenoviral genomes and the subnuclear domains. Furthermore, our imaging analyses indicated that neither IFI16 nor PHF13/SPOC1 are likely to target incoming adenoviral genomes. Thus our findings suggest that incoming adenoviral genomes may be able to escape from a large repertoire of nuclear antiviral mechanisms, providing a rationale for the efficient initiation of lytic replication cycle. - Highlights: • Host nuclear antiviral factors were analyzed upon adenovirus genome delivery. • Interferon treatments fail to permit PML nuclear bodies to target adenoviral genomes. • Neither Sp100A nor B targets adenoviral genomes despite potentially opposite roles. • The nuclear DNA sensor IFI16 does not target incoming adenoviral genomes. • PHF13/SPOC1 targets neither incoming adenoviral genomes nor genome-bound protein VII.

  6. Imaging analysis of nuclear antiviral factors through direct detection of incoming adenovirus genome complexes

    International Nuclear Information System (INIS)

    Komatsu, Tetsuro; Will, Hans; Nagata, Kyosuke; Wodrich, Harald

    2016-01-01

    Recent studies involving several viral systems have highlighted the importance of cellular intrinsic defense mechanisms through nuclear antiviral proteins that restrict viral propagation. These factors include among others components of PML nuclear bodies, the nuclear DNA sensor IFI16, and a potential restriction factor PHF13/SPOC1. For several nuclear replicating DNA viruses, it was shown that these factors sense and target viral genomes immediately upon nuclear import. In contrast to the anticipated view, we recently found that incoming adenoviral genomes are not targeted by PML nuclear bodies. Here we further explored cellular responses against adenoviral infection by focusing on specific conditions as well as additional nuclear antiviral factors. In line with our previous findings, we show that neither interferon treatment nor the use of specific isoforms of PML nuclear body components results in co-localization between incoming adenoviral genomes and the subnuclear domains. Furthermore, our imaging analyses indicated that neither IFI16 nor PHF13/SPOC1 are likely to target incoming adenoviral genomes. Thus our findings suggest that incoming adenoviral genomes may be able to escape from a large repertoire of nuclear antiviral mechanisms, providing a rationale for the efficient initiation of lytic replication cycle. - Highlights: • Host nuclear antiviral factors were analyzed upon adenovirus genome delivery. • Interferon treatments fail to permit PML nuclear bodies to target adenoviral genomes. • Neither Sp100A nor B targets adenoviral genomes despite potentially opposite roles. • The nuclear DNA sensor IFI16 does not target incoming adenoviral genomes. • PHF13/SPOC1 targets neither incoming adenoviral genomes nor genome-bound protein VII.

  7. Using a higher criticism statistic to detect modest effects in a genome-wide study of rheumatoid arthritis

    Science.gov (United States)

    2009-01-01

    In high-dimensional studies such as genome-wide association studies, the correction for multiple testing in order to control total type I error results in decreased power to detect modest effects. We present a new analytical approach based on the higher criticism statistic that allows identification of the presence of modest effects. We apply our method to the genome-wide study of rheumatoid arthritis provided in the Genetic Analysis Workshop 16 Problem 1 data set. There is evidence for unknown bias in this study that could be explained by the presence of undetected modest effects. We compared the asymptotic and empirical thresholds for the higher criticism statistic. Using the asymptotic threshold we detected the presence of modest effects genome-wide. We also detected modest effects using 90th percentile of the empirical null distribution as a threshold; however, there is no such evidence when the 95th and 99th percentiles were used. While the higher criticism method suggests that there is some evidence for modest effects, interpreting individual single-nucleotide polymorphisms with significant higher criticism statistics is of undermined value. The goal of higher criticism is to alert the researcher that genetic effects remain to be discovered and to promote the use of more targeted and powerful studies to detect the remaining effects. PMID:20018032

  8. Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX

    DEFF Research Database (Denmark)

    Schubert, Mikkel; Ermini, Luca; Der Sarkissian, Clio

    2014-01-01

    a variety of computational tools. Here we present PALEOMIX (http://geogenetics.ku.dk/publications/paleomix), a flexible and user-friendly pipeline applicable to both modern and ancient genomes, which largely automates the in silico analyses behind whole-genome resequencing. Starting with next...

  9. Evolutionary gradient of predicted nuclear localization signals (NLS)-bearing proteins in genomes of family Planctomycetaceae.

    Science.gov (United States)

    Guo, Min; Yang, Ruifu; Huang, Chen; Liao, Qiwen; Fan, Guangyi; Sun, Chenghang; Lee, Simon Ming-Yuen

    2017-04-04

    The nuclear envelope is considered a key classification marker that distinguishes prokaryotes from eukaryotes. However, this marker does not apply to the family Planctomycetaceae, which has intracellular spaces divided by lipidic intracytoplasmic membranes (ICMs). Nuclear localization signal (NLS), a short stretch of amino acid sequence, destines to transport proteins from cytoplasm into nucleus, and is also associated with the development of nuclear envelope. We attempted to investigate the NLS motifs in Planctomycetaceae genomes to demonstrate the potential molecular transition in the development of intracellular membrane system. In this study, we identified NLS-like motifs that have the same amino acid compositions as experimentally identified NLSs in genomes of 11 representative species of family Planctomycetaceae. A total of 15 NLS types and 170 NLS-bearing proteins were detected in the 11 strains. To determine the molecular transformation, we compared NLS-bearing protein abundances in the 11 representative Planctomycetaceae genomes with them in genomes of 16 taxonomically varied microorganisms: nine bacteria, two archaea and five fungi. In the 27 strains, 29 NLS types and 1101 NLS-bearing proteins were identified, principal component analysis showed a significant transitional gradient from bacteria to Planctomycetaceae to fungi on their NLS-bearing protein abundance profiles. Then, we clustered the 993 non-redundant NLS-bearing proteins into 181 families and annotated their involved metabolic pathways. Afterwards, we aligned the ten types of NLS motifs from the 13 families containing NLS-bearing proteins among bacteria, Planctomycetaceae or fungi, considering their diversity, length and origin. A transition towards increased complexity from non-planctomycete bacteria to Planctomycetaceae to archaea and fungi was detected based on the complexity of the 10 types of NLS-like motifs in the 13 NLS-bearing proteins families. The results of this study reveal that

  10. Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

    Science.gov (United States)

    Fauteux, François; Strömvik, Martina V

    2009-01-01

    Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs

  11. Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

    Directory of Open Access Journals (Sweden)

    Fauteux François

    2009-10-01

    Full Text Available Abstract Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP gene promoters from three plant families, namely Brassicaceae (mustards, Fabaceae (legumes and Poaceae (grasses using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L. Heynh., soybean (Glycine max (L. Merr. and rice (Oryza sativa L. respectively. We have identified three conserved motifs (two RY-like and one ACGT-like in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination

  12. Motif signatures of transcribed enhancers

    KAUST Repository

    Kleftogiannis, Dimitrios A.; Ashoor, Haitham; Zarokanellos, Nikolaos; Bajic, Vladimir B.

    2017-01-01

    In mammalian cells, transcribed enhancers (TrEn) play important roles in the initiation of gene expression and maintenance of gene expression levels in spatiotemporal manner. One of the most challenging questions in biology today is how the genomic

  13. Improving the detection of pathways in genome-wide association studies by combined effects of SNPs from Linkage Disequilibrium blocks

    OpenAIRE

    Zhao, Huiying; Nyholt, Dale R.; Yang, Yuanhao; Wang, Jihua; Yang, Yuedong

    2017-01-01

    Genome-wide association studies (GWAS) have successfully identified single variants associated with diseases. To increase the power of GWAS, gene-based and pathway-based tests are commonly employed to detect more risk factors. However, the gene- and pathway-based association tests may be biased towards genes or pathways containing a large number of single-nucleotide polymorphisms (SNPs) with small P-values caused by high linkage disequilibrium (LD) correlations. To address such bias, numerous...

  14. Detecting exact breakpoints of deletions with diversity in hepatitis B viral genomic DNA from next-generation sequencing data.

    Science.gov (United States)

    Cheng, Ji-Hong; Liu, Wen-Chun; Chang, Ting-Tsung; Hsieh, Sun-Yuan; Tseng, Vincent S

    2017-10-01

    Many studies have suggested that deletions of Hepatitis B Viral (HBV) are associated with the development of progressive liver diseases, even ultimately resulting in hepatocellular carcinoma (HCC). Among the methods for detecting deletions from next-generation sequencing (NGS) data, few methods considered the characteristics of virus, such as high evolution rates and high divergence among the different HBV genomes. Sequencing high divergence HBV genome sequences using the NGS technology outputs millions of reads. Thus, detecting exact breakpoints of deletions from these big and complex data incurs very high computational cost. We proposed a novel analytical method named VirDelect (Virus Deletion Detect), which uses split read alignment base to detect exact breakpoint and diversity variable to consider high divergence in single-end reads data, such that the computational cost can be reduced without losing accuracy. We use four simulated reads datasets and two real pair-end reads datasets of HBV genome sequence to verify VirDelect accuracy by score functions. The experimental results show that VirDelect outperforms the state-of-the-art method Pindel in terms of accuracy score for all simulated datasets and VirDelect had only two base errors even in real datasets. VirDelect is also shown to deliver high accuracy in analyzing the single-end read data as well as pair-end data. VirDelect can serve as an effective and efficient bioinformatics tool for physiologists with high accuracy and efficient performance and applicable to further analysis with characteristics similar to HBV on genome length and high divergence. The software program of VirDelect can be downloaded at https://sourceforge.net/projects/virdelect/. Copyright © 2017. Published by Elsevier Inc.

  15. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli.

    OpenAIRE

    Joensen, Katrine Grimstrup; Scheutz, Flemming; Lund, Ole; Hasman, Henrik; Kaas, Rolf Sommer; Nielsen, Eva M.; Aarestrup, Frank Møller

    2014-01-01

    Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming cheaper, it has huge potential in both diagnostics and routine surveillance. The aim of this study was to perform a real-time evaluation of WGS for routine typing and surveillance of verocytotoxin-prod...

  16. Real-Time Whole-Genome Sequencing for Routine Typing, Surveillance, and Outbreak Detection of Verotoxigenic Escherichia coli

    OpenAIRE

    Joensen, Katrine Grimstrup; Scheutz, Flemming; Lund, Ole; Hasman, Henrik; Kaas, Rolf S.; Nielsen, Eva M.; Aarestrup, Frank M.

    2014-01-01

    Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming cheaper, it has huge potential in both diagnostics and routine surveillance. The aim of this study was to perform a real-time evaluation of WGS for routine typing and surveillance of verocytotoxin-prod...

  17. Fitness for synchronization of network motifs

    DEFF Research Database (Denmark)

    Vega, Y.M.; Vázquez-Prada, M.; Pacheco, A.F.

    2004-01-01

    We study the synchronization of Kuramoto's oscillators in small parts of networks known as motifs. We first report on the system dynamics for the case of a scale-free network and show the existence of a non-trivial critical point. We compute the probability that network motifs synchronize, and fi...... that the fitness for synchronization correlates well with motifs interconnectedness and structural complexity. Possible implications for present debates about network evolution in biological and other systems are discussed....

  18. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  19. Genome mining and motif truncation of glycoside hydrolase family 5 endo-β-1,4-mannanase encoded by Aspergillus oryzae RIB40 for potential konjac flour hydrolysis or feed additive.

    Science.gov (United States)

    Tang, Cun-Duo; Shi, Hong-Ling; Tang, Qing-Hai; Zhou, Jun-Shi; Yao, Lun-Guang; Jiao, Zhu-Jin; Kan, Yun-Chao

    2016-11-01

    Two novel glycosyl hydrolase family 5 (GH5) β-mannanases (AoMan5A and AoMan5B) were identified from Aspergillus oryzae RIB40 by genome mining. The AoMan5A contains a predicted family 1 carbohydrate binding module (CBM-1), located at its N-terminal. The AoMan5A, AoMan5B and truncated mutant AoMan5AΔCL (truncating the N-terminal CBM and linker of AoMan5A) were expressed retaining the N-terminus of the native protein in Pichia pastoris GS115 by pPIC9K M . The specific enzyme activity of the purified reAoMan5A, reAoMan5B and reAoMan5AΔCL towards locust bean gum at pH 3.6 and 40°C for 10min, was 8.3, 104.2 and 15.8U/mg, respectively. The temperature properties of the reAoMan5AΔCL were improved by truncating CBM. They can degrade the pretreated konjac flour and produce prebiotics. In addition, they had excellent stability under simulative gastric fluid and simulative prilling process. All these properties make these recombinant β-mannanases potential additives for use in the food and feed industries. Copyright © 2016. Published by Elsevier Inc.

  20. Global MYCN transcription factor binding analysis in neuroblastoma reveals association with distinct E-box motifs and regions of DNA hypermethylation.

    LENUS (Irish Health Repository)

    Murphy, Derek M

    2009-01-01

    BACKGROUND: Neuroblastoma, a cancer derived from precursor cells of the sympathetic nervous system, is a major cause of childhood cancer related deaths. The single most important prognostic indicator of poor clinical outcome in this disease is genomic amplification of MYCN, a member of a family of oncogenic transcription factors. METHODOLOGY: We applied MYCN chromatin immunoprecipitation to microarrays (ChIP-chip) using MYCN amplified\\/non-amplified cell lines as well as a conditional knockdown cell line to determine the distribution of MYCN binding sites within all annotated promoter regions. CONCLUSION: Assessment of E-box usage within consistently positive MYCN binding sites revealed a predominance for the CATGTG motif (p<0.0016), with significant enrichment of additional motifs CATTTG, CATCTG, CAACTG in the MYCN amplified state. For cell lines over-expressing MYCN, gene ontology analysis revealed enrichment for the binding of MYCN at promoter regions of numerous molecular functional groups including DNA helicases and mRNA transcriptional regulation. In order to evaluate MYCN binding with respect to other genomic features, we determined the methylation status of all annotated CpG islands and promoter sequences using methylated DNA immunoprecipitation (MeDIP). The integration of MYCN ChIP-chip and MeDIP data revealed a highly significant positive correlation between MYCN binding and DNA hypermethylation. This association was also detected in regions of hemizygous loss, indicating that the observed association occurs on the same homologue. In summary, these findings suggest that MYCN binding occurs more commonly at CATGTG as opposed to the classic CACGTG E-box motif, and that disease associated over expression of MYCN leads to aberrant binding to additional weaker affinity E-box motifs in neuroblastoma. The co-localization of MYCN binding and DNA hypermethylation further supports the dual role of MYCN, namely that of a classical transcription factor affecting the

  1. DistAMo: A web-based tool to characterize DNA-motif distribution on bacterial chromosomes

    Directory of Open Access Journals (Sweden)

    Patrick eSobetzko

    2016-03-01

    Full Text Available Short DNA motifs are involved in a multitude of functions such as for example chromosome segregation, DNA replication or mismatch repair. Distribution of such motifs is often not random and the specific chromosomal pattern relates to the respective motif function. Computational approaches which quantitatively assess such chromosomal motif patterns are necessary. Here we present a new computer tool DistAMo (Distribution Analysis of DNA Motifs. The algorithm uses codon redundancy to calculate the relative abundance of short DNA motifs from single genes to entire chromosomes. Comparative genomics analyses of the GATC-motif distribution in γ-proteobacterial genomes using DistAMo revealed that (i genes beside the replication origin are enriched in GATCs, (ii genome-wide GATC distribution follows a distinct pattern and (iii genes involved in DNA replication and repair are enriched in GATCs. These features are specific for bacterial chromosomes encoding a Dam methyltransferase. The new software is available as a stand-alone or as an easy-to-use web-based server version at http://www.computational.bio.uni-giessen.de/distamo.

  2. SECOM: A novel hash seed and community detection based-approach for genome-scale protein domain identification

    KAUST Repository

    Fan, Ming

    2012-06-28

    With rapid advances in the development of DNA sequencing technologies, a plethora of high-throughput genome and proteome data from a diverse spectrum of organisms have been generated. The functional annotation and evolutionary history of proteins are usually inferred from domains predicted from the genome sequences. Traditional database-based domain prediction methods cannot identify novel domains, however, and alignment-based methods, which look for recurring segments in the proteome, are computationally demanding. Here, we propose a novel genome-wide domain prediction method, SECOM. Instead of conducting all-against-all sequence alignment, SECOM first indexes all the proteins in the genome by using a hash seed function. Local similarity can thus be detected and encoded into a graph structure, in which each node represents a protein sequence and each edge weight represents the shared hash seeds between the two nodes. SECOM then formulates the domain prediction problem as an overlapping community-finding problem in this graph. A backward graph percolation algorithm that efficiently identifies the domains is proposed. We tested SECOM on five recently sequenced genomes of aquatic animals. Our tests demonstrated that SECOM was able to identify most of the known domains identified by InterProScan. When compared with the alignment-based method, SECOM showed higher sensitivity in detecting putative novel domains, while it was also three orders of magnitude faster. For example, SECOM was able to predict a novel sponge-specific domain in nucleoside-triphosphatase (NTPases). Furthermore, SECOM discovered two novel domains, likely of bacterial origin, that are taxonomically restricted to sea anemone and hydra. SECOM is an open-source program and available at http://sfb.kaust.edu.sa/Pages/Software.aspx. © 2012 Fan et al.

  3. SECOM: A novel hash seed and community detection based-approach for genome-scale protein domain identification

    KAUST Repository

    Fan, Ming; Wong, Ka-Chun; Ryu, Tae Woo; Ravasi, Timothy; Gao, Xin

    2012-01-01

    With rapid advances in the development of DNA sequencing technologies, a plethora of high-throughput genome and proteome data from a diverse spectrum of organisms have been generated. The functional annotation and evolutionary history of proteins are usually inferred from domains predicted from the genome sequences. Traditional database-based domain prediction methods cannot identify novel domains, however, and alignment-based methods, which look for recurring segments in the proteome, are computationally demanding. Here, we propose a novel genome-wide domain prediction method, SECOM. Instead of conducting all-against-all sequence alignment, SECOM first indexes all the proteins in the genome by using a hash seed function. Local similarity can thus be detected and encoded into a graph structure, in which each node represents a protein sequence and each edge weight represents the shared hash seeds between the two nodes. SECOM then formulates the domain prediction problem as an overlapping community-finding problem in this graph. A backward graph percolation algorithm that efficiently identifies the domains is proposed. We tested SECOM on five recently sequenced genomes of aquatic animals. Our tests demonstrated that SECOM was able to identify most of the known domains identified by InterProScan. When compared with the alignment-based method, SECOM showed higher sensitivity in detecting putative novel domains, while it was also three orders of magnitude faster. For example, SECOM was able to predict a novel sponge-specific domain in nucleoside-triphosphatase (NTPases). Furthermore, SECOM discovered two novel domains, likely of bacterial origin, that are taxonomically restricted to sea anemone and hydra. SECOM is an open-source program and available at http://sfb.kaust.edu.sa/Pages/Software.aspx. © 2012 Fan et al.

  4. The Verrucomicrobia LexA-binding Motif: Insights into the Evolutionary Dynamics of the SOS Response

    Directory of Open Access Journals (Sweden)

    Ivan Erill

    2016-07-01

    Full Text Available The SOS response is the primary bacterial mechanism to address DNA damage, coordinating multiple cellular processes that include DNA repair, cell division and translesion synthesis. In contrast to other regulatory systems, the composition of the SOS genetic network and the binding motif of its transcriptional repressor, LexA, have been shown to vary greatly across bacterial clades, making it an ideal system to study the co-evolution of transcription factors and their regulons. Leveraging comparative genomics approaches and prior knowledge on the core SOS regulon, here we define the binding motif of the Verrucomicrobia, a recently described phylum of emerging interest due to its association with eukaryotic hosts. Site directed mutagenesis of the Verrucomicrobium spinosum recA promoter confirms that LexA binds a 14 bp palindromic motif with consensus sequence TGTTC-N4-GAACA. Computational analyses suggest that recognition of this novel motif is determined primarily by changes in base-contacting residues of the third alpha helix of the LexA helix-turn-helix DNA binding motif. In conjunction with comparative genomics analysis of the LexA regulon in the Verrucomicrobia phylum, electrophoretic shift assays reveal that LexA binds to operators in the promoter region of DNA repair genes and a mutagenesis cassette in this organism, and identify previously unreported components of the SOS response. The identification of tandem LexA-binding sites generating instances of other LexA-binding motifs in the lexA gene promoter of Verrucomicrobia species leads us to postulate a novel mechanism for LexA-binding motif evolution. This model, based on gene duplication, successfully addresses outstanding questions in the intricate co-evolution of the LexA protein, its binding motif and the regulatory network it controls.

  5. The Verrucomicrobia LexA-Binding Motif: Insights into the Evolutionary Dynamics of the SOS Response.

    Science.gov (United States)

    Erill, Ivan; Campoy, Susana; Kılıç, Sefa; Barbé, Jordi

    2016-01-01

    The SOS response is the primary bacterial mechanism to address DNA damage, coordinating multiple cellular processes that include DNA repair, cell division, and translesion synthesis. In contrast to other regulatory systems, the composition of the SOS genetic network and the binding motif of its transcriptional repressor, LexA, have been shown to vary greatly across bacterial clades, making it an ideal system to study the co-evolution of transcription factors and their regulons. Leveraging comparative genomics approaches and prior knowledge on the core SOS regulon, here we define the binding motif of the Verrucomicrobia, a recently described phylum of emerging interest due to its association with eukaryotic hosts. Site directed mutagenesis of the Verrucomicrobium spinosum recA promoter confirms that LexA binds a 14 bp palindromic motif with consensus sequence TGTTC-N4-GAACA. Computational analyses suggest that recognition of this novel motif is determined primarily by changes in base-contacting residues of the third alpha helix of the LexA helix-turn-helix DNA binding motif. In conjunction with comparative genomics analysis of the LexA regulon in the Verrucomicrobia phylum, electrophoretic shift assays reveal that LexA binds to operators in the promoter region of DNA repair genes and a mutagenesis cassette in this organism, and identify previously unreported components of the SOS response. The identification of tandem LexA-binding sites generating instances of other LexA-binding motifs in the lexA gene promoter of Verrucomicrobia species leads us to postulate a novel mechanism for LexA-binding motif evolution. This model, based on gene duplication, successfully addresses outstanding questions in the intricate co-evolution of the LexA protein, its binding motif and the regulatory network it controls.

  6. Translational Control of Host Gene Expression by a Cys-Motif Protein Encoded in a Bracovirus.

    Directory of Open Access Journals (Sweden)

    Eunseong Kim

    Full Text Available Translational control is a strategy that various viruses use to manipulate their hosts to suppress acute antiviral response. Polydnaviruses, a group of insect double-stranded DNA viruses symbiotic to some endoparasitoid wasps, are divided into two genera: ichnovirus (IV and bracovirus (BV. In IV, some Cys-motif genes are known as host translation-inhibitory factors (HTIF. The genome of endoparasitoid wasp Cotesia plutellae contains a Cys-motif gene (Cp-TSP13 homologous to an HTIF known as teratocyte-secretory protein 14 (TSP14 of Microplitis croceipes. Cp-TSP13 consists of 129 amino acid residues with a predicted molecular weight of 13.987 kDa and pI value of 7.928. Genomic DNA region encoding its open reading frame has three introns. Cp-TSP13 possesses six conserved cysteine residues as other Cys-motif genes functioning as HTIF. Cp-TSP13 was expressed in Plutella xylostella larvae parasitized by C. plutellae. C. plutellae bracovirus (CpBV was purified and injected into non-parasitized P. xylostella that expressed Cp-TSP13. Cp-TSP13 was cloned into a eukaryotic expression vector and used to infect Sf9 cells to transiently express Cp-TSP13. The synthesized Cp-TSP13 protein was detected in culture broth. An overlaying experiment showed that the purified Cp-TSP13 entered hemocytes. It was localized in the cytosol. Recombinant Cp-TSP13 significantly inhibited protein synthesis of secretory proteins when it was added to in vitro cultured fat body. In addition, the recombinant Cp-TSP13 directly inhibited the translation of fat body mRNAs in in vitro translation assay using rabbit reticulocyte lysate. Moreover, the recombinant Cp-TSP13 significantly suppressed cellular immune responses by inhibiting hemocyte-spreading behavior. It also exhibited significant insecticidal activities by both injection and feeding routes. These results indicate that Cp-TSP13 is a viral HTIF.

  7. MSDmotif: exploring protein sites and motifs

    Directory of Open Access Journals (Sweden)

    Henrick Kim

    2008-07-01

    Full Text Available Abstract Background Protein structures have conserved features – motifs, which have a sufficient influence on the protein function. These motifs can be found in sequence as well as in 3D space. Understanding of these fragments is essential for 3D structure prediction, modelling and drug-design. The Protein Data Bank (PDB is the source of this information however present search tools have limited 3D options to integrate protein sequence with its 3D structure. Results We describe here a web application for querying the PDB for ligands, binding sites, small 3D structural and sequence motifs and the underlying database. Novel algorithms for chemical fragments, 3D motifs, ϕ/ψ sequences, super-secondary structure motifs and for small 3D structural motif associations searches are incorporated. The interface provides functionality for visualization, search criteria creation, sequence and 3D multiple alignment options. MSDmotif is an integrated system where a results page is also a search form. A set of motif statistics is available for analysis. This set includes molecule and motif binding statistics, distribution of motif sequences, occurrence of an amino-acid within a motif, correlation of amino-acids side-chain charges within a motif and Ramachandran plots for each residue. The binding statistics are presented in association with properties that include a ligand fragment library. Access is also provided through the distributed Annotation System (DAS protocol. An additional entry point facilitates XML requests with XML responses. Conclusion MSDmotif is unique by combining chemical, sequence and 3D data in a single search engine with a range of search and visualisation options. It provides multiple views of data found in the PDB archive for exploring protein structures.

  8. Detection of short repeated genomic sequences on metaphase chromosomes using padlock probes and target primed rolling circle DNA synthesis

    Directory of Open Access Journals (Sweden)

    Stougaard Magnus

    2007-11-01

    Full Text Available Abstract Background In situ detection of short sequence elements in genomic DNA requires short probes with high molecular resolution and powerful specific signal amplification. Padlock probes can differentiate single base variations. Ligated padlock probes can be amplified in situ by rolling circle DNA synthesis and detected by fluorescence microscopy, thus enhancing PRINS type reactions, where localized DNA synthesis reports on the position of hybridization targets, to potentially reveal the binding of single oligonucleotide-size probe molecules. Such a system has been presented for the detection of mitochondrial DNA in fixed cells, whereas attempts to apply rolling circle detection to metaphase chromosomes have previously failed, according to the literature. Methods Synchronized cultured cells were fixed with methanol/acetic acid to prepare chromosome spreads in teflon-coated diagnostic well-slides. Apart from the slide format and the chromosome spreading everything was done essentially according to standard protocols. Hybridization targets were detected in situ with padlock probes, which were ligated and amplified using target primed rolling circle DNA synthesis, and detected by fluorescence labeling. Results An optimized protocol for the spreading of condensed metaphase chromosomes in teflon-coated diagnostic well-slides was developed. Applying this protocol we generated specimens for target primed rolling circle DNA synthesis of padlock probes recognizing a 40 nucleotide sequence in the male specific repetitive satellite I sequence (DYZ1 on the Y-chromosome and a 32 nucleotide sequence in the repetitive kringle IV domain in the apolipoprotein(a gene positioned on the long arm of chromosome 6. These targets were detected with good efficiency, but the efficiency on other target sites was unsatisfactory. Conclusion Our aim was to test the applicability of the method used on mitochondrial DNA to the analysis of nuclear genomes, in particular as

  9. ANALYSIS OF STABILITY OF TRINUCLEOTIDE TTC MOTIFS IN COMMON FLAX PLANTED IN THE CHERNOBYL AREA

    Directory of Open Access Journals (Sweden)

    Veronika Lancíková

    2015-02-01

    Full Text Available Flax (Linum usitatissimum L. is one of the oldest domesticated plants — it was cultivated as early as in ancient Egypt and Samaria 10,000 years ago to serve as a source of fiber and oil, whence it later spread around the world. Compared with other plants, the flax genome consists of a high number of repetitive sequences, middle repetitive sequences and small repetitive sequences of nucleotides. The aim of the study was to analyze the stability of the existing trinucleotides motifs of microsatellite DNA of the flax genome (genotype Kyivskyi, growing in the Chernobyl conditions. The Chernobyl area is the most extensive “natural” laboratory suitable for the study of radiation effects. Over the last 20 years, the researches collected important knowledge about the effects of low and high radiation doses on the DNA isolated from the plant material growing on the remediated fields near Chernobyl and the plant material from fields contaminated by radioactive cesium 137Cs and strontium 90Sr. Using eight pairs of microsatellite primers, we successfully amplified the samples from the remediated fields. For each primer in the control samples and remediated samples, we detected 1 to 3 fragments per locus, each in size up to 120 to 250 base pairs. The applied microsatellite primers confirmed the monomorphic condition of microsatellite loci.

  10. Detection of G-Quadruplex Structures Formed by G-Rich Sequences from Rice Genome and Transcriptome Using Combined Probes.

    Science.gov (United States)

    Chang, Tianjun; Li, Weiguo; Ding, Zhan; Cheng, Shaofei; Liang, Kun; Liu, Xiangjun; Bing, Tao; Shangguan, Dihua

    2017-08-01

    Putative G-quadruplex (G4) forming sequences (PQS) are highly prevalent in the genome and transcriptome of various organisms and are considered as potential regulation elements in many biological processes by forming G4 structures. The formation of G4 structures highly depends on the sequences and the environment. In most cases, it is difficult to predict G4 formation by PQS, especially PQS containing G2 tracts. Therefore, the experimental identification of G4 formation is essential in the study of G4-related biological functions. Herein, we report a rapid and simple method for the detection of G4 structures by using a pair of complementary reporters, hemin and BMSP. This method was applied to detect G4 structures formed by PQS (DNA and RNA) searched in the genome and transcriptome of Oryza sativa. Unlike most of the reported G4 probes that only recognize part of G4 structures, the proposed method based on combined probes positively responded to almost all G4 conformations, including parallel, antiparallel, and mixed/hybrid G4, but did not respond to non-G4 sequences. This method shows potential for high-throughput identification of G4 structures in genome and transcriptome. Furthermore, BMSP was observed to drive some PQS to form more stable G4 structures or induce the G4 formation of some PQS that cannot form G4 in normal physiological conditions, which may provide a powerful molecular tool for gene regulation.

  11. Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array

    Directory of Open Access Journals (Sweden)

    Sugnet Charles

    2006-12-01

    Full Text Available Abstract Background Alternative splicing is a mechanism for increasing protein diversity by excluding or including exons during post-transcriptional processing. Alternatively spliced proteins are particularly relevant in oncology since they may contribute to the etiology of cancer, provide selective drug targets, or serve as a marker set for cancer diagnosis. While conventional identification of splice variants generally targets individual genes, we present here a new exon-centric array (GeneChip Human Exon 1.0 ST that allows genome-wide identification of differential splice variation, and concurrently provides a flexible and inclusive analysis of gene expression. Results We analyzed 20 paired tumor-normal colon cancer samples using a microarray designed to detect over one million putative exons that can be virtually assembled into potential gene-level transcripts according to various levels of prior supporting evidence. Analysis of high confidence (empirically supported transcripts identified 160 differentially expressed genes, with 42 genes occupying a network impacting cell proliferation and another twenty nine genes with unknown functions. A more speculative analysis, including transcripts based solely on computational prediction, produced another 160 differentially expressed genes, three-fourths of which have no previous annotation. We also present a comparison of gene signal estimations from the Exon 1.0 ST and the U133 Plus 2.0 arrays. Novel splicing events were predicted by experimental algorithms that compare the relative contribution of each exon to the cognate transcript intensity in each tissue. The resulting candidate splice variants were validated with RT-PCR. We found nine genes that were differentially spliced between colon tumors and normal colon tissues, several of which have not been previously implicated in cancer. Top scoring candidates from our analysis were also found to substantially overlap with EST-based bioinformatic

  12. Temporal motifs in time-dependent networks

    International Nuclear Information System (INIS)

    Kovanen, Lauri; Karsai, Márton; Kaski, Kimmo; Kertész, János; Saramäki, Jari

    2011-01-01

    Temporal networks are commonly used to represent systems where connections between elements are active only for restricted periods of time, such as telecommunication, neural signal processing, biochemical reaction and human social interaction networks. We introduce the framework of temporal motifs to study the mesoscale topological–temporal structure of temporal networks in which the events of nodes do not overlap in time. Temporal motifs are classes of similar event sequences, where the similarity refers not only to topology but also to the temporal order of the events. We provide a mapping from event sequences to coloured directed graphs that enables an efficient algorithm for identifying temporal motifs. We discuss some aspects of temporal motifs, including causality and null models, and present basic statistics of temporal motifs in a large mobile call network

  13. Detecting signatures of a sponge-associated lifestyle in bacterial genomes.

    Science.gov (United States)

    Díez-Vives, Cristina; Esteves, Ana I S; Costa, Rodrigo; Nielsen, Shaun; Thomas, Torsten

    2018-04-30

    Sponges interact with diverse and rich communities of bacteria that are phylogenetically often distinct from their free-living counterparts. Recent genomics and metagenomic studies have indicated that bacterial sponge symbionts also have distinct functional features from free-living bacteria, however it is unclear, if such genome-derived functional signatures are common and present in different symbiont taxa. We therefore compared here a large set of genomes from cultured (Pseudovibrio, Ruegeria, Aquimarina) and yet-uncultivated (Synechococcus) bacteria found either in sponge-associated or free-living sources. Our analysis revealed only very few genera-specific functions that could be correlated with a sponge-associated lifestyle. Using different sets of sponge-associated and free-living bacteria for each genus, we could however show that the functions identified as "sponge-associated" are dependent on the reference comparison being made. Using simulation approaches we show how this influences the robustness of identifying functional signatures and how evolutionary divergence and genomic adaptation can be distinguished. Our results highlight the future need for robust comparative analyses to define genomic signatures of symbiotic lifestyles, whether it is for symbionts of sponges or other host organisms. This article is protected by copyright. All rights reserved. © 2018 Society for Applied Microbiology and John Wiley & Sons Ltd.

  14. Multiple TPR motifs characterize the Fanconi anemia FANCG protein.

    Science.gov (United States)

    Blom, Eric; van de Vrugt, Henri J; de Vries, Yne; de Winter, Johan P; Arwert, Fré; Joenje, Hans

    2004-01-05

    The genome protection pathway that is defective in patients with Fanconi anemia (FA) is controlled by at least eight genes, including BRCA2. A key step in the pathway involves the monoubiquitylation of FANCD2, which critically depends on a multi-subunit nuclear 'core complex' of at least six FANC proteins (FANCA, -C, -E, -F, -G, and -L). Except for FANCL, which has WD40 repeats and a RING finger domain, no significant domain structure has so far been recognized in any of the core complex proteins. By using a homology search strategy comparing the human FANCG protein sequence with its ortholog sequences in Oryzias latipes (Japanese rice fish) and Danio rerio (zebrafish) we identified at least seven tetratricopeptide repeat motifs (TPRs) covering a major part of this protein. TPRs are degenerate 34-amino acid repeat motifs which function as scaffolds mediating protein-protein interactions, often found in multiprotein complexes. In four out of five TPR motifs tested (TPR1, -2, -5, and -6), targeted missense mutagenesis disrupting the motifs at the critical position 8 of each TPR caused complete or partial loss of FANCG function. Loss of function was evident from failure of the mutant proteins to complement the cellular FA phenotype in FA-G lymphoblasts, which was correlated with loss of binding to FANCA. Although the TPR4 mutant fully complemented the cells, it showed a reduced interaction with FANCA, suggesting that this TPR may also be of functional importance. The recognition of FANCG as a typical TPR protein predicts this protein to play a key role in the assembly and/or stabilization of the nuclear FA protein core complex.

  15. Chromosomal imbalances detected in primary bone tumors by comparative genomic hybridization and interphase fluorescence in situ hybridization

    Directory of Open Access Journals (Sweden)

    Marcelo Razera Baruffi

    2003-01-01

    Full Text Available We applied a combination of comparative genomic hybridization (CGH and fluorescence in situ hybridization (FISH, to characterize the genetic aberrations in three osteosarcomas (OS and one Ewing's sarcoma. CGH identified recurrent chromosomal losses at 10p14-pter and gains at 8q22.3-24.1 in OS. Interphase FISH allowed to confirm 8q gain in two cases. A high amplification level of 11q12-qter was detected in one OS. The Ewing's sarcoma showed gain at 1p32-36.1 as the sole chromosome alteration. These studies demonstrate the value of molecular cytogenetic methods in the characterization of recurrent genomic alterations in bone tumor tissue.

  16. Evaluation of whole genome sequencing for outbreak detection of Salmonella enterica

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Nielsen, Eva M.; Kaas, Rolf Sommer

    2014-01-01

    Salmonella enterica is a common cause of minor and large food borne outbreaks. To achieve successful and nearly ‘real-time’ monitoring and identification of outbreaks, reliable sub-typing is essential. Whole genome sequencing (WGS) shows great promises for using as a routine epidemiological typing....... Enteritidis and 5 S. Derby were also sequenced and used for comparison. A number of different bioinformatics approaches were applied on the data; including pan-genome tree, k-mer tree, nucleotide difference tree and SNP tree. The outcome of each approach was evaluated in relation to the association...... of the isolates to specific outbreaks. The pan-genome tree clustered 65% of the S. Typhimurium isolates according to the pre-defined epidemiology, the k-mer tree 88%, the nucleotide difference tree 100% and the SNP tree 100% of the strains within S. Typhimurium. The resulting outcome of the four phylogenetic...

  17. Detection of genomic instability in normal human bronchial epithelial cells exposed to 238Pu

    International Nuclear Information System (INIS)

    Kennedy, C.H.; Fukushima, N.H.; Neft, R.E.; Lechner, J.F.

    1994-01-01

    Alpha particle-emitting radon daughters constitute a risk for development of lung cancer in humans. The development of this disease involves multiple genetic alterations. These changes and the time course they follow are not yet defined despite numerous in vitro endeavors to transform human lung cells with various physical or chemical agents. However, genomic instability, characterized both by structural and numerical chromosomal aberrations and by elevated rates of point mutations, is a common feature of tumor cells. Further, both types of genomic instability have been reported in the noncancerous progeny of normal murine hemopoietic cells exposed in vitro to α-particles. The purpose of this investigation was to determine if genomic instability is also a prominent feature of normal human bronchial epithelial cells exposed to α-particle irradiation from the decay of inhaled radon daughters

  18. MotifNet: a web-server for network motif analysis.

    Science.gov (United States)

    Smoly, Ilan Y; Lerman, Eugene; Ziv-Ukelson, Michal; Yeger-Lotem, Esti

    2017-06-15

    Network motifs are small topological patterns that recur in a network significantly more often than expected by chance. Their identification emerged as a powerful approach for uncovering the design principles underlying complex networks. However, available tools for network motif analysis typically require download and execution of computationally intensive software on a local computer. We present MotifNet, the first open-access web-server for network motif analysis. MotifNet allows researchers to analyze integrated networks, where nodes and edges may be labeled, and to search for motifs of up to eight nodes. The output motifs are presented graphically and the user can interactively filter them by their significance, number of instances, node and edge labels, and node identities, and view their instances. MotifNet also allows the user to distinguish between motifs that are centered on specific nodes and motifs that recur in distinct parts of the network. MotifNet is freely available at http://netbio.bgu.ac.il/motifnet . The website was implemented using ReactJs and supports all major browsers. The server interface was implemented in Python with data stored on a MySQL database. estiyl@bgu.ac.il or michaluz@cs.bgu.ac.il. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  19. Detection of the Epstein-Barr virus genome in the radiation-associated malignant lymphomas

    International Nuclear Information System (INIS)

    Butenko, Z.A.; Kindzel's'kij, L.P.; Butenko, A.K.

    1993-01-01

    The genomic and ultrastructural peculiarities of malignant lymphoma cells in the patients living in the radiation unfavourable regions have been examined. The specific viral genomic sequences of EBV are revealed in the lymphoma cells in 3 of 4 patients. The described EBV-associated lymphomas are correlated with a significant decrease of natural anti tumour immunity in all the patients. The obtained data can serve as an evidence of the important role of the Epstein-Barr herpes virus in the mechanism of lymphoid cells malignant transformation

  20. Clinical Utility of Array Comparative Genomic Hybridization for Detection of Chromosomal Abnormalities in Pediatric Acute Lymphoblastic Leukemia

    Science.gov (United States)

    Rabin, Karen R.; Man, Tsz-Kwong; Yu, Alexander; Folsom, Matthew R.; Zhao, Yi-Jue; Rao, Pulivarthi H.; Plon, Sharon E.; Naeem, Rizwan C.

    2014-01-01

    Background Accurate detection of recurrent chromosomal abnormalities is critical to assign patients to risk-based therapeutic regimens for pediatric acute lymphoblastic leukemia (ALL). Procedure We investigated the utility of array comparative genomic hybridization (aCGH) for detection of chromosomal abnormalities compared to standard clinical evaluation with karyotype and fluorescent in-situ hybridization (FISH). Fifty pediatric ALL diagnostic bone marrows were analyzed by bacterial artificial chromosome (BAC) array, and findings compared to standard clinical evaluation. Results Sensitivity of aCGH was 79% to detect karyotypic findings other than balanced translocations, which cannot be detected by aCGH because they involve no copy number change. aCGH also missed abnormalities occurring in subclones constituting less than 25% of cells. aCGH detected 44 additional abnormalities undetected or misidentified by karyotype, 21 subsequently validated by FISH, including abnormalities in 4 of 10 cases with uninformative cytogenetics. aCGH detected concurrent terminal deletions of both 9p and 20q in three cases, in two of which the 20q deletion was undetected by karyotype. A narrow region of loss at 7p21 was detected in two cases. Conclusions An array with increased BAC density over regions important in ALL, combined with PCR for fusion products of balanced translocations, could minimize labor- and time-intensive cytogenetic assays and provide key prognostic information in the approximately 35% of cases with uninformative cytogenetics. PMID:18253961

  1. Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest.

    Science.gov (United States)

    Wang, Xin; Lin, Peijie; Ho, Joshua W K

    2018-01-19

    It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs - a motif grammar - located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific.

  2. Chemical and radiation mutagenesis: Induction and detection by whole genome sequencing

    Science.gov (United States)

    Brachypodium distachyon has emerged as an effective model system to address fundamental questions in grass biology. With its small sequenced genome, short generation time and rapidly expanding array of genetic tools B. distachyon is an ideal system to elucidate the molecular basis of important trai...

  3. Genome-Wide SNP Detection, Validation, and Development of an 8K SNP Array for Apple

    NARCIS (Netherlands)

    Chagné, D.; Crowhurst, R.N.; Troggio, M.; Davey, M.W.; Gilmore, B.; Lawley, C.; Vanderzande, S.; Hellens, R.P.; Kumar, S.; Cestaro, A.; Velasco, R.; Main, D.; Rees, J.D.; Iezzoni, A.F.; Mockler, T.; Wilhelm, L.; Weg, van de W.E.; Gardiner, S.E.; Bassil, N.; Peace, C.

    2012-01-01

    As high-throughput genetic marker screening systems are essential for a range of genetics studies and plant breeding applications, the International RosBREED SNP Consortium (IRSC) has utilized the Illumina Infinium® II system to develop a medium- to high-throughput SNP screening tool for genome-wide

  4. Detection of gene-environment interaction in pedigree data using genome-wide genotypes

    NARCIS (Netherlands)

    Nivard, Michel G.; Middeldorp, Christel M.; Lubke, Gitta; Hottenga, Jouke-Jan; Abdellaoui, Abdel; Boomsma, Dorret I.; Dolan, Conor V.

    2016-01-01

    Heritability may be estimated using phenotypic data collected in relatives or in distantly related individuals using genome-wide single nucleotide polymorphism (SNP) data. We combined these approaches by re-parameterizing the model proposed by Zaitlen et al and extended this model to include

  5. Comparative genomic hybridization detects novel amplifications in fibroadenomas of the breast

    DEFF Research Database (Denmark)

    Ojopi, E P; Rogatto, S R; Caldeira, J R

    2001-01-01

    Comparative genomic hybridization analysis was performed for identification of chromosomal imbalances in 23 samples of fibroadenomas of the breast. Chromosomal gains rather than losses were a feature of these lesions. Only two cases with a familial and/or previous history of breast lesions had gain...

  6. Depauperate genetic variability detected in the American and European bison using genomic techniques

    DEFF Research Database (Denmark)

    Pertoldi, Cino; Tokarska, Magorzata; Wójcik, Jan M

    2009-01-01

    , likely reflecting drift overwhelming selection. We suggest that utilization of genome-wide screening technologies, followed by utilization of less expensive techniques (e.g. VeraCode and Fluidigm EP1), holds large potential for genetic monitoring of populations. Additionally, these techniques will allow...

  7. Rapid whole genome sequencing for the detection and characterization of microorganisms directly from clinical samples

    DEFF Research Database (Denmark)

    Hasman, Henrik; Saputra, Dhany; Sicheritz-Pontén, Thomas

    2014-01-01

    Whole genome sequencing (WGS) is becoming available as a routine tool for clinical microbiology. If applied directly on clinical samples this could further reduce diagnostic time and thereby improve control and treatment. A major bottle-neck is the availability of fast and reliable bioinformatics...

  8. ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS

    Directory of Open Access Journals (Sweden)

    Alves-Ferreira Marcelo

    2008-09-01

    Full Text Available Abstract Background Genome survey sequences (GSS offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. Results We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. Conclusion The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties

  9. Detection and validation of single feature polymorphisms in cowpea (Vigna unguiculata L. Walp using a soybean genome array

    Directory of Open Access Journals (Sweden)

    Wanamaker Steve

    2008-02-01

    Full Text Available Abstract Background Cowpea (Vigna unguiculata L. Walp is an important food and fodder legume of the semiarid tropics and subtropics worldwide, especially in sub-Saharan Africa. High density genetic linkage maps are needed for marker assisted breeding but are not available for cowpea. A single feature polymorphism (SFP is a microarray-based marker which can be used for high throughput genotyping and high density mapping. Results Here we report detection and validation of SFPs in cowpea using a readily available soybean (Glycine max genome array. Robustified projection pursuit (RPP was used for statistical analysis using RNA as a surrogate for DNA. Using a 15% outlying score cut-off, 1058 potential SFPs were enumerated between two parents of a recombinant inbred line (RIL population segregating for several important traits including drought tolerance, Fusarium and brown blotch resistance, grain size and photoperiod sensitivity. Sequencing of 25 putative polymorphism-containing amplicons yielded a SFP probe set validation rate of 68%. Conclusion We conclude that the Affymetrix soybean genome array is a satisfactory platform for identification of some 1000's of SFPs for cowpea. This study provides an example of extension of genomic resources from a well supported species to an orphan crop. Presumably, other legume systems are similarly tractable to SFP marker development using existing legume array resources.

  10. Familial Case of Pelizaeus-Merzbacher Disorder Detected by Oligoarray Comparative Genomic Hybridization: Genotype-to-Phenotype Diagnosis

    Directory of Open Access Journals (Sweden)

    Kimia Najafi

    2017-01-01

    Full Text Available Introduction. Pelizaeus-Merzbacher disease (PMD is an X-linked recessive hypomyelinating leukodystrophy characterized by nystagmus, spastic quadriplegia, ataxia, and developmental delay. It is caused by mutation in the PLP1 gene. Case Description. We report a 9-year-old boy referred for oligoarray comparative genomic hybridization (OA-CGH because of intellectual delay, seizures, microcephaly, nystagmus, and spastic paraplegia. Similar clinical findings were reported in his older brother and maternal uncle. Both parents had normal phenotypes. OA-CGH was performed and a 436 Kb duplication was detected and the diagnosis of PMD was made. The mother was carrier of this 436 Kb duplication. Conclusion. Clinical presentation has been accepted as being the mainstay of diagnosis for most conditions. However, recent developments in genetic diagnosis have shown that, in many congenital and sporadic disorders lacking specific phenotypic manifestations, a genotype-to-phenotype approach can be conclusive. In this case, a diagnosis was reached by universal genomic testing, namely, whole genomic array.

  11. Rapid and sensitive approach to simultaneous detection of genomes of hepatitis A, B, C, D and E viruses.

    Science.gov (United States)

    Kodani, Maja; Mixson-Hayden, Tonya; Drobeniuc, Jan; Kamili, Saleem

    2014-10-01

    Five viruses have been etiologically associated with viral hepatitis. Nucleic acid testing (NAT) remains the gold standard for diagnosis of viremic stages of infection. NAT methodologies have been developed for all hepatitis viruses; however, a NAT-based assay that can simultaneously detect all five viruses is not available. We designed TaqMan card-based assays for detection of HAV RNA, HBV DNA, HCV RNA, HDV RNA and HEV RNA. The performances of individual assays were evaluated on TaqMan Array Cards (TAC) for detecting five viral genomes simultaneously. Sensitivity and specificity were determined by testing 329 NAT-tested clinical specimens. All NAT-positive samples for HCV (n = 32), HDV (n = 28) and HEV (n = 14) were also found positive in TAC (sensitivity, 100%). Forty-three of 46 HAV-NAT positive samples were also positive in TAC (sensitivity, 94%), while 36 of 39 HBV-NAT positive samples were positive (sensitivity, 92%). No false-positives were detected for HBV (n = 32), HCV (n = 36), HDV (n = 30), and HEV (n = 31) NAT-negative samples (specificity 100%), while 38 of 41 HAV-NAT negative samples were negative by TAC (specificity 93%). TAC assay was concordant with corresponding individual NATs for hepatitis A-E viral genomes and can be used for their detection simultaneously. The TAC assay has potential for use in hepatitis surveillance, for screening of donor specimens and in outbreak situations. Wider availability of TAC-ready assays may allow for customized assays, for improving acute jaundice surveillance and for other purposes for which there is need to identify multiple pathogens rapidly. Published by Elsevier B.V.

  12. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

    Science.gov (United States)

    Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

    2013-01-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545

  13. Hunting Motifs in Situla Art

    Directory of Open Access Journals (Sweden)

    Andrej Preložnik

    2013-07-01

    Full Text Available Situla art developed as an echo of the toreutic style which had spread from the Near East through the Phoenicians, Greeks and Etruscans as far as the Veneti, Raeti, Histri, and their eastern neighbours in the region of Dolenjska (Lower Carniola. An Early Iron Age phenomenon (c. 600—300 BC, it rep- resents the major and most arresting form of the contemporary visual arts in an area stretching from the foot of the Apennines in the south to the Drava and Sava rivers in the east. Indeed, individual pieces have found their way across the Alpine passes and all the way north to the Danube. In the world and art of the situlae, a prominent role is accorded to ani- mals. They are displayed in numerous representations of human activities on artefacts crafted in the classic situla style – that is, between the late 6th  and early 5th centuries BC – as passive participants (e.g. in pageants or in harness or as an active element of the situla narrative. The most typical example of the latter is the hunting scene. Today we know at least four objects decorat- ed exclusively with hunting themes, and a number of situlae and other larger vessels where hunting scenes are embedded in composite narratives. All this suggests a popularity unparallelled by any other genre. Clearly recognisable are various hunting techniques and weapons, each associated with a particu- lar type of game (Fig. 1. The chase of a stag with javelin, horse and hound is depicted on the long- familiar and repeatedly published fibula of Zagorje (Fig. 2. It displays a hound mauling the stag’s back and a hunter on horseback pursuing a hind, her neck already pierced by the javelin. To judge by the (so far unnoticed shaft end un- der the stag’s muzzle, the hunter would have been brandishing a second jave- lin as well, like the warrior of the Vače fibula or the rider of the Nesactium situla, presumably himself a hunter. Many parallels to his motif are known from Greece, Etruria, and

  14. GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences.

    Science.gov (United States)

    Yu, Ning; Guo, Xuan; Zelikovsky, Alexander; Pan, Yi

    2017-05-24

    As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of CGI rely on: (a) %G+C content is ≥ 50%, (b) the ratio of the observed CpG content and the expected CpG content is ≥ 0.6, and (c) the general length of CGI is greater than 200 nucleotides. Most existing computational methods for the prediction of CpG island are programmed on these rules. However, many experimentally verified CpG islands deviate from these artificial criteria. Experiments indicate that in many cases %G+C is human genome. We analyze the energy distribution over genomic primary structure for each CpG site and adopt the parameters from statistics of Human genome. The evaluation results show that the new model can predict CpG islands efficiently by balancing both sensitivity and specificity over known human CGI data sets. Compared with other models, GaussianCpG can achieve better performance in CGI detection. Our Gaussian model aims to simplify the complex interaction between nucleotides. The model is computed not by the linear statistical method but by the Gaussian energy distribution and accumulation. The parameters of Gaussian function are not arbitrarily designated but deliberately chosen by optimizing the biological statistics. By using the pseudopotential analysis on CpG islands, the novel model is validated on both the real and artificial data sets.

  15. Genome-wide detection of copy number variations among diverse horse breeds by array CGH.

    Directory of Open Access Journals (Sweden)

    Wei Wang

    Full Text Available Recent studies have found that copy number variations (CNVs are widespread in human and animal genomes. CNVs are a significant source of genetic variation, and have been shown to be associated with phenotypic diversity. However, the effect of CNVs on genetic variation in horses is not well understood. In the present study, CNVs in 6 different breeds of mare horses, Mongolia horse, Abaga horse, Hequ horse and Kazakh horse (all plateau breeds and Debao pony and Thoroughbred, were determined using aCGH. In total, seven hundred CNVs were identified ranging in size from 6.1 Kb to 0.57 Mb across all autosomes, with an average size of 43.08 Kb and a median size of 15.11 Kb. By merging overlapping CNVs, we found a total of three hundred and fifty-three CNV regions (CNVRs. The length of the CNVRs ranged from 6.1 Kb to 1.45 Mb with average and median sizes of 38.49 Kb and 13.1 Kb. Collectively, 13.59 Mb of copy number variation was identified among the horses investigated and accounted for approximately 0.61% of the horse genome sequence. Five hundred and eighteen annotated genes were affected by CNVs, which corresponded to about 2.26% of all horse genes. Through the gene ontology (GO, genetic pathway analysis and comparison of CNV genes among different breeds, we found evidence that CNVs involving 7 genes may be related to the adaptation to severe environment of these plateau horses. This study is the first report of copy number variations in Chinese horses, which indicates that CNVs are ubiquitous in the horse genome and influence many biological processes of the horse. These results will be helpful not only in mapping the horse whole-genome CNVs, but also to further research for the adaption to the high altitude severe environment for plateau horses.

  16. Whole genome detection of signature of positive selection in African cattle reveals selection for thermotolerance.

    Science.gov (United States)

    Taye, Mengistie; Lee, Wonseok; Caetano-Anolles, Kelsey; Dessie, Tadelle; Hanotte, Olivier; Mwai, Okeyo Ally; Kemp, Stephen; Cho, Seoae; Oh, Sung Jong; Lee, Hak-Kyo; Kim, Heebal

    2017-12-01

    As African indigenous cattle evolved in a hot tropical climate, they have developed an inherent thermotolerance; survival mechanisms include a light-colored and shiny coat, increased sweating, and cellular and molecular mechanisms to cope with high environmental temperature. Here, we report the positive selection signature of genes in African cattle breeds which contribute for their heat tolerance mechanisms. We compared the genomes of five indigenous African cattle breeds with the genomes of four commercial cattle breeds using cross-population composite likelihood ratio (XP-CLR) and cross-population extended haplotype homozygosity (XP-EHH) statistical methods. We identified 296 (XP-EHH) and 327 (XP-CLR) positively selected genes. Gene ontology analysis resulted in 41 biological process terms and six Kyoto Encyclopedia of Genes and Genomes pathways. Several genes and pathways were found to be involved in oxidative stress response, osmotic stress response, heat shock response, hair and skin properties, sweat gland development and sweating, feed intake and metabolism, and reproduction functions. The genes and pathways identified directly or indirectly contribute to the superior heat tolerance mechanisms in African cattle populations. The result will improve our understanding of the biological mechanisms of heat tolerance in African cattle breeds and opens an avenue for further study. © 2017 Japanese Society of Animal Science.

  17. Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.

    Science.gov (United States)

    Vishnevsky, Oleg V; Bocharnikov, Andrey V; Kolchanov, Nikolay A

    2018-02-01

    The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.

  18. Deciphering functional glycosaminoglycan motifs in development.

    Science.gov (United States)

    Townley, Robert A; Bülow, Hannes E

    2018-03-23

    Glycosaminoglycans (GAGs) such as heparan sulfate, chondroitin/dermatan sulfate, and keratan sulfate are linear glycans, which when attached to protein backbones form proteoglycans. GAGs are essential components of the extracellular space in metazoans. Extensive modifications of the glycans such as sulfation, deacetylation and epimerization create structural GAG motifs. These motifs regulate protein-protein interactions and are thereby repsonsible for many of the essential functions of GAGs. This review focusses on recent genetic approaches to characterize GAG motifs and their function in defined signaling pathways during development. We discuss a coding approach for GAGs that would enable computational analyses of GAG sequences such as alignments and the computation of position weight matrices to describe GAG motifs. Copyright © 2018 Elsevier Ltd. All rights reserved.

  19. Application of whole genome shotgun sequencing for detection and characterization of genetically modified organisms and derived products.

    Science.gov (United States)

    Holst-Jensen, Arne; Spilsberg, Bjørn; Arulandhu, Alfred J; Kok, Esther; Shi, Jianxin; Zel, Jana

    2016-07-01

    The emergence of high-throughput, massive or next-generation sequencing technologies has created a completely new foundation for molecular analyses. Various selective enrichment processes are commonly applied to facilitate detection of predefined (known) targets. Such approaches, however, inevitably introduce a bias and are prone to miss unknown targets. Here we review the application of high-throughput sequencing technologies and the preparation of fit-for-purpose whole genome shotgun sequencing libraries for the detection and characterization of genetically modified and derived products. The potential impact of these new sequencing technologies for the characterization, breeding selection, risk assessment, and traceability of genetically modified organisms and genetically modified products is yet to be fully acknowledged. The published literature is reviewed, and the prospects for future developments and use of the new sequencing technologies for these purposes are discussed.

  20. A new method for detecting signal regions in ordered sequences of real numbers, and application to viral genomic data.

    Science.gov (United States)

    Gog, Julia R; Lever, Andrew M L; Skittrall, Jordan P

    2018-01-01

    We present a fast, robust and parsimonious approach to detecting signals in an ordered sequence of numbers. Our motivation is in seeking a suitable method to take a sequence of scores corresponding to properties of positions in virus genomes, and find outlying regions of low scores. Suitable statistical methods without using complex models or making many assumptions are surprisingly lacking. We resolve this by developing a method that detects regions of low score within sequences of real numbers. The method makes no assumptions a priori about the length of such a region; it gives the explicit location of the region and scores it statistically. It does not use detailed mechanistic models so the method is fast and will be useful in a wide range of applications. We present our approach in detail, and test it on simulated sequences. We show that it is robust to a wide range of signal morphologies, and that it is able to capture multiple signals in the same sequence. Finally we apply it to viral genomic data to identify regions of evolutionary conservation within influenza and rotavirus.

  1. Detection and precise mapping of germline rearrangements in BRCA1, BRCA2, MSH2, and MLH1 using zoom-in array comparative genomic hybridization (aCGH)

    DEFF Research Database (Denmark)

    Staaf, Johan; Törngren, Therese; Rambech, Eva

    2008-01-01

    Disease-predisposing germline mutations in cancer susceptibility genes may consist of large genomic rearrangements that are challenging to detect and characterize using standard PCR-based mutation screening methods. Here, we describe a custom-made zoom-in microarray comparative genomic hybridizat......Disease-predisposing germline mutations in cancer susceptibility genes may consist of large genomic rearrangements that are challenging to detect and characterize using standard PCR-based mutation screening methods. Here, we describe a custom-made zoom-in microarray comparative genomic...... deletions or duplications occurring in BRCA1 (n=11), BRCA2 (n=2), MSH2 (n=7), or MLH1 (n=9). Additionally, we demonstrate its applicability for uncovering complex somatic rearrangements, exemplified by zoom-in analysis of the PTEN and CDKN2A loci in breast cancer cells. The sizes of rearrangements ranged...... from several 100 kb, including large flanking regions, to rearrangements, allowing convenient design...

  2. Genome-first approach diagnosed Cabezas syndrome via novel CUL4B mutation detection.

    Science.gov (United States)

    Okamoto, Nobuhiko; Watanabe, Miki; Naruto, Takuya; Matsuda, Keiko; Kohmoto, Tomohiro; Saito, Masako; Masuda, Kiyoshi; Imoto, Issei

    2017-01-01

    Cabezas syndrome is a syndromic form of X-linked intellectual disability primarily characterized by a short stature, hypogonadism and abnormal gait, with other variable features resulting from mutations in the CUL4B gene. Here, we report a clinically undiagnosed 5-year-old male with severe intellectual disability. A genome-first approach using targeted exome sequencing identified a novel nonsense mutation [NM_003588.3:c.2698G>T, p.(Glu900*)] in the last coding exon of CUL4B , thus diagnosing this patient with Cabezas syndrome.

  3. Validation of a Pan-Orthopox Real-Time PCR Assay for the Detection and Quantification of Viral Genomes from Nonhuman Primate Blood

    Science.gov (United States)

    2017-06-19

    Validation of a pan-orthopox real-time PCR assay for the detection and quantification of viral genomes from nonhuman primate blood. Eric...medical countermeasures by the U.S. FDA, the assay was designed to quantitate poxvirus genomic DNA in a nonhuman primate (cynomolgus macaque) blood...monkeypox virus into nonhuman primate blood, we chose to use the HA standard after considering the potential biological safety and logistical issues with

  4. Detection of alien chromatin introgression from Thinopyrum into wheat using S genomic DNA as a probe--a landmark approach for Thinopyrum genome research.

    Science.gov (United States)

    Chen, Q

    2005-01-01

    The introduction of alien genetic variation from the genus Thinopyrum through chromosome engineering into wheat is a valuable and proven technique for wheat improvement. A number of economically important traits have been transferred into wheat as single genes, chromosome arms or entire chromosomes. Successful transfers can be greatly assisted by the precise identification of alien chromatin in the recipient progenies. Chromosome identification and characterization are useful for genetic manipulation and transfer in wheat breeding following chromosome engineering. Genomic in situ hybridization (GISH) using an S genomic DNA probe from the diploid species Pseudoroegneria has proven to be a powerful diagnostic cytogenetic tool for monitoring the transfer of many promising agronomic traits from Thinopyrum. This specific S genomic probe not only allows the direct determination of the chromosome composition in wheat-Thinopyrum hybrids, but also can separate the Th. intermedium chromosomes into the J, J(S) and S genomes. The J(S) genome, which consists of a modified J genome chromosome distinguished by S genomic sequences of Pseudoroegneria near the centromere and telomere, carries many disease and mite resistance genes. Utilization of this S genomic probe leads to a better understanding of genomic affinities between Thinopyrum and wheat, and provides a molecular cytogenetic marker for monitoring the transfer of alien Thinopyrum agronomic traits into wheat recipient lines. Copyright 2005 S. Karger AG, Basel.

  5. Whole Genome Scan to Detect Chromosomal Regions Affecting Multiple Traits in Dairy Cattle

    NARCIS (Netherlands)

    Schrooten, C.; Bink, M.C.A.M.; Bovenhuis, H.

    2004-01-01

    Chromosomal regions affecting multiple traits ( multiple trait quantitative trait regions or MQR) in dairy cattle were detected using a method based on results from single trait analyses to detect quantitative trait loci (QTL). The covariance between contrasts for different traits in single trait

  6. Discovery of a Regulatory Motif for Human Satellite DNA Transcription in Response to BATF2 Overexpression.

    Science.gov (United States)

    Bai, Xuejia; Huang, Wenqiu; Zhang, Chenguang; Niu, Jing; Ding, Wei

    2016-03-01

    One of the basic leucine zipper transcription factors, BATF2, has been found to suppress cancer growth and migration. However, little is known about the genes downstream of BATF2. HeLa cells were stably transfected with BATF2, then chromatin immunoprecipitation-sequencing was employed to identify the DNA motifs responsive to BATF2. Comprehensive bioinformatics analyses indicated that the most significant motif discovered as TTCCATT[CT]GATTCCATTC[AG]AT was primarily distributed among the chromosome centromere regions and mostly within human type II satellite DNA. Such motifs were able to prime the transcription of type II satellite DNA in a directional and asymmetrical manner. Consistently, satellite II transcription was up-regulated in BATF2-overexpressing cells. The present study provides insight into understanding the role of BATF2 in tumours and the importance of satellite DNA in the maintenance of genomic stability. Copyright© 2016 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved.

  7. Short Arginine Motifs Drive Protein Stickiness in the Escherichia coli Cytoplasm.

    Science.gov (United States)

    Kyne, Ciara; Crowley, Peter B

    2017-09-19

    Although essential to numerous biotech applications, knowledge of molecular recognition by arginine-rich motifs in live cells remains limited. 1 H, 15 N HSQC and 19 F NMR spectroscopies were used to investigate the effects of C-terminal -GR n (n = 1-5) motifs on GB1 interactions in Escherichia coli cells and cell extracts. While the "biologically inert" GB1 yields high-quality in-cell spectra, the -GR n fusions with n = 4 or 5 were undetectable. This result suggests that a tetra-arginine motif is sufficient to drive interactions between a test protein and macromolecules in the E. coli cytoplasm. The inclusion of a 12 residue flexible linker between GB1 and the -GR 5 motif did not improve detection of the "inert" domain. In contrast, all of the constructs were detectable in cell lysates and extracts, suggesting that the arginine-mediated complexes were weak. Together these data reveal the significance of weak interactions between short arginine-rich motifs and the E. coli cytoplasm and demonstrate the potential of such motifs to modify protein interactions in living cells. These interactions must be considered in the design of (in vivo) nanoscale assemblies that rely on arginine-rich sequences.

  8. bNEAT: a Bayesian network method for detecting epistatic interactions in genome-wide association studies

    Directory of Open Access Journals (Sweden)

    Chen Xue-wen

    2011-07-01

    Full Text Available Abstract Background Detecting epistatic interactions plays a significant role in improving pathogenesis, prevention, diagnosis and treatment of complex human diseases. A recent study in automatic detection of epistatic interactions shows that Markov Blanket-based methods are capable of finding genetic variants strongly associated with common diseases and reducing false positives when the number of instances is large. Unfortunately, a typical dataset from genome-wide association studies consists of very limited number of examples, where current methods including Markov Blanket-based method may perform poorly. Results To address small sample problems, we propose a Bayesian network-based approach (bNEAT to detect epistatic interactions. The proposed method also employs a Branch-and-Bound technique for learning. We apply the proposed method to simulated datasets based on four disease models and a real dataset. Experimental results show that our method outperforms Markov Blanket-based methods and other commonly-used methods, especially when the number of samples is small. Conclusions Our results show bNEAT can obtain a strong power regardless of the number of samples and is especially suitable for detecting epistatic interactions with slight or no marginal effects. The merits of the proposed approach lie in two aspects: a suitable score for Bayesian network structure learning that can reflect higher-order epistatic interactions and a heuristic Bayesian network structure learning method.

  9. Genomic and oncoproteomic advances in detection and treatment of colorectal cancer.

    LENUS (Irish Health Repository)

    McHugh, Seamus M

    2012-02-01

    AIMS: We will examine the latest advances in genomic and proteomic laboratory technology. Through an extensive literature review we aim to critically appraise those studies which have utilized these latest technologies and ascertain their potential to identify clinically useful biomarkers. METHODS: An extensive review of the literature was carried out in both online medical journals and through the Royal College of Surgeons in Ireland library. RESULTS: Laboratory technology has advanced in the fields of genomics and oncoproteomics. Gene expression profiling with DNA microarray technology has allowed us to begin genetic profiling of colorectal cancer tissue. The response to chemotherapy can differ amongst individual tumors. For the first time researchers have begun to isolate and identify the genes responsible. New laboratory techniques allow us to isolate proteins preferentially expressed in colorectal cancer tissue. This could potentially lead to identification of a clinically useful protein biomarker in colorectal cancer screening and treatment. CONCLUSION: If a set of discriminating genes could be used for characterization and prediction of chemotherapeutic response, an individualized tailored therapeutic regime could become the standard of care for those undergoing systemic treatment for colorectal cancer. New laboratory techniques of protein identification may eventually allow identification of a clinically useful biomarker that could be used for screening and treatment. At present however, both expression of different gene signatures and isolation of various protein peaks has been limited by study size. Independent multi-centre correlation of results with larger sample sizes is needed to allow translation into clinical practice.

  10. Fluorescent In Situ Hybridization to Detect Transgene Integration into Plant Genomes

    Science.gov (United States)

    Schwarzacher, Trude

    Fluorescent chromosome analysis technologies have advanced our understanding of genome organization during the last 30 years and have enabled the investigation of DNA organization and structure as well as the evolution of chromosomes. Fluorescent chromosome staining allows even small chromosomes to be visualized, characterized by their composition and morphology, and counted. Aneuploidies and polyploidies can be established for species, breeding lines, and individuals, including changes occurring during hybridization or tissue culture and transformation protocols. Fluorescent in situ hybridization correlates molecular information of a DNA sequence with its physical location on chromosomes and genomes. It thus allows determination of the physical position of sequences and often is the only means to determine the abundance and distribution of DNA sequences that are difficult to map with any other molecular method or would require segregation analysis, in particular multicopy or repetitive DNA. Equally, it is often the best way to establish the incorporation of transgenes, their numbers, and physical organization along chromosomes. This chapter presents protocols for probe and chromosome preparation, fluorescent in situ hybridization, chromosome staining, and the analysis of results.

  11. Genomic and oncoproteomic advances in detection and treatment of colorectal cancer.

    LENUS (Irish Health Repository)

    McHugh, Seamus M

    2009-01-01

    AIMS: We will examine the latest advances in genomic and proteomic laboratory technology. Through an extensive literature review we aim to critically appraise those studies which have utilized these latest technologies and ascertain their potential to identify clinically useful biomarkers. METHODS: An extensive review of the literature was carried out in both online medical journals and through the Royal College of Surgeons in Ireland library. RESULTS: Laboratory technology has advanced in the fields of genomics and oncoproteomics. Gene expression profiling with DNA microarray technology has allowed us to begin genetic profiling of colorectal cancer tissue. The response to chemotherapy can differ amongst individual tumors. For the first time researchers have begun to isolate and identify the genes responsible. New laboratory techniques allow us to isolate proteins preferentially expressed in colorectal cancer tissue. This could potentially lead to identification of a clinically useful protein biomarker in colorectal cancer screening and treatment. CONCLUSION: If a set of discriminating genes could be used for characterization and prediction of chemotherapeutic response, an individualized tailored therapeutic regime could become the standard of care for those undergoing systemic treatment for colorectal cancer. New laboratory techniques of protein identification may eventually allow identification of a clinically useful biomarker that could be used for screening and treatment. At present however, both expression of different gene signatures and isolation of various protein peaks has been limited by study size. Independent multi-centre correlation of results with larger sample sizes is needed to allow translation into clinical practice.

  12. Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets.

    Science.gov (United States)

    Chiu, Yi-Yuan; Lin, Chun-Yu; Lin, Chih-Ta; Hsu, Kai-Cheng; Chang, Li-Zen; Yang, Jinn-Moon

    2012-01-01

    To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery.

  13. Efficient sequential and parallel algorithms for planted motif search.

    Science.gov (United States)

    Nicolae, Marius; Rajasekaran, Sanguthevar

    2014-01-31

    Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances. This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/. We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.

  14. Erythropoietin abuse and erythropoietin gene doping: detection strategies in the genomic era.

    Science.gov (United States)

    Diamanti-Kandarakis, Evanthia; Konstantinopoulos, Panagiotis A; Papailiou, Joanna; Kandarakis, Stylianos A; Andreopoulos, Anastasios; Sykiotis, Gerasimos P

    2005-01-01

    The administration of recombinant human erythropoietin (rhEPO) increases the maximum oxygen consumption capacity, and is therefore abused as a doping method in endurance sports. The detection of erythropoietin (EPO) abuse is based on direct pharmacological and indirect haematological approaches, both of which have several limitations. In addition, current detection methods cannot cope with the emerging doping strategies of EPO mimicry, analogues and gene doping, and thus novel detection strategies are urgently needed. Direct detection methods for EPO misuse can be either pharmacological approaches that identify exogenous substances based on their physicochemical properties, or molecular methods that recognise EPO transgenes or gene transfer vectors. Since direct detection with molecular methods requires invasive procedures, it is not appropriate for routine screening of large numbers of athletes. In contrast, novel indirect methods based on haematological and/or molecular profiling could be better suited as screening tools, and athletes who are suspect of doping would then be submitted to direct pharmacological and molecular tests. This article reviews the current state of the EPO doping field, discusses available detection methods and their shortcomings, outlines emerging pharmaceutical and genetic technologies in EPO misuse, and proposes potential directions for the development of novel detection strategies.

  15. The MHC motif viewer: a visualization tool for MHC binding motifs

    DEFF Research Database (Denmark)

    Rapin, Nicolas; Hoof, Ilka; Lund, Ole

    2010-01-01

    is hampered by the lack of tools for browsing and comparing specificity of these molecules. We have developed a Web server, MHC Motif Viewer, which allows the display of the binding motif for MHC class I proteins for human, chimpanzee, rhesus monkey, mouse, and swine, as well as HLA-DR protein sequences...

  16. Analisis Unsur Matematika pada Motif Sulam Usus

    Directory of Open Access Journals (Sweden)

    Fredi Ganda Putra

    2017-12-01

    Full Text Available Based on interviews with researchers sources said that the beginning of the intestine embroidery is an art of genuine crafts. Called the intestine embroidery because this technique is a technique of combining a strand of cloth resembling the intestine formed according to the pattern by means of embroidered using a thread. Intestinal embroidery techniques were originally used to create a cover of the women's customary wardrobe of Lampung or often referred to as bebe. But not many people in Lampung, especially people who live in Lampung are still many who do not know and recognize the intestine embroidery because most only know tapis only characteristic of Lampung, besides that there are other cultural results that is embroidered intestine. There are still many who do not know that the intestine motif there is a knowledge of mathematics. The researcher's problem formulation is whether there are mathematical elements contained in the intestine embroidery motif based on the concept of geometry. The purpose of this study is to determine whether there are elements of mathematics contained in the intestine motif based on the concept of geometry. Subjects in this study consisted of 4 people obtained by purposive sampling technique. From the results of data analysis conducted by using descriptive analysis and discussion as follows: (1 Intestinal embroidery motif contains the meaning of mathematics and culture or often called Etnomatematika. On the meaning of culture there is a link between the embroidery intestine with a culture that has been there before as the existence of cultural linkage between Hindu belief Buddhism and there are similarities of motifs and decorative patterns contained in the motif embroidery intestine with ornamental variety in Indonesia. (2 The relationship between the intestine with mathematical motifs there are elements of mathematics such as geometry elements in the form of geometry of dimension one and dimension two, and the

  17. Molecular markers detect stable genomic regions underlying tomato fruit shelf life and weight

    Directory of Open Access Journals (Sweden)

    Guillermo Raúl Pratta

    2011-01-01

    Full Text Available Incorporating wild germplasm such as S. pimpinellifolium is an alternative strategy to prolong tomato fruit shelf life(SL without reducing fruit quality. A set of recombinant inbred lines with discrepant values of SL and weight (FW were derived byantagonistic-divergent selection from an interspecific cross. The general objective of this research was to evaluate Genotype x Year(GY and Marker x Year (MY interaction in these new genetic materials for both traits. Genotype and year principal effects and GYinteraction were statistically significant for SL. Genotype and year principal effects were significant for FW but GY interaction wasnot. The marker principal effect was significant for SL and FW but both year principal effect and MY interaction were not significant.Though SL was highly influenced by year conditions, some genome regions appeared to maintain a stable effect across years ofevaluation. Fruit weight, instead, was more independent of year effect.

  18. Detection of yellow fever virus genomes from four imported cases in China.

    Science.gov (United States)

    Cui, Shujuan; Pan, Yang; Lyu, Yanning; Liang, Zhichao; Li, Jie; Sun, Yulan; Dou, Xiangfeng; Tian, Lili; Huo, Da; Chen, Lijuan; Li, Xinyu; Wang, Quanyi

    2017-07-01

    Yellow fever virus (YFV), as the first proven human-pathogenic virus, is still a major public health problem with a dramatic upsurge in recent years. This is a report on four imported cases of yellow fever virus into China identified by whole genome sequencing. Phylogenetic analysis was performed and the results showed that these four viruses were highly homologous with Angola 71 strains (AY968064). In addition, effective mutations of amino acids were not observed in the E protein domain of four viruses, thus confirming the effectiveness of the YFV-17D vaccine (X03700). Although there is low risk of local transmission in most part of China, the increasing public health risk of YF caused by international exchange should not be ignored. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  19. Overlapping ETS and CRE Motifs (G/CCGGAAGTGACGTCA) Preferentially Bound by GABPα and CREB Proteins

    Science.gov (United States)

    Chatterjee, Raghunath; Zhao, Jianfei; He, Ximiao; Shlyakhtenko, Andrey; Mann, Ishminder; Waterfall, Joshua J.; Meltzer, Paul; Sathyanarayana, B. K.; FitzGerald, Peter C.; Vinson, Charles

    2012-01-01

    Previously, we identified 8-bps long DNA sequences (8-mers) that localize in human proximal promoters and grouped them into known transcription factor binding sites (TFBS). We now examine split 8-mers consisting of two 4-mers separated by 1-bp to 30-bps (X4-N1-30-X4) to identify pairs of TFBS that localize in proximal promoters at a precise distance. These include two overlapping TFBS: the ETS⇔ETS motif (C/GCCGGAAGCGGAA) and the ETS⇔CRE motif (C/GCGGAAGTGACGTCAC). The nucleotides in bold are part of both TFBS. Molecular modeling shows that the ETS⇔CRE motif can be bound simultaneously by both the ETS and the B-ZIP domains without protein-protein clashes. The electrophoretic mobility shift assay (EMSA) shows that the ETS protein GABPα and the B-ZIP protein CREB preferentially bind to the ETS⇔CRE motif only when the two TFBS overlap precisely. In contrast, the ETS domain of ETV5 and CREB interfere with each other for binding the ETS⇔CRE. The 11-mer (CGGAAGTGACG), the conserved part of the ETS⇔CRE motif, occurs 226 times in the human genome and 83% are in known regulatory regions. In vivo GABPα and CREB ChIP-seq peaks identified the ETS⇔CRE as the most enriched motif occurring in promoters of genes involved in mRNA processing, cellular catabolic processes, and stress response, suggesting that a specific class of genes is regulated by this composite motif. PMID:23050235

  20. Genome-scale detection of positive selection in nine primates predicts human-virus evolutionary conflicts.

    Science.gov (United States)

    van der Lee, Robin; Wiel, Laurens; van Dam, Teunis J P; Huynen, Martijn A

    2017-10-13

    Hotspots of rapid genome evolution hold clues about human adaptation. We present a comparative analysis of nine whole-genome sequenced primates to identify high-confidence targets of positive selection. We find strong statistical evidence for positive selection in 331 protein-coding genes (3%), pinpointing 934 adaptively evolving codons (0.014%). Our new procedure is stringent and reveals substantial artefacts (20% of initial predictions) that have inflated previous estimates. The final 331 positively selected genes (PSG) are strongly enriched for innate and adaptive immunity, secreted and cell membrane proteins (e.g. pattern recognition, complement, cytokines, immune receptors, MHC, Siglecs). We also find evidence for positive selection in reproduction and chromosome segregation (e.g. centromere-associated CENPO, CENPT), apolipoproteins, smell/taste receptors and mitochondrial proteins. Focusing on the virus-host interaction, we retrieve most evolutionary conflicts known to influence antiviral activity (e.g. TRIM5, MAVS, SAMHD1, tetherin) and predict 70 novel cases through integration with virus-human interaction data. Protein structure analysis further identifies positive selection in the interaction interfaces between viruses and their cellular receptors (CD4-HIV; CD46-measles, adenoviruses; CD55-picornaviruses). Finally, primate PSG consistently show high sequence variation in human exomes, suggesting ongoing evolution. Our curated dataset of positive selection is a rich source for studying the genetics underlying human (antiviral) phenotypes. Procedures and data are available at https://github.com/robinvanderlee/positive-selection. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Whole genome analysis of porcine astroviruses detected in Japanese pigs reveals genetic diversity and possible intra-genotypic recombination.

    Science.gov (United States)

    Ito, Mika; Kuroda, Moegi; Masuda, Tsuneyuki; Akagami, Masataka; Haga, Kei; Tsuchiaka, Shinobu; Kishimoto, Mai; Naoi, Yuki; Sano, Kaori; Omatsu, Tsutomu; Katayama, Yukie; Oba, Mami; Aoki, Hiroshi; Ichimaru, Toru; Mukono, Itsuro; Ouchi, Yoshinao; Yamasato, Hiroshi; Shirai, Junsuke; Katayama, Kazuhiko; Mizutani, Tetsuya; Nagai, Makoto

    2017-06-01

    Porcine astroviruses (PoAstVs) are ubiquitous enteric virus of pigs that are distributed in several countries throughout the world. Since PoAstVs are detected in apparent healthy pigs, the clinical significance of infection is unknown. However, AstVs have recently been associated with a severe neurological disorder in animals, including humans, and zoonotic potential has been suggested. To date, little is known about the epidemiology of PoAstVs among the pig population in Japan. In this report, we present an analysis of nearly complete genomes of 36 PoAstVs detected by a metagenomics approach in the feces of Japanese pigs. Based on a phylogenetic analysis and pairwise sequence comparison, 10, 5, 15, and 6 sequences were classified as PoAstV2, PoAstV3, PoAstV4, and PoAstV5, respectively. Co-infection with two or three strains was found in individual fecal samples from eight pigs. The phylogenetic trees of ORF1a, ORF1b, and ORF2 of PoAstV2 and PoAstV4 showed differences in their topologies. The PoAstV3 and PoAstV5 strains shared high sequence identities within each genotype in all ORFs; however, one PoAstV3 strain and one PoAstV5 strain showed considerable sequence divergence from the other PoAstV3 and PoAstV5 strains, respectively, in ORF2. Recombination analysis using whole genomes revealed evidence of multiple possible intra-genotype recombination events in PoAstV2 and PoAstV4, suggesting that recombination might have contributed to the genetic diversity and played an important role in the evolution of Japanese PoAstVs. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. Detection of Sleeping Beauty transposition in the genome of host cells by non-radioactive Southern blot analysis

    Energy Technology Data Exchange (ETDEWEB)

    Aravalli, Rajagopal N., E-mail: aravalli@umn.edu [Department of Radiology, University of Minnesota Medical School, MMC 292, 420 Delaware Street SE, Minneapolis, MN 55455 (United States); Park, Chang W. [Department of Medicine, University of Minnesota Medical School, MMC 36, 420 Delaware Street SE, Minneapolis, MN 55455 (United States); Steer, Clifford J., E-mail: steer001@umn.edu [Department of Medicine, University of Minnesota Medical School, MMC 36, 420 Delaware Street SE, Minneapolis, MN 55455 (United States); Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN 55455 (United States)

    2016-08-26

    The Sleeping Beauty transposon (SB-Tn) system is being used widely as a DNA vector for the delivery of therapeutic transgenes, as well as a tool for the insertional mutagenesis in animal models. In order to accurately assess the insertional potential and properties related to the integration of SB it is essential to determine the copy number of SB-Tn in the host genome. Recently developed SB100X transposase has demonstrated an integration rate that was much higher than the original SB10 and that of other versions of hyperactive SB transposases, such as HSB3 or HSB17. In this study, we have constructed a series of SB vectors carrying either a DsRed or a human β-globin transgene that was encompassed by cHS4 insulator elements, and containing the SB100X transposase gene outside the SB-Tn unit within the same vector in cis configuration. These SB-Tn constructs were introduced into the K-562 erythroid cell line, and their presence in the genomes of host cells was analyzed by Southern blot analysis using non-radioactive probes. Many copies of SB-Tn insertions were detected in host cells regardless of transgene sequences or the presence of cHS4 insulator elements. Interestingly, the size difference of 2.4 kb between insulated SB and non-insulated controls did not reflect the proportional difference in copy numbers of inserted SB-Tns. We then attempted methylation-sensitive Southern blots to assess the potential influence of cHS4 insulator elements on the epigenetic modification of SB-Tn. Our results indicated that SB100X was able to integrate at multiple sites with the number of SB-Tn copies larger than 6 kb in size. In addition, the non-radioactive Southern blot protocols developed here will be useful to detect integrated SB-Tn copies in any mammalian cell type.

  3. Detection of Sleeping Beauty transposition in the genome of host cells by non-radioactive Southern blot analysis

    International Nuclear Information System (INIS)

    Aravalli, Rajagopal N.; Park, Chang W.; Steer, Clifford J.

    2016-01-01

    The Sleeping Beauty transposon (SB-Tn) system is being used widely as a DNA vector for the delivery of therapeutic transgenes, as well as a tool for the insertional mutagenesis in animal models. In order to accurately assess the insertional potential and properties related to the integration of SB it is essential to determine the copy number of SB-Tn in the host genome. Recently developed SB100X transposase has demonstrated an integration rate that was much higher than the original SB10 and that of other versions of hyperactive SB transposases, such as HSB3 or HSB17. In this study, we have constructed a series of SB vectors carrying either a DsRed or a human β-globin transgene that was encompassed by cHS4 insulator elements, and containing the SB100X transposase gene outside the SB-Tn unit within the same vector in cis configuration. These SB-Tn constructs were introduced into the K-562 erythroid cell line, and their presence in the genomes of host cells was analyzed by Southern blot analysis using non-radioactive probes. Many copies of SB-Tn insertions were detected in host cells regardless of transgene sequences or the presence of cHS4 insulator elements. Interestingly, the size difference of 2.4 kb between insulated SB and non-insulated controls did not reflect the proportional difference in copy numbers of inserted SB-Tns. We then attempted methylation-sensitive Southern blots to assess the potential influence of cHS4 insulator elements on the epigenetic modification of SB-Tn. Our results indicated that SB100X was able to integrate at multiple sites with the number of SB-Tn copies larger than 6 kb in size. In addition, the non-radioactive Southern blot protocols developed here will be useful to detect integrated SB-Tn copies in any mammalian cell type.

  4. Detecting loci under recent positive selection in dairy and beef cattle by combining different genome-wide scan methods.

    Directory of Open Access Journals (Sweden)

    Yuri Tani Utsunomiya

    Full Text Available As the methodologies available for the detection of positive selection from genomic data vary in terms of assumptions and execution, weak correlations are expected among them. However, if there is any given signal that is consistently supported across different methodologies, it is strong evidence that the locus has been under past selection. In this paper, a straightforward frequentist approach based on the Stouffer Method to combine P-values across different tests for evidence of recent positive selection in common variations, as well as strategies for extracting biological information from the detected signals, were described and applied to high density single nucleotide polymorphism (SNP data generated from dairy and beef cattle (taurine and indicine. The ancestral Bovinae allele state of over 440,000 SNP is also reported. Using this combination of methods, highly significant (P<3.17×10(-7 population-specific sweeps pointing out to candidate genes and pathways that may be involved in beef and dairy production were identified. The most significant signal was found in the Cornichon homolog 3 gene (CNIH3 in Brown Swiss (P = 3.82×10(-12, and may be involved in the regulation of pre-ovulatory luteinizing hormone surge. Other putative pathways under selection are the glucolysis/gluconeogenesis, transcription machinery and chemokine/cytokine activity in Angus; calpain-calpastatin system and ribosome biogenesis in Brown Swiss; and gangliosides deposition in milk fat globules in Gyr. The composite method, combined with the strategies applied to retrieve functional information, may be a useful tool for surveying genome-wide selective sweeps and providing insights in to the source of selection.

  5. Armadillo motifs involved in vesicular transport.

    Directory of Open Access Journals (Sweden)

    Harald Striegl

    Full Text Available Armadillo (ARM repeat proteins function in various cellular processes including vesicular transport and membrane tethering. They contain an imperfect repeating sequence motif that forms a conserved three-dimensional structure. Recently, structural and functional insight into tethering mediated by the ARM-repeat protein p115 has been provided. Here we describe the p115 ARM-motifs for reasons of clarity and nomenclature and show that both sequence and structure are highly conserved among ARM-repeat proteins. We argue that there is no need to invoke repeat types other than ARM repeats for a proper description of the structure of the p115 globular head region. Additionally, we propose to define a new subfamily of ARM-like proteins and show lack of evidence that the ARM motifs found in p115 are present in other long coiled-coil tethering factors of the golgin family.

  6. Development of a real-time PCR for detection of Staphylococcus pseudintermedius using a novel automated comparison of whole-genome sequences.

    Directory of Open Access Journals (Sweden)

    Koen M Verstappen

    Full Text Available Staphylococcus pseudintermedius is an opportunistic pathogen in dogs and cats and occasionally causes infections in humans. S. pseudintermedius is often resistant to multiple classes of antimicrobials. It requires a reliable detection so that it is not misidentified as S. aureus. Phenotypic and currently-used molecular-based diagnostic assays lack specificity or are labour-intensive using multiplex PCR or nucleic acid sequencing. The aim of this study was to identify a specific target for real-time PCR by comparing whole genome sequences of S. pseudintermedius and non-pseudintermedius.Genome sequences were downloaded from public repositories and supplemented by isolates that were sequenced in this study. A Perl-script was written that analysed 300-nt fragments from a reference genome sequence of S. pseudintermedius and checked if this sequence was present in other S. pseudintermedius genomes (n = 74 and non-pseudintermedius genomes (n = 138. Six sequences specific for S. pseudintermedius were identified (sequence length between 300-500 nt. One sequence, which was located in the spsJ gene, was used to develop primers and a probe. The real-time PCR showed 100% specificity when testing for S. pseudintermedius isolates (n = 54, and eight other staphylococcal species (n = 43. In conclusion, a novel approach by comparing whole genome sequences identified a sequence that is specific for S. pseudintermedius and provided a real-time PCR target for rapid and reliable detection of S. pseudintermedius.

  7. Direct AUC optimization of regulatory motifs.

    Science.gov (United States)

    Zhu, Lin; Zhang, Hong-Bo; Huang, De-Shuang

    2017-07-15

    The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high-throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences. We propose a novel algorithm called CDAUC for optimizing DML-learned motifs based on the area under the receiver-operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high-throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities. CDAUC is available at: https://drive.google.com/drive/folders/0BxOW5MtIZbJjNFpCeHlBVWJHeW8 . dshuang@tongji.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  8. Copy number and loss of heterozygosity detected by SNP array of formalin-fixed tissues using whole-genome amplification.

    Directory of Open Access Journals (Sweden)

    Angela Stokes

    Full Text Available The requirement for large amounts of good quality DNA for whole-genome applications prohibits their use for small, laser capture micro-dissected (LCM, and/or rare clinical samples, which are also often formalin-fixed and paraffin-embedded (FFPE. Whole-genome amplification of DNA from these samples could, potentially, overcome these limitations. However, little is known about the artefacts introduced by amplification of FFPE-derived DNA with regard to genotyping, and subsequent copy number and loss of heterozygosity (LOH analyses. Using a ligation adaptor amplification method, we present data from a total of 22 Affymetrix SNP 6.0 experiments, using matched paired amplified and non-amplified DNA from 10 LCM FFPE normal and dysplastic oral epithelial tissues, and an internal method control. An average of 76.5% of SNPs were called in both matched amplified and non-amplified DNA samples, and concordance was a promising 82.4%. Paired analysis for copy number, LOH, and both combined, showed that copy number changes were reduced in amplified DNA, but were 99.5% concordant when detected, amplifications were the changes most likely to be 'missed', only 30% of non-amplified LOH changes were identified in amplified pairs, and when copy number and LOH are combined ∼50% of gene changes detected in the unamplified DNA were also detected in the amplified DNA and within these changes, 86.5% were concordant for both copy number and LOH status. However, there are also changes introduced as ∼20% of changes in the amplified DNA are not detected in the non-amplified DNA. An integrative network biology approach revealed that changes in amplified DNA of dysplastic oral epithelium localize to topologically critical regions of the human protein-protein interaction network, suggesting their functional implication in the pathobiology of this disease. Taken together, our results support the use of amplification of FFPE-derived DNA, provided sufficient samples are used

  9. Genomics-enabled sensor platform for rapid detection of viruses related to disease outbreak.

    Energy Technology Data Exchange (ETDEWEB)

    Brozik, Susan M; Manginell, Ronald P; Moorman, Matthew W; Xiao, Xiaoyin; Edwards, Thayne L.; Anderson, John Moses; Pfeifer, Kent Bryant; Branch, Darren W.; Wheeler, David Roger; Polsky, Ronen; Lopez, DeAnna M.; Ebel, Gregory D.; Prasad, Abhishek N.; Brozik, James A.; Rudolph, Angela R.; Wong, Lillian P.

    2013-09-01

    Bioweapons and emerging infectious diseases pose growing threats to our national security. Both natural disease outbreak and outbreaks due to a bioterrorist attack are a challenge to detect, taking days after the outbreak to identify since most outbreaks are only recognized through reportable diseases by health departments and reports of unusual diseases by clinicians. In recent decades, arthropod-borne viruses (arboviruses) have emerged as some of the most significant threats to human health. They emerge, often unexpectedly, from cryptic transmission foci causing localized outbreaks that can rapidly spread to multiple continents due to increased human travel and trade. Currently, diagnosis of acute infections requires amplification of viral nucleic acids, which can be costly, highly specific, technically challenging and time consuming. No diagnostic devices suitable for use at the bedside or in an outbreak setting currently exist. The original goals of this project were to 1) develop two highly sensitive and specific diagnostic assays for detecting RNA from a wide range of arboviruses; one based on an electrochemical approach and the other a fluorescent based assay and 2) develop prototype microfluidic diagnostic platforms for preclinical and field testing that utilize the assays developed in goal 1. We generated and characterized suitable primers for West Nile Virus RNA detection. Both optical and electrochemical transduction technologies were developed for DNA-RNA hybridization detection and were implemented in microfluidic diagnostic sensing platforms that were developed in this project.

  10. Zepto-molar electrochemical detection of Brucella genome based on gold nanoribbons covered by gold nanoblooms

    Science.gov (United States)

    Rahi, Amid; Sattarahmady, Naghmeh; Heli, Hossein

    2015-12-01

    Gold nanoribbons covered by gold nanoblooms were sonoelectrodeposited on a polycrystalline gold surface at -1800 mV (vs. AgCl) with the assistance of ultrasound and co-occurrence of the hydrogen evolution reaction. The nanostructure, as a transducer, was utilized to immobilize a Brucella-specific probe and fabrication of a genosensor, and the process of immobilization and hybridization was detected by electrochemical methods, using methylene blue as a redox marker. The proposed method for detection of the complementary sequence, sequences with base-mismatched (one-, two- and three-base mismatches), and the sequence of non-complementary sequence was assayed. The fabricated genosensor was evaluated for the assay of the bacteria in the cultured and human samples without polymerase chain reactions (PCR). The genosensor could detect the complementary sequence with a calibration sensitivity of 0.40 μA dm3 mol-1, a linear concentration range of 10 zmol dm-3 to 10 pmol dm-3, and a detection limit of 1.71 zmol dm-3.

  11. An Analysis of Multi-type Relational Interactions in FMA Using Graph Motifs with Disjointness Constraints

    Science.gov (United States)

    Zhang, Guo-Qiang; Luo, Lingyun; Ogbuji, Chime; Joslyn, Cliff; Mejino, Jose; Sahoo, Satya S

    2012-01-01

    The interaction of multiple types of relationships among anatomical classes in the Foundational Model of Anatomy (FMA) can provide inferred information valuable for quality assurance. This paper introduces a method called Motif Checking (MOCH) to study the effects of such multi-relation type interactions for detecting logical inconsistencies as well as other anomalies represented by the motifs. MOCH represents patterns of multi-type interaction as small labeled (with multiple types of edges) sub-graph motifs, whose nodes represent class variables, and labeled edges represent relational types. By representing FMA as an RDF graph and motifs as SPARQL queries, fragments of FMA are automatically obtained as auditing candidates. Leveraging the scalability and reconfigurability of Semantic Web Technology, we performed exhaustive analyses of a variety of labeled sub-graph motifs. The quality assurance feature of MOCH comes from the distinct use of a subset of the edges of the graph motifs as constraints for disjointness, whereby bringing in rule-based flavor to the approach as well. With possible disjointness implied by antonyms, we performed manual inspection of the resulting FMA fragments and tracked down sources of abnormal inferred conclusions (logical inconsistencies), which are amendable for programmatic revision of the FMA. Our results demonstrate that MOCH provides a unique source of valuable information for quality assurance. Since our approach is general, it is applicable to any ontological system with an OWL representation. PMID:23304382

  12. An analysis of multi-type relational interactions in FMA using graph motifs with disjointness constraints.

    Science.gov (United States)

    Zhang, Guo-Qiang; Luo, Lingyun; Ogbuji, Chime; Joslyn, Cliff; Mejino, Jose; Sahoo, Satya S

    2012-01-01

    The interaction of multiple types of relationships among anatomical classes in the Foundational Model of Anatomy (FMA) can provide inferred information valuable for quality assurance. This paper introduces a method called Motif Checking (MOCH) to study the effects of such multi-relation type interactions for detecting logical inconsistencies as well as other anomalies represented by the motifs. MOCH represents patterns of multi-type interaction as small labeled (with multiple types of edges) sub-graph motifs, whose nodes represent class variables, and labeled edges represent relational types. By representing FMA as an RDF graph and motifs as SPARQL queries, fragments of FMA are automatically obtained as auditing candidates. Leveraging the scalability and reconfigurability of Semantic Web Technology, we performed exhaustive analyses of a variety of labeled sub-graph motifs. The quality assurance feature of MOCH comes from the distinct use of a subset of the edges of the graph motifs as constraints for disjointness, whereby bringing in rule-based flavor to the approach as well. With possible disjointness implied by antonyms, we performed manual inspection of the resulting FMA fragments and tracked down sources of abnormal inferred conclusions (logical inconsistencies), which are amendable for programmatic revision of the FMA. Our results demonstrate that MOCH provides a unique source of valuable information for quality assurance. Since our approach is general, it is applicable to any ontological system with an OWL representation.

  13. Technical improvement and development of automatic detection method for genomic mutation

    International Nuclear Information System (INIS)

    Yamada, Kiyomi; Takai, Setsuo; Togashi, Chikako; Itami, Jun

    1999-01-01

    Fluorescent in situ hybridization (FISH) method was improved to estimate the dose of radiation exposure. Cleavage of DNA molecules in lymphocyte was used as the detection parameter and the procedures for preparation of samples suitable for atomic force microscopy and laser microscopy were developed. When ordinal primers on the market were used for PCR, the products were generally too small (about 200 b.p.) for detection by FISH method. Therefore, dATP and biotin-labeled dUTP were linked to its 3'-end by treatment with TdT for 2 hours, resulting that the mean length of PCR products was ca. 1.5 Kb. After hybridization using this prove, signal amplification was carried out according to biotin-avidin detection method. Thus, fluorescent signals on chromosomes could be easily detected. When three primers, D6S105, D6S291 and D6S282 were used as primer, fluorescent signal was detectable at 3 sites on chromatin fiber. These results indicate that this method is available for the analysis of cleavage of chromosomes. However, the backgrounds were much varied depending on the way to wash the preparation after incubation with fluorescent particle to form biotin-avidin binding. Therefore, further improvement of this method was necessary to apply in practice. When chromosomes 13, 14 and 15 from lymphocytes exposed to X-ray were used as test samples, it was demonstrated that radio-sensitivity was variable depending on the contents of R band in each chromosome. (M.N.)

  14. Detection of gene–environment interaction in pedigree data using genome-wide genotypes

    Science.gov (United States)

    Nivard, Michel G; Middeldorp, Christel M; Lubke, Gitta; Hottenga, Jouke-Jan; Abdellaoui, Abdel; Boomsma, Dorret I; Dolan, Conor V

    2016-01-01

    Heritability may be estimated using phenotypic data collected in relatives or in distantly related individuals using genome-wide single nucleotide polymorphism (SNP) data. We combined these approaches by re-parameterizing the model proposed by Zaitlen et al and extended this model to include moderation of (total and SNP-based) genetic and environmental variance components by a measured moderator. By means of data simulation, we demonstrated that the type 1 error rates of the proposed test are correct and parameter estimates are accurate. As an application, we considered the moderation by age or year of birth of variance components associated with body mass index (BMI), height, attention problems (AP), and symptoms of anxiety and depression. The genetic variance of BMI was found to increase with age, but the environmental variance displayed a greater increase with age, resulting in a proportional decrease of the heritability of BMI. Environmental variance of height increased with year of birth. The environmental variance of AP increased with age. These results illustrate the assessment of moderation of environmental and genetic effects, when estimating heritability from combined SNP and family data. The assessment of moderation of genetic and environmental variance will enhance our understanding of the genetic architecture of complex traits. PMID:27436263

  15. Microarray MAPH: accurate array-based detection of relative copy number in genomic DNA.

    Science.gov (United States)

    Gibbons, Brian; Datta, Parikkhit; Wu, Ying; Chan, Alan; Al Armour, John

    2006-06-30

    Current methods for measurement of copy number do not combine all the desirable qualities of convenience, throughput, economy, accuracy and resolution. In this study, to improve the throughput associated with Multiplex Amplifiable Probe Hybridisation (MAPH) we aimed to develop a modification based on the 3-Dimensional, Flow-Through Microarray Platform from PamGene International. In this new method, electrophoretic analysis of amplified products is replaced with photometric analysis of a probed oligonucleotide array. Copy number analysis of hybridised probes is based on a dual-label approach by comparing the intensity of Cy3-labelled MAPH probes amplified from test samples co-hybridised with similarly amplified Cy5-labelled reference MAPH probes. The key feature of using a hybridisation-based end point with MAPH is that discrimination of amplified probes is based on sequence and not fragment length. In this study we showed that microarray MAPH measurement of PMP22 gene dosage correlates well with PMP22 gene dosage determined by capillary MAPH and that copy number was accurately reported in analyses of DNA from 38 individuals, 12 of which were known to have Charcot-Marie-Tooth disease type 1A (CMT1A). Measurement of microarray-based endpoints for MAPH appears to be of comparable accuracy to electrophoretic methods, and holds the prospect of fully exploiting the potential multiplicity of MAPH. The technology has the potential to simplify copy number assays for genes with a large number of exons, or of expanded sets of probes from dispersed genomic locations.

  16. Rapid detection of microbial DNA by a novel isothermal genome exponential amplification reaction (GEAR) assay.

    Science.gov (United States)

    Prithiviraj, Jothikumar; Hill, Vincent; Jothikumar, Narayanan

    2012-04-20

    In this study we report the development of a simple target-specific isothermal nucleic acid amplification technique, termed genome exponential amplification reaction (GEAR). Escherichia coli was selected as the microbial target to demonstrate the GEAR technique as a proof of concept. The GEAR technique uses a set of four primers; in the present study these primers targeted 5 regions on the 16S rRNA gene of E. coli. The outer forward and reverse Tab primer sequences are complementary to each other at their 5' end, whereas their 3' end sequences are complementary to their respective target nucleic acid sequences. The GEAR assay was performed at a constant temperature 60 °C and monitored continuously in a real-time PCR instrument in the presence of an intercalating dye (SYTO 9). The GEAR assay enabled amplification of as few as one colony forming units of E. coli per reaction within 30 min. We also evaluated the GEAR assay for rapid identification of bacterial colonies cultured on agar media directly in the reaction without DNA extraction. Cells from E. coli colonies were picked and added directly to GEAR assay mastermix without prior DNA extraction. DNA in the cells could be amplified, yielding positive results within 15 min. Published by Elsevier Inc.

  17. Identification of high-efficiency 3′GG gRNA motifs in indexed FASTA files with ngg2

    Directory of Open Access Journals (Sweden)

    Elisha D. Roberson

    2015-11-01

    Full Text Available CRISPR/Cas9 is emerging as one of the most-used methods of genome modification in organisms ranging from bacteria to human cells. However, the efficiency of editing varies tremendously site-to-site. A recent report identified a novel motif, called the 3′GG motif, which substantially increases the efficiency of editing at all sites tested in C. elegans. Furthermore, they highlighted that previously published gRNAs with high editing efficiency also had this motif. I designed a Python command-line tool, ngg2, to identify 3′GG gRNA sites from indexed FASTA files. As a proof-of-concept, I screened for these motifs in six model genomes: Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Mus musculus, and Homo sapiens. I also scanned the genomes of pig (Sus scrofa and African elephant (Loxodonta africana to demonstrate the utility in non-model organisms. I identified more than 60 million single match 3′GG motifs in these genomes. Greater than 61% of all protein coding genes in the reference genomes had at least one unique 3′GG gRNA site overlapping an exon. In particular, more than 96% of mouse and 93% of human protein coding genes have at least one unique, overlapping 3′GG gRNA. These identified sites can be used as a starting point in gRNA selection, and the ngg2 tool provides an important ability to identify 3′GG editing sites in any species with an available genome sequence.

  18. Real-Time Whole-Genome Sequencing for Routine Typing, Surveillance, and Outbreak Detection of Verotoxigenic Escherichia coli

    Science.gov (United States)

    Scheutz, Flemming; Lund, Ole; Hasman, Henrik; Kaas, Rolf S.; Nielsen, Eva M.; Aarestrup, Frank M.

    2014-01-01

    Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming cheaper, it has huge potential in both diagnostics and routine surveillance. The aim of this study was to perform a real-time evaluation of WGS for routine typing and surveillance of verocytotoxin-producing Escherichia coli (VTEC). In Denmark, the Statens Serum Institut (SSI) routinely receives all suspected VTEC isolates. During a 7-week period in the fall of 2012, all incoming isolates were concurrently subjected to WGS using IonTorrent PGM. Real-time bioinformatics analysis was performed using web-tools (www.genomicepidemiology.org) for species determination, multilocus sequence type (MLST) typing, and determination of phylogenetic relationship, and a specific VirulenceFinder for detection of E. coli virulence genes was developed as part of this study. In total, 46 suspected VTEC isolates were characterized in parallel during the study. VirulenceFinder proved successful in detecting virulence genes included in routine typing, explicitly verocytotoxin 1 (vtx1), verocytotoxin 2 (vtx2), and intimin (eae), and also detected additional virulence genes. VirulenceFinder is also a robust method for assigning verocytotoxin (vtx) subtypes. A real-time clustering of isolates in agreement with the epidemiology was established from WGS, enabling discrimination between sporadic and outbreak isolates. Overall, WGS typing produced results faster and at a lower cost than the current routine. Therefore, WGS typing is a superior alternative to conventional typing strategies. This approach may also be applied to typing and surveillance of other pathogens. PMID:24574290

  19. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli.

    Science.gov (United States)

    Joensen, Katrine Grimstrup; Scheutz, Flemming; Lund, Ole; Hasman, Henrik; Kaas, Rolf S; Nielsen, Eva M; Aarestrup, Frank M

    2014-05-01

    Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming cheaper, it has huge potential in both diagnostics and routine surveillance. The aim of this study was to perform a real-time evaluation of WGS for routine typing and surveillance of verocytotoxin-producing Escherichia coli (VTEC). In Denmark, the Statens Serum Institut (SSI) routinely receives all suspected VTEC isolates. During a 7-week period in the fall of 2012, all incoming isolates were concurrently subjected to WGS using IonTorrent PGM. Real-time bioinformatics analysis was performed using web-tools (www.genomicepidemiology.org) for species determination, multilocus sequence type (MLST) typing, and determination of phylogenetic relationship, and a specific VirulenceFinder for detection of E. coli virulence genes was developed as part of this study. In total, 46 suspected VTEC isolates were characterized in parallel during the study. VirulenceFinder proved successful in detecting virulence genes included in routine typing, explicitly verocytotoxin 1 (vtx1), verocytotoxin 2 (vtx2), and intimin (eae), and also detected additional virulence genes. VirulenceFinder is also a robust method for assigning verocytotoxin (vtx) subtypes. A real-time clustering of isolates in agreement with the epidemiology was established from WGS, enabling discrimination between sporadic and outbreak isolates. Overall, WGS typing produced results faster and at a lower cost than the current routine. Therefore, WGS typing is a superior alternative to conventional typing strategies. This approach may also be applied to typing and surveillance of other pathogens.

  20. qPMS7: a fast algorithm for finding (ℓ, d-motifs in DNA and protein sequences.

    Directory of Open Access Journals (Sweden)

    Hieu Dinh

    Full Text Available Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is useful in the detection of transcription factor binding sites and transcriptional regulatory elements that are very crucial in understanding gene function, human disease, drug design, etc. Many versions of the motif search problem have been proposed in the literature. One such is the (ℓ, d-motif search (or Planted Motif Search (PMS. A generalized version of the PMS problem, namely, Quorum Planted Motif Search (qPMS, is shown to accurately model motifs in real data. However, solving the qPMS problem is an extremely difficult task because a special case of it, the PMS Problem, is already NP-hard, which means that any algorithm solving it can be expected to take exponential time in the worse case scenario. In this paper, we propose a novel algorithm named qPMS7 that tackles the qPMS problem on real data as well as challenging instances. Experimental results show that our Algorithm qPMS7 is on an average 5 times faster than the state-of-art algorithm. The executable program of Algorithm qPMS7 is freely available on the web at http://pms.engr.uconn.edu/downloads/qPMS7.zip. Our online motif discovery tools that use Algorithm qPMS7 are freely available at http://pms.engr.uconn.edu or http://motifsearch.com.

  1. Microarray MAPH: accurate array-based detection of relative copy number in genomic DNA

    Directory of Open Access Journals (Sweden)

    Chan Alan

    2006-06-01

    Full Text Available Abstract Background Current methods for measurement of copy number do not combine all the desirable qualities of convenience, throughput, economy, accuracy and resolution. In this study, to improve the throughput associated with Multiplex Amplifiable Probe Hybridisation (MAPH we aimed to develop a modification based on the 3-Dimensional, Flow-Through Microarray Platform from PamGene International. In this new method, electrophoretic analysis of amplified products is replaced with photometric analysis of a probed oligonucleotide array. Copy number analysis of hybridised probes is based on a dual-label approach by comparing the intensity of Cy3-labelled MAPH probes amplified from test samples co-hybridised with similarly amplified Cy5-labelled reference MAPH probes. The key feature of using a hybridisation-based end point with MAPH is that discrimination of amplified probes is based on sequence and not fragment length. Results In this study we showed that microarray MAPH measurement of PMP22 gene dosage correlates well with PMP22 gene dosage determined by capillary MAPH and that copy number was accurately reported in analyses of DNA from 38 individuals, 12 of which were known to have Charcot-Marie-Tooth disease type 1A (CMT1A. Conclusion Measurement of microarray-based endpoints for MAPH appears to be of comparable accuracy to electrophoretic methods, and holds the prospect of fully exploiting the potential multiplicity of MAPH. The technology has the potential to simplify copy number assays for genes with a large number of exons, or of expanded sets of probes from dispersed genomic locations.

  2. SA-Mot: a web server for the identification of motifs of interest extracted from protein loops.

    Science.gov (United States)

    Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude

    2011-07-01

    The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr.

  3. Faster exact Markovian probability functions for motif occurrences: a DFA-only approach.

    Science.gov (United States)

    Ribeca, Paolo; Raineri, Emanuele

    2008-12-15

    The computation of the statistical properties of motif occurrences has an obviously relevant application: patterns that are significantly over- or under-represented in genomes or proteins are interesting candidates for biological roles. However, the problem is computationally hard; as a result, virtually all the existing motif finders use fast but approximate scoring functions, in spite of the fact that they have been shown to produce systematically incorrect results. A few interesting exact approaches are known, but they are very slow and hence not practical in the case of realistic sequences. We give an exact solution, solely based on deterministic finite-state automata (DFA), to the problem of finding the whole relevant part of the probability distribution function of a simple-word motif in a homogeneous (biological) sequence. Out of that, the z-value can always be computed, while the P-value can be obtained either when it is not too extreme with respect to the number of floating-point digits available in the implementation, or when the number of pattern occurrences is moderately low. In particular, the time complexity of the algorithms for Markov models of moderate order (0 manage to obtain an algorithm which is both easily interpretable and efficient. This approach can be used for exact statistical studies of very long genomes and protein sequences, as we illustrate with some examples on the scale of the human genome.

  4. Improving the detection of pathways in genome-wide association studies by combined effects of SNPs from Linkage Disequilibrium blocks.

    Science.gov (United States)

    Zhao, Huiying; Nyholt, Dale R; Yang, Yuanhao; Wang, Jihua; Yang, Yuedong

    2017-06-14

    Genome-wide association studies (GWAS) have successfully identified single variants associated with diseases. To increase the power of GWAS, gene-based and pathway-based tests are commonly employed to detect more risk factors. However, the gene- and pathway-based association tests may be biased towards genes or pathways containing a large number of single-nucleotide polymorphisms (SNPs) with small P-values caused by high linkage disequilibrium (LD) correlations. To address such bias, numerous pathway-based methods have been developed. Here we propose a novel method, DGAT-path, to divide all SNPs assigned to genes in each pathway into LD blocks, and to sum the chi-square statistics of LD blocks for assessing the significance of the pathway by permutation tests. The method was proven robust with the type I error rate >1.6 times lower than other methods. Meanwhile, the method displays a higher power and is not biased by the pathway size. The applications to the GWAS summary statistics for schizophrenia and breast cancer indicate that the detected top pathways contain more genes close to associated SNPs than other methods. As a result, the method identified 17 and 12 significant pathways containing 20 and 21 novel associated genes, respectively for two diseases. The method is available online by http://sparks-lab.org/server/DGAT-path .

  5. FHSA-SED: Two-Locus Model Detection for Genome-Wide Association Study with Harmony Search Algorithm.

    Directory of Open Access Journals (Sweden)

    Shouheng Tuo

    Full Text Available Two-locus model is a typical significant disease model to be identified in genome-wide association study (GWAS. Due to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power, high computation cost, and preference for some types of disease models.In this study, two scoring functions (Bayesian network based K2-score and Gini-score are used for characterizing two SNP locus as a candidate model, the two criteria are adopted simultaneously for improving identification power and tackling the preference problem to disease models. Harmony search algorithm (HSA is improved for quickly finding the most likely candidate models among all two-locus models, in which a local search algorithm with two-dimensional tabu table is presented to avoid repeatedly evaluating some disease models that have strong marginal effect. Finally G-test statistic is used to further test the candidate models.We investigate our method named FHSA-SED on 82 simulated datasets and a real AMD dataset, and compare it with two typical methods (MACOED and CSE which have been developed recently based on swarm intelligent search algorithm. The results of simulation experiments indicate that our method outperforms the two compared algorithms in terms of detection power, computation time, evaluation times, sensitivity (TPR, specificity (SPC, positive predictive value (PPV and accuracy (ACC. Our method has identified two SNPs (rs3775652 and rs10511467 that may be also associated with disease in AMD dataset.

  6. Airborne Detection of H5N8 Highly Pathogenic Avian Influenza Virus Genome in Poultry Farms, France.

    Science.gov (United States)

    Scoizec, Axelle; Niqueux, Eric; Thomas, Rodolphe; Daniel, Patrick; Schmitz, Audrey; Le Bouquin, Sophie

    2018-01-01

    In southwestern France, during the winter of 2016-2017, the rapid spread of highly pathogenic avian influenza H5N8 outbreaks despite the implementation of routine control measures, raised the question about the potential role of airborne transmission in viral spread. As a first step to investigate the plausibility of that transmission, air samples were collected inside, outside and downwind from infected duck and chicken facilities. H5 avian influenza virus RNA was detected in all samples collected inside poultry houses, at external exhaust fans and at 5 m distance from poultry houses. For three of the five flocks studied, in the sample collected at 50-110 m distance, viral genomic RNA was detected. The measured viral air concentrations ranged between 4.3 and 6.4 log 10 RNA copies per m 3 , and their geometric mean decreased from external exhaust fans to the downwind measurement point. These findings are in accordance with the possibility of airborne transmission and question the procedures for outbreak depopulation.

  7. Airborne Detection of H5N8 Highly Pathogenic Avian Influenza Virus Genome in Poultry Farms, France

    Directory of Open Access Journals (Sweden)

    Axelle Scoizec

    2018-02-01

    Full Text Available In southwestern France, during the winter of 2016–2017, the rapid spread of highly pathogenic avian influenza H5N8 outbreaks despite the implementation of routine control measures, raised the question about the potential role of airborne transmission in viral spread. As a first step to investigate the plausibility of that transmission, air samples were collected inside, outside and downwind from infected duck and chicken facilities. H5 avian influenza virus RNA was detected in all samples collected inside poultry houses, at external exhaust fans and at 5 m distance from poultry houses. For three of the five flocks studied, in the sample collected at 50–110 m distance, viral genomic RNA was detected. The measured viral air concentrations ranged between 4.3 and 6.4 log10 RNA copies per m3, and their geometric mean decreased from external exhaust fans to the downwind measurement point. These findings are in accordance with the possibility of airborne transmission and question the procedures for outbreak depopulation.

  8. Performance Evaluation of NIPT in Detection of Chromosomal Copy Number Variants Using Low-Coverage Whole-Genome Sequencing of Plasma DNA

    DEFF Research Database (Denmark)

    Liu, Hongtai; Gao, Ya; Hu, Zhiyang

    2016-01-01

    , including 33 CNVs samples and 886 normal samples from September 1, 2011 to May 31, 2013, were enrolled in this study. The samples were randomly rearranged and blindly sequenced by low-coverage (about 7M reads) whole-genome sequencing of plasma DNA. Fetal CNVs were detected by Fetal Copy-number Analysis...

  9. ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions

    NARCIS (Netherlands)

    Muino, J.M.; Kaufmann, K.; Ham, van R.C.H.J.; Angenent, G.C.; Krajewski, P.

    2011-01-01

    Background In vivo detection of protein-bound genomic regions can be achieved by combining chromatin-immunoprecipitation with next-generation sequencing technology (ChIP-seq). The large amount of sequence data produced by this method needs to be analyzed in a statistically proper and computationally

  10. Genome Scan Detects Quantitative Trait Loci Affecting Female Fertility Traits in Danish and Swedish Holstein Cattle

    DEFF Research Database (Denmark)

    Höglund, Johanna Karolina; Guldbrandtsen, B; Su, G

    2009-01-01

    Data from the joint Nordic breeding value prediction for Danish and Swedish Holstein grandsire families were used to locate quantitative trait loci (QTL) for female fertility traits in Danish and Swedish Holstein cattle. Up to 36 Holstein grandsires with over 2,000 sons were genotyped for 416 mic...... for QTL segregating on Bos taurus chromosome (BTA)1, BTA7, BTA10, and BTA26. On each of these chromosomes, several QTL were detected affecting more than one of the fertility traits investigated in this study. Evidence for segregation of additional QTL on BTA2, BTA9, and BTA24 was found...

  11. A FRAMEWORK FOR ATTRIBUTE-BASED COMMUNITY DETECTION WITH APPLICATIONS TO INTEGRATED FUNCTIONAL GENOMICS.

    Science.gov (United States)

    Yu, Han; Hageman Blair, Rachael

    2016-01-01

    Understanding community structure in networks has received considerable attention in recent years. Detecting and leveraging community structure holds promise for understanding and potentially intervening with the spread of influence. Network features of this type have important implications in a number of research areas, including, marketing, social networks, and biology. However, an overwhelming majority of traditional approaches to community detection cannot readily incorporate information of node attributes. Integrating structural and attribute information is a major challenge. We propose a exible iterative method; inverse regularized Markov Clustering (irMCL), to network clustering via the manipulation of the transition probability matrix (aka stochastic flow) corresponding to a graph. Similar to traditional Markov Clustering, irMCL iterates between "expand" and "inflate" operations, which aim to strengthen the intra-cluster flow, while weakening the inter-cluster flow. Attribute information is directly incorporated into the iterative method through a sigmoid (logistic function) that naturally dampens attribute influence that is contradictory to the stochastic flow through the network. We demonstrate advantages and the exibility of our approach using simulations and real data. We highlight an application that integrates breast cancer gene expression data set and a functional network defined via KEGG pathways reveal significant modules for survival.

  12. Evaluation of Global Genomic DNA Methylation in Human Whole Blood by Capillary Electrophoresis UV Detection

    Directory of Open Access Journals (Sweden)

    Angelo Zinellu

    2017-01-01

    Full Text Available Alterations in global DNA methylation are implicated in various pathophysiological processes. The development of simple and quick, yet robust, methods to assess DNA methylation is required to facilitate its measurement and interpretation in clinical practice. We describe a highly sensitive and reproducible capillary electrophoresis method with UV detection for the separation and detection of cytosine and methylcytosine, after formic acid hydrolysis of DNA extracted from human whole blood. Hydrolysed samples were dried and resuspended with water and directly injected into the capillary without sample derivatization procedures. The use of a run buffer containing 50 mmol/L BIS-TRIS propane (BTP phosphate buffer at pH 3.25 and 60 mmol/L sodium acetate buffer at pH 3.60 (4 : 1, v/v allowed full analyte identification within 11 min. Precision tests indicated an elevated reproducibility with an interassay CV of 1.98% when starting from 2 μg of the extracted DNA. The method was successfully tested by measuring the DNA methylation degree both in healthy volunteers and in reference calf thymus DNA.

  13. Application of DETECTER, an evolutionary genomic tool to analyze genetic variation, to the cystic fibrosis gene family

    Directory of Open Access Journals (Sweden)

    De Kee Danny W

    2006-03-01

    Full Text Available Abstract Background The medical community requires computational tools that distinguish missense genetic differences having phenotypic impact within the vast number of sense mutations that do not. Tools that do this will become increasingly important for those seeking to use human genome sequence data to predict disease, make prognoses, and customize therapy to individual patients. Results An approach, termed DETECTER, is proposed to identify sites in a protein sequence where amino acid replacements are likely to have a significant effect on phenotype, including causing genetic disease. This approach uses a model-dependent tool to estimate the normalized replacement rate at individual sites in a protein sequence, based on a history of those sites extracted from an evolutionary analysis of the corresponding protein family. This tool identifies sites that have higher-than-average, average, or lower-than-average rates of change in the lineage leading to the sequence in the population of interest. The rates are then combined with sequence data to determine the likelihoods that particular amino acids were present at individual sites in the evolutionary history of the gene family. These likelihoods are used to predict whether any specific amino acid replacements, if introduced at the site in a modern human population, would have a significant impact on fitness. The DETECTER tool is used to analyze the cystic fibrosis transmembrane conductance regulator (CFTR gene family. Conclusion In this system, DETECTER retrodicts amino acid replacements associated with the cystic fibrosis disease with greater accuracy than alternative approaches. While this result validates this approach for this particular family of proteins only, the approach may be applicable to the analysis of polymorphisms generally, including SNPs in a human population.

  14. Rhabdovirus-like endogenous viral elements in the genome of Spodoptera frugiperda insect cells are actively transcribed: Implications for adventitious virus detection.

    Science.gov (United States)

    Geisler, Christoph; Jarvis, Donald L

    2016-07-01

    Spodoptera frugiperda (Sf) cell lines are used to produce several biologicals for human and veterinary use. Recently, it was discovered that all tested Sf cell lines are persistently infected with Sf-rhabdovirus, a novel rhabdovirus. As part of an effort to search for other adventitious viruses, we searched the Sf cell genome and transcriptome for sequences related to Sf-rhabdovirus. To our surprise, we found intact Sf-rhabdovirus N- and P-like ORFs, and partial Sf-rhabdovirus G- and L-like ORFs. The transcribed and genomic sequences matched, indicating the transcripts were derived from the genomic sequences. These appear to be endogenous viral elements (EVEs), which result from the integration of partial viral genetic material into the host cell genome. It is theoretically impossible for the Sf-rhabdovirus-like EVEs to produce infectious virus particles as 1) they are disseminated across 4 genomic loci, 2) the G and L ORFs are incomplete, and 3) the M ORF is missing. Our finding of transcribed virus-like sequences in Sf cells underscores that MPS-based searches for adventitious viruses in cell substrates used to manufacture biologics should take into account both genomic and transcribed sequences to facilitate the identification of transcribed EVE's, and to avoid false positive detection of replication-competent adventitious viruses. Copyright © 2016 International Alliance for Biological Standardization. Published by Elsevier Ltd. All rights reserved.

  15. Accurate quantification of microRNA via single strand displacement reaction on DNA origami motif.

    Directory of Open Access Journals (Sweden)

    Jie Zhu

    Full Text Available DNA origami is an emerging technology that assembles hundreds of staple strands and one single-strand DNA into certain nanopattern. It has been widely used in various fields including detection of biological molecules such as DNA, RNA and proteins. MicroRNAs (miRNAs play important roles in post-transcriptional gene repression as well as many other biological processes such as cell growth and differentiation. Alterations of miRNAs' expression contribute to many human diseases. However, it is still a challenge to quantitatively detect miRNAs by origami technology. In this study, we developed a novel approach based on streptavidin and quantum dots binding complex (STV-QDs labeled single strand displacement reaction on DNA origami to quantitatively detect the concentration of miRNAs. We illustrated a linear relationship between the concentration of an exemplary miRNA as miRNA-133 and the STV-QDs hybridization efficiency; the results demonstrated that it is an accurate nano-scale miRNA quantifier motif. In addition, both symmetrical rectangular motif and asymmetrical China-map motif were tested. With significant linearity in both motifs, our experiments suggested that DNA Origami motif with arbitrary shape can be utilized in this method. Since this DNA origami-based method we developed owns the unique advantages of simple, time-and-material-saving, potentially multi-targets testing in one motif and relatively accurate for certain impurity samples as counted directly by atomic force microscopy rather than fluorescence signal detection, it may be widely used in quantification of miRNAs.

  16. Accurate Quantification of microRNA via Single Strand Displacement Reaction on DNA Origami Motif

    Science.gov (United States)

    Lou, Jingyu; Li, Weidong; Li, Sheng; Zhu, Hongxin; Yang, Lun; Zhang, Aiping; He, Lin; Li, Can

    2013-01-01

    DNA origami is an emerging technology that assembles hundreds of staple strands and one single-strand DNA into certain nanopattern. It has been widely used in various fields including detection of biological molecules such as DNA, RNA and proteins. MicroRNAs (miRNAs) play important roles in post-transcriptional gene repression as well as many other biological processes such as cell growth and differentiation. Alterations of miRNAs' expression contribute to many human diseases. However, it is still a challenge to quantitatively detect miRNAs by origami technology. In this study, we developed a novel approach based on streptavidin and quantum dots binding complex (STV-QDs) labeled single strand displacement reaction on DNA origami to quantitatively detect the concentration of miRNAs. We illustrated a linear relationship between the concentration of an exemplary miRNA as miRNA-133 and the STV-QDs hybridization efficiency; the results demonstrated that it is an accurate nano-scale miRNA quantifier motif. In addition, both symmetrical rectangular motif and asymmetrical China-map motif were tested. With significant linearity in both motifs, our experiments suggested that DNA Origami motif with arbitrary shape can be utilized in this method. Since this DNA origami-based method we developed owns the unique advantages of simple, time-and-material-saving, potentially multi-targets testing in one motif and relatively accurate for certain impurity samples as counted directly by atomic force microscopy rather than fluorescence signal detection, it may be widely used in quantification of miRNAs. PMID:23990889

  17. Accurate quantification of microRNA via single strand displacement reaction on DNA origami motif.

    Science.gov (United States)

    Zhu, Jie; Feng, Xiaolu; Lou, Jingyu; Li, Weidong; Li, Sheng; Zhu, Hongxin; Yang, Lun; Zhang, Aiping; He, Lin; Li, Can

    2013-01-01

    DNA origami is an emerging technology that assembles hundreds of staple strands and one single-strand DNA into certain nanopattern. It has been widely used in various fields including detection of biological molecules such as DNA, RNA and proteins. MicroRNAs (miRNAs) play important roles in post-transcriptional gene repression as well as many other biological processes such as cell growth and differentiation. Alterations of miRNAs' expression contribute to many human diseases. However, it is still a challenge to quantitatively detect miRNAs by origami technology. In this study, we developed a novel approach based on streptavidin and quantum dots binding complex (STV-QDs) labeled single strand displacement reaction on DNA origami to quantitatively detect the concentration of miRNAs. We illustrated a linear relationship between the concentration of an exemplary miRNA as miRNA-133 and the STV-QDs hybridization efficiency; the results demonstrated that it is an accurate nano-scale miRNA quantifier motif. In addition, both symmetrical rectangular motif and asymmetrical China-map motif were tested. With significant linearity in both motifs, our experiments suggested that DNA Origami motif with arbitrary shape can be utilized in this method. Since this DNA origami-based method we developed owns the unique advantages of simple, time-and-material-saving, potentially multi-targets testing in one motif and relatively accurate for certain impurity samples as counted directly by atomic force microscopy rather than fluorescence signal detection, it may be widely used in quantification of miRNAs.

  18. Aeromonas salmonicida - Epidemiology, whole genome sequencing, detection and in vivo imaging

    DEFF Research Database (Denmark)

    Bartkova, Simona

    causes problems in sea reared rainbow trout (Oncorhynchus mykiss) production. Outbreaks occur repeatedly during stressful conditions such as elevated temperatures, in spite of commercial vaccines being applied. Besides seemingly lacking adequate protection, the vaccines also produce undesirable side...... of the concerns regarding A. salmonicida. First, we focused on investigation of the route of entry and initial dissemination of A. salmonicida in fish. This was done by tracing the bacterium using in vivo bioluminescence imaging. A Danish strain was transformed with a plasmid vector containing a green...... was subsequently turned to finding a sensitive method for detecting A. salmonicida in infected and possible carrier fish. For this, a previously developed quantitative real-time polymerase chain reaction (real-time PCR) targeting the aopP gene located on A. salmonicida plasmid pAsal1 was assessed. The real...

  19. Comprehensive human transcription factor binding site map for combinatory binding motifs discovery.

    Directory of Open Access Journals (Sweden)

    Arnoldo J Müller-Molina

    Full Text Available To know the map between transcription factors (TFs and their binding sites is essential to reverse engineer the regulation process. Only about 10%-20% of the transcription factor binding motifs (TFBMs have been reported. This lack of data hinders understanding gene regulation. To address this drawback, we propose a computational method that exploits never used TF properties to discover the missing TFBMs and their sites in all human gene promoters. The method starts by predicting a dictionary of regulatory "DNA words." From this dictionary, it distills 4098 novel predictions. To disclose the crosstalk between motifs, an additional algorithm extracts TF combinatorial binding patterns creating a collection of TF regulatory syntactic rules. Using these rules, we narrowed down a list of 504 novel motifs that appear frequently in syntax patterns. We tested the predictions against 509 known motifs confirming that our system can reliably predict ab initio motifs with an accuracy of 81%-far higher than previous approaches. We found that on average, 90% of the discovered combinatorial binding patterns target at least 10 genes, suggesting that to control in an independent manner smaller gene sets, supplementary regulatory mechanisms are required. Additionally, we discovered that the new TFBMs and their combinatorial patterns convey biological meaning, targeting TFs and genes related to developmental functions. Thus, among all the possible available targets in the genome, the TFs tend to regulate other TFs and genes involved in developmental functions. We provide a comprehensive resource for regulation analysis that includes a dictionary of "DNA words," newly predicted motifs and their corresponding combinatorial patterns. Combinatorial patterns are a useful filter to discover TFBMs that play a major role in orchestrating other factors and thus, are likely to lock/unlock cellular functional clusters.

  20. Optimizing a method for detection of hepatitis A virus in shellfish and study the effect of gamma radiation on the viral genome

    International Nuclear Information System (INIS)

    Amri, Islem

    2008-01-01

    Our work was aimed at detecting the hepatitis A virus (HAV) in bivalve mollusc collected from five shellfish harvesting areas and from a coastal region in Tunisia using RT-Nested-PCR and studying the effect of gamma radiation on HAV genome. Two methods used to recover HAV from mollusc flesh and two methods of extraction of virus RNA were compared in order to determine the most sensitive method. Glycine extraction and extraction of virus RNA using proteinase K were more convenient and then used in this study for detection of HAV in shellfish. The results of molecular analyses: RT-Nested-PCR using primers targeted at the P1 region revealed that 28 % of the samples were positive for HAV. Doses of gamma irradiation ranging between 5 to 30 kGy were used to study the effect of this radiation on HAV genome after the contamination of mollusc flesh with suspension of HAV (derived from stool specimens). HAV specific genomic band was observed for doses between 5 to 20 kGy. We didn't detect HAV genome with doses 25 and 30 kGy. (Author)

  1. Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.

    Science.gov (United States)

    Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N

    2013-03-15

    The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter

  2. The discovery of putative urine markers for the specific detection of prostate tumor by integrative mining of public genomic profiles.

    Directory of Open Access Journals (Sweden)

    Min Chen

    Full Text Available Urine has emerged as an attractive biofluid for the noninvasive detection of prostate cancer (PCa. There is a strong imperative to discover candidate urinary markers for the clinical diagnosis and prognosis of PCa. The rising flood of various omics profiles presents immense opportunities for the identification of prospective biomarkers. Here we present a simple and efficient strategy to derive candidate urine markers for prostate tumor by mining cancer genomic profiles from public databases. Prostate, bladder and kidney are three major tissues from which cellular matters could be released into urine. To identify urinary markers specific for PCa, upregulated entities that might be shed in exosomes of bladder cancer and kidney cancer are first excluded. Through the ontology-based filtering and further assessment, a reduced list of 19 entities encoding urinary proteins was derived as putative PCa markers. Among them, we have found 10 entities closely associated with the process of tumor cell growth and development by pathway enrichment analysis. Further, using the 10 entities as seeds, we have constructed a protein-protein interaction (PPI subnetwork and suggested a few urine markers as preferred prognostic markers to monitor the invasion and progression of PCa. Our approach is amenable to discover and prioritize potential markers present in a variety of body fluids for a spectrum of human diseases.

  3. Detection of genomic instability in α-irradiated and bystander human fibroblasts

    International Nuclear Information System (INIS)

    Ponnaiya, B.; Jenkins-Baker, G.; Bigelow, A.; Marino, S.; Geard, C.R.

    2003-01-01

    well. Currently experiments are ongoing to examine irradiated and bystander cells at immediate and delayed times for the presence of chromosomal aberrations that are not detectable by standard Giemsa staining (e.g. translocations) using mFISH analyses

  4. Mutational analysis of the RecJ exonuclease of Escherichia coli: identification of phosphoesterase motifs.

    Science.gov (United States)

    Sutera, V A; Han, E S; Rajman, L A; Lovett, S T

    1999-10-01

    The recJ gene, identified in Escherichia coli, encodes a Mg(+2)-dependent 5'-to-3' exonuclease with high specificity for single-strand DNA. Genetic and biochemical experiments implicate RecJ exonuclease in homologous recombination, base excision, and methyl-directed mismatch repair. Genes encoding proteins with strong similarities to RecJ have been found in every eubacterial genome sequenced to date, with the exception of Mycoplasma and Mycobacterium tuberculosis. Multiple genes encoding proteins similar to RecJ are found in some eubacteria, including Bacillus and Helicobacter, and in the archaea. Among this divergent set of sequences, seven conserved motifs emerge. We demonstrate here that amino acids within six of these motifs are essential for both the biochemical and genetic functions of E. coli RecJ. These motifs may define interactions with Mg(2+) ions or substrate DNA. A large family of proteins more distantly related to RecJ is present in archaea, eubacteria, and eukaryotes, including a hypothetical protein in the MgPa adhesin operon of Mycoplasma, a domain of putative polyA polymerases in Synechocystis and Aquifex, PRUNE of Drosophila, and an exopolyphosphatase (PPX1) of Saccharomyces cereviseae. Because these six RecJ motifs are shared between exonucleases and exopolyphosphatases, they may constitute an ancient phosphoesterase domain now found in all kingdoms of life.

  5. Annotating RNA motifs in sequences and alignments.

    Science.gov (United States)

    Gardner, Paul P; Eldai, Hisham

    2015-01-01

    RNA performs a diverse array of important functions across all cellular life. These functions include important roles in translation, building translational machinery and maturing messenger RNA. More recent discoveries include the miRNAs and bacterial sRNAs that regulate gene expression, the thermosensors, riboswitches and other cis-regulatory elements that help prokaryotes sense their environment and eukaryotic piRNAs that suppress transposition. However, there can be a long period between the initial discovery of a RNA and determining its function. We present a bioinformatic approach to characterize RNA motifs, which are critical components of many RNA structure-function relationships. These motifs can, in some instances, provide researchers with functional hypotheses for uncharacterized RNAs. Moreover, we introduce a new profile-based database of RNA motifs--RMfam--and illustrate some applications for investigating the evolution and functional characterization of RNA. All the data and scripts associated with this work are available from: https://github.com/ppgardne/RMfam. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Dynamic motifs in socio-economic networks

    Science.gov (United States)

    Zhang, Xin; Shao, Shuai; Stanley, H. Eugene; Havlin, Shlomo

    2014-12-01

    Socio-economic networks are of central importance in economic life. We develop a method of identifying and studying motifs in socio-economic networks by focusing on “dynamic motifs,” i.e., evolutionary connection patterns that, because of “node acquaintances” in the network, occur much more frequently than random patterns. We examine two evolving bi-partite networks: i) the world-wide commercial ship chartering market and ii) the ship build-to-order market. We find similar dynamic motifs in both bipartite networks, even though they describe different economic activities. We also find that “influence” and “persistence” are strong factors in the interaction behavior of organizations. When two companies are doing business with the same customer, it is highly probable that another customer who currently only has business relationship with one of these two companies, will become customer of the second in the future. This is the effect of influence. Persistence means that companies with close business ties to customers tend to maintain their relationships over a long period of time.

  7. The Q Motif Is Involved in DNA Binding but Not ATP Binding in ChlR1 Helicase.

    Directory of Open Access Journals (Sweden)

    Hao Ding

    Full Text Available Helicases are molecular motors that couple the energy of ATP hydrolysis to the unwinding of structured DNA or RNA and chromatin remodeling. The conversion of energy derived from ATP hydrolysis into unwinding and remodeling is coordinated by seven sequence motifs (I, Ia, II, III, IV, V, and VI. The Q motif, consisting of nine amino acids (GFXXPXPIQ with an invariant glutamine (Q residue, has been identified in some, but not all helicases. Compared to the seven well-recognized conserved helicase motifs, the role of the Q motif is less acknowledged. Mutations in the human ChlR1 (DDX11 gene are associated with a unique genetic disorder known as Warsaw Breakage Syndrome, which is characterized by cellular defects in genome maintenance. To examine the roles of the Q motif in ChlR1 helicase, we performed site directed mutagenesis of glutamine to alanine at residue 23 in the Q motif of ChlR1. ChlR1 recombinant protein was overexpressed and purified from HEK293T cells. ChlR1-Q23A mutant abolished the helicase activity of ChlR1 and displayed reduced DNA binding ability. The mutant showed impaired ATPase activity but normal ATP binding. A thermal shift assay revealed that ChlR1-Q23A has a melting point value similar to ChlR1-WT. Partial proteolysis mapping demonstrated that ChlR1-WT and Q23A have a similar globular structure, although some subtle conformational differences in these two proteins are evident. Finally, we found ChlR1 exists and functions as a monomer in solution, which is different from FANCJ, in which the Q motif is involved in protein dimerization. Taken together, our results suggest that the Q motif is involved in DNA binding but not ATP binding in ChlR1 helicase.

  8. Poly(A) motif prediction using spectral latent features from human DNA sequences

    KAUST Repository

    Xie, Bo; Jankovic, Boris R.; Bajic, Vladimir B.; Song, Le; Gao, Xin

    2013-01-01

    Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ?30% fewer error predictions relative to the other

  9. Poly(A) motif prediction using spectral latent features from human DNA sequences

    KAUST Repository

    Xie, Bo

    2013-06-21

    Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ?30% fewer error predictions relative to the other

  10. A novel human AP endonuclease with conserved zinc-finger-like motifs involved in DNA strand break responses

    OpenAIRE

    Kanno, Shin-ichiro; Kuzuoka, Hiroyuki; Sasao, Shigeru; Hong, Zehui; Lan, Li; Nakajima, Satoshi; Yasui, Akira

    2007-01-01

    DNA damage causes genome instability and cell death, but many of the cellular responses to DNA damage still remain elusive. We here report a human protein, PALF (PNK and APTX-like FHA protein), with an FHA (forkhead-associated) domain and novel zinc-finger-like CYR (cysteine–tyrosine–arginine) motifs that are involved in responses to DNA damage. We found that the CYR motif is widely distributed among DNA repair proteins of higher eukaryotes, and that PALF, as well as a Drosophila protein with...

  11. Next-generation sequencing detects repetitive elements expansion in giant genomes of annual killifish genus Austrolebias (Cyprinodontiformes, Rivulidae).

    Science.gov (United States)

    García, G; Ríos, N; Gutiérrez, V

    2015-06-01

    Among Neotropical fish fauna, the South American killifish genus Austrolebias (Cyprinodontiformes: Rivulidae) constitutes an excellent model to study the genomic evolutionary processes underlying speciation events. Recently, unusually large genome size has been described in 16 species of this genus, with an average DNA content of about 5.95 ± 0.45 pg per diploid cell (mean C-value of about 2.98 pg). In the present paper we explore the possible origin of this unparallel genomic increase by means of comparative analysis of the repetitive components using NGS (454-Roche) technology in the lowest and highest Rivulidae genomes. Here, we provide the first annotated Rivulidae-repeated sequences composition and their relative repetitive fraction in both genomes. Remarkably, the genomic proportion of the moderately repetitive DNA in Austrolebias charrua genome represents approximately twice (45%) of the repetitive components of the highly related rivulinae taxon Cynopoecilus melanotaenia (25%). Present work provides evidence about the impact of the repeat families that could be distinctly proliferated among sublineages within Rivulidae fish group, explaining the great genome size differences encompassing the differentiation and speciation events in this family.

  12. CONTEMPORARY USAGE OF TRADITIONAL TURKISH MOTIFS IN PRODUCT DESIGNS

    Directory of Open Access Journals (Sweden)

    Tulay Gumuser

    2012-12-01

    Full Text Available The aim of this study is to identify the traditional Turkish motifs and its relations among present industrial designs. Traditional Turkish motifs played a very important role in 16th century onwards. The arts of the Ottoman Empire were used because of their symbolic meanings and unique styles. When we examine these motifs we encounter; Tiger Stripe, Three Spot (Çintemani, Rumi, Hatayi, Penç, Cloud, Crescent, Star, Crown, Hyacinth, Tulip and Carnation motifs. Nowadays, Turkish designers have begun to use these traditional Turkish motifs in their designs so as to create differences and awareness in the world design. The examples of these industrial designs, using the Turkish motifs, have survived and have Ottoman heritage and historical value. In this study, the Turkish motifs will be examined along with their focus on contemporary Turkish industrial designs used today.

  13. Aplikasi Ornamen Khas Maluku untuk Pengembangan Desain Motif Batik

    Directory of Open Access Journals (Sweden)

    Masiswo Masiswo

    2016-04-01

    Full Text Available ABSTRAKMaluku memiliki banyak ragam hias budaya warisan nilai leluhur berupa ornamen etnis yang merupakan kesenian dan keterampilan kerajinan. Hasil warisan tersebut sampai saat ini masih lestari hidup serta dapat dinikmati sebagai konsumsi rohani yang memuaskan manusia. Berkaitan dengan keberlangsungan nilai-nilai tradisi etnis yang berwujud pada ornamen-ornamen daerah Maluku, maka dikembangkan untuk kebutuhan manusia berupa motif batik pada kain. Pengembangan ornamen ini lebih menekankan pada representasi akan bentuk-bentuk ornamen yang diterapkan pada kerajinan batik berupa motif khas Maluku. Pengembangan alternatif desain motif batik dibuat tiga variasi yang bersumber dari ornamen khas Maluku dibuat prototipe produknya dan diuji ketahanan luntur warnanya. Hasil uji ketahanan luntur warna terhadap gosokan basah dari tiga prototipe produk berpredikat baik sekali terdapat pada “Motif Siwa” dan predikat baik pada motif “Siwa Talang” dan motif “Matahari Siwa Talang”.Kata kunci: desain, Maluku, motif batik, ornamenABSTRACTMaluku has much decorative ancestral cultural heritage value in the form of ornament ethnic arts and crafts skills. The result of the legacy is still sustainable living can be enjoyed as well as satisfying spiritual human consumption.Related to the sustainability of traditional values in the form of ethnic ornaments Maluku, it was developed for human needs in the form of batik cloth . The development of these ornaments will be more emphasis on the representation forms of ornamentation that is applied to a batik motif Maluku. Development of alternative design motif made three variations. The development of three alternative design motifs derived from the Maluku ornaments made and tested a prototype product color fastness. The test results of color fastness to wet rubbing of the three prototypes are excellent products predicated on the "Motif Siwa" and a good rating on the motif "Siwa Talang" and motif "Matahari Siwa

  14. Selection for Unequal Densities of Sigma70 Promoter-like Signalsin Different Regions of Large Bacterial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Huerta, Araceli M.; Francino, M. Pilar; Morett, Enrique; Collado-Vides, Julio

    2006-03-01

    The evolutionary processes operating in the DNA regions that participate in the regulation of gene expression are poorly understood. In Escherichia coli, we have established a sequence pattern that distinguishes regulatory from nonregulatory regions. The density of promoter-like sequences, that are recognizable by RNA polymerase and may function as potential promoters, is high within regulatory regions, in contrast to coding regions and regions located between convergently-transcribed genes. Moreover, functional promoter sites identified experimentally are often found in the subregions of highest density of promoter-like signals, even when individual sites with higher binding affinity for RNA polymerase exist elsewhere within the regulatory region. In order to investigate the generality of this pattern, we have used position weight matrices describing the -35 and -10 promoter boxes of E. coli to search for these motifs in 43 additional genomes belonging to most established bacterial phyla, after specific calibration of the matrices according to the base composition of the noncoding regions of each genome. We have found that all bacterial species analyzed contain similar promoter-like motifs, and that, in most cases, these motifs follow the same genomic distribution observed in E. coli. Differential densities between regulatory and nonregulatory regions are detectable in most bacterial genomes, with the exception of those that have experienced evolutionary extreme genome reduction. Thus, the phylogenetic distribution of this pattern mirrors that of genes and other genomic features that require weak selection to be effective in order to persist. On this basis, we suggest that the loss of differential densities in the reduced genomes of host-restricted pathogens and symbionts is the outcome of a process of genome degradation resulting from the decreased efficiency of purifying selection in highly structured small populations. This implies that the differential

  15. Systematic evaluation of the impact of ChIP-seq read designs on genome coverage, peak identification, and allele-specific binding detection.

    Science.gov (United States)

    Zhang, Qi; Zeng, Xin; Younkin, Sam; Kawli, Trupti; Snyder, Michael P; Keleş, Sündüz

    2016-02-24

    Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36-50 bps), long (75-100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection. We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell lines. Systematic evaluations using in silico variations of these data as well as fully simulated data, revealed complex interplay between the sequencing parameters and analysis tools, and indicated clear advantages of paired-end designs in several aspects such as alignment accuracy, peak resolution, and most notably, allele-specific binding detection. Our work elucidates the effect of design on the downstream analysis and provides insights to investigators in deciding sequencing parameters in ChIP-seq experiments. We present the first systematic evaluation of the impact of ChIP-seq designs on allele-specific binding detection and highlights the power of pair-end designs in such studies.

  16. Mitochondrial and Y chromosome haplotype motifs as diagnostic markers of Jewish ancestry: a reconsideration.

    Directory of Open Access Journals (Sweden)

    Sergio eTofanelli

    2014-11-01

    Full Text Available Several authors have proposed haplotype motifs based on site variants at the mitochondrial genome (mtDNA and the non-recombining portion of the Y chromosome (NRY to trace the genealogies of Jewish people. Here, we analyzed their main approaches and test the feasibility of adopting motifs as ancestry markers through construction of a large database of mtDNA and NRY haplotypes from public genetic genealogical repositories. We verified the reliability of Jewish ancestry prediction based on the Cohen and Levite Modal Haplotypes in their classical 6 STR marker format or in the extended 12 STR format, as well as four founder mtDNA lineages (HVS-I segments accounting for about 40% of the current population of Ashkenazi Jews. For this purpose we compared haplotype composition in individuals of self-reported Jewish ancestry with the rest of European, African or Middle Eastern samples, to test for non-random association of ethno-geographic groups and haplotypes. Overall, NRY and mtDNA based motifs, previously reported to differentiate between groups, were found to be more represented in Jewish compared to non-Jewish groups. However, this seems to stem from common ancestors of Jewish lineages being rather recent respect to ancestors of non-Jewish lineages with the same haplotype signatures. Moreover, the polyphyly of haplotypes which contain the proposed motifs and the misuse of constant mutation rates heavily affected previous attempts to correctly dating the origin of common ancestries. Accordingly, our results stress the limitations of using the above haplotype motifs as reliable Jewish ancestry predictors and show its inadequacy for forensic or genealogical purposes.

  17. An efficient identification strategy of clonal tea cultivars using long-core motif SSR markers.

    Science.gov (United States)

    Wang, Rang Jian; Gao, Xiang Feng; Kong, Xiang Rui; Yang, Jun

    2016-01-01

    Microsatellites, or simple sequence repeats (SSRs), especially those with long-core motifs (tri-, tetra-, penta-, and hexa-nucleotide) represent an excellent tool for DNA fingerprinting. SSRs with long-core motifs are preferred since neighbor alleles are more easily separated and identified from each other, which render the interpretation of electropherograms and the true alleles more reliable. In the present work, with the purpose of characterizing a set of core SSR markers with long-core motifs for well fingerprinting clonal cultivars of tea (Camellia sinensis), we analyzed 66 elite clonal tea cultivars in China with 33 initially-chosen long-core motif SSR markers covering all the 15 linkage groups of tea plant genome. A set of 6 SSR markers were conclusively selected as core SSR markers after further selection. The polymorphic information content (PIC) of the core SSR markers was >0.5, with ≤5 alleles in each marker containing 10 or fewer genotypes. Phylogenetic analysis revealed that the core SSR markers were not strongly correlated with the trait 'cultivar processing-property'. The combined probability of identity (PID) between two random cultivars for the whole set of 6 SSR markers was estimated to be 2.22 × 10(-5), which was quite low, confirmed the usefulness of the proposed SSR markers for fingerprinting analyses in Camellia sinensis. Moreover, for the sake of quickly discriminating the clonal tea cultivars, a cultivar identification diagram (CID) was subsequently established using these core markers, which fully reflected the identification process and provided the immediate information about which SSR markers were needed to identify a cultivar chosen among the tested ones. The results suggested that long-core motif SSR markers used in the investigation contributed to the accurate and efficient identification of the clonal tea cultivars and enabled the protection of intellectual property.

  18. Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.

    Science.gov (United States)

    Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C

    2018-01-10

    Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing

  19. UKIRAN KERAWANG ACEH GAYO SEBAGAI INSPIRASI PENCIPTAAN MOTIF BATIK KHAS GAYO

    Directory of Open Access Journals (Sweden)

    Irfa ina Rohana Salma

    2016-12-01

    Full Text Available ABSTRAK Industri batik mulai berkembang di Gayo, tetapi belum memiliki motif batik khas daerah. Oleh karena itu perlu diciptakan motif batik khas Gayo, dengan mengambil inspirasi dari ukiran yang terdapat pada rumah tradisional yang biasa disebut ukiran kerawang Gayo. Tujuan penciptaan seni ini adalah untuk menciptakan motif batik yang memiliki ciri khas Gayo. Metode yang digunakan yaitu eksplorasi ide, perancangan, dan perwujudan menjadi motif batik. Dalam kegiatan ini telah diciptakan enam motif batik khas Gayo yaitu: (1 Motif Ceplok Gayo; (2 Motif Gayo Tegak; (3 Motif Gayo Lurus; (4 Motif Parang Gayo; (5 Motif Gayo Lembut; dan (6 Motif Geometris Gayo. Hasil uji kesukaan terhadap motif kepada lima puluh responden menunjukkan bahwa Motif Ceplok Gayo paling banyak dipilih oleh responden yaitu sebesar 19%, sedangkan Motif Parang Gayo 18%, Motif Gayo Lembut 17%, Motif Geometris Gayo 17%, Motif Gayo Lurus 15% dan Motif Gayo Tegak 14%. Rata-rata motif yang dihasilkan mendapatkan apresiasi yang baik dari responden, sehingga semua motif layak diproduksi sebagai batik khas Gayo.Kata kunci: batik Gayo, Motif Ceplok Gayo, Motif Parang Gayo.ABSTRACTBatik industry began to develop in Gayo, but have not had a typical batik motif itself. Therefore, it is necessary to create batik motifs of Gayo, by taking inspiration from the carvings found in traditional houses commonly called kerawang Gayo. The purpose of this art is to create motifs those have a Gayo characteristic. The method used are the idea exploration, design, and motifs embodiment. In this activity has created six Gayo batik motifs, namely: (1 Motif Ceplok Gayo; (2 Motif Gayo Tegak; (3 Motif GayoLurus; (4 Motif Parang Gayo; (5 Motif Gayo Lembut; dan (6 Motif Geometris Gayo. The test results fondness of the motives to fifty respondents indicated that the Motif Ceplok Gayo most preferred by respondents ie 19%, while Motif Parang Gayo 18%, Motif Gayo Lembut 17%, Motif Geometris Gayo 17%, Motif Gayo

  20. Dipeptide frequency/bias analysis identifies conserved sites of nonrandomness shared by cysteine-rich motifs.

    Science.gov (United States)

    Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N

    2001-08-15

    This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.

  1. Detecting Statistically Significant Communities of Triangle Motifs in Undirected Networks

    Science.gov (United States)

    2016-04-26

    Systems, Statistics & Management Science, University of Alabama, USA. 1 DISTRIBUTION A: Distribution approved for public release. Contents 1 Summary 5...13 5 Application to Real Networks 18 5.1 2012 FBS Football Schedule Network... football schedule network. . . . . . . . . . . . . . . . . . . . . . 21 14 Stem plot of degree-ordered vertices versus the degree for college football

  2. Development and validation of cross-transferable and polymorphic DNA markers for detecting alien genome introgression in Oryza sativa from Oryza brachyantha.

    Science.gov (United States)

    Ray, Soham; Bose, Lotan K; Ray, Joshitha; Ngangkham, Umakanta; Katara, Jawahar L; Samantaray, Sanghamitra; Behera, Lambodar; Anumalla, Mahender; Singh, Onkar N; Chen, Meingsheng; Wing, Rod A; Mohapatra, Trilochan

    2016-08-01

    African wild rice Oryza brachyantha (FF), a distant relative of cultivated rice Oryza sativa (AA), carries genes for pests and disease resistance. Molecular marker assisted alien gene introgression from this wild species to its domesticated counterpart is largely impeded due to the scarce availability of cross-transferable and polymorphic molecular markers that can clearly distinguish these two species. Availability of the whole genome sequence (WGS) of both the species provides a unique opportunity to develop markers, which are cross-transferable. We observed poor cross-transferability (~0.75 %) of O. sativa specific sequence tagged microsatellite (STMS) markers to O. brachyantha. By utilizing the genome sequence information, we developed a set of 45 low cost PCR based co-dominant polymorphic markers (STS and CAPS). These markers were found cross-transferrable (84.78 %) between the two species and could distinguish them from each other and thus allowed tracing alien genome introgression. Finally, we validated a Monosomic Alien Addition Line (MAAL) carrying chromosome 1 of O. brachyantha in O. sativa background using these markers, as a proof of concept. Hence, in this study, we have identified a set molecular marker (comprising of STMS, STS and CAPS) that are capable of detecting alien genome introgression from O. brachyantha to O. sativa.

  3. Structural motifs of pre-nucleation clusters.

    Science.gov (United States)

    Zhang, Y; Türkmen, I R; Wassermann, B; Erko, A; Rühl, E

    2013-10-07

    Structural motifs of pre-nucleation clusters prepared in single, optically levitated supersaturated aqueous aerosol microparticles containing CaBr2 as a model system are reported. Cluster formation is identified by means of X-ray absorption in the Br K-edge regime. The salt concentration beyond the saturation point is varied by controlling the humidity in the ambient atmosphere surrounding the 15-30 μm microdroplets. This leads to the formation of metastable supersaturated liquid particles. Distinct spectral shifts in near-edge spectra as a function of salt concentration are observed, in which the energy position of the Br K-edge is red-shifted by up to 7.1 ± 0.4 eV if the dilute solution is compared to the solid. The K-edge positions of supersaturated solutions are found between these limits. The changes in electronic structure are rationalized in terms of the formation of pre-nucleation clusters. This assumption is verified by spectral simulations using first-principle density functional theory and molecular dynamics calculations, in which structural motifs are considered, explaining the experimental results. These consist of solvated CaBr2 moieties, rather than building blocks forming calcium bromide hexahydrates, the crystal system that is formed by drying aqueous CaBr2 solutions.

  4. POWRS: position-sensitive motif discovery.

    Directory of Open Access Journals (Sweden)

    Ian W Davis

    Full Text Available Transcription factors and the short, often degenerate DNA sequences they recognize are central regulators of gene expression, but their regulatory code is challenging to dissect experimentally. Thus, computational approaches have long been used to identify putative regulatory elements from the patterns in promoter sequences. Here we present a new algorithm "POWRS" (POsition-sensitive WoRd Set for identifying regulatory sequence motifs, specifically developed to address two common shortcomings of existing algorithms. First, POWRS uses the position-specific enrichment of regulatory elements near transcription start sites to significantly increase sensitivity, while providing new information about the preferred localization of those elements. Second, POWRS forgoes position weight matrices for a discrete motif representation that appears more resistant to over-generalization. We apply this algorithm to discover sequences related to constitutive, high-level gene expression in the model plant Arabidopsis thaliana, and then experimentally validate the importance of those elements by systematically mutating two endogenous promoters and measuring the effect on gene expression levels. This provides a foundation for future efforts to rationally engineer gene expression in plants, a problem of great importance in developing biotech crop varieties.BSD-licensed Python code at http://grassrootsbio.com/papers/powrs/.

  5. Promoter Motifs in NCLDVs: An Evolutionary Perspective

    Science.gov (United States)

    Oliveira, Graziele Pereira; Andrade, Ana Cláudia dos Santos Pereira; Rodrigues, Rodrigo Araújo Lima; Arantes, Thalita Souza; Boratto, Paulo Victor Miranda; Silva, Ludmila Karen dos Santos; Dornas, Fábio Pio; Trindade, Giliane de Souza; Drumond, Betânia Paiva; La Scola, Bernard; Kroon, Erna Geessien; Abrahão, Jônatas Santos

    2017-01-01

    For many years, gene expression in the three cellular domains has been studied in an attempt to discover sequences associated with the regulation of the transcription process. Some specific transcriptional features were described in viruses, although few studies have been devoted to understanding the evolutionary aspects related to the spread of promoter motifs through related viral families. The discovery of giant viruses and the proposition of the new viral order Megavirales that comprise a monophyletic group, named nucleo-cytoplasmic large DNA viruses (NCLDV), raised new questions in the field. Some putative promoter sequences have already been described for some NCLDV members, bringing new insights into the evolutionary history of these complex microorganisms. In this review, we summarize the main aspects of the transcription regulation process in the three domains of life, followed by a systematic description of what is currently known about promoter regions in several NCLDVs. We also discuss how the analysis of the promoter sequences could bring new ideas about the giant viruses’ evolution. Finally, considering a possible common ancestor for the NCLDV group, we discussed possible promoters’ evolutionary scenarios and propose the term “MEGA-box” to designate an ancestor promoter motif (‘TATATAAAATTGA’) that could be evolved gradually by nucleotides’ gain and loss and point mutations. PMID:28117683

  6. Promoter Motifs in NCLDVs: An Evolutionary Perspective

    Directory of Open Access Journals (Sweden)

    Graziele Pereira Oliveira

    2017-01-01

    Full Text Available For many years, gene expression in the three cellular domains has been studied in an attempt to discover sequences associated with the regulation of the transcription process. Some specific transcriptional features were described in viruses, although few studies have been devoted to understanding the evolutionary aspects related to the spread of promoter motifs through related viral families. The discovery of giant viruses and the proposition of the new viral order Megavirales that comprise a monophyletic group, named nucleo-cytoplasmic large DNA viruses (NCLDV, raised new questions in the field. Some putative promoter sequences have already been described for some NCLDV members, bringing new insights into the evolutionary history of these complex microorganisms. In this review, we summarize the main aspects of the transcription regulation process in the three domains of life, followed by a systematic description of what is currently known about promoter regions in several NCLDVs. We also discuss how the analysis of the promoter sequences could bring new ideas about the giant viruses’ evolution. Finally, considering a possible common ancestor for the NCLDV group, we discussed possible promoters’ evolutionary scenarios and propose the term “MEGA-box” to designate an ancestor promoter motif (‘TATATAAAATTGA’ that could be evolved gradually by nucleotides’ gain and loss and point mutations.

  7. GNG Motifs Can Replace a GGG Stretch during G-Quadruplex Formation in a Context Dependent Manner.

    Directory of Open Access Journals (Sweden)

    Kohal Das

    Full Text Available G-quadruplexes are one of the most commonly studied non-B DNA structures. Generally, these structures are formed using a minimum of 4, three guanine tracts, with connecting loops ranging from one to seven. Recent studies have reported deviation from this general convention. One such deviation is the involvement of bulges in the guanine tracts. In this study, guanines along with bulges, also referred to as GNG motifs have been extensively studied using recently reported HOX11 breakpoint fragile region I as a model template. By strategic mutagenesis approach we show that the contribution from continuous G-tracts may be dispensible during G-quadruplex formation when such motifs are flanked by GNGs. Importantly, the positioning and number of GNG/GNGNG can also influence the formation of G-quadruplexes. Further, we assessed three genomic regions from HIF1 alpha, VEGF and SHOX gene for G-quadruplex formation using GNG motifs. We show that HIF1 alpha sequence harbouring GNG motifs can fold into intramolecular G-quadruplex. In contrast, GNG motifs in mutant VEGF sequence could not participate in structure formation, suggesting that the usage of GNG is context dependent. Importantly, we show that when two continuous stretches of guanines are flanked by two independent GNG motifs in a naturally occurring sequence (SHOX, it can fold into an intramolecular G-quadruplex. Finally, we show the specific binding of G-quadruplex binding protein, Nucleolin and G-quadruplex antibody, BG4 to SHOX G-quadruplex. Overall, our study provides novel insights into the role of GNG motifs in G-quadruplex structure formation which may have both physiological and pathological implications.

  8. Parole, Sintagmatik, dan Paradigmatik Motif Batik Mega Mendung

    Directory of Open Access Journals (Sweden)

    Rudi - Nababan

    2012-04-01

    Full Text Available ABSTRACT   Discussing traditional batik is related a lot to the organization system of fine arts element ac- companying it, either the pattern of the motif or the technique of the making. In this case, the motif of Mega Mendung Cirebon certainly has patterns and rules which are traditionally different from the other motifs in other areas. Through  semiotics analysis especially with Saussure and Pierce concept, it can be traced that batik with Cirebon motif, in this case Mega Mendung motif, has parole and langue system, as unique fine arts language in batik, and structure of visual syntagmatic and paradigmatic. In the context of batik motif as fine arts language, it is surely related to sign system as symbol and icon.       Keywords: visual semiotic, Cirebon’s batik.

  9. Novel Structural and Functional Motifs in cellulose synthase (CesA Genes of Bread Wheat (Triticum aestivum, L..

    Directory of Open Access Journals (Sweden)

    Simerjeet Kaur

    Full Text Available Cellulose is the primary determinant of mechanical strength in plant tissues. Late-season lodging is inversely related to the amount of cellulose in a unit length of the stem. Wheat is the most widely grown of all the crops globally, yet information on its CesA gene family is limited. We have identified 22 CesA genes from bread wheat, which include homoeologs from each of the three genomes, and named them as TaCesAXA, TaCesAXB or TaCesAXD, where X denotes the gene number and the last suffix stands for the respective genome. Sequence analyses of the CESA proteins from wheat and their orthologs from barley, maize, rice, and several dicot species (Arabidopsis, beet, cotton, poplar, potato, rose gum and soybean revealed motifs unique to monocots (Poales or dicots. Novel structural motifs CQIC and SVICEXWFA were identified, which distinguished the CESAs involved in the formation of primary and secondary cell wall (PCW and SCW in all the species. We also identified several new motifs specific to monocots or dicots. The conserved motifs identified in this study possibly play functional roles specific to PCW or SCW formation. The new insights from this study advance our knowledge about the structure, function and evolution of the CesA family in plants in general and wheat in particular. This information will be useful in improving culm strength to reduce lodging or alter wall composition to improve biofuel production.

  10. Detection of selection signatures of population-specific genomic regions selected during domestication process in Jinhua pigs.

    Science.gov (United States)

    Li, Zhengcao; Chen, Jiucheng; Wang, Zhen; Pan, Yuchun; Wang, Qishan; Xu, Ningying; Wang, Zhengguang

    2016-12-01

    Chinese pigs have been undergoing both natural and artificial selection for thousands of years. Jinhua pigs are of great importance, as they can be a valuable model for exploring the genetic mechanisms linked to meat quality and other traits such as disease resistance, reproduction and production. The purpose of this study was to identify distinctive footprints of selection between Jinhua pigs and other breeds utilizing genome-wide SNP data. Genotyping by genome reducing and sequencing was implemented in order to perform cross-population extended haplotype homozygosity to reveal strong signatures of selection for those economically important traits. This work was performed at a 2% genome level, which comprised 152 006 SNPs genotyped in a total of 517 individuals. Population-specific footprints of selective sweeps were searched for in the genome of Jinhua pigs using six native breeds and three European breeds as reference groups. Several candidate genes associated with meat quality, health and reproduction, such as GH1, CRHR2, TRAF4 and CCK, were found to be overlapping with the significantly positive outliers. Additionally, the results revealed that some genomic regions associated with meat quality, immune response and reproduction in Jinhua pigs have evolved directionally under domestication and subsequent selections. The identified genes and biological pathways in Jinhua pigs showed different selection patterns in comparison with the Chinese and European breeds. © 2016 Stichting International Foundation for Animal Genetics.

  11. De Novo Assembly of Human Herpes Virus Type 1 (HHV-1) Genome, Mining of Non-Canonical Structures and Detection of Novel Drug-Resistance Mutations Using Short- and Long-Read Next Generation Sequencing Technologies.

    Science.gov (United States)

    Karamitros, Timokratis; Harrison, Ian; Piorkowska, Renata; Katzourakis, Aris; Magiorkinis, Gkikas; Mbisa, Jean Lutamyo

    2016-01-01

    Human herpesvirus type 1 (HHV-1) has a large double-stranded DNA genome of approximately 152 kbp that is structurally complex and GC-rich. This makes the assembly of HHV-1 whole genomes from short-read sequencing data technically challenging. To improve the assembly of HHV-1 genomes we have employed a hybrid genome assembly protocol using data from two sequencing technologies: the short-read Roche 454 and the long-read Oxford Nanopore MinION sequencers. We sequenced 18 HHV-1 cell culture-isolated clinical specimens collected from immunocompromised patients undergoing antiviral therapy. The susceptibility of the samples to several antivirals was determined by plaque reduction assay. Hybrid genome assembly resulted in a decrease in the number of contigs in 6 out of 7 samples and an increase in N(G)50 and N(G)75 of all 7 samples sequenced by both technologies. The approach also enhanced the detection of non-canonical contigs including a rearrangement between the unique (UL) and repeat (T/IRL) sequence regions of one sample that was not detectable by assembly of 454 reads alone. We detected several known and novel resistance-associated mutations in UL23 and UL30 genes. Genome-wide genetic variability ranged from genomes will be useful in determining genetic determinants of drug resistance, virulence, pathogenesis and viral evolution. The numerous, complex repeat regions of the HHV-1 genome currently remain a barrier towards this goal.

  12. Genome sequence of a cluster A13 mycobacteriophage detected in Mycobacterium phlei over a half century ago.

    Science.gov (United States)

    Marton, Szilvia; Fehér, Enikő; Horváth, Balázs; Háber, Katalin; Somogyi, Pál; Minárovits, János; Bányai, Krisztián

    2016-01-01

    A phage infecting Mycobacterium phlei was isolated in 1958 from a soil sample in Hungary. Some physicochemical and biological properties of the virus were described in independent studies over the years. Here, we report the genome sequence of this early mycobacteriophage isolate. The Phlei phage genome measured 50,418 bp, had a GC content of 60.1 % and was predicted to encode 81 proteins and three tRNAs. Phylogeny of the tape measure protein revealed genetic relatedness to other early isolates of mycobacteriophages within subcluster A2. The genomic organization and genetic relationships to other strains showed that the Phlei phage belongs to a novel genetic cluster, designated A13.

  13. Genome variability in European and American bison detected using the BovineSNP50 BeadChip

    DEFF Research Database (Denmark)

    Pertoldi, C.; Wójcik, Jan M; Tokarska, Małgorzata

    2010-01-01

     The remaining wild populations of bison have all been through severe bottlenecks. The genomic consequences of these bottlenecks present an interesting area to study. Using a very large panel of SNPs developed in Bos taurus we have carried out a genome-wide screening on the European bison (Bison...... bonasus; EB) and on two subspecies of American bison: the plains bison (B. bison bison; PB) and the wood bison (B. bison athabascae; WB). One hundred bison samples were genotyped for 52,978 SNPs along with seven breeds of domestic bovine Bos taurus. Only 2,209 of the SNPs were polymorphic in the bison...

  14. Rekayasa Pengembangan Desain Motif Batik Khas Melayu

    Directory of Open Access Journals (Sweden)

    Eustasia Sri Murwati

    2016-04-01

    Full Text Available ABSTRAKPengembangan desain batik melalui rancang bangun perekayasaan desain menurut ragam hias Melayu meliputi pengembangan motif dan proses, termasuk pemilihan komposisi warna. Proses yang sering dilakukan yaitu proses celup, penghilangan lilin dan celup warna tumpangan atau proses colet, celup, penghilangan lilin atau celup kemudian penghilangan lilin yang disebut Batik Kelengan. Setiap pulau di Indonesia mempunyai ciri khas budaya dan kesenian yang dikenal dengan corak/ragam hias khas daerah, juga ornamen yang diminati oleh masyarakat dari daerah tersebut atau dari daerah lain. Kondisi demikian mendorong pertumbuhan industri kerajinan yang memanfaatkan unsur–unsur seni. Adapun motif yang diperoleh adalah: Ayam Berlaga, Bungo Matahari, Kuntum Bersanding, Lancang Kuning, Encong Kerinci, Durian Pecah, Bungo Bintang, Bungo Pauh Kecil, Riang-riang, Bungo Nagaro. Pengembangan desain tersebut dipilih 3 produk terbaik yang dinilai oleh 5 penilai yang ahli di bidang desain batik, yaitu motif Durian Pecah, Ayam Berlaga, dan Bungo Matahari. Rancang bangun diversifikasi desain dengan memanfaatkan unsur–unsur seni dan ketrampilan etnis Melayu yaitu pemilihan ragam hias dan motif batik Melayu untuk diterapkan ke bahan sandang dengan komposisi warna yang menarik, sehingga produk memenuhi selera konsumen. Memperbaiki keberagaman batik dengan meningkatkan desain produk antara lain menuangkan ragam hias Melayu ke dalam proses batik yang menggunakan berbagai macam warna sehingga komposisi warna memadai. Diperoleh hasil produk batik dengan ragam hias Melayu yang berkualitas dan komposisi warna yang sesuai dengan karakter ragam hias Melayu. Rancang bangun desain produk untuk mendapatkan formulasi desain serta kelayakan prosesnya dengan penekanan pada teknologi akrab lingkungan dilaksanakan dengan alternatif pendekatan yaitu penciptaan desain bentuk baru.Kata kunci: desain, batik, rancang bangun, ragam hias, MelayuABSTRACTDevelopment of batik design through

  15. Motif decomposition of the phosphotyrosine proteome reveals a new N-terminal binding motif for SHIP2

    DEFF Research Database (Denmark)

    Miller, Martin Lee; Hanke, S.; Hinsby, A. M.

    2008-01-01

    set of 481 unique phosphotyrosine (Tyr(P)) peptides by sequence similarity to known ligands of the Src homology 2 (SH2) and the phosphotyrosine binding (PTB) domains. From 20 clusters we extracted 16 known and four new interaction motifs. Using quantitative mass spectrometry we pulled down Tyr......(P)-specific binding partners for peptides corresponding to the extracted motifs. We confirmed numerous previously known interaction motifs and found 15 new interactions mediated by phosphosites not previously known to bind SH2 or PTB. Remarkably, a novel hydrophobic N-terminal motif ((L/V/I)(L/V/I)pY) was identified...

  16. The limits of de novo DNA motif discovery.

    Directory of Open Access Journals (Sweden)

    David Simcha

    Full Text Available A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify "motifs" that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery-searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA "background" sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are "too null," resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where "ground truth" is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced "over-fitting" in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of

  17. Complete motif analysis of sequence requirements for translation initiation at non-AUG start codons.

    Science.gov (United States)

    Diaz de Arce, Alexander J; Noderer, William L; Wang, Clifford L

    2018-01-25

    The initiation of mRNA translation from start codons other than AUG was previously believed to be rare and of relatively low impact. More recently, evidence has suggested that as much as half of all translation initiation utilizes non-AUG start codons, codons that deviate from AUG by a single base. Furthermore, non-AUG start codons have been shown to be involved in regulation of expression and disease etiology. Yet the ability to gauge expression based on the sequence of a translation initiation site (start codon and its flanking bases) has been limited. Here we have performed a comprehensive analysis of translation initiation sites that utilize non-AUG start codons. By combining genetic-reporter, cell-sorting, and high-throughput sequencing technologies, we have analyzed the expression associated with all possible variants of the -4 to +4 positions of non-AUG translation initiation site motifs. This complete motif analysis revealed that 1) with the right sequence context, certain non-AUG start codons can generate expression comparable to that of AUG start codons, 2) sequence context affects each non-AUG start codon differently, and 3) initiation at non-AUG start codons is highly sensitive to changes in the flanking sequences. Complete motif analysis has the potential to be a key tool for experimental and diagnostic genomics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. A whole genome association study to detect additive and dominant single nucleotide polymorphisms for growth and carcass traits in Korean native cattle, Hanwoo

    Directory of Open Access Journals (Sweden)

    Yi Li

    2017-01-01

    Full Text Available Objective A whole genome association study was conducted to identify single nucleotide polymorphisms (SNPs with additive and dominant effects for growth and carcass traits in Korean native cattle, Hanwoo. Methods The data set comprised 61 sires and their 486 Hanwoo steers that were born between spring of 2005 and fall of 2007. The steers were genotyped with the 35,968 SNPs that were embedded in the Illumina bovine SNP 50K beadchip and six growth and carcass quality traits were measured for the steers. A series of lack-of-fit tests between the models was applied to classify gene expression pattern as additive or dominant. Results A total of 18 (0, 15 (3, 12 (8, 15 (18, 11 (7, and 21 (1 SNPs were detected at the 5% chromosome (genome - wise level for weaning weight (WWT, yearling weight (YWT, carcass weight (CWT, backfat thickness (BFT, longissimus dorsi muscle area (LMA and marbling score, respectively. Among the significant 129 SNPs, 56 SNPs had additive effects, 20 SNPs dominance effects, and 53 SNPs both additive and dominance effects, suggesting that dominance inheritance mode be considered in genetic improvement for growth and carcass quality in Hanwoo. The significant SNPs were located at 33 quantitative trait locus (QTL regions on 18 Bos Taurus chromosomes (i.e. BTA 3, 4, 5, 6, 7, 9, 11, 12, 13, 14, 16, 17, 18, 20, 23, 26, 28, and 29 were detected. There is strong evidence that BTA14 is the key chromosome affecting CWT. Also, BTA20 is the key chromosome for almost all traits measured (WWT, YWT, LMA. Conclusion The application of various additive and dominance SNP models enabled better characterization of SNP inheritance mode for growth and carcass quality traits in Hanwoo, and many of the detected SNPs or QTL had dominance effects, suggesting that dominance be considered for the whole-genome SNPs data and implementation of successive molecular breeding schemes in Hanwoo.

  19. Genomics using the Assembly of the Mink Genome

    DEFF Research Database (Denmark)

    Guldbrandtsen, Bernt; Cai, Zexi; Sahana, Goutam

    2018-01-01

    The American Mink’s (Neovison vison) genome has recently been sequenced. This opens numerous avenues of research both for studying the basic genetics and physiology of the mink as well as genetic improvement in mink. Using genotyping-by-sequencing (GBS) generated marker data for 2,352 Danish farm...... mink runs of homozygosity (ROH) were detect in mink genomes. Detectable ROH made up on average 1.7% of the genome indicating the presence of at most a moderate level of genomic inbreeding. The fraction of genome regions found in ROH varied. Ten percent of the included regions were never found in ROH....... The ability to detect ROH in the mink genome also demonstrates the general reliability of the new mink genome assembly. Keywords: american mink, run of homozygosity, genome, selection, genomic inbreeding...

  20. Assessing local structure motifs using order parameters for motif recognition, interstitial identification, and diffusion path characterization

    Science.gov (United States)

    Zimmermann, Nils E. R.; Horton, Matthew K.; Jain, Anubhav; Haranczyk, Maciej

    2017-11-01

    Structure-property relationships form the basis of many design rules in materials science, including synthesizability and long-term stability of catalysts, control of electrical and optoelectronic behavior in semiconductors as well as the capacity of and transport properties in cathode materials for rechargeable batteries. The immediate atomic environments (i.e., the first coordination shells) of a few atomic sites are often a key factor in achieving a desired property. Some of the most frequently encountered coordination patterns are tetrahedra, octahedra, body and face-centered cubic as well as hexagonal closed packed-like environments. Here, we showcase the usefulness of local order parameters to identify these basic structural motifs in inorganic solid materials by developing classification criteria. We introduce a systematic testing framework, the Einstein crystal test rig, that probes the response of order parameters to distortions in perfect motifs to validate our approach. Subsequently, we highlight three important application cases. First, we map basic crystal structure information of a large materials database in an intuitive manner by screening the Materials Project (MP) database (61,422 compounds) for element-specific motif distributions. Second, we use the structure-motif recognition capabilities to automatically find interstitials in metals, semiconductor, and insulator materials. Our Interstitialcy Finding Tool (InFiT) facilitates high-throughput screenings of defect properties. Third, the order parameters are reliable and compact quantitative structure descriptors for characterizing diffusion hops of intercalants as our example of magnesium in MnO2-spinel indicates. Finally, the tools developed in our work are readily and freely available as software implementations in the pymatgen library, and we expect them to be further applied to machine-learning approaches for emerging applications in materials science.

  1. Assessing Local Structure Motifs Using Order Parameters for Motif Recognition, Interstitial Identification, and Diffusion Path Characterization

    Directory of Open Access Journals (Sweden)

    Nils E. R. Zimmermann

    2017-11-01

    Full Text Available Structure–property relationships form the basis of many design rules in materials science, including synthesizability and long-term stability of catalysts, control of electrical and optoelectronic behavior in semiconductors, as well as the capacity of and transport properties in cathode materials for rechargeable batteries. The immediate atomic environments (i.e., the first coordination shells of a few atomic sites are often a key factor in achieving a desired property. Some of the most frequently encountered coordination patterns are tetrahedra, octahedra, body and face-centered cubic as well as hexagonal close packed-like environments. Here, we showcase the usefulness of local order parameters to identify these basic structural motifs in inorganic solid materials by developing classification criteria. We introduce a systematic testing framework, the Einstein crystal test rig, that probes the response of order parameters to distortions in perfect motifs to validate our approach. Subsequently, we highlight three important application cases. First, we map basic crystal structure information of a large materials database in an intuitive manner by screening the Materials Project (MP database (61,422 compounds for element-specific motif distributions. Second, we use the structure-motif recognition capabilities to automatically find interstitials in metals, semiconductor, and insulator materials. Our Interstitialcy Finding Tool (InFiT facilitates high-throughput screenings of defect properties. Third, the order parameters are reliable and compact quantitative structure descriptors for characterizing diffusion hops of intercalants as our example of magnesium in MnO2-spinel indicates. Finally, the tools developed in our work are readily and freely available as software implementations in the pymatgen library, and we expect them to be further applied to machine-learning approaches for emerging applications in materials science.

  2. Identification, characterization, and utilization of genome-wide simple sequence repeats to identify a QTL for acidity in apple

    Science.gov (United States)

    2012-01-01

    Background Apple is an economically important fruit crop worldwide. Developing a genetic linkage map is a critical step towards mapping and cloning of genes responsible for important horticultural traits in apple. To facilitate linkage map construction, we surveyed and characterized the distribution and frequency of perfect microsatellites in assembled contig sequences of the apple genome. Results A total of 28,538 SSRs have been identified in the apple genome, with an overall density of 40.8 SSRs per Mb. Di-nucleotide repeats are the most frequent microsatellites in the apple genome, accounting for 71.9% of all microsatellites. AT/TA repeats are the most frequent in genomic regions, accounting for 38.3% of all the G-SSRs, while AG/GA dimers prevail in transcribed sequences, and account for 59.4% of all EST-SSRs. A total set of 310 SSRs is selected to amplify eight apple genotypes. Of these, 245 (79.0%) are found to be polymorphic among cultivars and wild species tested. AG/GA motifs in genomic regions have detected more alleles and higher PIC values than AT/TA or AC/CA motifs. Moreover, AG/GA repeats are more variable than any other dimers in apple, and should be preferentially selected for studies, such as genetic diversity and linkage map construction. A total of 54 newly developed apple SSRs have been genetically mapped. Interestingly, clustering of markers with distorted segregation is observed on linkage groups 1, 2, 10, 15, and 16. A QTL responsible for malic acid content of apple fruits is detected on linkage group 8, and accounts for ~13.5% of the observed phenotypic variation. Conclusions This study demonstrates that di-nucleotide repeats are prevalent in the apple genome and that AT/TA and AG/GA repeats are the most frequent in genomic and transcribed sequences of apple, respectively. All SSR motifs identified in this study as well as those newly mapped SSRs will serve as valuable resources for pursuing apple genetic studies, aiding the apple breeding

  3. Genome-wide detection of CNVs in Chinese indigenous sheep with different types of tails using ovine high-density 600K SNP arrays

    OpenAIRE

    Zhu, Caiye; Fan, Hongying; Yuan, Zehu; Hu, Shijin; Ma, Xiaomeng; Xuan, Junli; Wang, Hongwei; Zhang, Li; Wei, Caihong; Zhang, Qin; Zhao, Fuping; Du, Lixin

    2016-01-01

    Chinese indigenous sheep can be classified into three types based on tail morphology: fat-tailed, fat-rumped, and thin-tailed sheep, of which the typical breeds are large-tailed Han sheep, Altay sheep, and Tibetan sheep, respectively. To unravel the genetic mechanisms underlying the phenotypic differences among Chinese indigenous sheep with tails of three different types, we used ovine high-density 600K SNP arrays to detect genome-wide copy number variation (CNV). In large-tailed Han sheep, A...

  4. Comparative genomic hybridization analysis detects frequent over-representation of DNA sequences at 3q, 7p, 8q and 18q in head and neck carcinomas

    DEFF Research Database (Denmark)

    Bergamo, N A; Rogatto, S R; Poli-Frederico, R C

    2000-01-01

    Comparative genomic hybridization (CGH) was used to identify chromosomal imbalances in 19 samples of squamous cell carcinoma of the head and neck (HNSCC). The chromosome arms most often over-represented were 3q (48%), 8q (42%), and 7p (32%); in many cases, these changes were observed at high copy...... and 2q material were detected in patients exhibiting a clinical history of recurrence and/or metastasis followed by terminal disease. This association suggests that gain of 1q and 2q may be a new marker of head and neck tumors with a refractory clinical response....

  5. Probing structural changes of self assembled i-motif DNA

    KAUST Repository

    Lee, Iljoon; Patil, Sachin; Fhayli, Karim; Alsaiari, Shahad K.; Khashab, Niveen M.

    2015-01-01

    We report an i-motif structural probing system based on Thioflavin T (ThT) as a fluorescent sensor. This probe can discriminate the structural changes of RET and Rb i-motif sequences according to pH change. This journal is

  6. The identification of functional motifs in temporal gene expression analysis

    Directory of Open Access Journals (Sweden)

    Michael G. Surette

    2005-01-01

    Full Text Available The identification of transcription factor binding sites is essential to the understanding of the regulation of gene expression and the reconstruction of genetic regulatory networks. The in silico identification of cis-regulatory motifs is challenging due to sequence variability and lack of sufficient data to generate consensus motifs that are of quantitative or even qualitative predictive value. To determine functional motifs in gene expression, we propose a strategy to adopt false discovery rate (FDR and estimate motif effects to evaluate combinatorial analysis of motif candidates and temporal gene expression data. The method decreases the number of predicted motifs, which can then be confirmed by genetic analysis. To assess the method we used simulated motif/expression data to evaluate parameters. We applied this approach to experimental data for a group of iron responsive genes in Salmonella typhimurium 14028S. The method identified known and potentially new ferric-uptake regulator (Fur binding sites. In addition, we identified uncharacterized functional motif candidates that correlated with specific patterns of expression. A SAS code for the simulation and analysis gene expression data is available from the first author upon request.

  7. RNA recognition motif (RRM)-containing proteins in Bombyx mori

    African Journals Online (AJOL)

    STORAGESEVER

    2009-03-20

    Mar 20, 2009 ... Recognition Motif (RRM), sometimes referred to as. RNP1, is one of the first identified domains for RNA interaction. RRM is very common ..... Apart from the RRM motif, eIF3-S9 has a Trp-Asp. (WD) repeat domain, Poly (A) ...

  8. BlockLogo: Visualization of peptide and sequence motif conservation

    DEFF Research Database (Denmark)

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian

    2013-01-01

    BlockLogo is a web-server application for the visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, se...

  9. Genomics-based non-invasive prenatal testing for detection of fetal chromosomal aneuploidy in pregnant women.

    Science.gov (United States)

    Badeau, Mylène; Lindsay, Carmen; Blais, Jonatan; Nshimyumukiza, Leon; Takwoingi, Yemisi; Langlois, Sylvie; Légaré, France; Giguère, Yves; Turgeon, Alexis F; Witteman, William; Rousseau, François

    2017-11-10

    pooled analyses (246 T21 cases, 112 T18 cases, 20 T13 cases and 4282 unaffected pregnancies), the clinical sensitivity (95% CI) of TMPS was 99.2% (96.8% to 99.8%), 98.2% (93.1% to 99.6%), 100% (83.9% to 100%) and 92.4% (84.1% to 96.5%) for T21, T18, T13 and 45,X respectively. The clinical specificities were above 100% for T21, T18 and T13 and 99.8% (98.3% to 100%) for 45,X. Indirect comparisons of MPSS and TMPS for T21, T18 and 45,X showed no statistical difference in clinical sensitivity, clinical specificity or both. Due to limited data, comparative meta-analysis of MPSS and TMPS was not possible for T13.We were unable to perform meta-analyses of gNIPT for 47,XXX, 47,XXY and 47,XYY because there were very few or no studies in one or more risk groups. These results show that MPSS and TMPS perform similarly in terms of clinical sensitivity and specificity for the detection of fetal T31, T18, T13 and sex chromosome aneuploidy (SCA). However, no study compared the two approaches head-to-head in the same cohort of patients. The accuracy of gNIPT as a prenatal screening test has been mainly evaluated as a second-tier screening test to identify pregnancies at very low risk of fetal aneuploidies (T21, T18 and T13), thus avoiding invasive procedures. Genomics-based non-invasive prenatal testing methods appear to be sensitive and highly specific for detection of fetal trisomies 21, 18 and 13 in high-risk populations. There is paucity of data on the accuracy of gNIPT as a first-tier aneuploidy screening test in a population of unselected pregnant women. With respect to the replacement of invasive tests, the performance of gNIPT observed in this review is not sufficient to replace current invasive diagnostic tests.We conclude that given the current data on the performance of gNIPT, invasive fetal karyotyping is still the required diagnostic approach to confirm the presence of a chromosomal abnormality prior to making irreversible decisions relative to the pregnancy outcome

  10. Third International E. coli genome meeting

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1994-12-31

    Proceedings of the Third E. Coli Genome Meeting are provided. Presentations were divided into sessions entitled (1) Large Scale Sequencing, Sequence Analysis; (2) Databases; (3) Sequence Analysis; (4) Sequence Divergence in E. coli Strains; (5) Repeated Sequences and Regulatory Motifs; (6) Mutations, Rearrangements and Stress Responses; and (7) Origins of New Genes. The document provides a collection of abstracts of oral and poster presentations.

  11. Identification of sequence motifs significantly associated with antisense activity

    Directory of Open Access Journals (Sweden)

    Peek Andrew S

    2007-06-01

    Full Text Available Abstract Background Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features. Results We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs. Conclusion The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic

  12. Survey and analysis of simple sequence repeats in the Laccaria bicolor genome, with development of microsatellite markers

    Energy Technology Data Exchange (ETDEWEB)

    Labbe, Jessy L [ORNL; Murat, Claude [INRA, Nancy, France; Morin, Emmanuelle [INRA, Nancy, France; Le Tacon, F [UMR, France; Martin, Francis [INRA, Nancy, France

    2011-01-01

    It is becoming clear that simple sequence repeats (SSRs) play a significant role in fungal genome organization, and they are a large source of genetic markers for population genetics and meiotic maps. We identified SSRs in the Laccaria bicolor genome by in silico survey and analyzed their distribution in the different genomic regions. We also compared the abundance and distribution of SSRs in L. bicolor with those of the following fungal genomes: Phanerochaete chrysosporium, Coprinopsis cinerea, Ustilago maydis, Cryptococcus neoformans, Aspergillus nidulans, Magnaporthe grisea, Neurospora crassa and Saccharomyces cerevisiae. Using the MISA computer program, we detected 277,062 SSRs in the L. bicolor genome representing 8% of the assembled genomic sequence. Among the analyzed basidiomycetes, L. bicolor exhibited the highest SSR density although no correlation between relative abundance and the genome sizes was observed. In most genomes the short motifs (mono- to trinucleotides) were more abundant than the longer repeated SSRs. Generally, in each organism, the occurrence, relative abundance, and relative density of SSRs decreased as the repeat unit increased. Furthermore, each organism had its own common and longest SSRs. In the L. bicolor genome, most of the SSRs were located in intergenic regions (73.3%) and the highest SSR density was observed in transposable elements (TEs; 6,706 SSRs/Mb). However, 81% of the protein-coding genes contained SSRs in their exons, suggesting that SSR polymorphism may alter gene phenotypes. Within a L. bicolor offspring, sequence polymorphism of 78 SSRs was mainly detected in non-TE intergenic regions. Unlike previously developed microsatellite markers, these new ones are spread throughout the genome; these markers could have immediate applications in population genetics.

  13. IQCJ-SCHIP1, a novel fusion transcript encoding a calmodulin-binding IQ motif protein

    International Nuclear Information System (INIS)

    Kwasnicka-Crawford, Dorota A.; Carson, Andrew R.; Scherer, Stephen W.

    2006-01-01

    The existence of transcripts that span two adjacent, independent genes is considered rare in the human genome. This study characterizes a novel human fusion gene named IQCJ-SCHIP1. IQCJ-SCHIP1 is the longest isoform of a complex transcriptional unit that bridges two separate genes that encode distinct proteins, IQCJ, a novel IQ motif containing protein and SCHIP1, a schwannomin interacting protein that has been previously shown to interact with the Neurofibromatosis type 2 (NF2) protein. IQCJ-SCHIP1 is located on the chromosome 3q25 and comprises a 1692-bp transcript encompassing 11 exons spanning 828 kb of the genomic DNA. We show that IQCJ-SCHIP1 mRNA is highly expressed in the brain. Protein encoded by the IQCJ-SCHIP1 gene was localized to cytoplasm and actin-rich regions and in differentiated PC12 cells was also seen in neurite extensions

  14. Novel Strategy for Discrimination of Transcription Factor Binding Motifs Employing Mathematical Neural Network

    Science.gov (United States)

    Sugimoto, Asuka; Sumi, Takuya; Kang, Jiyoung; Tateno, Masaru

    2017-07-01

    Recognition in biological macromolecular systems, such as DNA-protein recognition, is one of the most crucial problems to solve toward understanding the fundamental mechanisms of various biological processes. Since specific base sequences of genome DNA are discriminated by proteins, such as transcription factors (TFs), finding TF binding motifs (TFBMs) in whole genome DNA sequences is currently a central issue in interdisciplinary biophysical and information sciences. In the present study, a novel strategy to create a discriminant function for discrimination of TFBMs by constituting mathematical neural networks (NNs) is proposed, together with a method to determine the boundary of signals (TFBMs) and noise in the NN-score (output) space. This analysis also leads to the mathematical limitation of discrimination in the recognition of features representing TFBMs, in an information geometrical manifold. Thus, the present strategy enables the identification of the whole space of TFBMs, right up to the noise boundary.

  15. Genome-wide analysis of tandem repeats in plants and green algae

    Science.gov (United States)

    Zhixin Zhao; Cheng Guo; Sreeskandarajan Sutharzan; Pei Li; Craig Echt; Jie Zhang; Chun Liang

    2014-01-01

    Tandem repeats (TRs) extensively exist in the genomes of prokaryotes and eukaryotes. Based on the sequenced genomes and gene annotations of 31 plant and algal species in Phytozome version 8.0 (http://www.phytozome.net/), we examined TRs in a genome-wide scale, characterized their distributions and motif features, and explored their putative biological functions. Among...

  16. Detection of Hereditary 1,25-Hydroxyvitamin D-Resistant Rickets Caused by Uniparental Disomy of Chromosome 12 Using Genome-Wide Single Nucleotide Polymorphism Array.

    Directory of Open Access Journals (Sweden)

    Mayuko Tamura

    Full Text Available Hereditary 1,25-dihydroxyvitamin D-resistant rickets (HVDRR is an autosomal recessive disease caused by biallelic mutations in the vitamin D receptor (VDR gene. No patients have been reported with uniparental disomy (UPD.Using genome-wide single nucleotide polymorphism (SNP array to confirm whether HVDRR was caused by UPD of chromosome 12.A 2-year-old girl with alopecia and short stature and without any family history of consanguinity was diagnosed with HVDRR by typical laboratory data findings and clinical features of rickets. Sequence analysis of VDR was performed, and the origin of the homozygous mutation was investigated by target SNP sequencing, short tandem repeat analysis, and genome-wide SNP array.The patient had a homozygous p.Arg73Ter nonsense mutation. Her mother was heterozygous for the mutation, but her father was negative. We excluded gross deletion of the father's allele or paternal discordance. Genome-wide SNP array of the family (the patient and her parents showed complete maternal isodisomy of chromosome 12. She was successfully treated with high-dose oral calcium.This is the first report of HVDRR caused by UPD, and the third case of complete UPD of chromosome 12, in the published literature. Genome-wide SNP array was useful for detecting isodisomy and the parental origin of the allele. Comprehensive examination of the homozygous state is essential for accurate genetic counseling of recurrence risk and appropriate monitoring for other chromosome 12 related disorders. Furthermore, oral calcium therapy was effective as an initial treatment for rickets in this instance.

  17. Detection and genome analysis of a lineage III peste des petits ruminants virus in Kenya in 2011

    International Nuclear Information System (INIS)

    Dundon, W.G.; Kihu, S.M.; Gitao, G.C.; Bebora, L.C.; John, N.M.; Ogugi, J.O.; Loitsch, A.; Diallo, A.

    2016-01-01

    Full text: In May 2011 in Turkana County, north-western Kenya, tissue samples were collected from goats suspected of having died of peste des petits ruminant (PPR) disease, an acute viral disease of small ruminants. The samples were processed and tested by reverse transcriptase PCR for the presence of PPR viral RNA. The positive samples were sequenced and identified as belonging to peste des petits ruminants virus (PPRV) lineage III. Full-genome analysis of one of the positive samples revealed that the virus causing disease in Kenya in 2011 was 95.7% identical to the full genome of a virus isolated in Uganda in 2012 and that a segment of the viral fusion gene was 100% identical to that of a virus circulating in Tanzania in 2013. These data strongly indicate transboundary movement of lineage III viruses between Eastern Africa countries and have significant implications for surveillance and control of this important disease as it moves southwards in Africa. (author)

  18. Detection of Multiple Parallel Transmission Outbreak of Streptococcus suis Human Infection by Use of Genome Epidemiology, China, 2005.

    Science.gov (United States)

    Du, Pengcheng; Zheng, Han; Zhou, Jieping; Lan, Ruiting; Ye, Changyun; Jing, Huaiqi; Jin, Dong; Cui, Zhigang; Bai, Xuemei; Liang, Jianming; Liu, Jiantao; Xu, Lei; Zhang, Wen; Chen, Chen; Xu, Jianguo

    2017-02-01

    Streptococcus suis sequence type 7 emerged and caused 2 of the largest human infection outbreaks in China in 1998 and 2005. To determine the major risk factors and source of the infections, we analyzed whole genomes of 95 outbreak-associated isolates, identified 160 single nucleotide polymorphisms, and classified them into 6 clades. Molecular clock analysis revealed that clade 1 (responsible for the 1998 outbreak) emerged in October 1997. Clades 2-6 (responsible for the 2005 outbreak) emerged separately during February 2002-August 2004. A total of 41 lineages of S. suis emerged by the end of 2004 and rapidly expanded to 68 genome types through single base mutations when the outbreak occurred in June 2005. We identified 32 identical isolates and classified them into 8 groups, which were distributed in a large geographic area with no transmission link. These findings suggest that persons were infected in parallel in respective geographic sites.

  19. Genome-wide analysis of putative peroxiredoxin in unicellular and filamentous cyanobacteria.

    Science.gov (United States)

    Cui, Hongli; Wang, Yipeng; Wang, Yinchu; Qin, Song

    2012-11-16

    Cyanobacteria are photoautotrophic prokaryotes with wide variations in genome sizes and ecological habitats. Peroxiredoxin (PRX) is an important protein that plays essential roles in protecting own cells against reactive oxygen species (ROS). PRXs have been identified from mammals, fungi and higher plants. However, knowledge on cyanobacterial PRXs still remains obscure. With the availability of 37 sequenced cyanobacterial genomes, we performed a comprehensive comparative analysis of PRXs and explored their diversity, distribution, domain structure and evolution. Overall 244 putative prx genes were identified, which were abundant in filamentous diazotrophic cyanobacteria, Acaryochloris marina MBIC 11017, and unicellular cyanobacteria inhabiting freshwater and hot-springs, while poor in all Prochlorococcus and marine Synechococcus strains. Among these putative genes, 25 open reading frames (ORFs) encoding hypothetical proteins were identified as prx gene family members and the others were already annotated as prx genes. All 244 putative PRXs were classified into five major subfamilies (1-Cys, 2-Cys, BCP, PRX5_like, and PRX-like) according to their domain structures. The catalytic motifs of the cyanobacterial PRXs were similar to those of eukaryotic PRXs and highly conserved in all but the PRX-like subfamily. Classical motif (CXXC) of thioredoxin was detected in protein sequences from the PRX-like subfamily. Phylogenetic tree constructed of catalytic domains coincided well with the domain structures of PRXs and the phylogenies based on 16s rRNA. The distribution of genes encoding PRXs in different unicellular and filamentous cyanobacteria especially those sub-families like PRX-like or 1-Cys PRX correlate with the genome size, eco-physiology, and physiological properties of the organisms. Cyanobacterial and eukaryotic PRXs share similar conserved motifs, indicating that cyanobacteria adopt similar catalytic mechanisms as eukaryotes. All cyanobacterial PRX proteins

  20. Powdery mildew fungal effector candidates share N-terminal Y/F/WxC-motif

    Directory of Open Access Journals (Sweden)

    Emmersen Jeppe

    2010-05-01

    Full Text Available Abstract Background Powdery mildew and rust fungi are widespread, serious pathogens that depend on developing haustoria in the living plant cells. Haustoria are separated from the host cytoplasm by a plant cell-derived extrahaustorial membrane. They secrete effector proteins, some of which are subsequently transferred across this membrane to the plant cell to suppress defense. Results In a cDNA library from barley epidermis containing powdery mildew haustoria, two-thirds of the sequenced ESTs were fungal and represented ~3,000 genes. Many of the most highly expressed genes encoded small proteins with N-terminal signal peptides. While these proteins are novel and poorly related, they do share a three-amino acid motif, which we named "Y/F/WxC", in the N-terminal of the mature proteins. The first amino acid of this motif is aromatic: tyrosine, phenylalanine or tryptophan, and the last is always cysteine. In total, we identified 107 such proteins, for which the ESTs represent 19% of the fungal clones in our library, suggesting fundamental roles in haustoria function. While overall sequence similarity between the powdery mildew Y/F/WxC-proteins is low, they do have a highly similar exon-intron structure, suggesting they have a common origin. Interestingly, searches of public fungal genome and EST databases revealed that haustoria-producing rust fungi also encode large numbers of novel, short proteins with signal peptides and the Y/F/WxC-motif. No significant numbers of such proteins were identified from genome and EST sequences from either fungi which do not produce haustoria or from haustoria-producing Oomycetes. Conclusion In total, we identified 107, 178 and 57 such Y/F/WxC-proteins from the barley powdery mildew, the wheat stem rust and the wheat leaf rust fungi, respectively. All together, our findings suggest the Y/F/WxC-proteins to be a new class of effectors from haustoria-producing pathogenic fungi.

  1. Inspecting Targeted Deep Sequencing of Whole Genome Amplified DNA Versus Fresh DNA for Somatic Mutation Detection: A Genetic Study in Myelodysplastic Syndrome Patients.

    Science.gov (United States)

    Palomo, Laura; Fuster-Tormo, Francisco; Alvira, Daniel; Ademà, Vera; Armengol, María Pilar; Gómez-Marzo, Paula; de Haro, Nuri; Mallo, Mar; Xicoy, Blanca; Zamora, Lurdes; Solé, Francesc

    2017-08-01

    Whole genome amplification (WGA) has become an invaluable method for preserving limited samples of precious stock material and has been used during the past years as an alternative tool to increase the amount of DNA before library preparation for next-generation sequencing. Myelodysplastic syndromes (MDS) are a group of clonal hematopoietic stem cell disorders characterized by presenting somatic mutations in several myeloid-related genes. In this work, targeted deep sequencing has been performed on four paired fresh DNA and WGA DNA samples from bone marrow of MDS patients, to assess the feasibility of using WGA DNA for detecting somatic mutations. The results of this study highlighted that, in general, the sequencing and alignment statistics of fresh DNA and WGA DNA samples were similar. However, after variant calling and when considering variants detected at all frequencies, there was a high level of discordance between fresh DNA and WGA DNA (overall, a higher number of variants was detected in WGA DNA). After proper filtering, a total of three somatic mutations were detected in the cohort. All somatic mutations detected in fresh DNA were also identified in WGA DNA and validated by whole exome sequencing.

  2. Motif statistics and spike correlations in neuronal networks

    International Nuclear Information System (INIS)

    Hu, Yu; Shea-Brown, Eric; Trousdale, James; Josić, Krešimir

    2013-01-01

    Motifs are patterns of subgraphs of complex networks. We studied the impact of such patterns of connectivity on the level of correlated, or synchronized, spiking activity among pairs of cells in a recurrent network of integrate and fire neurons. For a range of network architectures, we find that the pairwise correlation coefficients, averaged across the network, can be closely approximated using only three statistics of network connectivity. These are the overall network connection probability and the frequencies of two second order motifs: diverging motifs, in which one cell provides input to two others, and chain motifs, in which two cells are connected via a third intermediary cell. Specifically, the prevalence of diverging and chain motifs tends to increase correlation. Our method is based on linear response theory, which enables us to express spiking statistics using linear algebra, and a resumming technique, which extrapolates from second order motifs to predict the overall effect of coupling on network correlation. Our motif-based results seek to isolate the effect of network architecture perturbatively from a known network state. (paper)

  3. Computational analyses of synergism in small molecular network motifs.

    Directory of Open Access Journals (Sweden)

    Yili Zhang

    2014-03-01

    Full Text Available Cellular functions and responses to stimuli are controlled by complex regulatory networks that comprise a large diversity of molecular components and their interactions. However, achieving an intuitive understanding of the dynamical properties and responses to stimuli of these networks is hampered by their large scale and complexity. To address this issue, analyses of regulatory networks often focus on reduced models that depict distinct, reoccurring connectivity patterns referred to as motifs. Previous modeling studies have begun to characterize the dynamics of small motifs, and to describe ways in which variations in parameters affect their responses to stimuli. The present study investigates how variations in pairs of parameters affect responses in a series of ten common network motifs, identifying concurrent variations that act synergistically (or antagonistically to alter the responses of the motifs to stimuli. Synergism (or antagonism was quantified using degrees of nonlinear blending and additive synergism. Simulations identified concurrent variations that maximized synergism, and examined the ways in which it was affected by stimulus protocols and the architecture of a motif. Only a subset of architectures exhibited synergism following paired changes in parameters. The approach was then applied to a model describing interlocked feedback loops governing the synthesis of the CREB1 and CREB2 transcription factors. The effects of motifs on synergism for this biologically realistic model were consistent with those for the abstract models of single motifs. These results have implications for the rational design of combination drug therapies with the potential for synergistic interactions.

  4. Triadic motifs in the dependence networks of virtual societies

    Science.gov (United States)

    Xie, Wen-Jie; Li, Ming-Xia; Jiang, Zhi-Qiang; Zhou, Wei-Xing

    2014-06-01

    In friendship networks, individuals have different numbers of friends, and the closeness or intimacy between an individual and her friends is heterogeneous. Using a statistical filtering method to identify relationships about who depends on whom, we construct dependence networks (which are directed) from weighted friendship networks of avatars in more than two hundred virtual societies of a massively multiplayer online role-playing game (MMORPG). We investigate the evolution of triadic motifs in dependence networks. Several metrics show that the virtual societies evolved through a transient stage in the first two to three weeks and reached a relatively stable stage. We find that the unidirectional loop motif (M9) is underrepresented and does not appear, open motifs are also underrepresented, while other close motifs are overrepresented. We also find that, for most motifs, the overall level difference of the three avatars in the same motif is significantly lower than average, whereas the sum of ranks is only slightly larger than average. Our findings show that avatars' social status plays an important role in the formation of triadic motifs.

  5. Triadic motifs in the dependence networks of virtual societies.

    Science.gov (United States)

    Xie, Wen-Jie; Li, Ming-Xia; Jiang, Zhi-Qiang; Zhou, Wei-Xing

    2014-06-10

    In friendship networks, individuals have different numbers of friends, and the closeness or intimacy between an individual and her friends is heterogeneous. Using a statistical filtering method to identify relationships about who depends on whom, we construct dependence networks (which are directed) from weighted friendship networks of avatars in more than two hundred virtual societies of a massively multiplayer online role-playing game (MMORPG). We investigate the evolution of triadic motifs in dependence networks. Several metrics show that the virtual societies evolved through a transient stage in the first two to three weeks and reached a relatively stable stage. We find that the unidirectional loop motif (M9) is underrepresented and does not appear, open motifs are also underrepresented, while other close motifs are overrepresented. We also find that, for most motifs, the overall level difference of the three avatars in the same motif is significantly lower than average, whereas the sum of ranks is only slightly larger than average. Our findings show that avatars' social status plays an important role in the formation of triadic motifs.

  6. Genome-wide distribution comparative and composition analysis of the SSRs in Poaceae.

    Science.gov (United States)

    Wang, Yi; Yang, Chao; Jin, Qiaojun; Zhou, Dongjie; Wang, Shuangshuang; Yu, Yuanjie; Yang, Long

    2015-02-15

    The Poaceae family is of great importance to human beings since it comprises the cereal grasses which are the main sources for human food and animal feed. With the rapid growth of genomic data from Poaceae members, comparative genomics becomes a convinent method to study genetics of diffierent species. The SSRs (Simple Sequence Repeats) are widely used markers in the studies of Poaceae for their high abundance and stability. In this study, using the genomic sequences of 9 Poaceae species, we detected 11,993,943 SSR loci and developed 6,799,910 SSR primer pairs. The results show that SSRs are distributed on all the genomic elements in grass. Hexamer is the most frequent motif and AT/TA is the most frequent motif in dimer. The abundance of the SSRs has a positive linear relationship with the recombination rate. SSR sequences in the coding regions involve a higher GC content in the Poaceae than that in the other species. SSRs of 70-80 bp in length showed the highest AT/GC base ratio among all of these loci. The result shows the highest polymorphism rate belongs to the SSRs ranged from 30 bp to 40 bp. Using all the SSR primers of Japonica, nineteen universal primers were selected and located on the genome of the grass family. The information of SSR loci, the SSR primers and the tools of mining and analyzing SSR are provided in the PSSRD (Poaceae SSR Database, http://biodb.sdau.edu.cn/pssrd/). Our study and the PSSRD database provide a foundation for the comparative study in the Poaceae and it will accelerate the study on markers application, gene mapping and molecular breeding.

  7. Array-based assay detects genome-wide 5-mC and 5-hmC in the brains of humans, non-human primates, and mice.

    Science.gov (United States)

    Chopra, Pankaj; Papale, Ligia A; White, Andrew T J; Hatch, Andrea; Brown, Ryan M; Garthwaite, Mark A; Roseboom, Patrick H; Golos, Thaddeus G; Warren, Stephen T; Alisch, Reid S

    2014-02-13

    Methylation on the fifth position of cytosine (5-mC) is an essential epigenetic mark that is linked to both normal neurodevelopment and neurological diseases. The recent identification of another modified form of cytosine, 5-hydroxymethylcytosine (5-hmC), in both stem cells and post-mitotic neurons, raises new questions as to the role of this base in mediating epigenetic effects. Genomic studies of these marks using model systems are limited, particularly with array-based tools, because the standard method of detecting DNA methylation cannot distinguish between 5-mC and 5-hmC and most methods have been developed to only survey the human genome. We show that non-human data generated using the optimization of a widely used human DNA methylation array, designed only to detect 5-mC, reproducibly distinguishes tissue types within and between chimpanzee, rhesus, and mouse, with correlations near the human DNA level (R(2) > 0.99). Genome-wide methylation analysis, using this approach, reveals 6,102 differentially methylated loci between rhesus placental and fetal tissues with pathways analysis significantly overrepresented for developmental processes. Restricting the analysis to oncogenes and tumor suppressor genes finds 76 differentially methylated loci, suggesting that rhesus placental tissue carries a cancer epigenetic signature. Similarly, adapting the assay to detect 5-hmC finds highly reproducible 5-hmC levels within human, rhesus, and mouse brain tissue that is species-specific with a hierarchical abundance among the three species (human > rhesus > mouse). Annotation of 5-hmC with respect to gene structure reveals a significant prevalence in the 3'UTR and an association with chromatin-related ontological terms, suggesting an epigenetic feedback loop mechanism for 5-hmC. Together, these data show that this array-based methylation assay is generalizable to all mammals for the detection of both 5-mC and 5-hmC, greatly improving the utility of mammalian model systems

  8. A novel human AP endonuclease with conserved zinc-finger-like motifs involved in DNA strand break responses

    Science.gov (United States)

    Kanno, Shin-ichiro; Kuzuoka, Hiroyuki; Sasao, Shigeru; Hong, Zehui; Lan, Li; Nakajima, Satoshi; Yasui, Akira

    2007-01-01

    DNA damage causes genome instability and cell death, but many of the cellular responses to DNA damage still remain elusive. We here report a human protein, PALF (PNK and APTX-like FHA protein), with an FHA (forkhead-associated) domain and novel zinc-finger-like CYR (cysteine–tyrosine–arginine) motifs that are involved in responses to DNA damage. We found that the CYR motif is widely distributed among DNA repair proteins of higher eukaryotes, and that PALF, as well as a Drosophila protein with tandem CYR motifs, has endo- and exonuclease activities against abasic site and other types of base damage. PALF accumulates rapidly at single-strand breaks in a poly(ADP-ribose) polymerase 1 (PARP1)-dependent manner in human cells. Indeed, PALF interacts directly with PARP1 and is required for its activation and for cellular resistance to methyl-methane sulfonate. PALF also interacts directly with KU86, LIGASEIV and phosphorylated XRCC4 proteins and possesses endo/exonuclease activity at protruding DNA ends. Various treatments that produce double-strand breaks induce formation of PALF foci, which fully coincide with γH2AX foci. Thus, PALF and the CYR motif may play important roles in DNA repair of higher eukaryotes. PMID:17396150

  9. A Unique Set of the Burkholderia Collagen-Like Proteins Provides Insight into Pathogenesis, Genome Evolution and Niche Adaptation, and Infection Detection.

    Science.gov (United States)

    Bachert, Beth A; Choi, Soo J; Snyder, Anna K; Rio, Rita V M; Durney, Brandon C; Holland, Lisa A; Amemiya, Kei; Welkos, Susan L; Bozue, Joel A; Cote, Christopher K; Berisio, Rita; Lukomski, Slawomir

    2015-01-01

    Burkholderia pseudomallei and Burkholderia mallei, classified as category B priority pathogens, are significant human and animal pathogens that are highly infectious and broad-spectrum antibiotic resistant. Currently, the pathogenicity mechanisms utilized by Burkholderia are not fully understood, and correct diagnosis of B. pseudomallei and B. mallei infection remains a challenge due to limited detection methods. Here, we provide a comprehensive analysis of a set of 13 novel Burkholderia collagen-like proteins (Bucl) that were identified among B. pseudomallei and B. mallei select agents. We infer that several Bucl proteins participate in pathogenesis based on their noncollagenous domains that are associated with the components of a type III secretion apparatus and membrane transport systems. Homology modeling of the outer membrane efflux domain of Bucl8 points to a role in multi-drug resistance. We determined that bucl genes are widespread in B. pseudomallei and B. mallei; Fischer's exact test and Cramer's V2 values indicate that the majority of bucl genes are highly associated with these pathogenic species versus nonpathogenic B. thailandensis. We designed a bucl-based quantitative PCR assay which was able to detect B. pseudomallei infection in a mouse with a detection limit of 50 CFU. Finally, chromosomal mapping and phylogenetic analysis of bucl loci revealed considerable genomic plasticity and adaptation of Burkholderia spp. to host and environmental niches. In this study, we identified a large set of phylogenetically unrelated bucl genes commonly found in Burkholderia select agents, encoding predicted pathogenicity factors, detection targets, and vaccine candidates.

  10. A Study on the Motif Pattern of Dark-Cloud Cover in the Securities

    Directory of Open Access Journals (Sweden)

    Long Jing

    2017-01-01

    Full Text Available Morphological analysis is the analysis and mining of the graphics formed of the securities price changes. Investors need to forecast the trend of future before buying and selling points, which can avoid great loss. Therefore, the analysis of motif pattern of K-line in the form of futures investment technology analysis is very significant. Based on the thoughts of short-term trend clustering, this paper proposes a method of detecting the motif pattern of Dark-Cloud Cover in stock time series by analysing stock historic data and K-line shape, in order to predict the stock market trends. And we prove the effectiveness and practicality of the method by a series of experimental analysis.

  11. DNA mutation motifs in the genes associated with inherited diseases.

    Directory of Open Access Journals (Sweden)

    Michal Růžička

    Full Text Available Mutations in human genes can be responsible for inherited genetic disorders and cancer. Mutations can arise due to environmental factors or spontaneously. It has been shown that certain DNA sequences are more prone to mutate. These sites are termed hotspots and exhibit a higher mutation frequency than expected by chance. In contrast, DNA sequences with lower mutation frequencies than expected by chance are termed coldspots. Mutation hotspots are usually derived from a mutation spectrum, which reflects particular population where an effect of a common ancestor plays a role. To detect coldspots/hotspots unaffected by population bias, we analysed the presence of germline mutations obtained from HGMD database in the 5-nucleotide segments repeatedly occurring in genes associated with common inherited disorders, in particular, the PAH, LDLR, CFTR, F8, and F9 genes. Statistically significant sequences (mutational motifs rarely associated with mutations (coldspots and frequently associated with mutations (hotspots exhibited characteristic sequence patterns, e.g. coldspots contained purine tract while hotspots showed alternating purine-pyrimidine bases, often with the presence of CpG dinucleotide. Using molecular dynamics simulations and free energy calculations, we analysed the global bending properties of two selected coldspots and two hotspots with a G/T mismatch. We observed that the coldspots were inherently more flexible than the hotspots. We assume that this property might be critical for effective mismatch repair as DNA with a mutation recognized by MutSα protein is noticeably bent.

  12. Leader protein of encephalomyocarditis virus binds zinc, is phosphorylated during viral infection, and affects the efficiency of genome translation.

    Science.gov (United States)

    Dvorak, C M; Hall, D J; Hill, M; Riddle, M; Pranter, A; Dillman, J; Deibel, M; Palmenberg, A C

    2001-11-25

    Encephalomyocarditis virus (EMCV) is the prototype member of the cardiovirus genus of picornaviruses. For cardioviruses and the related aphthoviruses, the first protein segment translated from the plus-strand RNA genome is the Leader protein. The aphthovirus Leader (173-201 amino acids) is an autocatalytic papain-like protease that cleaves translation factor eIF-4G to shut off cap-dependent host protein synthesis during infection. The less characterized cardioviral Leader is a shorter protein (67-76 amino acids) and does not contain recognizable proteolytic motifs. Instead, these Leaders have sequences consistent with N-terminal zinc-binding motifs, centrally located tyrosine kinase phosphorylation sites, and C-terminal, acid-rich domains. Deletion mutations, removing the zinc motif, the acid domain, or both domains, were engineered into EMCV cDNAs. In all cases, the mutations gave rise to viable viruses, but the plaque phenotypes in HeLa cells were significantly smaller than for wild-type virus. RNA transcripts containing the Leader deletions had reduced capacity to direct protein synthesis in cell-free extracts and the products with deletions in the acid-rich domains were less effective substrates at the L/P1 site, for viral proteinase 3Cpro. Recombinant EMCV Leader (rL) was expressed in bacteria and purified to homogeneity. This protein bound zinc stoichiometrically, whereas protein with a deletion in the zinc motif was inactive. Polyclonal mouse sera, raised against rL, immunoprecipitated Leader-containing precursors from infected HeLa cell extracts, but did not detect significant pools of the mature Leader. However, additional reactions with antiphosphotyrosine antibodies show that the mature Leader, but not its precursors, is phosphorylated during viral infection. The data suggest the natural Leader may play a role in regulation of viral genome translation, perhaps through a triggering phosphorylation event.

  13. Surface antigens and potential virulence factors from parasites detected by comparative genomics of perfect amino acid repeats

    Directory of Open Access Journals (Sweden)

    Adler Joël

    2007-12-01

    Full Text Available Abstract Background Many parasitic organisms, eukaryotes as well as bacteria, possess surface antigens with amino acid repeats. Making up the interface between host and pathogen such repetitive proteins may be virulence factors involved in immune evasion or cytoadherence. They find immunological applications in serodiagnostics and vaccine development. Here we use proteins which contain perfect repeats as a basis for comparative genomics between parasitic and free-living organisms. Results We have developed Reptile http://reptile.unibe.ch, a program for proteome-wide probabilistic description of perfect repeats in proteins. Parasite proteomes exhibited a large variance regarding the proportion of repeat-containing proteins. Interestingly, there was a good correlation between the percentage of highly repetitive proteins and mean protein length in parasite proteomes, but not at all in the proteomes of free-living eukaryotes. Reptile combined with programs for the prediction of transmembrane domains and GPI-anchoring resulted in an effective tool for in silico identification of potential surface antigens and virulence factors from parasites. Conclusion Systemic surveys for perfect amino acid repeats allowed basic comparisons between free-living and parasitic organisms that were directly applicable to predict proteins of serological and parasitological importance. An on-line tool is available at http://genomics.unibe.ch/dora.

  14. A genome-wide association study to detect QTL for commercially important traits in Swiss Large White boars.

    Directory of Open Access Journals (Sweden)

    Doreen Becker

    Full Text Available The improvement of meat quality and production traits has high priority in the pork industry. Many of these traits show a low to moderate heritability and are difficult and expensive to measure. Their improvement by targeted breeding programs is challenging and requires knowledge of the genetic and molecular background. For this study we genotyped 192 artificial insemination boars of a commercial line derived from the Swiss Large White breed using the PorcineSNP60 BeadChip with 62,163 evenly spaced SNPs across the pig genome. We obtained 26 estimated breeding values (EBVs for various traits including exterior, meat quality, reproduction, and production. The subsequent genome-wide association analysis allowed us to identify four QTL with suggestive significance for three of these traits (p-values ranging from 4.99×10⁻⁶ to 2.73×10⁻⁵. Single QTL for the EBVs pH one hour post mortem (pH1 and carcass length were on pig chromosome (SSC 14 and SSC 2, respectively. Two QTL for the EBV rear view hind legs were on SSC 10 and SSC 16.

  15. GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies.

    Science.gov (United States)

    Yung, Ling Sing; Yang, Can; Wan, Xiang; Yu, Weichuan

    2011-05-01

    Collecting millions of genetic variations is feasible with the advanced genotyping technology. With a huge amount of genetic variations data in hand, developing efficient algorithms to carry out the gene-gene interaction analysis in a timely manner has become one of the key problems in genome-wide association studies (GWAS). Boolean operation-based screening and testing (BOOST), a recent work in GWAS, completes gene-gene interaction analysis in 2.5 days on a desktop computer. Compared with central processing units (CPUs), graphic processing units (GPUs) are highly parallel hardware and provide massive computing resources. We are, therefore, motivated to use GPUs to further speed up the analysis of gene-gene interactions. We implement the BOOST method based on a GPU framework and name it GBOOST. GBOOST achieves a 40-fold speedup compared with BOOST. It completes the analysis of Wellcome Trust Case Control Consortium Type 2 Diabetes (WTCCC T2D) genome data within 1.34 h on a desktop computer equipped with Nvidia GeForce GTX 285 display card. GBOOST code is available at http://bioinformatics.ust.hk/BOOST.html#GBOOST.

  16. VRprofile: gene-cluster-detection-based profiling of virulence and antibiotic resistance traits encoded within genome sequences of pathogenic bacteria.

    Science.gov (United States)

    Li, Jun; Tai, Cui; Deng, Zixin; Zhong, Weihong; He, Yongqun; Ou, Hong-Yu

    2017-01-10

    VRprofile is a Web server that facilitates rapid investigation of virulence and antibiotic resistance genes, as well as extends these trait transfer-related genetic contexts, in newly sequenced pathogenic bacterial genomes. The used backend database MobilomeDB was firstly built on sets of known gene cluster loci of bacterial type III/IV/VI/VII secretion systems and mobile genetic elements, including integrative and conjugative elements, prophages, class I integrons, IS elements and pathogenicity/antibiotic resistance islands. VRprofile is thus able to co-localize the homologs of these conserved gene clusters using HMMer or BLASTp searches. With the integration of the homologous gene cluster search module with a sequence composition module, VRprofile has exhibited better performance for island-like region predictions than the other widely used methods. In addition, VRprofile also provides an integrated Web interface for aligning and visualizing identified gene clusters with MobilomeDB-archived gene clusters, or a variety set of bacterial genomes. VRprofile might contribute to meet the increasing demands of re-annotations of bacterial variable regions, and aid in the real-time definitions of disease-relevant gene clusters in pathogenic bacteria of interest. VRprofile is freely available at http://bioinfo-mml.sjtu.edu.cn/VRprofile. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  17. Genome-wide association analysis for heat tolerance at flowering detected a large set of genes involved in adaptation to thermal and other stresses.

    Directory of Open Access Journals (Sweden)

    Tanguy Lafarge

    Full Text Available Fertilization sensitivity to heat in rice is a major issue within climate change scenarios in the tropics. A panel of 167 indica landraces and improved varieties was phenotyped for spikelet sterility (SPKST under 38°C during anthesis and for several secondary traits potentially affecting panicle micro-climate and thus the fertilization process. The panel was genotyped with an average density of one marker per 29 kb using genotyping by sequencing. Genome-wide association analyses (GWAS were conducted using three methods based on single marker regression, haplotype regression and simultaneous fitting of all markers, respectively. Fourteen loci significantly associated with SPKST under at least two GWAS methods were detected. A large number of associations was also detected for the secondary traits. Analysis of co-localization of SPKST associated loci with QTLs detected in progenies of bi-parental crosses reported in the literature allowed to narrow -down the position of eight of those QTLs, including the most documented one, qHTSF4.1. Gene families underlying loci associated with SPKST corresponded to functions ranging from sensing abiotic stresses and regulating plant response, such as wall-associated kinases and heat shock proteins, to cell division and gametophyte development. Analysis of diversity at the vicinity of loci associated with SPKST within the rice three thousand genomes, revealed widespread distribution of the favourable alleles across O. sativa genetic groups. However, few accessions assembled the favourable alleles at all loci. Effective donors included the heat tolerant variety N22 and some Indian and Taiwanese varieties. These results provide a basis for breeding for heat tolerance during anthesis and for functional validation of major loci governing this trait.

  18. Targeting functional motifs of a protein family

    Science.gov (United States)

    Bhadola, Pradeep; Deo, Nivedita

    2016-10-01

    The structural organization of a protein family is investigated by devising a method based on the random matrix theory (RMT), which uses the physiochemical properties of the amino acid with multiple sequence alignment. A graphical method to represent protein sequences using physiochemical properties is devised that gives a fast, easy, and informative way of comparing the evolutionary distances between protein sequences. A correlation matrix associated with each property is calculated, where the noise reduction and information filtering is done using RMT involving an ensemble of Wishart matrices. The analysis of the eigenvalue statistics of the correlation matrix for the β -lactamase family shows the universal features as observed in the Gaussian orthogonal ensemble (GOE). The property-based approach captures the short- as well as the long-range correlation (approximately following GOE) between the eigenvalues, whereas the previous approach (treating amino acids as characters) gives the usual short-range correlations, while the long-range correlations are the same as that of an uncorrelated series. The distribution of the eigenvector components for the eigenvalues outside the bulk (RMT bound) deviates significantly from RMT observations and contains important information about the system. The information content of each eigenvector of the correlation matrix is quantified by introducing an entropic estimate, which shows that for the β -lactamase family the smallest eigenvectors (low eigenmodes) are highly localized as well as informative. These small eigenvectors when processed gives clusters involving positions that have well-defined biological and structural importance matching with experiments. The approach is crucial for the recognition of structural motifs as shown in β -lactamase (and other families) and selectively identifies the important positions for targets to deactivate (activate) the enzymatic actions.

  19. ROMANIAN FOLKLORE MOTIFS IN FASHION DESIGN

    Directory of Open Access Journals (Sweden)

    MOCENCO Alexandra

    2014-05-01

    Full Text Available The traditional Romanian costume such as the entire popular art (architecture, woodcarvins, pottery etc. was born and lasted in our country since ancient times. Closely related to human existence, the traditional costume reflected over the years as reflected nowadays, the mentality and artistic conception of the people. Today the traditional Romanian costume became an inspiration source to the wholesale fashion production industry designers, both Romanian and international. Although the contemporary designers are working in accordance with a vision, using a wide area of styles, methods and current technology, they usually return to traditional techniques and ethnic folklore motifs, which converts and resize them, integrating them in their contemporary space. Adrian Oianu is a very appreciated Romanian designer who launched two collections inspired by his native’s country traditional costumes: “Suflecata pan’ la brau” (“Turned up ‘til the belt” and “Bucurie” (“Joy”. Dorin Negrau had as inspiration for his “Lost” collection the traditional costume from the Bihor region. Yves Saint Laurent had a collection inspired by the Romanian traditional flax blouses called “La blouse roumaine”. The paper presents the traditional Romanian values throw fashion collections. The research activity will create innovative concepts to support the garment industry in order to develop their own brand and to bring the design activities in Romania at an international level. The research was conducted during the initial stage of a project, financed through national founds, consisting in a documentary study on ethnographic characteristics of the popular costume from different regions of the country.

  20. Algal MIPs, high diversity and conserved motifs.

    Science.gov (United States)

    Anderberg, Hanna I; Danielson, Jonas Å H; Johanson, Urban

    2011-04-21

    Major intrinsic proteins (MIPs) also named aquaporins form channels facilitating the passive transport of water and other small polar molecules across membranes. MIPs are particularly abundant and diverse in terrestrial plants but little is known about their evolutionary history. In an attempt to investigate the origin of the plant MIP subfamilies, genomes of chlorophyte algae, the sister group of charophyte algae and land plants, were searched for MIP encoding genes. A total of 22 MIPs were identified in the nine analysed genomes and phylogenetic analyses classified them into seven subfamilies. Two of these, Plasma membrane Intrinsic Proteins (PIPs) and GlpF-like Intrinsic Proteins (GIPs), are also present in land plants and divergence dating support a common origin of these algal and land plant MIPs, predating the evolution of terrestrial plants. The subfamilies unique to algae were named MIPA to MIPE to facilitate the use of a common nomenclature for plant MIPs reflecting phylogenetically stable groups. All of the investigated genomes contained at least one MIP gene but only a few species encoded MIPs belonging to more than one subfamily. Our results suggest that at least two of the seven subfamilies found in land plants were present already in an algal ancestor. The total variation of MIPs and the number of different subfamilies in chlorophyte algae is likely to be even higher than that found in land plants. Our analyses indicate that genetic exchanges between several of the algal subfamilies have occurred. The PIP1 and PIP2 groups and the Ca2+ gating appear to be specific to land plants whereas the pH gating is a more ancient characteristic shared by all PIPs. Further studies are needed to discern the function of the algal specific subfamilies MIPA-E and to fully understand the evolutionary relationship of algal and terrestrial plant MIPs.

  1. Algal MIPs, high diversity and conserved motifs

    Directory of Open Access Journals (Sweden)

    Johanson Urban

    2011-04-01

    Full Text Available Abstract Background Major intrinsic proteins (MIPs also named aquaporins form channels facilitating the passive transport of water and other small polar molecules across membranes. MIPs are particularly abundant and diverse in terrestrial plants but little is known about their evolutionary history. In an attempt to investigate the origin of the plant MIP subfamilies, genomes of chlorophyte algae, the sister group of charophyte algae and land plants, were searched for MIP encoding genes. Results A total of 22 MIPs were identified in the nine analysed genomes and phylogenetic analyses classified them into seven subfamilies. Two of these, Plasma membrane Intrinsic Proteins (PIPs and GlpF-like Intrinsic Proteins (GIPs, are also present in land plants and divergence dating support a common origin of these algal and land plant MIPs, predating the evolution of terrestrial plants. The subfamilies unique to algae were named MIPA to MIPE to facilitate the use of a common nomenclature for plant MIPs reflecting phylogenetically stable groups. All of the investigated genomes contained at least one MIP gene but only a few species encoded MIPs belonging to more than one subfamily. Conclusions Our results suggest that at least two of the seven subfamilies found in land plants were present already in an algal ancestor. The total variation of MIPs and the number of different subfamilies in chlorophyte algae is likely to be even higher than that found in land plants. Our analyses indicate that genetic exchanges between several of the algal subfamilies have occurred. The PIP1 and PIP2 groups and the Ca2+ gating appear to be specific to land plants whereas the pH gating is a more ancient characteristic shared by all PIPs. Further studies are needed to discern the function of the algal specific subfamilies MIPA-E and to fully understand the evolutionary relationship of algal and terrestrial plant MIPs.

  2. Motif analysis unveils the possible co-regulation of chloroplast genes and nuclear genes encoding chloroplast proteins.

    Science.gov (United States)

    Wang, Ying; Ding, Jun; Daniell, Henry; Hu, Haiyan; Li, Xiaoman

    2012-09-01

    Chloroplasts play critical roles in land plant cells. Despite their importance and the availability of at least 200 sequenced chloroplast genomes, the number of known DNA regulatory sequences in chloroplast genomes are limited. In this paper, we designed computational methods to systematically study putative DNA regulatory sequences in intergenic regions near chloroplast genes in seven plant species and in promoter sequences of nuclear genes in Arabidopsis and rice. We found that -35/-10 elements alone cannot explain the transcriptional regulation of chloroplast genes. We also concluded that there are unlikely motifs shared by intergenic sequences of most of chloroplast genes, indicating that these genes are regulated differently. Finally and surprisingly, we found five conserved motifs, each of which occurs in no more than six chloroplast intergenic sequences, are significantly shared by promoters of nuclear-genes encoding chloroplast proteins. By integrating information from gene function annotation, protein subcellular localization analyses, protein-protein interaction data, and gene expression data, we further showed support of the functionality of these conserved motifs. Our study implies the existence of unknown nuclear-encoded transcription factors that regulate both chloroplast genes and nuclear genes encoding chloroplast protein, which sheds light on the understanding of the transcriptional regulation of chloroplast genes.

  3. Genome-wide methylation patterns in Salmonella enterica Subsp. enterica Serovars.

    Directory of Open Access Journals (Sweden)

    Cary Pirone-Davies

    Full Text Available The methylation of DNA bases plays an important role in numerous biological processes including development, gene expression, and DNA replication. Salmonella is an important foodborne pathogen, and methylation in Salmonella is implicated in virulence. Using single molecule real-time (SMRT DNA-sequencing, we sequenced and assembled the complete genomes of eleven Salmonella enterica isolates from nine different serovars, and analysed the whole-genome methylation patterns of each genome. We describe 16 distinct N6-methyladenine (m6A methylated motifs, one N4-methylcytosine (m4C motif, and one combined m6A-m4C motif. Eight of these motifs are novel, i.e., they have not been previously described. We also identified the methyltransferases (MTases associated with 13 of the motifs. Some motifs are conserved across all Salmonella serovars tested, while others were found only in a subset of serovars. Eight of the nine serovars contained a unique methylated motif that was not found in any other serovar (most of these motifs were part of Type I restriction modification systems, indicating the high diversity of methylation patterns present in Salmonella.

  4. Review article: The mountain motif in the plot of Matthew

    Directory of Open Access Journals (Sweden)

    Gert J. Volschenk

    2010-09-01

    Full Text Available This article reviewed T.L. Donaldson’s book, Jesus on the mountain: A study in Matthean theology, published in 1985 by JSOT Press, Sheffield, and focused on the mountain motif in the structure and plot of the Gospel of Matthew, in addition to the work of Donaldson on the mountain motif as a literary motif and as theological symbol. The mountain is a primary theological setting for Jesus’ ministry and thus is an important setting, serving as one of the literary devices by which Matthew structured and progressed his narrative. The Zion theological and eschatological significance and Second Temple Judaism serve as the historical and theological background for the mountain motif. The last mountain setting (Mt 28:16–20 is the culmination of the three theological themes in the plot of Matthew, namely Christology, ecclesiology and salvation history.

  5. Methods and statistics for combining motif match scores.

    Science.gov (United States)

    Bailey, T L; Gribskov, M

    1998-01-01

    Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score p-values. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The MAST sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http:/(/)www.sdsc.edu/MEME.

  6. DNA regulatory motif selection based on support vector machine ...

    African Journals Online (AJOL)

    ... machine (SVM) and its application in microarray experiment of Kashin-Beck disease. ... speed and amount of the corresponding mRNA in gene replication process. ... and revealed that some motifs may be related to the immune reactions.

  7. Unveiling Mycoplasma hyopneumoniae Promoters: Sequence Definition and Genomic Distribution

    Science.gov (United States)

    Weber, Shana de Souto; Sant'Anna, Fernando Hayashi; Schrank, Irene Silveira

    2012-01-01

    Several Mycoplasma species have had their genome completely sequenced, including four strains of the swine pathogen Mycoplasma hyopneumoniae. Nevertheless, little is known about the nucleotide sequences that control transcriptional initiation in these microorganisms. Therefore, with the objective of investigating the promoter sequences of M. hyopneumoniae, 23 transcriptional start sites (TSSs) of distinct genes were mapped. A pattern that resembles the σ70 promoter −10 element was found upstream of the TSSs. However, no −35 element was distinguished. Instead, an AT-rich periodic signal was identified. About half of the experimentally defined promoters contained the motif 5′-TRTGn-3′, which was identical to the −16 element usually found in Gram-positive bacteria. The defined promoters were utilized to build position-specific scoring matrices in order to scan putative promoters upstream of all coding sequences (CDSs) in the M. hyopneumoniae genome. Two hundred and one signals were found associated with 169 CDSs. Most of these sequences were located within 100 nucleotides of the start codons. This study has shown that the number of promoter-like sequences in the M. hyopneumoniae genome is more frequent than expected by chance, indicating that most of the sequences detected are probably biologically functional. PMID:22334569

  8. Genome-wide detection of chromosomal rearrangements, indels, and mutations in circular chromosomes by short read sequencing

    DEFF Research Database (Denmark)

    Skovgaard, Ole; Bak, Mads; Løbner-Olesen, Anders

    2011-01-01

    a combination of WGS and genome copy number analysis, for the identification of mutations that suppress the growth deficiency imposed by excessive initiations from the Escherichia coli origin of replication, oriC. The E. coli chromosome, like the majority of bacterial chromosomes, is circular, and DNA...... replication is initiated by assembling two replication complexes at the origin, oriC. These complexes then replicate the chromosome bidirectionally toward the terminus, ter. In a population of growing cells, this results in a copy number gradient, so that origin-proximal sequences are more frequent than...... origin-distal sequences. Major rearrangements in the chromosome are, therefore, readily identified by changes in copy number, i.e., certain sequences become over- or under-represented. Of the eight mutations analyzed in detail here, six were found to affect a single gene only, one was a large chromosomal...

  9. Inability of 'Whole Genome Amplification' to Improve Success Rates for the Biomolecular Detection of Tuberculosis in Archaeological Samples.

    Directory of Open Access Journals (Sweden)

    Jannine Forst

    Full Text Available We assessed the ability of whole genome amplification (WGA to improve the efficiency of downstream polymerase chain reactions (PCRs directed at ancient DNA (aDNA of members of the Mycobacterium tuberculosis complex (MTBC. Using extracts from a variety of bones and a tooth from human skeletons with or without lesions indicative of tuberculosis, from multiple time periods, we obtained inconsistent results. We conclude that WGA does not provide any advantage in studies of MTBC aDNA. The sporadic nature of our results are probably due to the fact that WGA is itself a PCR-based procedure which, although designed to deal with fragmented DNA, might be inefficient with the low concentration of templates in an aDNA extract. As such, WGA is subject to similar, if not the same, restrictions as PCR when applied to aDNA.

  10. Comparison of Ion Personal Genome Machine Platforms for the Detection of Variants in BRCA1 and BRCA2.

    Science.gov (United States)

    Hwang, Sang Mee; Lee, Ki Chan; Lee, Min Seob; Park, Kyoung Un

    2018-01-01

    Transition to next generation sequencing (NGS) for BRCA1 / BRCA2 analysis in clinical laboratories is ongoing but different platforms and/or data analysis pipelines give different results resulting in difficulties in implementation. We have evaluated the Ion Personal Genome Machine (PGM) Platforms (Ion PGM, Ion PGM Dx, Thermo Fisher Scientific) for the analysis of BRCA1 /2. The results of Ion PGM with OTG-snpcaller, a pipeline based on Torrent mapping alignment program and Genome Analysis Toolkit, from 75 clinical samples and 14 reference DNA samples were compared with Sanger sequencing for BRCA1 / BRCA2 . Ten clinical samples and 14 reference DNA samples were additionally sequenced by Ion PGM Dx with Torrent Suite. Fifty types of variants including 18 pathogenic or variants of unknown significance were identified from 75 clinical samples and known variants of the reference samples were confirmed by Sanger sequencing and/or NGS. One false-negative results were present for Ion PGM/OTG-snpcaller for an indel variant misidentified as a single nucleotide variant. However, eight discordant results were present for Ion PGM Dx/Torrent Suite with both false-positive and -negative results. A 40-bp deletion, a 4-bp deletion and a 1-bp deletion variant was not called and a false-positive deletion was identified. Four other variants were misidentified as another variant. Ion PGM/OTG-snpcaller showed acceptable performance with good concordance with Sanger sequencing. However, Ion PGM Dx/Torrent Suite showed many discrepant results not suitable for use in a clinical laboratory, requiring further optimization of the data analysis for calling variants.

  11. EuGI: a novel resource for studying genomic islands to facilitate horizontal gene transfer detection in eukaryotes.

    Science.gov (United States)

    Clasen, Frederick Johannes; Pierneef, Rian Ewald; Slippers, Bernard; Reva, Oleg

    2018-05-03

    Genomic islands (GIs) are inserts of foreign DNA that have potentially arisen through horizontal gene transfer (HGT). There are evidences that GIs can contribute significantly to the evolution of prokaryotes. The acquisition of GIs through HGT in eukaryotes has, however, been largely unexplored. In this study, the previously developed GI prediction tool, SeqWord Gene Island Sniffer (SWGIS), is modified to predict GIs in eukaryotic chromosomes. Artificial simulations are used to estimate ratios of predicting false positive and false negative GIs by inserting GIs into different test chromosomes and performing the SWGIS v2.0 algorithm. Using SWGIS v2.0, GIs are then identified in 36 fungal, 22 protozoan and 8 invertebrate genomes. SWGIS v2.0 predicts GIs in large eukaryotic chromosomes based on the atypical nucleotide composition of these regions. Averages for predicting false negative and false positive GIs were 20.1% and 11.01% respectively. A total of 10,550 GIs were identified in 66 eukaryotic species with 5299 of these GIs coding for at least one functional protein. The EuGI web-resource, freely accessible at http://eugi.bi.up.ac.za , was developed that allows browsing the database created from identified GIs and genes within GIs through an interactive and visual interface. SWGIS v2.0 along with the EuGI database, which houses GIs identified in 66 different eukaryotic species, and the EuGI web-resource, provide the first comprehensive resource for studying HGT in eukaryotes.

  12. Detection of genetic variants affecting cattle behaviour and their impact on milk production: a genome-wide association study.

    Science.gov (United States)

    Friedrich, Juliane; Brand, Bodo; Ponsuksili, Siriluck; Graunke, Katharina L; Langbein, Jan; Knaust, Jacqueline; Kühn, Christa; Schwerin, Manfred

    2016-02-01

    Behaviour traits of cattle have been reported to affect important production traits, such as meat quality and milk performance as well as reproduction and health. Genetic predisposition is, together with environmental stimuli, undoubtedly involved in the development of behaviour phenotypes. Underlying molecular mechanisms affecting behaviour in general and behaviour and productions traits in particular still have to be studied in detail. Therefore, we performed a genome-wide association study in an F2 Charolais × German Holstein cross-breed population to identify genetic variants that affect behaviour-related traits assessed in an open-field and novel-object test and analysed their putative impact on milk performance. Of 37,201 tested single nucleotide polymorphism (SNPs), four showed a genome-wide and 37 a chromosome-wide significant association with behaviour traits assessed in both tests. Nine of the SNPs that were associated with behaviour traits likewise showed a nominal significant association with milk performance traits. On chromosomes 14 and 29, six SNPs were identified to be associated with exploratory behaviour and inactivity during the novel-object test as well as with milk yield traits. Least squares means for behaviour and milk performance traits for these SNPs revealed that genotypes associated with higher inactivity and less exploratory behaviour promote higher milk yields. Whether these results are due to molecular mechanisms simultaneously affecting behaviour and milk performance or due to a behaviour predisposition, which causes indirect effects on milk performance by influencing individual reactivity, needs further investigation. © 2015 Stichting International Foundation for Animal Genetics.

  13. BEAM web server: a tool for structural RNA motif discovery.

    Science.gov (United States)

    Pietrosanto, Marco; Adinolfi, Marta; Casula, Riccardo; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela

    2018-03-15

    RNA structural motif finding is a relevant problem that becomes computationally hard when working on high-throughput data (e.g. eCLIP, PAR-CLIP), often represented by thousands of RNA molecules. Currently, the BEAM server is the only web tool capable to handle tens of thousands of RNA in input with a motif discovery procedure that is only limited by the current secondary structure prediction accuracies. The recently developed method BEAM (BEAr Motifs finder) can analyze tens of thousands of RNA molecules and identify RNA secondary structure motifs associated to a measure of their statistical significance. BEAM is extremely fast thanks to the BEAR encoding that transforms each RNA secondary structure in a string of characters. BEAM also exploits the evolutionary knowledge contained in a substitution matrix of secondary structure elements, extracted from the RFAM database of families of homologous RNAs. The BEAM web server has been designed to streamline data pre-processing by automatically handling folding and encoding of RNA sequences, giving users a choice for the preferred folding program. The server provides an intuitive and informative results page with the list of secondary structure motifs identified, the logo of each motif, its significance, graphic representation and information about its position in the RNA molecules sharing it. The web server is freely available at http://beam.uniroma2.it/ and it is implemented in NodeJS and Python with all major browsers supported. marco.pietrosanto@uniroma2.it. Supplementary data are available at Bioinformatics online.

  14. Characterizing Motif Dynamics of Electric Brain Activity Using Symbolic Analysis

    Directory of Open Access Journals (Sweden)

    Massimiliano Zanin

    2014-10-01

    Full Text Available Motifs are small recurring circuits of interactions which constitute the backbone of networked systems. Characterizing motif dynamics is therefore key to understanding the functioning of such systems. Here we propose a method to define and quantify the temporal variability and time scales of electroencephalogram (EEG motifs of resting brain activity. Given a triplet of EEG sensors, links between them are calculated by means of linear correlation; each pattern of links (i.e., each motif is then associated to a symbol, and its appearance frequency is analyzed by means of Shannon entropy. Our results show that each motif becomes observable with different coupling thresholds and evolves at its own time scale, with fronto-temporal sensors emerging at high thresholds and changing at fast time scales, and parietal ones at low thresholds and changing at slower rates. Finally, while motif dynamics differed across individuals, for each subject, it showed robustness across experimental conditions, indicating that it could represent an individual dynamical signature.

  15. Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

    Science.gov (United States)

    Roy, Indranil; Aluru, Srinivas

    2016-01-01

    Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.

  16. An experimental test of a fundamental food web motif.

    Science.gov (United States)

    Rip, Jason M K; McCann, Kevin S; Lynn, Denis H; Fawcett, Sonia

    2010-06-07

    Large-scale changes to the world's ecosystem are resulting in the deterioration of biostructure-the complex web of species interactions that make up ecological communities. A difficult, yet crucial task is to identify food web structures, or food web motifs, that are the building blocks of this baroque network of interactions. Once identified, these food web motifs can then be examined through experiments and theory to provide mechanistic explanations for how structure governs ecosystem stability. Here, we synthesize recent ecological research to show that generalist consumers coupling resources with different interaction strengths, is one such motif. This motif amazingly occurs across an enormous range of spatial scales, and so acts to distribute coupled weak and strong interactions throughout food webs. We then perform an experiment that illustrates the importance of this motif to ecological stability. We find that weak interactions coupled to strong interactions by generalist consumers dampen strong interaction strengths and increase community stability. This study takes a critical step by isolating a common food web motif and through clear, experimental manipulation, identifies the fundamental stabilizing consequences of this structure for ecological communities.

  17. Detection of Helicobacter Pylori Genome with an Optical Biosensor Based on Hybridization of Urease Gene with a Gold Nanoparticles-Labeled Probe

    Science.gov (United States)

    Shahrashoob, M.; Mohsenifar, A.; Tabatabaei, M.; Rahmani-Cherati, T.; Mobaraki, M.; Mota, A.; Shojaei, T. R.

    2016-05-01

    A novel optics-based nanobiosensor for sensitive determination of the Helicobacter pylori genome using a gold nanoparticles (AuNPs)-labeled probe is reported. Two specific thiol-modified capture and signal probes were designed based on a single-stranded complementary DNA (cDNA) region of the urease gene. The capture probe was immobilized on AuNPs, which were previously immobilized on an APTES-activated glass, and the signal probe was conjugated to different AuNPs as well. The presence of the cDNA in the reaction mixture led to the hybridization of the AuNPs-labeled capture probe and the signal probe with the cDNA, and consequently the optical density of the reaction mixture (AuNPs) was reduced proportionally to the cDNA concentration. The limit of detection was measured at 0.5 nM.

  18. A PLSPM-Based Test Statistic for Detecting Gene-Gene Co-Association in Genome-Wide Association Study with Case-Control Design

    Science.gov (United States)

    Zhang, Xiaoshuai; Yang, Xiaowei; Yuan, Zhongshang; Liu, Yanxun; Li, Fangyu; Peng, Bin; Zhu, Dianwen; Zhao, Jinghua; Xue, Fuzhong

    2013-01-01

    For genome-wide association data analysis, two genes in any pathway, two SNPs in the two linked gene regions respectively or in the two linked exons respectively within one gene are often correlated with each other. We therefore proposed the concept of gene-gene co-association, which refers to the effects not only due to the traditional interaction under nearly independent condition but the correlation between two genes. Furthermore, we constructed a novel statistic for detecting gene-gene co-association based on Partial Least Squares Path Modeling (PLSPM). Through simulation, the relationship between traditional interaction and co-association was highlighted under three different types of co-association. Both simulation and real data analysis demonstrated that the proposed PLSPM-based statistic has better performance than single SNP-based logistic model, PCA-based logistic model, and other gene-based methods. PMID:23620809

  19. A Whole Genome Association Study to Detect Single Nucleotide Polymorphisms for Blood Components (Immunity in a Cross between Korean Native Pig and Yorkshire

    Directory of Open Access Journals (Sweden)

    Y.-M. Lee

    2012-12-01

    Full Text Available The purpose of this study was to detect significant SNPs for blood components that were related to immunity using high single nucleotide polymorphism (SNP density panels in a Korean native pig (KNP×Yorkshire (YK cross population. A reciprocal design of KNP×YK produced 249 F2 individuals that were genotyped for a total of 46,865 available SNPs in the Illumina porcine 60K beadchip. To perform whole genome association analysis (WGA, phenotypes were regressed on each SNP under a simple linear regression model after adjustment for sex and slaughter age. To set up a significance threshold, 0.1% point-wise p value from F distribution was used for each SNP test. Among the significant SNPs for a trait, the best set of SNP markers were determined using a stepwise regression procedure with the rates of inclusion and exclusion of each SNP out of the model at 0.001 level. A total of 54 SNPs were detected; 10, 6, 4, 4, 5, 4, 5, 10, and 6 SNPs for neutrophil, lymphocyte, monocyte, eosinophil, basophil, atypical lymph, immunoglobulin, insulin, and insulin-like growth factor-I, respectively. Each set of significant SNPs per trait explained 24 to 42% of phenotypic variance. Several pleiotropic SNPs were detected on SSCs 4, 13, 14 and 15.

  20. Simultaneous detection of genetically modified organisms by multiplex ligation-dependent genome amplification and capillary gel electrophoresis with laser-induced fluorescence.

    Science.gov (United States)

    García-Cañas, Virginia; Mondello, Monica; Cifuentes, Alejandro

    2010-07-01

    In this work, an innovative method useful to simultaneously analyze multiple genetically modified organisms is described. The developed method consists in the combination of multiplex ligation-dependent genome dependent amplification (MLGA) with CGE and LIF detection using bare-fused silica capillaries. The MLGA process is based on oligonucleotide constructs, formed by a universal sequence (vector) and long specific oligonucleotides (selectors) that facilitate the circularization of specific DNA target regions. Subsequently, the circularized target sequences are simultaneously amplified with the same couple of primers and analyzed by CGE-LIF using a bare-fused silica capillary and a run electrolyte containing 2-hydroxyethyl cellulose acting as both sieving matrix and dynamic capillary coating. CGE-LIF is shown to be very useful and informative for optimizing MLGA parameters such as annealing temperature, number of ligation cycles, and selector probes concentration. We demonstrate the specificity of the method in detecting the presence of transgenic DNA in certified reference and raw commercial samples. The method developed is sensitive and allows the simultaneous detection in a single run of percentages of transgenic maize as low as 1% of GA21, 1% of MON863, and 1% of MON810 in maize samples with signal-to-noise ratios for the corresponding DNA peaks of 15, 12, and 26, respectively. These results demonstrate, to our knowledge for the first time, the great possibilities of MLGA techniques for genetically modified organisms analysis.

  1. Detection and full genome characterization of two beta CoV viruses related to Middle East respiratory syndrome from bats in Italy.

    Science.gov (United States)

    Moreno, Ana; Lelli, Davide; de Sabato, Luca; Zaccaria, Guendalina; Boni, Arianna; Sozzi, Enrica; Prosperi, Alice; Lavazza, Antonio; Cella, Eleonora; Castrucci, Maria Rita; Ciccozzi, Massimo; Vaccari, Gabriele

    2017-12-19

    Middle East respiratory syndrome coronavirus (MERS-CoV), which belongs to beta group of coronavirus, can infect multiple host species and causes severe diseases in humans. Multiple surveillance and phylogenetic studies suggest a bat origin. In this study, we describe the detection and full genome characterization of two CoVs closely related to MERS-CoV from two Italian bats, Pipistrellus kuhlii and Hypsugo savii. Pool of viscera were tested by a pan-coronavirus RT-PCR. Virus isolation was attempted by inoculation in different cell lines. Full genome sequencing was performed using the Ion Torrent platform and phylogenetic trees were performed using IQtree software. Similarity plots of CoV clade c genomes were generated by using SSE v1.2. The three dimensional macromolecular structure (3DMMS) of the receptor binding domain (RBD) in the S protein was predicted by sequence-homology method using the protein data bank (PDB). Both samples resulted positive to the pan-coronavirus RT-PCR (IT-batCoVs) and their genome organization showed identical pattern of MERS CoV. Phylogenetic analysis showed a monophyletic group placed in the Beta2c clade formed by MERS-CoV sequences originating from humans and camels and bat-related sequences from Africa, Italy and China. The comparison of the secondary and 3DMMS of the RBD of IT-batCoVs with MERS, HKU4 and HKU5 bat sequences showed two aa deletions located in a region corresponding to the external subdomain of MERS-RBD in IT-batCoV and HKU5 RBDs. This study reported two beta CoVs closely related to MERS that were obtained from two bats belonging to two commonly recorded species in Italy (P. kuhlii and H. savii). The analysis of the RBD showed similar structure in IT-batCoVs and HKU5 respect to HKU4 sequences. Since the RBD domain of HKU4 but not HKU5 can bind to the human DPP4 receptor for MERS-CoV, it is possible to suggest also for IT-batCoVs the absence of DPP4-binding potential. More surveillance studies are needed to better

  2. Novel Polymerase Spiral Reaction (PSR) for rapid visual detection of Bovine Herpesvirus 1 genomic DNA from aborted bovine fetus and semen.

    Science.gov (United States)

    Malla, Javed Ahmed; Chakravarti, Soumendu; Gupta, Vikas; Chander, Vishal; Sharma, Gaurav Kumar; Qureshi, Salauddin; Mishra, Adhiraj; Gupta, Vivek Kumar; Nandi, Sukdeb

    2018-02-20

    Bovine herpesvirus-1 (BHV-1) is a major viral pathogen affecting bovines leading to various clinical manifestations and causes significant economic impediment in modern livestock production system. Rapid, accurate and sensitive detection of BHV-1 infection at frozen semen stations or at dairy herds remains a priority for control of BHV-1 spread to susceptible population. Polymerase Spiral Reaction (PSR), a novel addition in the gamut of isothermal techniques, has been successfully implemented in initial optimization for detection of BHV-1 genomic DNA and further validated in clinical samples. The developed PSR assay has been validated for detection of BHV-1 from bovine semen (n=99), a major source of transmission of BHV-1 from breeding bulls to susceptible dams in artificial insemination programs. The technique has also been used for screening of BHV-1 DNA from suspected aborted fetal tissues (n=25). The developed PSR technique is 100 fold more sensitive than conventional PCR and comparable to real-time PCR. The PSR technique has been successful in detecting 13 samples positive for BHV-1 DNA in bovine semen, 4 samples more than conventional PCR. The aborted fetal tissues were negative for presence of BHV-1 DNA. The presence of BHV-1 in bovine semen samples raises a pertinent concern for extensively screening of semen from breeding bulls before been used for artificial insemination process. PSR has all the attributes for becoming a method of choice for rapid, accurate and sensitive detection of BHV-1 DNA at frozen semen stations or at dairy herds in resource constrained settings. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Informative genomic microsatellite markers for efficient genotyping applications in sugarcane.

    Science.gov (United States)

    Parida, Swarup K; Kalia, Sanjay K; Kaul, Sunita; Dalal, Vivek; Hemaprabha, G; Selvi, Athiappan; Pandit, Awadhesh; Singh, Archana; Gaikwad, Kishor; Sharma, Tilak R; Srivastava, Prem Shankar; Singh, Nagendra K; Mohapatra, Trilochan

    2009-01-01

    Genomic microsatellite markers are capable of revealing high degree of polymorphism. Sugarcane (Saccharum sp.), having a complex polyploid genome requires more number of such informative markers for various applications in genetics and breeding. With the objective of generating a large set of microsatellite markers designated as Sugarcane Enriched Genomic MicroSatellite (SEGMS), 6,318 clones from genomic libraries of two hybrid sugarcane cultivars enriched with 18 different microsatellite repeat-motifs were sequenced to generate 4.16 Mb high-quality sequences. Microsatellites were identified in 1,261 of the 5,742 non-redundant clones that accounted for 22% enrichment of the libraries. Retro-transposon association was observed for 23.1% of the identified microsatellites. The utility of the microsatellite containing genomic sequences were demonstrated by higher primer designing potential (90%) and PCR amplification efficiency (87.4%). A total of 1,315 markers including 567 class I microsatellite markers were designed and placed in the public domain for unrestricted use. The level of polymorphism detected by these markers among sugarcane species, genera, and varieties was 88.6%, while cross-transferability rate was 93.2% within Saccharum complex and 25% to cereals. Cloning and sequencing of size variant amplicons revealed that the variation in the number of repeat-units was the main source of SEGMS fragment length polymorphism. High level of polymorphism and wide range of genetic diversity (0.16-0.82 with an average of 0.44) assayed with the SEGMS markers suggested their usefulness in various genotyping applications in sugarcane.

  4. The complete genome sequence of a south Indian isolate of Rice tungro spherical virus reveals evidence of genetic recombination between distinct isolates.

    Science.gov (United States)

    Sailaja, B; Anjum, Najreen; Patil, Yogesh K; Agarwal, Surekha; Malathi, P; Krishnaveni, D; Balachandran, S M; Viraktamath, B C; Mangrauthia, Satendra K

    2013-12-01

    In this study, complete genome of a south Indian isolate of Rice tungro spherical virus (RTSV) from Andhra Pradesh (AP) was sequenced, and the predicted amino acid sequence was analysed. The RTSV RNA genome consists of 12,171 nt without the poly(A) tail, encoding a putative typical polyprotein of 3,470 amino acids. Furthermore, cleavage sites and sequence motifs of the polyprotein were predicted. Multiple alignment with other RTSV isolates showed a nucleotide sequence identity of 95% to east Indian isolates and 90% to Philippines isolates. A phylogenetic tree based on complete genome sequence showed that Indian isolates clustered together, while Vt6 and PhilA isolates of Philippines formed two separate clusters. Twelve recombination events were detected in RNA genome of RTSV using the Recombination Detection Program version 3. Recombination analysis suggested significant role of 5' end and central region of genome in virus evolution. Further, AP and Odisha isolates appeared as important RTSV isolates involved in diversification of this virus in India through recombination phenomenon. The new addition of complete genome of first south Indian isolate provided an opportunity to establish the molecular evolution of RTSV through recombination analysis and phylogenetic relationship.

  5. SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale

    Directory of Open Access Journals (Sweden)

    Paccanaro Alberto

    2010-03-01

    Full Text Available Abstract Background An important problem in genomics is the automatic inference of groups of homologous proteins from pairwise sequence similarities. Several approaches have been proposed for this task which are "local" in the sense that they assign a protein to a cluster based only on the distances between that protein and the other proteins in the set. It was shown recently that global methods such as spectral clustering have better performance on a wide variety of datasets. However, currently available implementations of spectral clustering methods mostly consist of a few loosely coupled Matlab scripts that assume a fair amount of familiarity with Matlab programming and hence they are inaccessible for large parts of the research community. Results SCPS (Spectral Clustering of Protein Sequences is an efficient and user-friendly implementation of a spectral method for inferring protein families. The method uses only pairwise sequence similarities, and is therefore practical when only sequence information is available. SCPS was tested on difficult sets of proteins whose relationships were extracted from the SCOP database, and its results were extensively compared with those obtained using other popular protein clustering algorithms such as TribeMCL, hierarchical clustering and connected component analysis. We show that SCPS is able to identify many of the family/superfamily relationships correctly and that the quality of the obtained clusters as indicated by their F-scores is consistently better than all the other methods we compared it with. We also demonstrate the scalability of SCPS by clustering the entire SCOP database (14,183 sequences and the complete genome of the yeast Saccharomyces cerevisiae (6,690 sequences. Conclusions Besides the spectral method, SCPS also implements connected component analysis and hierarchical clustering, it integrates TribeMCL, it provides different cluster quality tools, it can extract human-readable protein

  6. SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale.

    Science.gov (United States)

    Nepusz, Tamás; Sasidharan, Rajkumar; Paccanaro, Alberto

    2010-03-09

    An important problem in genomics is the automatic inference of groups of homologous proteins from pairwise sequence similarities. Several approaches have been proposed for this task which are "local" in the sense that they assign a protein to a cluster based only on the distances between that protein and the other proteins in the set. It was shown recently that global methods such as spectral clustering have better performance on a wide variety of datasets. However, currently available implementations of spectral clustering methods mostly consist of a few loosely coupled Matlab scripts that assume a fair amount of familiarity with Matlab programming and hence they are inaccessible for large parts of the research community. SCPS (Spectral Clustering of Protein Sequences) is an efficient and user-friendly implementation of a spectral method for inferring protein families. The method uses only pairwise sequence similarities, and is therefore practical when only sequence information is available. SCPS was tested on difficult sets of proteins whose relationships were extracted from the SCOP database, and its results were extensively compared with those obtained using other popular protein clustering algorithms such as TribeMCL, hierarchical clustering and connected component analysis. We show that SCPS is able to identify many of the family/superfamily relationships correctly and that the quality of the obtained clusters as indicated by their F-scores is consistently better than all the other methods we compared it with. We also demonstrate the scalability of SCPS by clustering the entire SCOP database (14,183 sequences) and the complete genome of the yeast Saccharomyces cerevisiae (6,690 sequences). Besides the spectral method, SCPS also implements connected component analysis and hierarchical clustering, it integrates TribeMCL, it provides different cluster quality tools, it can extract human-readable protein descriptions using GI numbers from NCBI, it interfaces with

  7. Stanniocalcin 1 binds hemin through a partially conserved heme regulatory motif

    International Nuclear Information System (INIS)

    Westberg, Johan A.; Jiang, Ji; Andersson, Leif C.

    2011-01-01

    Highlights: → Stanniocalcin 1 (STC1) binds heme through novel heme binding motif. → Central iron atom of heme and cysteine-114 of STC1 are essential for binding. → STC1 binds Fe 2+ and Fe 3+ heme. → STC1 peptide prevents oxidative decay of heme. -- Abstract: Hemin (iron protoporphyrin IX) is a necessary component of many proteins, functioning either as a cofactor or an intracellular messenger. Hemoproteins have diverse functions, such as transportation of gases, gas detection, chemical catalysis and electron transfer. Stanniocalcin 1 (STC1) is a protein involved in respiratory responses of the cell but whose mechanism of action is still undetermined. We examined the ability of STC1 to bind hemin in both its reduced and oxidized states and located Cys 114 as the axial ligand of the central iron atom of hemin. The amino acid sequence differs from the established (Cys-Pro) heme regulatory motif (HRM) and therefore presents a novel heme binding motif (Cys-Ser). A STC1 peptide containing the heme binding sequence was able to inhibit both spontaneous and H 2 O 2 induced decay of hemin. Binding of hemin does not affect the mitochondrial localization of STC1.

  8. Stanniocalcin 1 binds hemin through a partially conserved heme regulatory motif

    Energy Technology Data Exchange (ETDEWEB)

    Westberg, Johan A., E-mail: johan.westberg@helsinki.fi [Department of Pathology, Haartman Institute, University of Helsinki and HUSLAB, P.O. Box 21, Haartmaninkatu 3, FI-00014 Helsinki (Finland); Jiang, Ji, E-mail: ji.jiang@helsinki.fi [Department of Pathology, Haartman Institute, University of Helsinki and HUSLAB, P.O. Box 21, Haartmaninkatu 3, FI-00014 Helsinki (Finland); Andersson, Leif C., E-mail: leif.andersson@helsinki.fi [Department of Pathology, Haartman Institute, University of Helsinki and HUSLAB, P.O. Box 21, Haartmaninkatu 3, FI-00014 Helsinki (Finland)

    2011-06-03

    Highlights: {yields} Stanniocalcin 1 (STC1) binds heme through novel heme binding motif. {yields} Central iron atom of heme and cysteine-114 of STC1 are essential for binding. {yields} STC1 binds Fe{sup 2+} and Fe{sup 3+} heme. {yields} STC1 peptide prevents oxidative decay of heme. -- Abstract: Hemin (iron protoporphyrin IX) is a necessary component of many proteins, functioning either as a cofactor or an intracellular messenger. Hemoproteins have diverse functions, such as transportation of gases, gas detection, chemical catalysis and electron transfer. Stanniocalcin 1 (STC1) is a protein involved in respiratory responses of the cell but whose mechanism of action is still undetermined. We examined the ability of STC1 to bind hemin in both its reduced and oxidized states and located Cys{sup 114} as the axial ligand of the central iron atom of hemin. The amino acid sequence differs from the established (Cys-Pro) heme regulatory motif (HRM) and therefore presents a novel heme binding motif (Cys-Ser). A STC1 peptide containing the heme binding sequence was able to inhibit both spontaneous and H{sub 2}O{sub 2} induced decay of hemin. Binding of hemin does not affect the mitochondrial localization of STC1.

  9. HMMerThread: detecting remote, functional conserved domains in entire genomes by combining relaxed sequence-database searches with fold recognition.

    Directory of Open Access Journals (Sweden)

    Charles Richard Bradshaw

    Full Text Available Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10, a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in

  10. Application of whole genome shotgun sequencing for detection and characterization of genetically modified organisms and derived products

    NARCIS (Netherlands)

    Holst-Jensen, Arne; Spilsberg, Bjørn; Arulandhu, Alfred J.; Kok, Esther; Shi, Jianxin; Zel, Jana

    2016-01-01

    The emergence of high-throughput, massive or next-generation sequencing technologies has created a completely new foundation for molecular analyses. Various selective enrichment processes are commonly applied to facilitate detection of predefined (known) targets. Such approaches, however,

  11. The Methanosarcina barkeri genome: comparative analysis withMethanosarcina acetivorans and Methanosarcina mazei reveals extensiverearrangement within methanosarcinal genomes

    Energy Technology Data Exchange (ETDEWEB)

    Maeder, Dennis L.; Anderson, Iain; Brettin, Thomas S.; Bruce,David C.; Gilna, Paul; Han, Cliff S.; Lapidus, Alla; Metcalf, William W.; Saunders, Elizabeth; Tapia, Roxanne; Sowers, Kevin R.

    2006-05-19

    We report here a comparative analysis of the genome sequence of Methanosarcina barkeri with those of Methanosarcina acetivorans and Methanosarcina mazei. All three genomes share a conserved double origin of replication and many gene clusters. M. barkeri is distinguished by having an organization that is well conserved with respect to the other Methanosarcinae in the region proximal to the origin of replication with interspecies gene similarities as high as 95%. However it is disordered and marked by increased transposase frequency and decreased gene synteny and gene density in the proximal semi-genome. Of the 3680 open reading frames in M. barkeri, 678 had paralogs with better than 80% similarity to both M. acetivorans and M. mazei while 128 nonhypothetical orfs were unique (non-paralogous) amongst these species including a complete formate dehydrogenase operon, two genes required for N-acetylmuramic acid synthesis, a 14 gene gas vesicle cluster and a bacterial P450-specific ferredoxin reductase cluster not previously observed or characterized in this genus. A cryptic 36 kbp plasmid sequence was detected in M. barkeri that contains an orc1 gene flanked by a presumptive origin of replication consisting of 38 tandem repeats of a 143 nt motif. Three-way comparison of these genomes reveals differing mechanisms for the accrual of changes. Elongation of the large M. acetivorans is the result of multiple gene-scale insertions and duplications uniformly distributed in that genome, while M. barkeri is characterized by localized inversions associated with the loss of gene content. In contrast, the relatively short M. mazei most closely approximates the ancestral organizational state.

  12. New genomic structure for prostate cancer specific gene PCA3 within BMCC1: implications for prostate cancer detection and progression.

    Directory of Open Access Journals (Sweden)

    Raymond A Clarke

    Full Text Available The prostate cancer antigen 3 (PCA3/DD3 gene is a highly specific biomarker upregulated in prostate cancer (PCa. In order to understand the importance of PCA3 in PCa we investigated the organization and evolution of the PCA3 gene locus.We have employed cDNA synthesis, RTPCR and DNA sequencing to identify 4 new transcription start sites, 4 polyadenylation sites and 2 new differentially spliced exons in an extended form of PCA3. Primers designed from these novel PCA3 exons greatly improve RT-PCR based discrimination between PCa, PCa metastases and BPH specimens. Comparative genomic analyses demonstrated that PCA3 has only recently evolved in an anti-sense orientation within a second gene, BMCC1/PRUNE2. BMCC1 has been shown previously to interact with RhoA and RhoC, determinants of cellular transformation and metastasis, respectively. Using RT-PCR we demonstrated that the longer BMCC1-1 isoform - like PCA3 - is upregulated in PCa tissues and metastases and in PCa cell lines. Furthermore PCA3 and BMCC1-1 levels are responsive to dihydrotestosterone treatment.Upregulation of two new PCA3 isoforms in PCa tissues improves discrimination between PCa and BPH. The functional relevance of this specificity is now of particular interest given PCA3's overlapping association with a second gene BMCC1, a regulator of Rho signalling. Upregulation of PCA3 and BMCC1 in PCa has potential for improved diagnosis.

  13. Structural and Functional Motifs in Influenza Virus RNAs

    Directory of Open Access Journals (Sweden)

    Damien Ferhadian

    2018-03-01

    Full Text Available Influenza A viruses (IAV are responsible for recurrent influenza epidemics and occasional devastating pandemics in humans and animals. They belong to the Orthomyxoviridae family and their genome consists of eight (- sense viral RNA (vRNA segments of different lengths coding for at least 11 viral proteins. A heterotrimeric polymerase complex is bound to the promoter consisting of the 13 5′-terminal and 12 3′-terminal nucleotides of each vRNA, while internal parts of the vRNAs are associated with multiple copies of the viral nucleoprotein (NP, thus forming ribonucleoproteins (vRNP. Transcription and replication of vRNAs result in viral mRNAs (vmRNAs and complementary RNAs (cRNAs, respectively. Complementary RNAs are the exact positive copies of vRNAs; they also form ribonucleoproteins (cRNPs and are intermediate templates in the vRNA amplification process. On the contrary, vmRNAs have a 5′ cap snatched from cellular mRNAs and a 3′ polyA tail, both gained by the viral polymerase complex. Hence, unlike vRNAs and cRNAs, vmRNAs do not have a terminal promoter able to recruit the viral polymerase. Furthermore, synthesis of at least two viral proteins requires vmRNA splicing. Except for extensive analysis of the viral promoter structure and function and a few, mostly bioinformatics, studies addressing the vRNA and vmRNA structure, structural studies of the influenza A vRNAs, cRNAs, and vmRNAs are still in their infancy. The recent crystal structures of the influenza polymerase heterotrimeric complex drastically improved our understanding of the replication and transcription processes. The vRNA structure has been mainly studied in vitro using RNA probing, but its structure has been very recently studied within native vRNPs using crosslinking and RNA probing coupled to next generation RNA sequencing. Concerning vmRNAs, most studies focused on the segment M and NS splice sites and several structures initially predicted by bioinformatics analysis

  14. Identification of novel conserved functional motifs across most Influenza A viral strains

    Directory of Open Access Journals (Sweden)

    El-Azab Iman

    2011-01-01

    Full Text Available Abstract Background Influenza A virus poses a continuous threat to global public health. Design of novel universal drugs and vaccine requires a careful analysis of different strains of Influenza A viral genome from diverse hosts and subtypes. We performed a systematic in silico analysis of Influenza A viral segments of all available Influenza A viral strains and subtypes and grouped them based on host, subtype, and years isolated, and through multiple sequence alignments we extrapolated conserved regions, motifs, and accessible regions for functional mapping and annotation. Results Across all species and strains 87 highly conserved regions (conservation percentage > = 90% and 19 functional motifs (conservation percentage = 100% were found in PB2, PB1, PA, NP, M, and NS segments. The conservation percentage of these segments ranged between 94 - 98% in human strains (the most conserved, 85 - 93% in swine strains (the most variable, and 91 - 94% in avian strains. The most conserved segment was different in each host (PB1 for human strains, NS for avian strains, and M for swine strains. Target accessibility prediction yielded 324 accessible regions, with a single stranded probability > 0.5, of which 78 coincided with conserved regions. Some of the interesting annotations in these regions included sites for protein-protein interactions, the RNA binding groove, and the proton ion channel. Conclusions The influenza virus has evolved to adapt to its host through variations in the GC content and conservation percentage of the conserved regions. Nineteen universal conserved functional motifs were discovered, of which some were accessible regions with interesting biological functions. These regions will serve as a foundation for universal drug targets as well as universal vaccine design.

  15. Verification of the MOTIF code version 3.0

    International Nuclear Information System (INIS)

    Chan, T.; Guvanasen, V.; Nakka, B.W.; Reid, J.A.K.; Scheier, N.W.; Stanchell, F.W.

    1996-12-01

    As part of the Canadian Nuclear Fuel Waste Management Program (CNFWMP), AECL has developed a three-dimensional finite-element code, MOTIF (Model Of Transport In Fractured/ porous media), for detailed modelling of groundwater flow, heat transport and solute transport in a fractured rock mass. The code solves the transient and steady-state equations of groundwater flow, solute (including one-species radionuclide) transport, and heat transport in variably saturated fractured/porous media. The initial development was completed in 1985 (Guvanasen 1985) and version 3.0 was completed in 1986. This version is documented in detail in Guvanasen and Chan (in preparation). This report describes a series of fourteen verification cases which has been used to test the numerical solution techniques and coding of MOTIF, as well as demonstrate some of the MOTIF analysis capabilities. For each case the MOTIF solution has been compared with a corresponding analytical or independently developed alternate numerical solution. Several of the verification cases were included in Level 1 of the International Hydrologic Code Intercomparison Project (HYDROCOIN). The MOTIF results for these cases were also described in the HYDROCOIN Secretariat's compilation and comparison of results submitted by the various project teams (Swedish Nuclear Power Inspectorate 1988). It is evident from the graphical comparisons presented that the MOTIF solutions for the fourteen verification cases are generally in excellent agreement with known analytical or numerical solutions obtained from independent sources. This series of verification studies has established the ability of the MOTIF finite-element code to accurately model the groundwater flow and solute and heat transport phenomena for which it is intended. (author). 20 refs., 14 tabs., 32 figs

  16. Mechanisms of zero-lag synchronization in cortical motifs.

    Directory of Open Access Journals (Sweden)

    Leonardo L Gollo

    2014-04-01

    Full Text Available Zero-lag synchronization between distant cortical areas has been observed in a diversity of experimental data sets and between many different regions of the brain. Several computational mechanisms have been proposed to account for such isochronous synchronization in the presence of long conduction delays: Of these, the phenomenon of "dynamical relaying"--a mechanism that relies on a specific network motif--has proven to be the most robust with respect to parameter mismatch and system noise. Surprisingly, despite a contrary belief in the community, the common driving motif is an unreliable means of establishing zero-lag synchrony. Although dynamical relaying has been validated in empirical and computational studies, the deeper dynamical mechanisms and comparison to dynamics on other motifs is lacking. By systematically comparing synchronization on a variety of small motifs, we establish that the presence of a single reciprocally connected pair--a "resonance pair"--plays a crucial role in disambiguating those motifs that foster zero-lag synchrony in the presence of conduction delays (such as dynamical relaying from those that do not (such as the common driving triad. Remarkably, minor structural changes to the common driving motif that incorporate a reciprocal pair recover robust zero-lag synchrony. The findings are observed in computational models of spiking neurons, populations of spiking neurons and neural mass models, and arise whether the oscillatory systems are periodic, chaotic, noise-free or driven by stochastic inputs. The influence of the resonance pair is also robust to parameter mismatch and asymmetrical time delays amongst the elements of the motif. We call this manner of facilitating zero-lag synchrony resonance-induced synchronization, outline the conditions for its occurrence, and propose that it may be a general mechanism to promote zero-lag synchrony in the brain.

  17. Phyloproteomic Analysis of 11780 Six-Residue-Long Motifs Occurrences

    Directory of Open Access Journals (Sweden)

    O. V. Galzitskaya

    2015-01-01

    Full Text Available How is it possible to find good traits for phylogenetic reconstructions? Here, we present a new phyloproteomic criterion that is an occurrence of simple motifs which can be imprints of evolution history. We studied the occurrences of 11780 six-residue-long motifs consisting of two randomly located amino acids in 97 eukaryotic and 25 bacterial proteomes. For all eukaryotic proteomes, with the exception of the Amoebozoa, Stramenopiles, and Diplomonadida kingdoms, the number of proteins containing the motifs from the first group (one of the two amino acids occurs once at the terminal position made about 20%; in the case of motifs from the second (one of two amino acids occurs one time within the pattern and third (the two amino acids occur randomly groups, 30% and 50%, respectively. For bacterial proteomes, this relationship was 10%, 27%, and 63%, respectively. The matrices of correlation coefficients between numbers of proteins where a motif from the set of 11780 motifs appears at least once in 9 kingdoms and 5 phyla of bacteria were calculated. Among the correlation coefficients for eukaryotic proteomes, the correlation between the animal and fungi kingdoms (0.62 is higher than between fungi and plants (0.54. Our study provides support that animals and fungi are sibling kingdoms. Comparison of the frequencies of six-residue-long motifs in different proteomes allows obtaining phylogenetic relationships based on similarities between these frequencies: the Diplomonadida kingdoms are more close to Bacteria than to Eukaryota; Stramenopiles and Amoebozoa are more close to each other than to other kingdoms of Eukaryota.

  18. Conservation of Repeats at the Mammalian KCNQ1OT1-CDKN1C Region Suggests a Role in Genomic Imprinting

    Directory of Open Access Journals (Sweden)

    Marcos De Donato

    2017-06-01

    Full Text Available KCNQ1OT1 is located in the region with the highest number of genes showing genomic imprinting, but the mechanisms controlling the genes under its influence have not been fully elucidated. Therefore, we conducted a comparative analysis of the KCNQ1/KCNQ1OT1-CDKN1C region to study its conservation across the best assembled eutherian mammalian genomes sequenced to date and analyzed potential elements that may be implicated in the control of genomic imprinting in this region. The genomic features in these regions from human, mouse, cattle, and dog show a higher number of genes and CpG islands (detected using cpgplot from EMBOSS, but lower number of repetitive elements (including short interspersed nuclear elements and long interspersed nuclear elements, compared with their whole chromosomes (detected by RepeatMasker. The KCNQ1OT1-CDKN1C region contains the highest number of conserved noncoding sequences (CNS among mammals, where we found 16 regions containing about 38 different highly conserved repetitive elements (using mVista, such as LINE1 elements: L1M4, L1MB7, HAL1, L1M4a, L1Med, and an LTR element: MLT1H. From these elements, we found 74 CNS showing high sequence identity (>70% between human, cattle, and mouse, from which we identified 13 motifs (using Multiple Em for Motif Elicitation/Motif Alignment and Search Tool with a significant probability of occurrence, 3 of which were the most frequent and were used to find transcription factor–binding sites. We detected several transcription factors (using JASPAR suite from the families SOX, FOX, and GATA. A phylogenetic analysis of these CNS from human, marmoset, mouse, rat, cattle, dog, horse, and elephant shows branches with high levels of support and very similar phylogenetic relationships among these groups, confirming previous reports. Our results suggest that functional DNA elements identified by comparative genomics in a region densely populated with imprinted mammalian genes may be

  19. Early detection of fatigue cracks by means of nondestructive testing, NDT; Tidig detektering av utmattningssprickor genom ofoerstoerande provning, OFP

    Energy Technology Data Exchange (ETDEWEB)

    Broddegaard, Mattias [Siemens Industrial Turbines, Finspaang (Sweden)

    2004-12-01

    Components in gas turbines, steam turbines and boilers are subjected to both high and low cycle fatigue. The lifetime of components is established by calculations based on conservative assumptions and safety factors, which means that most components will have a real life far exceeding the calculated. Conventional nondestructive testing is aimed at detecting macroscopic defects, such as cracks, inclusions and other discontinuities in the material. By having the possibility of detecting damage at a microscopic level, the risk of fractures in components subjected to fatigue can be reduced and the interval between testing occasions can be extended. The project goal has been to establish knowledge about possibilities and limitations for early detection of low and high cycle fatigue damage, by a literature survey and by practical experiments on low cycle fatigue specimens in 12% Cr-steel, for the following nondestructive testing methods: MWM (Meandering Winding Magnetometer) eddy current testing; and Nonlinear ultrasonics, both classical (second harmonic) and non-classical (crack closure). The project started with a literature survey. This resulted in a proposal for specimen design and selection of testing techniques and project partners. Manufacturing of specimens in 12% Cr-steel, designation X22CrMoV12-1, and low cycle fatigue testing at 300 deg C testing temperature was carried out at Siemens Industrial Turbines in Finspaang. Specimens with 0, 25, 50, 75 and 100% consumed life, based on the number of cycles to presence of macroscopic cracks, were produced. MWM eddy current testing was carried out by Jentek Sensors Inc. in the USA. Measurements with nonlinear ultrasonics were carried out by Siemens Corporate Technology in Munich and at Blekinge Univ. The specimens were finally examined in SEM and light optical microscope in Finspaang. In the literature, results showing that early detection of fatigue damage by nondestructive testing is possible, can be found. By

  20. HBVRegDB: Annotation, comparison, detection and visualization of regulatory elements in hepatitis B virus sequences

    Directory of Open Access Journals (Sweden)

    Firth Andrew E

    2007-12-01

    Full Text Available Abstract Background The many Hepadnaviridae sequences available have widely varied functional annotation. The genomes are very compact (~3.2 kb but contain multiple layers of functional regulatory elements in addition to coding regions. Key regions are subject to purifying selection, as mutations in these regions will produce non-functional viruses. Results These genomic sequences have been organized into a structured database to facilitate research at the molecular level. HBVRegDB is a comparative genomic analysis tool with an integrated underlying sequence database. The database contains genomic sequence data from representative viruses. In addition to INSDC and RefSeq annotation, HBVRegDB also contains expert and systematically calculated annotations (e.g. promoters and comparative genome analysis results (e.g. blastn, tblastx. It also contains analyses based on curated HBV alignments. Information about conserved regions – including primary conservation (e.g. CDS-Plotcon and RNA secondary structure predictions (e.g. Alidot – is integrated into the database. A large amount of data is graphically presented using the GBrowse (Generic Genome Browser adapted for analysis of viral genomes. Flexible query access is provided based on any annotated genomic feature. Novel regulatory motifs can be found by analysing the annotated sequences. Conclusion HBVRegDB serves as a knowledge database and as a comparative genomic analysis tool for molecular biologists investigating HBV. It is publicly available and complementary to other viral and HBV focused datasets and tools http://hbvregdb.otago.ac.nz. The availability of multiple and highly annotated sequences of viral genomes in one database combined with comparative analysis tools facilitates detection of novel genomic elements.

  1. El genoma y sus metáforas: ¿Detectives, héroes o profetas? The genome and its metaphors: Detectives, heroes or prophets?

    Directory of Open Access Journals (Sweden)

    M.C. Davo

    2003-02-01

    tres tipos de metáforas se caracterizan por su intento de dotar de intencionalidad a los genes. Según estas observaciones, el punto de vista tecnocrático es el que parece estar prevaleciendo frente al religioso o el bélico, y es el que puede ejercer una mayor influencia en la construcción futura de la cultura de salud.The «new genetics», or the impetus given to this discipline by the Genome Project, aims to a change of paradigm of the Health Sciences. This change is postulated from a phenotypic approach to a genotypic one, thereby excluding the influence of the environment, which could seriously undermine the grounds for the development and exercise of Public Health. Since the beginning of the genome project, information on genetic discoveries has frequently been reported in the mass media. Metaphors are often used by geneticists and journalists to convey the complex concepts of genetic research for which there are no equivalents in the lay language. The media do not merely shape the social agenda but also provide the space in which health culture is constructed. We present the results of a preliminary study exploring the metaphors used in the three most widely-read national daily newspapers in Spain, namely ABC, El Pais and El Mundo, when reporting news of the «new genetics». The possible consequences of the natural history of these metaphors, or the process through which figurative terms acquire a literal meaning, are discussed. A preliminary taxonomy for the metaphors identified was developed. Fifty-one out of 342 identified headings (14.8% contained metaphors. Strategic metaphors such as «program», «control», «code», «map», and «puzzle», were the most commonly used, followed by teleological ones such as «mystery» or «God language» and finally war-like metaphors such as «attack», «defeat», and «capture». The three groups of metaphors are characterized by an attempt to giving intentionality to genes. Strategic metaphors predominated over

  2. Detection and genomic characterization of motility in Lactobacillus curvatus: confirmation of motility in a species outside the Lactobacillus salivarius clade.

    Science.gov (United States)

    Cousin, Fabien J; Lynch, Shónagh M; Harris, Hugh M B; McCann, Angela; Lynch, Denise B; Neville, B Anne; Irisawa, Tomohiro; Okada, Sanae; Endo, Akihito; O'Toole, Paul W

    2015-02-01

    Lactobacillus is the largest genus within the lactic acid bacteria (LAB), with almost 180 species currently identified. Motility has been reported for at least 13 Lactobacillus species, all belonging to the Lactobacillus salivarius clade. Motility in lactobacilli is poorly characterized. It probably confers competitive advantages, such as superior nutrient acquisition and niche colonization, but it could also play an important role in innate immune system activation through flagellin–Toll-like receptor 5 (TLR5) interaction. We now report strong evidence of motility in a species outside the L. salivarius clade, Lactobacillus curvatus (strain NRIC0822). The motility of L. curvatus NRIC 0822 was revealed by phase-contrast microscopy and soft-agar motility assays. Strain NRIC 0822 was motile at temperatures between 15 °C and 37 °C, with a range of different carbohydrates, and under varying atmospheric conditions. We sequenced the L. curvatus NRIC 0822 genome, which revealed that the motility genes are organized in a single operon and that the products are very similar (>98.5% amino acid similarity over >11,000 amino acids) to those encoded by the motility operon of Lactobacillus acidipiscis KCTC 13900 (shown for the first time to be motile also). Moreover, the presence of a large number of mobile genetic elements within and flanking the motility operon of L. curvatus suggests recent horizontal transfer between members of two distinct Lactobacillus clades: L. acidipiscis in the L. salivarius clade and L. curvatus inthe L. sakei clade. This study provides novel phenotypic, genetic, and phylogenetic insights into flagellum-mediated motility in lactobacilli.

  3. iLOCi: a SNP interaction prioritization technique for detecting epistasis in genome-wide association studies

    Directory of Open Access Journals (Sweden)

    Piriyapongsa Jittima

    2012-12-01

    Full Text Available Abstract Background Genome-wide association studies (GWAS do not provide a full account of the heritability of genetic diseases since gene-gene interactions, also known as epistasis are not considered in single locus GWAS. To address this problem, a considerable number of methods have been developed for identifying disease-associated gene-gene interactions. However, these methods typically fail to identify interacting markers explaining more of the disease heritability over single locus GWAS, since many of the interactions significant for disease are obscured by uninformative marker interactions e.g., linkage disequilibrium (LD. Results In this study, we present a novel SNP interaction prioritization algorithm, named iLOCi (Interacting Loci. This algorithm accounts for marker dependencies separately in case and control groups. Disease-associated interactions are then prioritized according to a novel ranking score calculated from the difference in marker dependencies for every possible pair between case and control groups. The analysis of a typical GWAS dataset can be completed in less than a day on a standard workstation with parallel processing capability. The proposed framework was validated using simulated data and applied to real GWAS datasets using the Wellcome Trust Case Control Consortium (WTCCC data. The results from simulated data showed the ability of iLOCi to identify various types of gene-gene interactions, especially for high-order interaction. From the WTCCC data, we found that among the top ranked interacting SNP pairs, several mapped to genes previously known to be associated with disease, and interestingly, other previously unreported genes with biologically related roles. Conclusion iLOCi is a powerful tool for uncovering true disease interacting markers and thus can provide a more complete understanding of the genetic basis underlying complex disease. The program is available for download at http://www4a.biotec.or.th/GI/tools/iloci.

  4. Identification of circo-like virus-Brazil genomic sequences in raw sewage from the metropolitan area of São Paulo: evidence of circulation two and three years after the first detection

    Directory of Open Access Journals (Sweden)

    Silvana Beres Castrignano

    Full Text Available BACKGROUND Two novel viruses named circo-like virus-Brazil (CLV-BR hs1 and hs2 were previously discovered in a Brazilian human fecal sample through metagenomics. CLV-BR hs1 and hs2 possess a small circular DNA genome encoding a replication initiator protein (Rep, and the two genomes exhibit 92% nucleotide identity with each other. Phylogenetic analysis based on the Rep protein showed that CLV-BRs do not cluster with circoviruses, nanoviruses, geminiviruses or cycloviruses. OBJECTIVE The aim of this study was to search for CLV-BR genomes in sewage and reclaimed water samples from the metropolitan area of São Paulo, Brazil, to verify whether the first detection of these viruses was an isolated finding. METHODS Sewage and reclaimed water samples collected concomitantly during the years 2005-2006 were purified and concentrated using methodologies designed for the study of viruses. A total of 177 treated reclaimed water samples were grouped into five pools, as were 177 treated raw sewage samples. Nucleic acid extraction, polymerase chain reaction (PCR amplification and Sanger sequencing were then performed.e FINDINGS CLV-BR genomes were detected in two pools of sewage samples, p6 and p9. Approximately 28% and 51% of the CLV-BR genome was amplified from p6 and p9, respectively, including 76% of the Rep gene. The detected genomes are most likely related to CLV-BR hs1. Comparative analysis showed several synonymous substitutions within Rep-encoding sequences, suggesting purifying selection for this gene, as has been observed for other eukaryotic circular Rep-encoding single-stranded DNA (CRESS-DNA viruses. MAIN CONCLUSION The results therefore indicated that CLV-BR has continued to circulate in Brazil two and three years after first being detected.

  5. Binding properties of SUMO-interacting motifs (SIMs) in yeast.

    Science.gov (United States)

    Jardin, Christophe; Horn, Anselm H C; Sticht, Heinrich

    2015-03-01

    Small ubiquitin-like modifier (SUMO) conjugation and interaction play an essential role in many cellular processes. A large number of yeast proteins is known to interact non-covalently with SUMO via short SUMO-interacting motifs (SIMs), but the structural details of this interaction are yet poorly characterized. In the present work, sequence analysis of a large dataset of 148 yeast SIMs revealed the existence of a hydrophobic core binding motif and a preference for acidic residues either within or adjacent to the core motif. Thus the sequence properties of yeast SIMs are highly similar to those described for human. Molecular dynamics simulations were performed to investigate the binding preferences for four representative SIM peptides differing in the number and distribution of acidic residues. Furthermore, the relative stability of two previously observed alternative binding orientations (parallel, antiparallel) was assessed. For all SIMs investigated, the antiparallel binding mode remained stable in the simulations and the SIMs were tightly bound via their hydrophobic core residues supplemented by polar interactions of the acidic residues. In contrary, the stability of the parallel binding mode is more dependent on the sequence features of the SIM motif like the number and position of acidic residues or the presence of additional adjacent interaction motifs. This information should be helpful to enhance the prediction of SIMs and their binding properties in different organisms to facilitate the reconstruction of the SUMO interactome.

  6. RegRNA: an integrated web server for identifying regulatory RNA motifs and elements

    OpenAIRE

    Huang, Hsi-Yuan; Chien, Chia-Hung; Jen, Kuan-Hua; Huang, Hsien-Da

    2006-01-01

    Numerous regulatory structural motifs have been identified as playing essential roles in transcriptional and post-transcriptional regulation of gene expression. RegRNA is an integrated web server for identifying the homologs of regulatory RNA motifs and elements against an input mRNA sequence. Both sequence homologs and structural homologs of regulatory RNA motifs can be recognized. The regulatory RNA motifs supported in RegRNA are categorized into several classes: (i) motifs in mRNA 5′-untra...

  7. Selection of functional 2A sequences within foot-and-mouth disease virus; requirements for the NPGP motif with a distinct codon bias.

    Science.gov (United States)

    Kjær, Jonas; Belsham, Graham J

    2018-01-01

    Foot-and-mouth disease virus (FMDV) has a positive-sense ssRNA genome including a single, large, open reading frame. Splitting of the encoded polyprotein at the 2A/2B junction is mediated by the 2A peptide (18 residues long), which induces a nonproteolytic, cotranslational "cleavage" at its own C terminus. A conserved feature among variants of 2A is the C-terminal motif N 16 P 17 G 18 /P 19 , where P 19 is the first residue of 2B. It has been shown previously that certain amino acid substitutions can be tolerated at residues E 14 , S 15 , and N 16 within the 2A sequence of infectious FMDVs, but no variants at residues P 17 , G 18 , or P 19 have been identified. In this study, using highly degenerate primers, we analyzed if any other residues can be present at each position of the NPG/P motif within infectious FMDV. No alternative forms of this motif were found to be encoded by rescued FMDVs after two, three, or four passages. However, surprisingly, a clear codon preference for the wt nucleotide sequence encoding the NPGP motif within these viruses was observed. Indeed, the codons selected to code for P 17 and P 19 within this motif were distinct; thus the synonymous codons are not equivalent. © 2018 Kjær and Belsham; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  8. Counting of oligomers in sequences generated by markov chains for DNA motif discovery.

    Science.gov (United States)

    Shan, Gao; Zheng, Wei-Mou

    2009-02-01

    By means of the technique of the imbedded Markov chain, an efficient algorithm is proposed to exactly calculate first, second moments of word counts and the probability for a word to occur at least once in random texts generated by a Markov chain. A generating function is introduced directly from the imbedded Markov chain to derive asymptotic approximations for the problem. Two Z-scores, one based on the number of sequences with hits and the other on the total number of word hits in a set of sequences, are examined for discovery of motifs on a set of promoter sequences extracted from A. thaliana genome. Source code is available at http://www.itp.ac.cn/zheng/oligo.c.

  9. Whole genome detection of rotavirus mixed infections in human, porcine and bovine samples co-infected with various rotavirus strains collected from sub-Saharan Africa.

    Science.gov (United States)

    Nyaga, Martin M; Jere, Khuzwayo C; Esona, Mathew D; Seheri, Mapaseka L; Stucker, Karla M; Halpin, Rebecca A; Akopov, Asmik; Stockwell, Timothy B; Peenze, Ina; Diop, Amadou; Ndiaye, Kader; Boula, Angeline; Maphalala, Gugu; Berejena, Chipo; Mwenda, Jason M; Steele, A Duncan; Wentworth, David E; Mphahlele, M Jeffrey

    2015-04-01

    Group A rotaviruses (RVA) are among the main global causes of severe diarrhea in children under the age of 5years. Strain diversity, mixed infections and untypeable RVA strains are frequently reported in Africa. We analysed rotavirus-positive human stool samples (n=13) obtained from hospitalised children under the age of 5years who presented with acute gastroenteritis at sentinel hospital sites in six African countries, as well as bovine and porcine stool samples (n=1 each), to gain insights into rotavirus diversity and evolution. Polyacrylamide gel electrophoresis (PAGE) analysis and genotyping with G-(VP7) and P-specific (VP4) typing primers suggested that 13 of the 15 samples contained more than 11 segments and/or mixed G/P genotypes. Full-length amplicons for each segment were generated using RVA-specific primers and sequenced using the Ion Torrent and/or Illumina MiSeq next-generation sequencing platforms. Sequencing detected at least one segment in each sample for which duplicate sequences, often having distinct genotypes, existed. This supported and extended the PAGE and RT-PCR genotyping findings that suggested these samples were collected from individuals that had mixed rotavirus infections. The study reports the first porcine (MRC-DPRU1567) and bovine (MRC-DPRU3010) mixed infections. We also report a unique genome segment 9 (VP7), whose G9 genotype belongs to lineage VI and clusters with porcine reference strains. Previously, African G9 strains have all been in lineage III. Furthermore, additional RVA segments isolated from humans have a clear evolutionary relationship with porcine, bovine and ovine rotavirus sequences, indicating relatively recent interspecies transmission and reassortment. Thus, multiple RVA strains from sub-Saharan Africa are infecting mammalian hosts with unpredictable variations in their gene segment combinations. Whole-genome sequence analyses of mixed RVA strains underscore the considerable diversity of rotavirus sequences and

  10. Sequence-specific DNA binding by MYC/MAX to low-affinity non-E-box motifs.

    Directory of Open Access Journals (Sweden)

    Michael Allevato

    Full Text Available The MYC oncoprotein regulates transcription of a large fraction of the genome as an obligatory heterodimer with the transcription factor MAX. The MYC:MAX heterodimer and MAX:MAX homodimer (hereafter MYC/MAX bind Enhancer box (E-box DNA elements (CANNTG and have the greatest affinity for the canonical MYC E-box (CME CACGTG. However, MYC:MAX also recognizes E-box variants and was reported to bind DNA in a "non-specific" fashion in vitro and in vivo. Here, in order to identify potential additional non-canonical binding sites for MYC/MAX, we employed high throughput in vitro protein-binding microarrays, along with electrophoretic mobility-shift assays and bioinformatic analyses of MYC-bound genomic loci in vivo. We identified all hexameric motifs preferentially bound by MYC/MAX in vitro, which include the low-affinity non-E-box sequence AACGTT, and found that the vast majority (87% of MYC-bound genomic sites in a human B cell line contain at least one of the top 21 motifs bound by MYC:MAX in vitro. We further show that high MYC/MAX concentrations are needed for specific binding to the low-affinity sequence AACGTT in vitro and that elevated MYC levels in vivo more markedly increase the occupancy of AACGTT sites relative to CME sites, especially at distal intergenic and intragenic loci. Hence, MYC binds diverse DNA motifs with a broad range of affinities in a sequence-specific and dose-dependent manner, suggesting that MYC overexpression has more selective effects on the tumor transcriptome than previously thought.

  11. Principal component analysis for predicting transcription-factor binding motifs from array-derived data

    Directory of Open Access Journals (Sweden)

    Vincenti Matthew P

    2005-11-01

    Full Text Available Abstract Background The responses to interleukin 1 (IL-1 in human chondrocytes constitute a complex regulatory mechanism, where multiple transcription factors interact combinatorially to transcription-factor binding motifs (TFBMs. In order to select a critical set of TFBMs from genomic DNA information and an array-derived data, an efficient algorithm to solve a combinatorial optimization problem is required. Although computational approaches based on evolutionary algorithms are commonly employed, an analytical algorithm would be useful to predict TFBMs at nearly no computational cost and evaluate varying modelling conditions. Singular value decomposition (SVD is a powerful method to derive primary components of a given matrix. Applying SVD to a promoter matrix defined from regulatory DNA sequences, we derived a novel method to predict the critical set of TFBMs. Results The promoter matrix was defined to establish a quantitative relationship between the IL-1-driven mRNA alteration and genomic DNA sequences of the IL-1 responsive genes. The matrix was decomposed with SVD, and the effects of 8 potential TFBMs (5'-CAGGC-3', 5'-CGCCC-3', 5'-CCGCC-3', 5'-ATGGG-3', 5'-GGGAA-3', 5'-CGTCC-3', 5'-AAAGG-3', and 5'-ACCCA-3' were predicted from a pool of 512 random DNA sequences. The prediction included matches to the core binding motifs of biologically known TFBMs such as AP2, SP1, EGR1, KROX, GC-BOX, ABI4, ETF, E2F, SRF, STAT, IK-1, PPARγ, STAF, ROAZ, and NFκB, and their significance was evaluated numerically using Monte Carlo simulation and genetic algorithm. Conclusion The described SVD-based prediction is an analytical method to provide a set of potential TFBMs involved in transcriptional regulation. The results would be useful to evaluate analytically a contribution of individual DNA sequences.

  12. How pathogens use linear motifs to perturb host cell networks

    KAUST Repository

    Via, Allegra; Uyar, Bora; Brun, Christine; Zanzoni, Andreas

    2015-01-01

    Molecular mimicry is one of the powerful stratagems that pathogens employ to colonise their hosts and take advantage of host cell functions to guarantee their replication and dissemination. In particular, several viruses have evolved the ability to interact with host cell components through protein short linear motifs (SLiMs) that mimic host SLiMs, thus facilitating their internalisation and the manipulation of a wide range of cellular networks. Here we present convincing evidence from the literature that motif mimicry also represents an effective, widespread hijacking strategy in prokaryotic and eukaryotic parasites. Further insights into host motif mimicry would be of great help in the elucidation of the molecular mechanisms behind host cell invasion and the development of anti-infective therapeutic strategies.

  13. A proportion of mutations fixed in the genomes of in vitro selected isogenic drug-resistant Mycobacterium tuberculosis mutants can be detected as minority variants in the parent culture

    KAUST Repository

    Bergval, Indra; Coll, Francesc; Schuitema, Anja; de Ronde, Hans; Mallard, Kim; Pain, Arnab; McNerney, Ruth; Clark, Taane G.; Anthony, Richard M.

    2015-01-01

    We studied genomic variation in a previously selected collection of isogenic Mycobacterium tuberculosis laboratory strains subjected to one or two rounds of antibiotic selection. Whole genome sequencing analysis identified eleven single, unique mutations (four synonymous, six non-synonymous, one intergenic), in addition to drug resistance-conferring mutations, that were fixed in the genomes of six monoresistant strains. Eight loci, present as minority variants (five non-synonymous, three synonymous) in the genome of the susceptible parent strain, became fixed in the genomes of multiple daughter strains. None of these mutations are known to be involved with drug resistance. Our results confirm previously observed genomic stability for M. tuberculosis, although the parent strain had accumulated allelic variants at multiple locations in an antibiotic-free in vitro environment. It is therefore likely to assume that these so-called hitchhiking mutations were co-selected and fixed in multiple daughter strains during antibiotic selection. The presence of multiple allelic variations, accumulated under non-selecti