WorldWideScience

Sample records for genomic motif detection

  1. Highly scalable Ab initio genomic motif identification

    KAUST Repository

    Marchand, Benoit

    2011-01-01

    We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time. Copyright 2011 ACM.

  2. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data.

    Science.gov (United States)

    Tran, Ngoc Tam L; Huang, Chun-Hsi

    2014-02-20

    ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data.

  3. RMOD: a tool for regulatory motif detection in signaling network.

    Directory of Open Access Journals (Sweden)

    Jinki Kim

    Full Text Available Regulatory motifs are patterns of activation and inhibition that appear repeatedly in various signaling networks and that show specific regulatory properties. However, the network structures of regulatory motifs are highly diverse and complex, rendering their identification difficult. Here, we present a RMOD, a web-based system for the identification of regulatory motifs and their properties in signaling networks. RMOD finds various network structures of regulatory motifs by compressing the signaling network and detecting the compressed forms of regulatory motifs. To apply it into a large-scale signaling network, it adopts a new subgraph search algorithm using a novel data structure called path-tree, which is a tree structure composed of isomorphic graphs of query regulatory motifs. This algorithm was evaluated using various sizes of signaling networks generated from the integration of various human signaling pathways and it showed that the speed and scalability of this algorithm outperforms those of other algorithms. RMOD includes interactive analysis and auxiliary tools that make it possible to manipulate the whole processes from building signaling network and query regulatory motifs to analyzing regulatory motifs with graphical illustration and summarized descriptions. As a result, RMOD provides an integrated view of the regulatory motifs and mechanism underlying their regulatory motif activities within the signaling network. RMOD is freely accessible online at the following URL: http://pks.kaist.ac.kr/rmod.

  4. Detecting Statistically Significant Communities of Triangle Motifs in Undirected Networks

    Science.gov (United States)

    2016-04-26

    extend the work of Perry et al. [6] by developing a statistical framework that supports the detection of triangle motif- based clusters in complex...priori, the need for triangle motif- based clustering. 2. Developed an algorithm for clustering undirected networks, where the triangle con guration was...13 5 Application to Real Networks 18 5.1 2012 FBS Football Schedule Network

  5. Genome wide identification of regulatory motifs in Bacillus subtilis

    Directory of Open Access Journals (Sweden)

    Siggia Eric D

    2003-05-01

    Full Text Available Abstract Background To explain the vastly different phenotypes exhibited by the same organism under different conditions, it is essential that we understand how the organism's genes are coordinately regulated. While there are many excellent tools for predicting sequences encoding proteins or RNA genes, few algorithms exist to predict regulatory sequences on a genome wide scale with no prior information. Results To identify motifs involved in the control of transcription, an algorithm was developed that searches upstream of operons for improbably frequent dimers. The algorithm was applied to the B. subtilis genome, which is predicted to encode for approximately 200 DNA binding proteins. The dimers found to be over-represented could be clustered into 317 distinct groups, each thought to represent a class of motifs uniquely recognized by some transcription factor. For each cluster of dimers, a representative weight matrix was derived and scored over the regions upstream of the operons to predict the sites recognized by the cluster's factor, and a putative regulon of the operons immediately downstream of the sites was inferred. The distribution in number of operons per predicted regulon is comparable to that for well characterized transcription factors. The most highly over-represented dimers matched σA, the T-box, and σW sites. We have evidence to suggest that at least 52 of our clusters of dimers represent actual regulatory motifs, based on the groups' weight matrix matches to experimentally characterized sites, the functional similarity of the component operons of the groups' regulons, and the positional biases of the weight matrix matches. All predictions are assigned a significance value, and thresholds are set to avoid false positives. Where possible, we examine our false negatives, drawing examples from known regulatory motifs and regulons inferred from RNA expression data. Conclusions We have demonstrated that in the case of B. subtilis

  6. RSAT::Plants: Motif Discovery in ChIP-Seq Peaks of Plant Genomes.

    Science.gov (United States)

    Castro-Mondragon, Jaime A; Rioualen, Claire; Contreras-Moreira, Bruno; van Helden, Jacques

    2016-01-01

    In this protocol, we explain how to run ab initio motif discovery in order to gather putative transcription factor binding motifs (TFBMs) from sets of genomic regions returned by ChIP-seq experiments. The protocol starts from a set of peak coordinates (genomic regions) which can be either downloaded from ChIP-seq databases, or produced by a peak-calling software tool. We provide a concise description of the successive steps to discover motifs, cluster the motifs returned by different motif discovery algorithms, and compare them with reference motif databases. The protocol is documented with detailed notes explaining the rationale underlying the choice of options. The interpretation of the results is illustrated with an example from the model plant Arabidopsis thaliana.

  7. Genome-Wide Motif Statistics are Shaped by DNA Binding Proteins over Evolutionary Time Scales

    Directory of Open Access Journals (Sweden)

    Long Qian

    2016-10-01

    Full Text Available The composition of a genome with respect to all possible short DNA motifs impacts the ability of DNA binding proteins to locate and bind their target sites. Since nonfunctional DNA binding can be detrimental to cellular functions and ultimately to organismal fitness, organisms could benefit from reducing the number of nonfunctional DNA binding sites genome wide. Using in vitro measurements of binding affinities for a large collection of DNA binding proteins, in multiple species, we detect a significant global avoidance of weak binding sites in genomes. We demonstrate that the underlying evolutionary process leaves a distinct genomic hallmark in that similar words have correlated frequencies, a signal that we detect in all species across domains of life. We consider the possibility that natural selection against weak binding sites contributes to this process, and using an evolutionary model we show that the strength of selection needed to maintain global word compositions is on the order of point mutation rates. Likewise, we show that evolutionary mechanisms based on interference of protein-DNA binding with replication and mutational repair processes could yield similar results and operate with similar rates. On the basis of these modeling and bioinformatic results, we conclude that genome-wide word compositions have been molded by DNA binding proteins acting through tiny evolutionary steps over time scales spanning millions of generations.

  8. Genome-wide conserved consensus transcription factor binding motifs are hyper-methylated

    Directory of Open Access Journals (Sweden)

    Down Thomas A

    2010-09-01

    Full Text Available Abstract Background DNA methylation can regulate gene expression by modulating the interaction between DNA and proteins or protein complexes. Conserved consensus motifs exist across the human genome ("predicted transcription factor binding sites": "predicted TFBS" but the large majority of these are proven by chromatin immunoprecipitation and high throughput sequencing (ChIP-seq not to be biological transcription factor binding sites ("empirical TFBS". We hypothesize that DNA methylation at conserved consensus motifs prevents promiscuous or disorderly transcription factor binding. Results Using genome-wide methylation maps of the human heart and sperm, we found that all conserved consensus motifs as well as the subset of those that reside outside CpG islands have an aggregate profile of hyper-methylation. In contrast, empirical TFBS with conserved consensus motifs have a profile of hypo-methylation. 40% of empirical TFBS with conserved consensus motifs resided in CpG islands whereas only 7% of all conserved consensus motifs were in CpG islands. Finally we further identified a minority subset of TF whose profiles are either hypo-methylated or neutral at their respective conserved consensus motifs implicating that these TF may be responsible for establishing or maintaining an un-methylated DNA state, or whose binding is not regulated by DNA methylation. Conclusions Our analysis supports the hypothesis that at least for a subset of TF, empirical binding to conserved consensus motifs genome-wide may be controlled by DNA methylation.

  9. Analysis of genomic sequence motifs for deciphering transcription factor binding and transcriptional regulation in eukaryotic cells

    Directory of Open Access Journals (Sweden)

    Valentina eBoeva

    2016-02-01

    Full Text Available Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation.

  10. MotifMap-RNA: a genome-wide map of RBP binding sites.

    Science.gov (United States)

    Liu, Yu; Sun, Sha; Bredy, Timothy; Wood, Marcelo; Spitale, Robert C; Baldi, Pierre

    2017-07-01

    RNA plays a critical role in gene expression and its regulation. RNA binding proteins (RBPs), in turn, are important regulators of RNA. Thanks to the availability of large scale data for RBP binding motifs and in vivo binding sites results in the form of eCLIP experiments, it is now possible to computationally predict RBP binding sites across the whole genome. We describe MotifMap-RNA, an extension of MotifMap which predicts binding sites for RBP motifs across human and mouse genomes and allows large scale querying of predicted binding sites. The data and corresponding web server are available from: http://motifmap-rna.ics.uci.edu/ as part of the MotifMap web portal. rspitale@uci.edu or pfbaldi@uci.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

  11. Genome Analysis of Conserved Dehydrin Motifs in Vascular Plants

    Directory of Open Access Journals (Sweden)

    Ahmad A. Malik

    2017-05-01

    Full Text Available Dehydrins, a large family of abiotic stress proteins, are defined by the presence of a mostly conserved motif known as the K-segment, and may also contain two other conserved motifs known as the Y-segment and S-segment. Using the dehydrin literature, we developed a sequence motif definition of the K-segment, which we used to create a large dataset of dehydrin sequences by searching the Pfam00257 dehydrin dataset and the Phytozome 10 sequences of vascular plants. A comprehensive analysis of these sequences reveals that lysine residues are highly conserved in the K-segment, while the amino acid type is often conserved at other positions. Despite the Y-segment name, the central tyrosine is somewhat conserved, but can be substituted with two other small aromatic amino acids (phenylalanine or histidine. The S-segment contains a series of serine residues, but in some proteins is also preceded by a conserved LHR sequence. In many dehydrins containing all three of these motifs the S-segment is linked to the K-segment by a GXGGRRKK motif (where X can be any amino acid, suggesting a functional linkage between these two motifs. An analysis of the sequences shows that the dehydrin architecture and several biochemical properties (isoelectric point, molecular mass, and hydrophobicity score are dependent on each other, and that some dehydrin architectures are overexpressed during certain abiotic stress, suggesting that they may be optimized for a specific abiotic stress while others are involved in all forms of dehydration stress (drought, cold, and salinity.

  12. Systematic discovery of regulatory motifs in Fusarium graminearum by comparing four Fusarium genomes

    Directory of Open Access Journals (Sweden)

    Kistler Corby

    2010-03-01

    Full Text Available Abstract Background Fusarium graminearum (Fg, a major fungal pathogen of cultivated cereals, is responsible for billions of dollars in agriculture losses. There is a growing interest in understanding the transcriptional regulation of this organism, especially the regulation of genes underlying its pathogenicity. The generation of whole genome sequence assemblies for Fg and three closely related Fusarium species provides a unique opportunity for such a study. Results Applying comparative genomics approaches, we developed a computational pipeline to systematically discover evolutionarily conserved regulatory motifs in the promoter, downstream and the intronic regions of Fg genes, based on the multiple alignments of sequenced Fusarium genomes. Using this method, we discovered 73 candidate regulatory motifs in the promoter regions. Nearly 30% of these motifs are highly enriched in promoter regions of Fg genes that are associated with a specific functional category. Through comparison to Saccharomyces cerevisiae (Sc and Schizosaccharomyces pombe (Sp, we observed conservation of transcription factors (TFs, their binding sites and the target genes regulated by these TFs related to pathways known to respond to stress conditions or phosphate metabolism. In addition, this study revealed 69 and 39 conserved motifs in the downstream regions and the intronic regions, respectively, of Fg genes. The top intronic motif is the splice donor site. For the downstream regions, we noticed an intriguing absence of the mammalian and Sc poly-adenylation signals among the list of conserved motifs. Conclusion This study provides the first comprehensive list of candidate regulatory motifs in Fg, and underscores the power of comparative genomics in revealing functional elements among related genomes. The conservation of regulatory pathways among the Fusarium genomes and the two yeast species reveals their functional significance, and provides new insights in their

  13. Prevalent RNA recognition motif duplication in the human genome.

    Science.gov (United States)

    Tsai, Yihsuan S; Gomez, Shawn M; Wang, Zefeng

    2014-05-01

    The sequence-specific recognition of RNA by proteins is mediated through various RNA binding domains, with the RNA recognition motif (RRM) being the most frequent and present in >50% of RNA-binding proteins (RBPs). Many RBPs contain multiple RRMs, and it is unclear how each RRM contributes to the binding specificity of the entire protein. We found that RRMs within the same RBP (i.e., sibling RRMs) tend to have significantly higher similarity than expected by chance. Sibling RRM pairs from RBPs shared by multiple species tend to have lower similarity than those found only in a single species, suggesting that multiple RRMs within the same protein might arise from domain duplication followed by divergence through random mutations. This finding is exemplified by a recent RRM domain duplication in DAZ proteins and an ancient duplication in PABP proteins. Additionally, we found that different similarities between sibling RRMs are associated with distinct functions of an RBP and that the RBPs tend to contain repetitive sequences with low complexity. Taken together, this study suggests that the number of RBPs with multiple RRMs has expanded in mammals and that the multiple sibling RRMs may recognize similar target motifs in a cooperative manner.

  14. A Simple Decision Rule for Recognition of Poly(A) Tail Signal Motifs in Human Genome

    KAUST Repository

    AbouEisha, Hassan M.

    2015-05-12

    Background is the numerous attempts were made to predict motifs in genomic sequences that correspond to poly (A) tail signals. Vast portion of this effort has been directed to a plethora of nonlinear classification methods. Even when such approaches yield good discriminant results, identifying dominant features of regulatory mechanisms nevertheless remains a challenge. In this work, we look at decision rules that may help identifying such features. Findings are we present a simple decision rule for classification of candidate poly (A) tail signal motifs in human genomic sequence obtained by evaluating features during the construction of gradient boosted trees. We found that values of a single feature based on the frequency of adenine in the genomic sequence surrounding candidate signal and the number of consecutive adenine molecules in a well-defined region immediately following the motif displays good discriminative potential in classification of poly (A) tail motifs for samples covered by the rule. Conclusions is the resulting simple rule can be used as an efficient filter in construction of more complex poly(A) tail motifs classification algorithms.

  15. Detecting DNA regulatory motifs by incorporating positional trendsin information content

    Energy Technology Data Exchange (ETDEWEB)

    Kechris, Katherina J.; van Zwet, Erik; Bickel, Peter J.; Eisen,Michael B.

    2004-05-04

    On the basis of the observation that conserved positions in transcription factor binding sites are often clustered together, we propose a simple extension to the model-based motif discovery methods. We assign position-specific prior distributions to the frequency parameters of the model, penalizing deviations from a specified conservation profile. Examples with both simulated and real data show that this extension helps discover motifs as the data become noisier or when there is a competing false motif.

  16. Structure and sequence motifs in the HIV-1 RNA genome

    NARCIS (Netherlands)

    van Bel, N.

    2015-01-01

    The untranslated leader of the HIV-1 RNA genome contains some 350 nucleotides and is highly conserved among virus isolates. Several characteristic hairpin structures that regulate important virus replication steps, such as dimerization and packaging in virion particles, are clustered in this leader.

  17. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal M.

    2011-11-15

    Motivation: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants. The Author(s) 2011. Published by Oxford University Press. All rights reserved.

  18. Genome-Wide Analysis of the Transcription Start Sites and Promoter Motifs of Phytoplasmas.

    Science.gov (United States)

    Nijo, Takamichi; Neriya, Yutaro; Koinuma, Hiroaki; Iwabuchi, Nozomu; Kitazawa, Yugo; Tanno, Kazuyuki; Okano, Yukari; Maejima, Kensaku; Yamaji, Yasuyuki; Oshima, Kenro; Namba, Shigetou

    2017-12-01

    Phytoplasmas are obligate intracellular parasitic bacteria that infect both plants and insects. We previously identified the sigma factor RpoD-dependent consensus promoter sequence of phytoplasma. However, the genome-wide landscape of RNA transcripts, including non-coding RNAs (ncRNAs) and RpoD-independent promoter elements, was still unknown. In this study, we performed an improved RNA sequencing analysis for genome-wide identification of the transcription start sites (TSSs) and the consensus promoter sequences. We constructed cDNA libraries using a random adenine/thymine hexamer primer, in addition to a conventional random hexamer primer, for efficient sequencing of 5'-termini of AT-rich phytoplasma RNAs. We identified 231 TSSs, which were classified into four categories: mRNA TSSs, internal sense TSSs, antisense TSSs (asTSSs), and orphan TSSs (oTSSs). The presence of asTSSs and oTSSs indicated the genome-wide transcription of ncRNAs, which might act as regulatory ncRNAs in phytoplasmas. This is the first description of genome-wide phytoplasma ncRNAs. Using a de novo motif discovery program, we identified two consensus motif sequences located upstream of the TSSs. While one was almost identical to the RpoD-dependent consensus promoter sequence, the other was an unidentified novel motif, which might be recognized by another transcription initiation factor. These findings are valuable for understanding the regulatory mechanism of phytoplasma gene expression.

  19. Nencki Genomics Database--Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs.

    Science.gov (United States)

    Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal

    2013-01-01

    We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql -h database.nencki-genomics.org -u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface.

  20. Using weakly conserved motifs hidden in secretion signals to identify type-III effectors from bacterial pathogen genomes.

    Directory of Open Access Journals (Sweden)

    Xiaobao Dong

    Full Text Available BACKGROUND: As one of the most important virulence factor types in gram-negative pathogenic bacteria, type-III effectors (TTEs play a crucial role in pathogen-host interactions by directly influencing immune signaling pathways within host cells. Based on the hypothesis that type-III secretion signals may be comprised of some weakly conserved sequence motifs, here we used profile-based amino acid pair information to develop an accurate TTE predictor. RESULTS: For a TTE or non-TTE, we first used a hidden Markov model-based sequence searching method (i.e., HHblits to detect its weakly homologous sequences and extracted the profile-based k-spaced amino acid pair composition (HH-CKSAAP from the N-terminal sequences. In the next step, the feature vector HH-CKSAAP was used to train a linear support vector machine model, which we designate as BEAN (Bacterial Effector ANalyzer. We compared our method with four existing TTE predictors through an independent test set, and our method revealed improved performance. Furthermore, we listed the most predictive amino acid pairs according to their weights in the established classification model. Evolutionary analysis shows that predictive amino acid pairs tend to be more conserved. Some predictive amino acid pairs also show significantly different position distributions between TTEs and non-TTEs. These analyses confirmed that some weakly conserved sequence motifs may play important roles in type-III secretion signals. Finally, we also used BEAN to scan one plant pathogen genome and showed that BEAN can be used for genome-wide TTE identification. The webserver and stand-alone version of BEAN are available at http://protein.cau.edu.cn:8080/bean/.

  1. Cloud-based MOTIFSIM: Detecting Similarity in Large DNA Motif Data Sets.

    Science.gov (United States)

    Tran, Ngoc Tam L; Huang, Chun-Hsi

    2017-05-01

    We developed the cloud-based MOTIFSIM on Amazon Web Services (AWS) cloud. The tool is an extended version from our web-based tool version 2.0, which was developed based on a novel algorithm for detecting similarity in multiple DNA motif data sets. This cloud-based version further allows researchers to exploit the computing resources available from AWS to detect similarity in multiple large-scale DNA motif data sets resulting from the next-generation sequencing technology. The tool is highly scalable with expandable AWS.

  2. Genome Holography: Deciphering Function-Form Motifs from Gene Expression Data

    Science.gov (United States)

    Roth, Dalit; Regev, Tamar; Bransburg-Zabary, Sharron; Jacob, Eshel Ben

    2008-01-01

    Background DNA chips allow simultaneous measurements of genome-wide response of thousands of genes, i.e. system level monitoring of the gene-network activity. Advanced analysis methods have been developed to extract meaningful information from the vast amount of raw gene-expression data obtained from the microarray measurements. These methods usually aimed to distinguish between groups of subjects (e.g., cancer patients vs. healthy subjects) or identifying marker genes that help to distinguish between those groups. We assumed that motifs related to the internal structure of operons and gene-networks regulation are also embedded in microarray and can be deciphered by using proper analysis. Methodology/Principal Findings The analysis presented here is based on investigating the gene-gene correlations. We analyze a database of gene expression of Bacillus subtilis exposed to sub-lethal levels of 37 different antibiotics. Using unsupervised analysis (dendrogram) of the matrix of normalized gene-gene correlations, we identified the operons as they form distinct clusters of genes in the sorted correlation matrix. Applying dimension-reduction algorithm (Principal Component Analysis, PCA) to the matrices of normalized correlations reveals functional motifs. The genes are placed in a reduced 3-dimensional space of the three leading PCA eigen-vectors according to their corresponding eigen-values. We found that the organization of the genes in the reduced PCA space recovers motifs of the operon internal structure, such as the order of the genes along the genome, gene separation by non-coding segments, and translational start and end regions. In addition to the intra-operon structure, it is also possible to predict inter-operon relationships, operons sharing functional regulation factors, and more. In particular, we demonstrate the above in the context of the competence and sporulation pathways. Conclusions/Significance We demonstrated that by analyzing gene-gene correlation

  3. Detecting microsatellites within genomes: significant variation among algorithms

    Directory of Open Access Journals (Sweden)

    Rivals Eric

    2007-04-01

    Full Text Available Abstract Background Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker. Results Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp, regardless of motif. Conclusion Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions.

  4. Clustering and Candidate Motif Detection in Exosomal miRNAs by Application of Machine Learning Algorithms.

    Science.gov (United States)

    Gaur, Pallavi; Chaturvedi, Anoop

    2017-07-22

    The clustering pattern and motifs give immense information about any biological data. An application of machine learning algorithms for clustering and candidate motif detection in miRNAs derived from exosomes is depicted in this paper. Recent progress in the field of exosome research and more particularly regarding exosomal miRNAs has led much bioinformatic-based research to come into existence. The information on clustering pattern and candidate motifs in miRNAs of exosomal origin would help in analyzing existing, as well as newly discovered miRNAs within exosomes. Along with obtaining clustering pattern and candidate motifs in exosomal miRNAs, this work also elaborates the usefulness of the machine learning algorithms that can be efficiently used and executed on various programming languages/platforms. Data were clustered and sequence candidate motifs were detected successfully. The results were compared and validated with some available web tools such as 'BLASTN' and 'MEME suite'. The machine learning algorithms for aforementioned objectives were applied successfully. This work elaborated utility of machine learning algorithms and language platforms to achieve the tasks of clustering and candidate motif detection in exosomal miRNAs. With the information on mentioned objectives, deeper insight would be gained for analyses of newly discovered miRNAs in exosomes which are considered to be circulating biomarkers. In addition, the execution of machine learning algorithms on various language platforms gives more flexibility to users to try multiple iterations according to their requirements. This approach can be applied to other biological data-mining tasks as well.

  5. C-terminal motif prediction in eukaryotic proteomes using comparative genomics and statistical over-representation across protein families

    Directory of Open Access Journals (Sweden)

    Cutler Sean R

    2007-06-01

    Full Text Available Abstract Background The carboxy termini of proteins are a frequent site of activity for a variety of biologically important functions, ranging from post-translational modification to protein targeting. Several short peptide motifs involved in protein sorting roles and dependent upon their proximity to the C-terminus for proper function have already been characterized. As a limited number of such motifs have been identified, the potential exists for genome-wide statistical analysis and comparative genomics to reveal novel peptide signatures functioning in a C-terminal dependent manner. We have applied a novel methodology to the prediction of C-terminal-anchored peptide motifs involving a simple z-statistic and several techniques for improving the signal-to-noise ratio. Results We examined the statistical over-representation of position-specific C-terminal tripeptides in 7 eukaryotic proteomes. Sequence randomization models and simple-sequence masking were applied to the successful reduction of background noise. Similarly, as C-terminal homology among members of large protein families may artificially inflate tripeptide counts in an irrelevant and obfuscating manner, gene-family clustering was performed prior to the analysis in order to assess tripeptide over-representation across protein families as opposed to across all proteins. Finally, comparative genomics was used to identify tripeptides significantly occurring in multiple species. This approach has been able to predict, to our knowledge, all C-terminally anchored targeting motifs present in the literature. These include the PTS1 peroxisomal targeting signal (SKL*, the ER-retention signal (K/HDEL*, the ER-retrieval signal for membrane bound proteins (KKxx*, the prenylation signal (CC* and the CaaX box prenylation motif. In addition to a high statistical over-representation of these known motifs, a collection of significant tripeptides with a high propensity for biological function exists

  6. A motif-based search in bacterial genomes identifies the ortholog of the small RNA Yfr1 in all lineages of cyanobacteria

    Directory of Open Access Journals (Sweden)

    Axmann Ilka M

    2007-10-01

    Full Text Available Abstract Background Non-coding RNAs (ncRNA are regulators of gene expression in all domains of life. They control growth and differentiation, virulence, motility and various stress responses. The identification of ncRNAs can be a tedious process due to the heterogeneous nature of this molecule class and the missing sequence similarity of orthologs, even among closely related species. The small ncRNA Yfr1 has previously been found in the Prochlorococcus/Synechococcus group of marine cyanobacteria. Results Here we show that screening available genome sequences based on an RNA motif and followed by experimental analysis works successfully in detecting this RNA in all lineages of cyanobacteria. Yfr1 is an abundant ncRNA between 54 and 69 nt in size that is ubiquitous for cyanobacteria except for two low light-adapted strains of Prochlorococcus, MIT 9211 and SS120, in which it must have been lost secondarily. Yfr1 consists of two predicted stem-loop elements separated by an unpaired sequence of 16–20 nucleotides containing the ultraconserved undecanucleotide 5'-ACUCCUCACAC-3'. Conclusion Starting with an ncRNA previously found in a narrow group of cyanobacteria only, we show here the highly specific and sensitive identification of its homologs within all lineages of cyanobacteria, whereas it was not detected within the genome sequences of E. coli and of 7 other eubacteria belonging to the alpha-proteobacteria, chlorobiaceae and spirochaete. The integration of RNA motif prediction into computational pipelines for the detection of ncRNAs in bacteria appears as a promising step to improve the quality of such predictions.

  7. Comparisons of Copy Number, Genomic Structure, and Conserved Motifs for α-Amylase Genes from Barley, Rice, and Wheat

    Directory of Open Access Journals (Sweden)

    Qisen Zhang

    2017-10-01

    Full Text Available Barley is an important crop for the production of malt and beer. However, crops such as rice and wheat are rarely used for malting. α-amylase is the key enzyme that degrades starch during malting. In this study, we compared the genomic properties, gene copies, and conserved promoter motifs of α-amylase genes in barley, rice, and wheat. In all three crops, α-amylase consists of four subfamilies designated amy1, amy2, amy3, and amy4. In wheat and barley, members of amy1 and amy2 genes are localized on chromosomes 6 and 7, respectively. In rice, members of amy1 genes are found on chromosomes 1 and 2, and amy2 genes on chromosome 6. The barley genome has six amy1 members and three amy2 members. The wheat B genome contains four amy1 members and three amy2 members, while the rice genome has three amy1 members and one amy2 member. The B genome has mostly amy1 and amy2 members among the three wheat genomes. Amy1 promoters from all three crop genomes contain a GA-responsive complex consisting of a GA-responsive element (CAATAAA, pyrimidine box (CCTTTT and TATCCAT/C box. This study has shown that amy1 and amy2 from both wheat and barley have similar genomic properties, including exon/intron structures and GA-responsive elements on promoters, but these differ in rice. Like barley, wheat should have sufficient amy activity to degrade starch completely during malting. Other factors, such as high protein with haze issues and the lack of husk causing Lauting difficulty, may limit the use of wheat for brewing.

  8. Comparisons of Copy Number, Genomic Structure, and Conserved Motifs for α-Amylase Genes from Barley, Rice, and Wheat.

    Science.gov (United States)

    Zhang, Qisen; Li, Chengdao

    2017-01-01

    Barley is an important crop for the production of malt and beer. However, crops such as rice and wheat are rarely used for malting. α-amylase is the key enzyme that degrades starch during malting. In this study, we compared the genomic properties, gene copies, and conserved promoter motifs of α-amylase genes in barley, rice, and wheat. In all three crops, α-amylase consists of four subfamilies designated amy1, amy2 , amy3 , and amy4 . In wheat and barley, members of amy1 and amy2 genes are localized on chromosomes 6 and 7, respectively. In rice, members of amy1 genes are found on chromosomes 1 and 2, and amy2 genes on chromosome 6. The barley genome has six amy1 members and three amy2 members. The wheat B genome contains four amy1 members and three amy2 members, while the rice genome has three amy1 members and one amy2 member. The B genome has mostly amy1 and amy2 members among the three wheat genomes. Amy1 promoters from all three crop genomes contain a GA-responsive complex consisting of a GA-responsive element (CAATAAA), pyrimidine box (CCTTTT) and TATCCAT/C box. This study has shown that amy1 and amy2 from both wheat and barley have similar genomic properties, including exon/intron structures and GA-responsive elements on promoters, but these differ in rice. Like barley, wheat should have sufficient amy activity to degrade starch completely during malting. Other factors, such as high protein with haze issues and the lack of husk causing Lauting difficulty, may limit the use of wheat for brewing.

  9. Robust and Accurate Anomaly Detection in ECG Artifacts Using Time Series Motif Discovery

    Science.gov (United States)

    Sivaraks, Haemwaan

    2015-01-01

    Electrocardiogram (ECG) anomaly detection is an important technique for detecting dissimilar heartbeats which helps identify abnormal ECGs before the diagnosis process. Currently available ECG anomaly detection methods, ranging from academic research to commercial ECG machines, still suffer from a high false alarm rate because these methods are not able to differentiate ECG artifacts from real ECG signal, especially, in ECG artifacts that are similar to ECG signals in terms of shape and/or frequency. The problem leads to high vigilance for physicians and misinterpretation risk for nonspecialists. Therefore, this work proposes a novel anomaly detection technique that is highly robust and accurate in the presence of ECG artifacts which can effectively reduce the false alarm rate. Expert knowledge from cardiologists and motif discovery technique is utilized in our design. In addition, every step of the algorithm conforms to the interpretation of cardiologists. Our method can be utilized to both single-lead ECGs and multilead ECGs. Our experiment results on real ECG datasets are interpreted and evaluated by cardiologists. Our proposed algorithm can mostly achieve 100% of accuracy on detection (AoD), sensitivity, specificity, and positive predictive value with 0% false alarm rate. The results demonstrate that our proposed method is highly accurate and robust to artifacts, compared with competitive anomaly detection methods. PMID:25688284

  10. Large-scale discovery of promoter motifs in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Thomas A Down

    2007-01-01

    Full Text Available A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes.

  11. Deep sequencing of phage-displayed peptide libraries reveals sequence motif that detects norovirus

    Science.gov (United States)

    Hurwitz, Amy M.; Huang, Wanzhi; Estes, Mary K.; Atmar, Robert L.; Palzkill, Timothy

    2017-01-01

    Norovirus infections are the leading cause of non-bacterial gastroenteritis and result in about 21 million new cases and $2 billion in costs per year in the United States. Existing diagnostics have limited feasibility for point-of-care applications, so there is a clear need for more reliable, rapid, and simple-to-use diagnostic tools in order to contain outbreaks and prevent inappropriate treatments. In this study, a combination of phage display technology, deep sequencing and computational analysis was used to identify 12-mer peptides with specific binding to norovirus genotype GI.1 virus-like particles (VLPs). After biopanning, phage populations were sequenced and analyzed to identify a consensus peptide motif—YRSWXP. Two 12-mer peptides containing this sequence, NV-O-R5-3 and NV-O-R5-6, were further characterized to evaluate the motif's functional ability to detect VLPs and virus. Results indicated that these peptides effectively detect GI.1 VLPs in solid-phase peptide arrays, ELISAs and dot blots. Further, their specificity for the S-domain of the major capsid protein enables them to detect a wide range of GI and GII norovirus genotypes. Both peptides were able to detect virus in norovirus-positive clinical stool samples. Overall, the work reported here demonstrates the application of phage display coupled with next generation sequencing and computational analysis to uncover peptides with specific binding ability to a target protein for diagnostic applications. Further, the reagents characterized here can be integrated into existing diagnostic formats to detect clinically relevant genotypes of norovirus in stool. PMID:28035012

  12. Genomic Motifs as a Novel Indicator of the Relationship between Strains Isolated from the Epidemic of Porcine Epidemic Diarrhea in 2013-2014.

    Science.gov (United States)

    Yamamoto, Takehisa; Suzuki, Tohru; Ohashi, Seiichi; Miyazaki, Ayako; Tsutsui, Toshiyuki

    2016-01-01

    Porcine epidemic diarrhea virus (PEDV) is a positive-sense RNA virus that causes infectious gastroenteritis in pigs. Following a PED outbreak that occurred in China in 2010, the disease was identified for the first time in the United States in April 2013, and was reported in many other countries worldwide from 2013 to 2014. As a novel approach to elucidate the epidemiological relationship between PEDV strains, we explored their genome sequences to identify the motifs that were shared within related strains. Of PED outbreaks reported in many countries during 2013-2014, 119 PEDV strains in Japan, USA, Canada, Mexico, Germany, and Korea were selected and used in this study. We developed a motif mining program, which aimed to identify a specific region of the genome that was exclusively shared by a group of PEDV strains. Eight motifs were identified (M1-M8) and they were observed in 41, 9, 18, 6, 10, 14, 2, and 2 strains, respectively. Motifs M1-M6 were shared by strains from more than two countries, and seemed to originate from one PEDV strain, Indiana12.83/USA/2013, among the 119 strains studied. BLAST search for motifs M1-M6 revealed that M3-M5 were almost identical to the strain ZMDZY identified in 2011 in China, while M1 and M2 were similar to other Chinese strains isolated in 2011-2012. Consequently, the PED outbreaks in these six countries may be closely related, and multiple transmissions of PEDV strains between these countries may have occurred during 2013-2014. Although tools such as phylogenetic tree analysis with whole genome sequences are increasingly applied to reveal the connection between isolates, its interpretation is sometimes inconclusive. Application of motifs as a tool to examine the whole genome sequences of causative agents will be more objective and will be an explicit indicator of their relationship.

  13. Genomic Motifs as a Novel Indicator of the Relationship between Strains Isolated from the Epidemic of Porcine Epidemic Diarrhea in 2013-2014.

    Directory of Open Access Journals (Sweden)

    Takehisa Yamamoto

    Full Text Available Porcine epidemic diarrhea virus (PEDV is a positive-sense RNA virus that causes infectious gastroenteritis in pigs. Following a PED outbreak that occurred in China in 2010, the disease was identified for the first time in the United States in April 2013, and was reported in many other countries worldwide from 2013 to 2014. As a novel approach to elucidate the epidemiological relationship between PEDV strains, we explored their genome sequences to identify the motifs that were shared within related strains. Of PED outbreaks reported in many countries during 2013-2014, 119 PEDV strains in Japan, USA, Canada, Mexico, Germany, and Korea were selected and used in this study. We developed a motif mining program, which aimed to identify a specific region of the genome that was exclusively shared by a group of PEDV strains. Eight motifs were identified (M1-M8 and they were observed in 41, 9, 18, 6, 10, 14, 2, and 2 strains, respectively. Motifs M1-M6 were shared by strains from more than two countries, and seemed to originate from one PEDV strain, Indiana12.83/USA/2013, among the 119 strains studied. BLAST search for motifs M1-M6 revealed that M3-M5 were almost identical to the strain ZMDZY identified in 2011 in China, while M1 and M2 were similar to other Chinese strains isolated in 2011-2012. Consequently, the PED outbreaks in these six countries may be closely related, and multiple transmissions of PEDV strains between these countries may have occurred during 2013-2014. Although tools such as phylogenetic tree analysis with whole genome sequences are increasingly applied to reveal the connection between isolates, its interpretation is sometimes inconclusive. Application of motifs as a tool to examine the whole genome sequences of causative agents will be more objective and will be an explicit indicator of their relationship.

  14. Detecting remote sequence homology in disordered proteins: discovery of conserved motifs in the N-termini of Mononegavirales phosphoproteins.

    Directory of Open Access Journals (Sweden)

    David Karlin

    Full Text Available Paramyxovirinae are a large group of viruses that includes measles virus and parainfluenza viruses. The viral Phosphoprotein (P plays a central role in viral replication. It is composed of a highly variable, disordered N-terminus and a conserved C-terminus. A second viral protein alternatively expressed, the V protein, also contains the N-terminus of P, fused to a zinc finger. We suspected that, despite their high variability, the N-termini of P/V might all be homologous; however, using standard approaches, we could previously identify sequence conservation only in some Paramyxovirinae. We now compared the N-termini using sensitive sequence similarity search programs, able to detect residual similarities unnoticeable by conventional approaches. We discovered that all Paramyxovirinae share a short sequence motif in their first 40 amino acids, which we called soyuz1. Despite its short length (11-16aa, several arguments allow us to conclude that soyuz1 probably evolved by homologous descent, unlike linear motifs. Conservation across such evolutionary distances suggests that soyuz1 plays a crucial role and experimental data suggest that it binds the viral nucleoprotein to prevent its illegitimate self-assembly. In some Paramyxovirinae, the N-terminus of P/V contains a second motif, soyuz2, which might play a role in blocking interferon signaling. Finally, we discovered that the P of related Mononegavirales contain similarly overlooked motifs in their N-termini, and that their C-termini share a previously unnoticed structural similarity suggesting a common origin. Our results suggest several testable hypotheses regarding the replication of Mononegavirales and suggest that disordered regions with little overall sequence similarity, common in viral and eukaryotic proteins, might contain currently overlooked motifs (intermediate in length between linear motifs and disordered domains that could be detected simply by comparing orthologous proteins.

  15. An Inhibitory Motif on the 5'UTR of Several Rotavirus Genome Segments Affects Protein Expression and Reverse Genetics Strategies.

    Directory of Open Access Journals (Sweden)

    Giuditta De Lorenzo

    Full Text Available Rotavirus genome consists of eleven segments of dsRNA, each encoding one single protein. Viral mRNAs contain an open reading frame (ORF flanked by relatively short untranslated regions (UTRs, whose role in the viral cycle remains elusive. Here we investigated the role of 5'UTRs in T7 polymerase-driven cDNAs expression in uninfected cells. The 5'UTRs of eight genome segments (gs3, gs5-6, gs7-11 of the simian SA11 strain showed a strong inhibitory effect on the expression of viral proteins. Decreased protein expression was due to both compromised transcription and translation and was independent of the ORF and the 3'UTR sequences. Analysis of several mutants of the 21-nucleotide long 5'UTR of gs 11 defined an inhibitory motif (IM represented by its primary sequence rather than its secondary structure. IM was mapped to the 5' terminal 6-nucleotide long pyrimidine-rich tract 5'-GGY(U/AUY-3'. The 5' terminal position within the mRNA was shown to be essentially required, as inhibitory activity was lost when IM was moved to an internal position. We identified two mutations (insertion of a G upstream the 5'UTR and the U to A mutation of the fifth nucleotide of IM that render IM non-functional and increase the transcription and translation rate to levels that could considerably improve the efficiency of virus helper-free reverse genetics strategies.

  16. Genomic alterations detected by comparative genomic hybridization in ovarian endometriomas

    Directory of Open Access Journals (Sweden)

    L.C. Veiga-Castelli

    2010-08-01

    Full Text Available Endometriosis is a complex and multifactorial disease. Chromosomal imbalance screening in endometriotic tissue can be used to detect hot-spot regions in the search for a possible genetic marker for endometriosis. The objective of the present study was to detect chromosomal imbalances by comparative genomic hybridization (CGH in ectopic tissue samples from ovarian endometriomas and eutopic tissue from the same patients. We evaluated 10 ovarian endometriotic tissues and 10 eutopic endometrial tissues by metaphase CGH. CGH was prepared with normal and test DNA enzymatically digested, ligated to adaptors and amplified by PCR. A second PCR was performed for DNA labeling. Equal amounts of both normal and test-labeled DNA were hybridized in human normal metaphases. The Isis FISH Imaging System V 5.0 software was used for chromosome analysis. In both eutopic and ectopic groups, 4/10 samples presented chromosomal alterations, mainly chromosomal gains. CGH identified 11q12.3-q13.1, 17p11.1-p12, 17q25.3-qter, and 19p as critical regions. Genomic imbalances in 11q, 17p, 17q, and 19p were detected in normal eutopic and/or ectopic endometrium from women with ovarian endometriosis. These regions contain genes such as POLR2G, MXRA7 and UBA52 involved in biological processes that may lead to the establishment and maintenance of endometriotic implants. This genomic imbalance may affect genes in which dysregulation impacts both eutopic and ectopic endometrium.

  17. Genome-Wide Identification of Mitogen-Activated Protein Kinase Gene Family across Fungal Lineage Shows Presence of Novel and Diverse Activation Loop Motifs.

    Directory of Open Access Journals (Sweden)

    Tapan Kumar Mohanta

    Full Text Available The mitogen-activated protein kinase (MAPK is characterized by the presence of the T-E-Y, T-D-Y, and T-G-Y motifs in its activation loop region and plays a significant role in regulating diverse cellular responses in eukaryotic organisms. Availability of large-scale genome data in the fungal kingdom encouraged us to identify and analyse the fungal MAPK gene family consisting of 173 fungal species. The analysis of the MAPK gene family resulted in the discovery of several novel activation loop motifs (T-T-Y, T-I-Y, T-N-Y, T-H-Y, T-S-Y, K-G-Y, T-Q-Y, S-E-Y and S-D-Y in fungal MAPKs. The phylogenetic analysis suggests that fungal MAPKs are non-polymorphic, had evolved from their common ancestors around 1500 million years ago, and are distantly related to plant MAPKs. We are the first to report the presence of nine novel activation loop motifs in fungal MAPKs. The specificity of the activation loop motif plays a significant role in controlling different growth and stress related pathways in fungi. Hence, the presences of these nine novel activation loop motifs in fungi are of special interest.

  18. Detection of Non-Amplified Genomic DNA

    CERN Document Server

    Corradini, Roberto

    2012-01-01

    This book offers a state-of-the-art overview on non amplified DNA detection methods and provides chemists, biochemists, biotechnologists and material scientists with an introduction to these methods. In fact all these fields have dedicated resources to the problem of nucleic acid detection, each contributing with their own specific methods and concepts. This book will explain the basic principles of the different non amplified DNA detection methods available, highlighting their respective advantages and limitations. The importance of non-amplified DNA sequencing technologies will be also discussed. Non-amplified DNA detection can be achieved by adopting different techniques. Such techniques have allowed the commercialization of innovative platforms for DNA detection that are expected to break into the DNA diagnostics market. The enhanced sensitivity required for the detection of non amplified genomic DNA has prompted new strategies that can achieve ultrasensitivity by combining specific materials with specifi...

  19. Detection of genomic instability in hypospadias patients by random ...

    African Journals Online (AJOL)

    The primer detectability on genomic instability in 12 samples ranged from 25% with primer OPA-01 to 66% with OPA-08. Case 2 showed the highest genomic instability (80%). The lowest genomic instabamility was (10%) case 6. The results determined numbers of genomic instabilities among hypospadias patients.

  20. Genome-wide prediction and functional validation of promoter motifs regulating gene expression in spore and infection stages of Phytophthora infestans.

    Science.gov (United States)

    Roy, Sourav; Kagda, Meenakshi; Judelson, Howard S

    2013-03-01

    Most eukaryotic pathogens have complex life cycles in which gene expression networks orchestrate the formation of cells specialized for dissemination or host colonization. In the oomycete Phytophthora infestans, the potato late blight pathogen, major shifts in mRNA profiles during developmental transitions were identified using microarrays. We used those data with search algorithms to discover about 100 motifs that are over-represented in promoters of genes up-regulated in hyphae, sporangia, sporangia undergoing zoosporogenesis, swimming zoospores, or germinated cysts forming appressoria (infection structures). Most of the putative stage-specific transcription factor binding sites (TFBSs) thus identified had features typical of TFBSs such as position or orientation bias, palindromy, and conservation in related species. Each of six motifs tested in P. infestans transformants using the GUS reporter gene conferred the expected stage-specific expression pattern, and several were shown to bind nuclear proteins in gel-shift assays. Motifs linked to the appressoria-forming stage, including a functionally validated TFBS, were over-represented in promoters of genes encoding effectors and other pathogenesis-related proteins. To understand how promoter and genome architecture influence expression, we also mapped transcription patterns to the P. infestans genome assembly. Adjacent genes were not typically induced in the same stage, including genes transcribed in opposite directions from small intergenic regions, but co-regulated gene pairs occurred more than expected by random chance. These data help illuminate the processes regulating development and pathogenesis, and will enable future attempts to purify the cognate transcription factors.

  1. [Prediction of Promoter Motifs in Virophages].

    Science.gov (United States)

    Gong, Chaowen; Zhou, Xuewen; Pan, Yingjie; Wang, Yongjie

    2015-07-01

    Virophages have crucial roles in ecosystems and are the transport vectors of genetic materials. To shed light on regulation and control mechanisms in virophage--host systems as well as evolution between virophages and their hosts, the promoter motifs of virophages were predicted on the upstream regions of start codons using an analytical tool for prediction of promoter motifs: Multiple EM for Motif Elicitation. Seventeen potential promoter motifs were identified based on the E-value, location, number and length of promoters in genomes. Sputnik and zamilon motif 2 with AT-rich regions were distributed widely on genomes, suggesting that these motifs may be associated with regulation of the expression of various genes. Motifs containing the TCTA box were predicted to be late promoter motif in mavirus; motifs containing the ATCT box were the potential late promoter motif in the Ace Lake mavirus . AT-rich regions were identified on motif 2 in the Organic Lake virophage, motif 3 in Yellowstone Lake virophage (YSLV)1 and 2, motif 1 in YSLV3, and motif 1 and 2 in YSLV4, respectively. AT-rich regions were distributed widely on the genomes of virophages. All of these motifs may be promoter motifs of virophages. Our results provide insights into further exploration of temporal expression of genes in virophages as well as associations between virophages and giant viruses.

  2. Selection against spurious promoter motifs correlates withtranslational efficiency across bacteria

    Energy Technology Data Exchange (ETDEWEB)

    Froula, Jeffrey L.; Francino, M. Pilar

    2007-05-01

    Because binding of RNAP to misplaced sites could compromise the efficiency of transcription, natural selection for the optimization of gene expression should regulate the distribution of DNA motifs capable of RNAP-binding across the genome. Here we analyze the distribution of the -10 promoter motifs that bind the {sigma}{sup 70} subunit of RNAP in 42 bacterial genomes. We show that selection on these motifs operates across the genome, maintaining an over-representation of -10 motifs in regulatory sequences while eliminating them from the nonfunctional and, in most cases, from the protein coding regions. In some genomes, however, -10 sites are over-represented in the coding sequences; these sites could induce pauses effecting regulatory roles throughout the length of a transcriptional unit. For nonfunctional sequences, the extent of motif under-representation varies across genomes in a manner that broadly correlates with the number of tRNA genes, a good indicator of translational speed and growth rate. This suggests that minimizing the time invested in gene transcription is an important selective pressure against spurious binding. However, selection against spurious binding is detectable in the reduced genomes of host-restricted bacteria that grow at slow rates, indicating that components of efficiency other than speed may also be important. Minimizing the number of RNAP molecules per cell required for transcription, and the corresponding energetic expense, may be most relevant in slow growers. These results indicate that genome-level properties affecting the efficiency of transcription and translation can respond in an integrated manner to optimize gene expression. The detection of selection against promoter motifs in nonfunctional regions also implies that no sequence may evolve free of selective constraints, at least in the relatively small and unstructured genomes of bacteria.

  3. Predicting tissue specific cis-regulatory modules in the human genome using pairs of co-occurring motifs

    Directory of Open Access Journals (Sweden)

    Girgis Hani Z

    2012-02-01

    Full Text Available Abstract Background Researchers seeking to unlock the genetic basis of human physiology and diseases have been studying gene transcription regulation. The temporal and spatial patterns of gene expression are controlled by mainly non-coding elements known as cis-regulatory modules (CRMs and epigenetic factors. CRMs modulating related genes share the regulatory signature which consists of transcription factor (TF binding sites (TFBSs. Identifying such CRMs is a challenging problem due to the prohibitive number of sequence sets that need to be analyzed. Results We formulated the challenge as a supervised classification problem even though experimentally validated CRMs were not required. Our efforts resulted in a software system named CrmMiner. The system mines for CRMs in the vicinity of related genes. CrmMiner requires two sets of sequences: a mixed set and a control set. Sequences in the vicinity of the related genes comprise the mixed set, whereas the control set includes random genomic sequences. CrmMiner assumes that a large percentage of the mixed set is made of background sequences that do not include CRMs. The system identifies pairs of closely located motifs representing vertebrate TFBSs that are enriched in the training mixed set consisting of 50% of the gene loci. In addition, CrmMiner selects a group of the enriched pairs to represent the tissue-specific regulatory signature. The mixed and the control sets are searched for candidate sequences that include any of the selected pairs. Next, an optimal Bayesian classifier is used to distinguish candidates found in the mixed set from their control counterparts. Our study proposes 62 tissue-specific regulatory signatures and putative CRMs for different human tissues and cell types. These signatures consist of assortments of ubiquitously expressed TFs and tissue-specific TFs. Under controlled settings, CrmMiner identified known CRMs in noisy sets up to 1:25 signal-to-noise ratio. CrmMiner was

  4. Predicting tissue specific cis-regulatory modules in the human genome using pairs of co-occurring motifs.

    Science.gov (United States)

    Girgis, Hani Z; Ovcharenko, Ivan

    2012-02-07

    Researchers seeking to unlock the genetic basis of human physiology and diseases have been studying gene transcription regulation. The temporal and spatial patterns of gene expression are controlled by mainly non-coding elements known as cis-regulatory modules (CRMs) and epigenetic factors. CRMs modulating related genes share the regulatory signature which consists of transcription factor (TF) binding sites (TFBSs). Identifying such CRMs is a challenging problem due to the prohibitive number of sequence sets that need to be analyzed. We formulated the challenge as a supervised classification problem even though experimentally validated CRMs were not required. Our efforts resulted in a software system named CrmMiner. The system mines for CRMs in the vicinity of related genes. CrmMiner requires two sets of sequences: a mixed set and a control set. Sequences in the vicinity of the related genes comprise the mixed set, whereas the control set includes random genomic sequences. CrmMiner assumes that a large percentage of the mixed set is made of background sequences that do not include CRMs. The system identifies pairs of closely located motifs representing vertebrate TFBSs that are enriched in the training mixed set consisting of 50% of the gene loci. In addition, CrmMiner selects a group of the enriched pairs to represent the tissue-specific regulatory signature. The mixed and the control sets are searched for candidate sequences that include any of the selected pairs. Next, an optimal Bayesian classifier is used to distinguish candidates found in the mixed set from their control counterparts. Our study proposes 62 tissue-specific regulatory signatures and putative CRMs for different human tissues and cell types. These signatures consist of assortments of ubiquitously expressed TFs and tissue-specific TFs. Under controlled settings, CrmMiner identified known CRMs in noisy sets up to 1:25 signal-to-noise ratio. CrmMiner was 21-75% more precise than a related CRM

  5. Visual compression of workflow visualizations with automated detection of macro motifs.

    Science.gov (United States)

    Maguire, Eamonn; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Davies, Jim; Chen, Min

    2013-12-01

    This paper is concerned with the creation of 'macros' in workflow visualization as a support tool to increase the efficiency of data curation tasks. We propose computation of candidate macros based on their usage in large collections of workflows in data repositories. We describe an efficient algorithm for extracting macro motifs from workflow graphs. We discovered that the state transition information, used to identify macro candidates, characterizes the structural pattern of the macro and can be harnessed as part of the visual design of the corresponding macro glyph. This facilitates partial automation and consistency in glyph design applicable to a large set of macro glyphs. We tested this approach against a repository of biological data holding some 9,670 workflows and found that the algorithmically generated candidate macros are in keeping with domain expert expectations.

  6. Molecular Detection, Phylogenetic Analysis, and Identification of Transcription Motifs in Feline Leukemia Virus from Naturally Infected Cats in Malaysia

    Directory of Open Access Journals (Sweden)

    Faruku Bande

    2014-01-01

    Full Text Available A nested PCR assay was used to determine the viral RNA and proviral DNA status of naturally infected cats. Selected samples that were FeLV-positive by PCR were subjected to sequencing, phylogenetic analysis, and motifs search. Of the 39 samples that were positive for FeLV p27 antigen, 87.2% (34/39 were confirmed positive with nested PCR. FeLV proviral DNA was detected in 38 (97.3% of p27-antigen negative samples. Malaysian FeLV isolates are found to be highly similar with a homology of 91% to 100%. Phylogenetic analysis revealed that Malaysian FeLV isolates divided into two clusters, with a majority (86.2% sharing similarity with FeLV-K01803 and fewer isolates (13.8% with FeLV-GM1 strain. Different enhancer motifs including NF-GMa, Krox-20/WT1I-del2, BAF1, AP-2, TBP, TFIIF-beta, TRF, and TFIID are found to occur either in single, duplicate, triplicate, or sets of 5 in different positions within the U3-LTR-gag region. The present result confirms the occurrence of FeLV viral RNA and provirus DNA in naturally infected cats. Malaysian FeLV isolates are highly similar, and a majority of them are closely related to a UK isolate. This study provides the first molecular based information on FeLV in Malaysia. Additionally, different enhancer motifs likely associated with FeLV related pathogenesis have been identified.

  7. On detecting selective sweeps using single genomes

    Directory of Open Access Journals (Sweden)

    Priyanka eSinha

    2011-12-01

    Full Text Available Identifying the genetic basis of human adaptation has remained a central focal point of modern population genetics. One major area of interest has been the use of polymorphism data to detect so-called 'footprints' of selective sweeps - patterns produced as a beneficial mutation arises and rapidly fixes in the population. Based on numerous simulation studies and power analyses, the necessary sample size for achieving appreciable power has been shown to vary from a few individuals to a few dozen, depending on the test statistic. And yet, the sequencing of multiple copies of a single region, or of multiple genomes as is now often the case, incurs considerable cost. Enard et al. (2010 have recently proposed a method to identify patterns of selective sweeps using a single genome - and apply this approach to human and non¬human primates (chimpanzee, orangutan and macaque. They employ essentially a modification of the Hudson, Kreitman and Aguade (HKA test - using heterozygous single nucleotide poly¬morphisms (SNPs from single individuals, and divergence data from two closely related spe¬cies (human-chimpanzee, human-orangutan and human-macaque. Given the potential importance of this finding, we here investigate the properties of this statistic. We demonstrate through simulation that this approach is neither robust to demography nor background selection; nor is it robust to variable recombination rates.

  8. Genome-wide prediction and functional validation of promoter motifs regulating gene expression in spore and infection stages of Phytophthora infestans.

    Directory of Open Access Journals (Sweden)

    Sourav Roy

    2013-03-01

    Full Text Available Most eukaryotic pathogens have complex life cycles in which gene expression networks orchestrate the formation of cells specialized for dissemination or host colonization. In the oomycete Phytophthora infestans, the potato late blight pathogen, major shifts in mRNA profiles during developmental transitions were identified using microarrays. We used those data with search algorithms to discover about 100 motifs that are over-represented in promoters of genes up-regulated in hyphae, sporangia, sporangia undergoing zoosporogenesis, swimming zoospores, or germinated cysts forming appressoria (infection structures. Most of the putative stage-specific transcription factor binding sites (TFBSs thus identified had features typical of TFBSs such as position or orientation bias, palindromy, and conservation in related species. Each of six motifs tested in P. infestans transformants using the GUS reporter gene conferred the expected stage-specific expression pattern, and several were shown to bind nuclear proteins in gel-shift assays. Motifs linked to the appressoria-forming stage, including a functionally validated TFBS, were over-represented in promoters of genes encoding effectors and other pathogenesis-related proteins. To understand how promoter and genome architecture influence expression, we also mapped transcription patterns to the P. infestans genome assembly. Adjacent genes were not typically induced in the same stage, including genes transcribed in opposite directions from small intergenic regions, but co-regulated gene pairs occurred more than expected by random chance. These data help illuminate the processes regulating development and pathogenesis, and will enable future attempts to purify the cognate transcription factors.

  9. Motif trie: An efficient text index for pattern discovery with don't cares

    DEFF Research Database (Denmark)

    Grossi, Roberto; Menconi, Giulia; Pisanti, Nadia

    2017-01-01

    We introduce the motif trie data structure, which has applications in pattern matching and discovery in genomic analysis, plagiarism detection, data mining, intrusion detection, spam fighting and time series analysis, to name a few. Here the extraction of recurring patterns in sequential and text......We introduce the motif trie data structure, which has applications in pattern matching and discovery in genomic analysis, plagiarism detection, data mining, intrusion detection, spam fighting and time series analysis, to name a few. Here the extraction of recurring patterns in sequential...

  10. Mitotic control of human papillomavirus genome-containing cells is regulated by the function of the PDZ-binding motif of the E6 oncoprotein

    Science.gov (United States)

    Marsh, Elizabeth K.; Delury, Craig P.; Davies, Nicholas J.; Weston, Christopher J.; Miah, Mohammed A.L.; Banks, Lawrence; Parish, Joanna L.

    2017-01-01

    The function of a conserved PDS95/DLG1/ZO1 (PDZ) binding motif (E6 PBM) at the C-termini of E6 oncoproteins of high-risk human papillomavirus (HPV) types contributes to the development of HPV-associated malignancies. Here, using a primary human keratinocyte-based model of the high-risk HPV18 life cycle, we identify a novel link between the E6 PBM and mitotic stability. In cultures containing a mutant genome in which the E6 PBM was deleted there was an increase in the frequency of abnormal mitoses, including multinucleation, compared to cells harboring the wild type HPV18 genome. The loss of the E6 PBM was associated with a significant increase in the frequency of mitotic spindle defects associated with anaphase and telophase. Furthermore, cells carrying this mutant genome had increased chromosome segregation defects and they also exhibited greater levels of genomic instability, as shown by an elevated level of centromere-positive micronuclei. In wild type HPV18 genome-containing organotypic cultures, the majority of mitotic cells reside in the suprabasal layers, in keeping with the hyperplastic morphology of the structures. However, in mutant genome-containing structures a greater proportion of mitotic cells were retained in the basal layer, which were often of undefined polarity, thus correlating with their reduced thickness. We conclude that the ability of E6 to target cellular PDZ proteins plays a critical role in maintaining mitotic stability of HPV infected cells, ensuring stable episome persistence and vegetative amplification. PMID:28061478

  11. Mitotic control of human papillomavirus genome-containing cells is regulated by the function of the PDZ-binding motif of the E6 oncoprotein.

    Science.gov (United States)

    Marsh, Elizabeth K; Delury, Craig P; Davies, Nicholas J; Weston, Christopher J; Miah, Mohammed A L; Banks, Lawrence; Parish, Joanna L; Higgs, Martin R; Roberts, Sally

    2017-03-21

    The function of a conserved PDS95/DLG1/ZO1 (PDZ) binding motif (E6 PBM) at the C-termini of E6 oncoproteins of high-risk human papillomavirus (HPV) types contributes to the development of HPV-associated malignancies. Here, using a primary human keratinocyte-based model of the high-risk HPV18 life cycle, we identify a novel link between the E6 PBM and mitotic stability. In cultures containing a mutant genome in which the E6 PBM was deleted there was an increase in the frequency of abnormal mitoses, including multinucleation, compared to cells harboring the wild type HPV18 genome. The loss of the E6 PBM was associated with a significant increase in the frequency of mitotic spindle defects associated with anaphase and telophase. Furthermore, cells carrying this mutant genome had increased chromosome segregation defects and they also exhibited greater levels of genomic instability, as shown by an elevated level of centromere-positive micronuclei. In wild type HPV18 genome-containing organotypic cultures, the majority of mitotic cells reside in the suprabasal layers, in keeping with the hyperplastic morphology of the structures. However, in mutant genome-containing structures a greater proportion of mitotic cells were retained in the basal layer, which were often of undefined polarity, thus correlating with their reduced thickness. We conclude that the ability of E6 to target cellular PDZ proteins plays a critical role in maintaining mitotic stability of HPV infected cells, ensuring stable episome persistence and vegetative amplification.

  12. Automatic annotation of protein motif function with Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  13. Detection of genomic instability in hypospadias patients by random ...

    African Journals Online (AJOL)

    DIRECTOR

    2011-05-16

    May 16, 2011 ... The results determined numbers of genomic instabilities among hypospadias patients. In addition, the RAPD-PCR technique is a powerful tool for detection of genomic instability in hypospadias patients. Further larger studies are needed, which include low and high grade of patients to: 1) Obtain RAPD ...

  14. Identification of a Gamma Interferon-Activated Inhibitor of Translation-Like RNA Motif at the 3′ End of the Transmissible Gastroenteritis Coronavirus Genome Modulating Innate Immune Response

    Science.gov (United States)

    Marquez-Jurado, Silvia; Nogales, Aitor; Zuñiga, Sonia; Almazán, Fernando

    2015-01-01

    ABSTRACT A 32-nucleotide (nt) RNA motif located at the 3′ end of the transmissible gastroenteritis coronavirus (TGEV) genome was found to specifically interact with the host proteins glutamyl-prolyl-tRNA synthetase (EPRS) and arginyl-tRNA synthetase (RRS). This RNA motif has high homology in sequence and secondary structure with the gamma interferon-activated inhibitor of translation (GAIT) element, which is located at the 3′ end of several mRNAs encoding proinflammatory proteins. The GAIT element is involved in the translation silencing of these mRNAs through its interaction with the GAIT complex (EPRS, heterogeneous nuclear ribonucleoprotein Q, ribosomal protein L13a, and glyceraldehyde 3-phosphate dehydrogenase) to favor the resolution of inflammation. Interestingly, we showed that the viral RNA motif bound the GAIT complex and inhibited the in vitro translation of a chimeric mRNA containing this RNA motif. To our knowledge, this is the first GAIT-like motif described in a positive RNA virus. To test the functional role of the GAIT-like RNA motif during TGEV infection, a recombinant coronavirus harboring mutations in this motif was engineered and characterized. Mutations of the GAIT-like RNA motif did not affect virus growth in cell cultures. However, an exacerbated innate immune response, mediated by the melanoma differentiation-associated gene 5 (MDA5) pathway, was observed in cells infected with the mutant virus compared with the response observed in cells infected with the parental virus. Furthermore, the mutant virus was more sensitive to beta interferon than the parental virus. All together, these data strongly suggested that the viral GAIT-like RNA motif modulates the host innate immune response. PMID:25759500

  15. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    Directory of Open Access Journals (Sweden)

    Saray Santamaría-Hernando

    Full Text Available Proteins of the animal heme peroxidase (ANP superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20, where it was found to be involved in Ca(2+ coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+ binding with a K(D of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821 is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of

  16. gmos: Rapid Detection of Genome Mosaicism over Short Evolutionary Distances.

    Directory of Open Access Journals (Sweden)

    Mirjana Domazet-Lošo

    Full Text Available Prokaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cannot map (align a query to subject genomes. To address this problem, we have developed gmos (Genome MOsaic Structure, a new program that determines the mosaic structure of query genomes when compared to a set of closely related subject genomes. The program first computes local alignments between query and subject genomes and then reconstructs the query mosaic structure by choosing the best local alignment for each query region. To accomplish the analysis quickly, the program mostly relies on pairwise alignments and constructs multiple sequence alignments over short overlapping subject regions only when necessary. This fine-tuned implementation achieves an efficiency comparable to an alignment-free tool. The program performs well for simulated and real data sets of closely related genomes and can be used for fast recombination detection; for instance, when a new prokaryotic pathogen is discovered. As an example, gmos was used to detect genome mosaicism in a pathogenic Enterococcus faecium strain compared to seven closely related genomes. The analysis took less than two minutes on a single 2.1 GHz processor. The output is available in fasta format and can be visualized using an accessory program, gmosDraw (freely available with gmos.

  17. Direct detection of methylation in genomic DNA

    NARCIS (Netherlands)

    Bart, A.; van Passel, M. W. J.; van Amsterdam, K.; van der Ende, A.

    2005-01-01

    The identification of methylated sites on bacterial genomic DNA would be a useful tool to study the major roles of DNA methylation in prokaryotes: distinction of self and nonself DNA, direction of post-replicative mismatch repair, control of DNA replication and cell cycle, and regulation of gene

  18. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

    Directory of Open Access Journals (Sweden)

    Lynch Michael

    2010-05-01

    Full Text Available Abstract Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1 shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2 are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3 reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  19. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium.

    Science.gov (United States)

    Catania, Francesco; Lynch, Michael

    2010-05-04

    In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  20. MHC motif viewer

    DEFF Research Database (Denmark)

    Rapin, Nicolas Philippe Jean-Pierre; Hoof, Ilka; Lund, Ole

    2008-01-01

    In vertebrates, the major histocompatibility complex (MHC) presents peptides to the immune system. In humans, MHCs are called human leukocyte antigens (HLAs), and some of the loci encoding them are the most polymorphic in the human genome. Different MHC molecules present different subsets...... of peptides, and knowledge of their binding specificities is important for understanding the differences in the immune response between individuals. Knowledge of motifs may be used to identify epitopes, to understand the MHC restriction of epitopes, and to compare the specificities of different MHC molecules....... Algorithms that predict which peptides MHC molecules bind have recently been developed and cover many different alleles, but the utility of these algorithms is hampered by the lack of tools for browsing and comparing the specificity of these molecules. We have, therefore, developed a web server, MHC motif...

  1. Detecting genomic regions associated with a disease using variability functions and Adjusted Rand Index

    Directory of Open Access Journals (Sweden)

    Makarenkov Vladimir

    2011-10-01

    Full Text Available Abstract Background The identification of functional regions contained in a given multiple sequence alignment constitutes one of the major challenges of comparative genomics. Several studies have focused on the identification of conserved regions and motifs. However, most of existing methods ignore the relationship between the functional genomic regions and the external evidence associated with the considered group of species (e.g., carcinogenicity of Human Papilloma Virus. In the past, we have proposed a method that takes into account the prior knowledge on an external evidence (e.g., carcinogenicity or invasivity of the considered organisms and identifies genomic regions related to a specific disease. Results and conclusion We present a new algorithm for detecting genomic regions that may be associated with a disease. Two new variability functions and a bipartition optimization procedure are described. We validate and weigh our results using the Adjusted Rand Index (ARI, and thus assess to what extent the selected regions are related to carcinogenicity, invasivity, or any other species classification, given as input. The predictive power of different hit region detection functions was assessed on synthetic and real data. Our simulation results suggest that there is no a single function that provides the best results in all practical situations (e.g., monophyletic or polyphyletic evolution, and positive or negative selection, and that at least three different functions might be useful. The proposed hit region identification functions that do not benefit from the prior knowledge (i.e., carcinogenicity or invasivity of the involved organisms can provide equivalent results than the existing functions that take advantage of such a prior knowledge. Using the new algorithm, we examined the Neisseria meningitidis FrpB gene product for invasivity and immunologic activity, and human papilloma virus (HPV E6 oncoprotein for carcinogenicity, and confirmed

  2. Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo

    Directory of Open Access Journals (Sweden)

    Gaul Ulrike

    2002-10-01

    Full Text Available Abstract Background Regulation of gene transcription is crucial for the function and development of all organisms. While gene prediction programs that identify protein coding sequence are used with remarkable success in the annotation of genomes, the development of computational methods to analyze noncoding regions and to delineate transcriptional control elements is still in its infancy. Results Here we present novel algorithms to detect cis-regulatory modules through genome wide scans for clusters of transcription factor binding sites using three levels of prior information. When binding sites for the factors are known, our statistical segmentation algorithm, Ahab, yields about 150 putative gap gene regulated modules, with no adjustable parameters other than a window size. If one or more related modules are known, but no binding sites, repeated motifs can be found by a customized Gibbs sampler and input to Ahab, to predict genes with similar regulation. Finally using only the genome, we developed a third algorithm, Argos, that counts and scores clusters of overrepresented motifs in a window of sequence. Argos recovers many of the known modules, upstream of the segmentation genes, with no training data. Conclusions We have demonstrated, in the case of body patterning in the Drosophila embryo, that our algorithms allow the genome-wide identification of regulatory modules. We believe that Ahab overcomes many problems of recent approaches and we estimated the false positive rate to be about 50%. Argos is the first successful attempt to predict regulatory modules using only the genome without training data. Complete results and module predictions across the Drosophila genome are available at http://uqbar.rockefeller.edu/~siggia/.

  3. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology

    DEFF Research Database (Denmark)

    Cao, Hongzhi; Hastie, Alex R.; Cao, Dandan

    2014-01-01

    BACKGROUND: Structural variants (SVs) are less common than single nucleotide polymorphisms and indels in the population, but collectively account for a significant fraction of genetic polymorphism and diseases. Base pair differences arising from SVs are on a much higher order (>100 fold) than point...... mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost......-effective genome mapping technology to comprehensively discover genome-wide SVs and characterize complex regions of the YH genome using long single molecules (>150 kb) in a global fashion. RESULTS: Utilizing nanochannel-based genome mapping technology, we obtained 708 insertions/deletions and 17 inversions larger...

  4. Genome-Enhanced Detection and Identification (GEDI of plant pathogens

    Directory of Open Access Journals (Sweden)

    Nicolas Feau

    2018-02-01

    Full Text Available Plant diseases caused by fungi and Oomycetes represent worldwide threats to crops and forest ecosystems. Effective prevention and appropriate management of emerging diseases rely on rapid detection and identification of the causal pathogens. The increase in genomic resources makes it possible to generate novel genome-enhanced DNA detection assays that can exploit whole genomes to discover candidate genes for pathogen detection. A pipeline was developed to identify genome regions that discriminate taxa or groups of taxa and can be converted into PCR assays. The modular pipeline is comprised of four components: (1 selection and genome sequencing of phylogenetically related taxa, (2 identification of clusters of orthologous genes, (3 elimination of false positives by filtering, and (4 assay design. This pipeline was applied to some of the most important plant pathogens across three broad taxonomic groups: Phytophthoras (Stramenopiles, Oomycota, Dothideomycetes (Fungi, Ascomycota and Pucciniales (Fungi, Basidiomycota. Comparison of 73 fungal and Oomycete genomes led the discovery of 5,939 gene clusters that were unique to the targeted taxa and an additional 535 that were common at higher taxonomic levels. Approximately 28% of the 299 tested were converted into qPCR assays that met our set of specificity criteria. This work demonstrates that a genome-wide approach can efficiently identify multiple taxon-specific genome regions that can be converted into highly specific PCR assays. The possibility to easily obtain multiple alternative regions to design highly specific qPCR assays should be of great help in tackling challenging cases for which higher taxon-resolution is needed.

  5. Directional genomic hybridization for chromosomal inversion discovery and detection.

    Science.gov (United States)

    Ray, F Andrew; Zimmerman, Erin; Robinson, Bruce; Cornforth, Michael N; Bedford, Joel S; Goodwin, Edwin H; Bailey, Susan M

    2013-04-01

    Chromosomal rearrangements are a source of structural variation within the genome that figure prominently in human disease, where the importance of translocations and deletions is well recognized. In principle, inversions-reversals in the orientation of DNA sequences within a chromosome-should have similar detrimental potential. However, the study of inversions has been hampered by traditional approaches used for their detection, which are not particularly robust. Even with significant advances in whole genome approaches, changes in the absolute orientation of DNA remain difficult to detect routinely. Consequently, our understanding of inversions is still surprisingly limited, as is our appreciation for their frequency and involvement in human disease. Here, we introduce the directional genomic hybridization methodology of chromatid painting-a whole new way of looking at structural features of the genome-that can be employed with high resolution on a cell-by-cell basis, and demonstrate its basic capabilities for genome-wide discovery and targeted detection of inversions. Bioinformatics enabled development of sequence- and strand-specific directional probe sets, which when coupled with single-stranded hybridization, greatly improved the resolution and ease of inversion detection. We highlight examples of the far-ranging applicability of this cytogenomics-based approach, which include confirmation of the alignment of the human genome database and evidence that individuals themselves share similar sequence directionality, as well as use in comparative and evolutionary studies for any species whose genome has been sequenced. In addition to applications related to basic mechanistic studies, the information obtainable with strand-specific hybridization strategies may ultimately enable novel gene discovery, thereby benefitting the diagnosis and treatment of a variety of human disease states and disorders including cancer, autism, and idiopathic infertility.

  6. Detecting individual ancestry in the human genome

    NARCIS (Netherlands)

    A. Wollstein (Andreas); O. Lao Grueso (Oscar)

    2015-01-01

    textabstractDetecting and quantifying the population substructure present in a sample of individuals are of main interest in the fields of genetic epidemiology, population genetics, and forensics among others. To date, several algorithms have been proposed for estimating the amount of genetic

  7. Lightning-fast genome variant detection with GROM.

    Science.gov (United States)

    Smith, Sean D; Kawash, Joseph K; Grigoriev, Andrey

    2017-10-01

    Current human whole genome sequencing projects produce massive amounts of data, often creating significant computational challenges. Different approaches have been developed for each type of genome variant and method of its detection, necessitating users to run multiple algorithms to find variants. We present Genome Rearrangement OmniMapper (GROM), a novel comprehensive variant detection algorithm accepting aligned read files as input and finding SNVs, indels, structural variants (SVs), and copy number variants (CNVs). We show that GROM outperforms state-of-the-art methods on 7 validated benchmarks using 2 whole genome sequencing (WGS) data sets. Additionally, GROM boasts lightning-fast run times, analyzing a 50× WGS human data set (NA12878) on commonly available computer hardware in 11 minutes, more than an order of magnitude (up to 72 times) faster than tools detecting a similar range of variants. Addressing the needs of big data analysis, GROM combines in 1 algorithm SNV, indel, SV, and CNV detection, providing superior speed, sensitivity, and precision. GROM is also able to detect CNVs, SNVs, and indels in non-paired-read WGS libraries, as well as SNVs and indels in whole exome or RNA sequencing data sets. © The Authors 2017. Published by Oxford University Press.

  8. Detection of evaluation bias caused by genomic preselection.

    Science.gov (United States)

    Tyrisevä, A-M; Mäntysaari, E A; Jakobsen, J; Aamand, G P; Dürr, J; Fikse, W F; Lidauer, M H

    2018-04-01

    The aim of this simulation study was to investigate whether it is possible to detect the effect of genomic preselection on Mendelian sampling (MS) means or variances obtained by the MS validation test. Genomic preselection of bull calves is 1 additional potential source of bias in international evaluations unless adequately accounted for in national evaluations. Selection creates no bias in traditional breeding value evaluation if the data of all animals are included. However, this is not the case with genomic preselection, as it excludes culled bulls. Genomic breeding values become biased if calculated using a multistep procedure instead of, for example, a single-step method. Currently, about 60% of the countries participating in international bull evaluations have already adopted genomic selection in their breeding schemes. The data sent for multiple across-country evaluation can, therefore, be very heterogeneous, and a proper validation method is needed to ensure a fair comparison of the bulls included in international genetic evaluations. To study the effect of genomic preselection, we generated a total of 50 replicates under control and genomic preselection schemes using the structures of the real data and pedigree from a medium-size cow population. A genetic trend of 15% of the genetic standard deviation was created for both schemes. In carrying out the analyses, we used 2 different heritabilities: 0.25 and 0.10. From the start of genomic preselection, all bulls were genomically preselected. Their MS deviations were inflated with a value corresponding to selection of the best 10% of genomically tested bull calves. For cows, the MS deviations were unaltered. The results revealed a clear underestimation of bulls' breeding values (BV) after genomic preselection started, as well as a notable deviation from zero both in true and estimated MS means. The software developed recently for the MS validation test already produces yearly MS means, and they can be used to

  9. Comparing genetic variants detected in the 1000 genomes project ...

    Indian Academy of Sciences (India)

    Comparing genetic variants detected in the 1000 genomes project with SNPs determined by the International HapMap Consortium ... for Toxicological Research, US Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA; Thomson Reuters, IP and Science, 22 Thomson Place, Boston, MA 02210, USA ...

  10. Comparing genetic variants detected in the 1000 genomes project ...

    Indian Academy of Sciences (India)

    Single-nucleotide polymorphisms (SNPs) determined based on SNP arrays from the international HapMap consortium (HapMap) and the genetic variants detected in the 1000 genomes project (1KGP) can serve as two references for genomewide association studies (GWAS). We conducted comparative analyses to provide ...

  11. rMotifGen: random motif generator for DNA and protein sequences

    Directory of Open Access Journals (Sweden)

    Hardin C Timothy

    2007-08-01

    Full Text Available Abstract Background Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detection algorithms on motifs with varying properties and levels of conservation, we have developed a computational tool, rMotifGen, with the sole purpose of generating a number of random DNA or protein sequences containing short sequence motifs. Each motif consensus can be user-defined, randomly generated, or created from a position-specific scoring matrix (PSSM. Insertions and mutations within these motifs are created according to user-defined parameters and substitution matrices. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms. Results Two implementations of rMotifGen have been created, one providing a graphical user interface (GUI for random motif construction, and the other serving as a command line interface. The second implementation has the added advantages of platform independence and being able to be called in a batch mode. rMotifGen was used to construct sample sets of sequences containing DNA motifs and amino acid motifs that were then tested against the Gibbs sampler and MEME packages. Conclusion rMotifGen provides an efficient and convenient method for creating random DNA or amino acid sequences with a variable number of motifs, where the instance of each motif can be incorporated using a position-specific scoring matrix (PSSM or by creating an instance mutated from its corresponding consensus using an evolutionary model based on substitution matrices. rMotifGen is freely available at: http://bioinformatics.louisville.edu/brg/rMotifGen/.

  12. A new approach for using genome scans to detect recent positive selection in the human genome.

    Directory of Open Access Journals (Sweden)

    Kun Tang

    2007-07-01

    Full Text Available Genome-wide scanning for signals of recent positive selection is essential for a comprehensive and systematic understanding of human adaptation. Here, we present a genomic survey of recent local selective sweeps, especially aimed at those nearly or recently completed. A novel approach was developed for such signals, based on contrasting the extended haplotype homozygosity (EHH profiles between populations. We applied this method to the genome single nucleotide polymorphism (SNP data of both the International HapMap Project and Perlegen Sciences, and detected widespread signals of recent local selection across the genome, consisting of both complete and partial sweeps. A challenging problem of genomic scans of recent positive selection is to clearly distinguish selection from neutral effects, given the high sensitivity of the test statistics to departures from neutral demographic assumptions and the lack of a single, accurate neutral model of human history. We therefore developed a new procedure that is robust across a wide range of demographic and ascertainment models, one that indicates that certain portions of the genome clearly depart from neutrality. Simulations of positive selection showed that our tests have high power towards strong selection sweeps that have undergone fixation. Gene ontology analysis of the candidate regions revealed several new functional groups that might help explain some important interpopulation differences in phenotypic traits.

  13. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

    Science.gov (United States)

    Chen, Qingyu; Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.

  14. Simultaneous detection and estimation of trait associations with genomic phenotypes.

    Science.gov (United States)

    Morrison, Jean; Simon, Noah; Witten, Daniela

    2017-01-01

    SummaryGenomic phenotypes, such as DNA methylation and chromatin accessibility, can be used to characterize the transcriptional and regulatory activity of DNA within a cell. Recent technological advances have made it possible to measure such phenotypes very densely. This density often results in spatial structure, in the sense that measurements at nearby sites are very similar. In this article, we consider the task of comparing genomic phenotypes across experimental conditions, cell types, or disease subgroups. We propose a new method, Joint Adaptive Differential Estimation (JADE), which leverages the spatial structure inherent to genomic phenotypes. JADE simultaneously estimates smooth underlying group average genomic phenotype profiles and detects regions in which the average profile differs between groups. We evaluate JADE's performance in several biologically plausible simulation settings. We also consider an application to the detection of regions with differential methylation between mature skeletal muscle cells, myotubes, and myoblasts. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. MotifLab: a tools and data integration workbench for motif discovery and regulatory sequence analysis.

    Science.gov (United States)

    Klepper, Kjetil; Drabløs, Finn

    2013-01-16

    Traditional methods for computational motif discovery often suffer from poor performance. In particular, methods that search for sequence matches to known binding motifs tend to predict many non-functional binding sites because they fail to take into consideration the biological state of the cell. In recent years, genome-wide studies have generated a lot of data that has the potential to improve our ability to identify functional motifs and binding sites, such as information about chromatin accessibility and epigenetic states in different cell types. However, it is not always trivial to make use of this data in combination with existing motif discovery tools, especially for researchers who are not skilled in bioinformatics programming. Here we present MotifLab, a general workbench for analysing regulatory sequence regions and discovering transcription factor binding sites and cis-regulatory modules. MotifLab supports comprehensive motif discovery and analysis by allowing users to integrate several popular motif discovery tools as well as different kinds of additional information, including phylogenetic conservation, epigenetic marks, DNase hypersensitive sites, ChIP-Seq data, positional binding preferences of transcription factors, transcription factor interactions and gene expression. MotifLab offers several data-processing operations that can be used to create, manipulate and analyse data objects, and complete analysis workflows can be constructed and automatically executed within MotifLab, including graphical presentation of the results. We have developed MotifLab as a flexible workbench for motif analysis in a genomic context. The flexibility and effectiveness of this workbench has been demonstrated on selected test cases, in particular two previously published benchmark data sets for single motifs and modules, and a realistic example of genes responding to treatment with forskolin. MotifLab is freely available at http://www.motiflab.org.

  16. Detection and characterization of small insertion and deletion genetic variants in modern layer chicken genomes.

    Science.gov (United States)

    Boschiero, Clarissa; Gheyas, Almas A; Ralph, Hannah K; Eory, Lel; Paton, Bob; Kuo, Richard; Fulton, Janet; Preisinger, Rudolf; Kaiser, Pete; Burt, David W

    2015-07-31

    Small insertions and deletions (InDels) constitute the second most abundant class of genetic variants and have been found to be associated with many traits and diseases. The present study reports on the detection and characterisation of about 883 K high quality InDels from the whole-genome analysis of several modern layer chicken lines from diverse breeds. To reduce the error rates seen in InDel detection, this study used the consensus set from two InDel-calling packages: SAMtools and Dindel, as well as stringent post-filtering criteria. By analysing sequence data from 163 chickens from 11 commercial and 5 experimental layer lines, this study detected about 883 K high quality consensus InDels with 93% validation rate and an average density of 0.78 InDels/kb over the genome. Certain chromosomes, viz, GGAZ, 16, 22 and 25 showed very low densities of InDels whereas the highest rate was observed on GGA6. In spite of the higher recombination rates on microchromosomes, the InDel density on these chromosomes was generally lower relative to macrochromosomes possibly due to their higher gene density. About 43-87% of the InDels were found to be fixed within each line. The majority of detected InDels (86%) were 1-5 bases and about 63% were non-repetitive in nature while the rest were tandem repeats of various motif types. Functional annotation identified 613 frameshift, 465 non-frameshift and 10 stop-gain/loss InDels. Apart from the frameshift and stopgain/loss InDels that are expected to affect the translation of protein sequences and their biological activity, 33% of the non-frameshift were predicted as evolutionary intolerant with potential impact on protein functions. Moreover, about 2.5% of the InDels coincided with the most-conserved elements previously mapped on the chicken genome and are likely to define functional elements. InDels potentially affecting protein function were found to be enriched for certain gene-classes e.g. those associated with cell proliferation

  17. Motif signatures of transcribed enhancers

    KAUST Repository

    Kleftogiannis, Dimitrios

    2017-09-14

    In mammalian cells, transcribed enhancers (TrEn) play important roles in the initiation of gene expression and maintenance of gene expression levels in spatiotemporal manner. One of the most challenging questions in biology today is how the genomic characteristics of enhancers relate to enhancer activities. This is particularly critical, as several recent studies have linked enhancer sequence motifs to specific functional roles. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers genomic code in a more systematic way. To address this problem, we developed a novel computational method, TELS, aimed at identifying predictive cell type/tissue specific motif signatures. We used TELS to compile a comprehensive catalog of motif signatures for all known TrEn identified by the FANTOM5 consortium across 112 human primary cells and tissues. Our results confirm that distinct cell type/tissue specific motif signatures characterize TrEn. These signatures allow discriminating successfully a) TrEn from random controls, proxy of non-enhancer activity, and b) cell type/tissue specific TrEn from enhancers expressed and transcribed in different cell types/tissues. TELS codes and datasets are publicly available at http://www.cbrc.kaust.edu.sa/TELS.

  18. Pengembangan Motif Batik Khas Bali

    Directory of Open Access Journals (Sweden)

    Irfa'ina Rohana Salma

    2016-04-01

    Full Text Available ABSTRAKIndustri batik berkembang pesat di Bali, namun motif-motif batiknya tidak mencerminkan identitas khas daerah. Oleh karena itu perlu diciptakan desain motif batik khas Bali yang sumber inspirasinya digali budaya dan alam Bali. Tujuan penelitian dan penciptaan seni ini adalah untuk menghasilkan motif batik yang mempunyai bentuk  unik dan karakteristik sehingga dapat mencerminkan budaya dan alam Bali. Metode yang digunakan yaitu pengumpulan data, perancangan motif, perwujudan menjadi batik, serta uji estetikanya. Dari penciptaan seni ini berhasil diciptakan 5 motif batik yaitu: (1 Motif Jepun Alit; (2 Motif Jepun Ageng; (3 Motif Sekar Jagad Bali; (4 Motif Teratai Banji; dan (5 Motif Poleng Biru. Berdasarkan hasil penilaian “Selera Estetika” diketahui bahwa motif yang paling banyak disukai adalah Motif Jepun Alit, Motif Sekar Jagad Bali,  dan Motif Teratai Banji. Kata kunci: Motif Jepun Alit, Motif Jepun Ageng, Motif Sekar Jagad Bali, Motif Teratai Banji, Motif Poleng Biru ABSTRACT Batik industry is growing rapidly in Bali, but its batik motifs do not reflect the typical regional identities. Therefore, it is necessary to create a distinctive design motif source of Bali excavated  from the repertoire of traditional Balinese arts and culture. The purpose of this research and its art creation is to produce batik motifs that have a unique shape and characteristics  to reflect the Balinese culture and natural surroundings. The method used by gathering and collecting data, designing motifs to  become the embodiment of batik. From the creation of this art had been created 5 motifs, namely: (1 Motif Jepun Alit; (2 Motif Jepun Ageng; (3 Motif Sekar Jagad Bali; (4 Motif Teratai Banji; and (5 Motif Poleng Biru. Based on the results of aesthetical assessment known that the most preferred motif are  Motif Jepun Alit, Motif Sekar Jagad Bali, and Motif Teratai Banji. Key words: Motif Jepun Alit, Motif Jepun Ageng, Motif Sekar Jagad Bali, Motif

  19. Detecting Genomic Signatures of Natural Selection with Principal Component Analysis: Application to the 1000 Genomes Data.

    Science.gov (United States)

    Duforet-Frebourg, Nicolas; Luu, Keurcien; Laval, Guillaume; Bazin, Eric; Blum, Michael G B

    2016-04-01

    To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis (PCA). We show that the common FST index of genetic differentiation between populations can be viewed as the proportion of variance explained by the principal components. Considering the correlations between genetic variants and each principal component provides a conceptual framework to detect genetic variants involved in local adaptation without any prior definition of populations. To validate the PCA-based approach, we consider the 1000 Genomes data (phase 1) considering 850 individuals coming from Africa, Asia, and Europe. The number of genetic variants is of the order of 36 millions obtained with a low-coverage sequencing depth (3×). The correlations between genetic variation and each principal component provide well-known targets for positive selection (EDAR, SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN, KCNMA, MYO5C) and noncoding RNAs. In addition to identifying genes involved in biological adaptation, we identify two biological pathways involved in polygenic adaptation that are related to the innate immune system (beta defensins) and to lipid metabolism (fatty acid omega oxidation). An additional analysis of European data shows that a genome scan based on PCA retrieves classical examples of local adaptation even when there are no well-defined populations. PCA-based statistics, implemented in the PCAdapt R package and the PCAdapt fast open-source software, retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially when defining populations is difficult. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  20. On detection and assessment of statistical significance of Genomic Islands

    Directory of Open Access Journals (Sweden)

    Chaudhuri Probal

    2008-04-01

    Full Text Available Abstract Background Many of the available methods for detecting Genomic Islands (GIs in prokaryotic genomes use markers such as transposons, proximal tRNAs, flanking repeats etc., or they use other supervised techniques requiring training datasets. Most of these methods are primarily based on the biases in GC content or codon and amino acid usage of the islands. However, these methods either do not use any formal statistical test of significance or use statistical tests for which the critical values and the P-values are not adequately justified. We propose a method, which is unsupervised in nature and uses Monte-Carlo statistical tests based on randomly selected segments of a chromosome. Such tests are supported by precise statistical distribution theory, and consequently, the resulting P-values are quite reliable for making the decision. Results Our algorithm (named Design-Island, an acronym for Detection of Statistically Significant Genomic Island runs in two phases. Some 'putative GIs' are identified in the first phase, and those are refined into smaller segments containing horizontally acquired genes in the refinement phase. This method is applied to Salmonella typhi CT18 genome leading to the discovery of several new pathogenicity, antibiotic resistance and metabolic islands that were missed by earlier methods. Many of these islands contain mobile genetic elements like phage-mediated genes, transposons, integrase and IS elements confirming their horizontal acquirement. Conclusion The proposed method is based on statistical tests supported by precise distribution theory and reliable P-values along with a technique for visualizing statistically significant islands. The performance of our method is better than many other well known methods in terms of their sensitivity and accuracy, and in terms of specificity, it is comparable to other methods.

  1. The detection of large deletions or duplications in genomic DNA.

    Science.gov (United States)

    Armour, J A L; Barton, D E; Cockburn, D J; Taylor, G R

    2002-11-01

    While methods for the detection of point mutations and small insertions or deletions in genomic DNA are well established, the detection of larger (>100 bp) genomic duplications or deletions can be more difficult. Most mutation scanning methods use PCR as a first step, but the subsequent analyses are usually qualitative rather than quantitative. Gene dosage methods based on PCR need to be quantitative (i.e., they should report molar quantities of starting material) or semi-quantitative (i.e., they should report gene dosage relative to an internal standard). Without some sort of quantitation, heterozygous deletions and duplications may be overlooked and therefore be under-ascertained. Gene dosage methods provide the additional benefit of reporting allele drop-out in the PCR. This could impact on SNP surveys, where large-scale genotyping may miss null alleles. Here we review recent developments in techniques for the detection of this type of mutation and compare their relative strengths and weaknesses. We emphasize that comprehensive mutation analysis should include scanning for large insertions and deletions and duplications. Copyright 2002 Wiley-Liss, Inc.

  2. Ubiquitous presence of the hammerhead ribozyme motif along the tree of life

    Science.gov (United States)

    de la Peña, Marcos; García-Robles, Inmaculada

    2010-01-01

    Examples of small self-cleaving RNAs embedded in noncoding regions already have been found to be involved in the control of gene expression, although their origin remains uncertain. In this work, we show the widespread occurrence of the hammerhead ribozyme (HHR) motif among genomes from the Bacteria, Chromalveolata, Plantae, and Metazoa kingdoms. Intergenic HHRs were detected in three different bacterial genomes, whereas metagenomic data from Galapagos Islands showed the occurrence of similar ribozymes that could be regarded as direct relics from the RNA world. Among eukaryotes, HHRs were detected in the genomes of three water molds as well as 20 plant species, ranging from unicellular algae to vascular plants. These HHRs were very similar to those previously described in small RNA plant pathogens and, in some cases, appeared as close tandem repetitions. A parallel situation of tandemly repeated HHR motifs was also detected in the genomes of lower metazoans from cnidarians to invertebrates, with special emphasis among hematophagous and parasitic organisms. Altogether, these findings unveil the HHR as a widespread motif in DNA genomes, which would be involved in new forms of retrotransposable elements. PMID:20705646

  3. Comparative Genomics

    Indian Academy of Sciences (India)

    structions of the tree of life, drug discovery programs, func- tion predictions of hypothetical proteins and genes, regula- tory motifs and other non-coding DNA motifs, and genome ... expertise in assembling sequences. Beginning with the complete genome sequence of the bacterial pathogen Haemophilus influenzae that was ...

  4. CD8+ T cells with characteristic T cell receptor beta motif are detected in blood and expanded in synovial fluid of ankylosing spondylitis patients.

    Science.gov (United States)

    Komech, Ekaterina A; Pogorelyy, Mikhail V; Egorov, Evgeniy S; Britanova, Olga V; Rebrikov, Denis V; Bochkova, Anna G; Shmidt, Evgeniya I; Shostak, Nadejda A; Shugay, Mikhail; Lukyanov, Sergey; Mamedov, Ilgar Z; Lebedev, Yuriy B; Chudakov, Dmitriy M; Zvyagin, Ivan V

    2018-02-22

    The risk of AS is associated with genomic variants related to antigen presentation and specific cytokine signalling pathways, suggesting the involvement of cellular immunity in disease initiation/progression. The aim of the present study was to explore the repertoire of TCR sequences in healthy donors and AS patients to uncover AS-linked TCR variants. Using quantitative molecular-barcoded 5'-RACE, we performed deep TCR β repertoire profiling of peripheral blood (PB) and SF samples for 25 AS patients and 108 healthy donors. AS-linked TCR variants were identified using a new computational approach that relies on a probabilistic model of the VDJ rearrangement process. Using the donor-agnostic probabilistic model, we reveal a TCR β motif characteristic for PB of AS patients, represented by eight highly homologous amino acid sequence variants. Some of these variants were previously reported in SF and PB of patients with ReA and in PB of AS patients. We demonstrate that identified AS-linked clones have a CD8+ phenotype, present at relatively low frequencies in PB, and are significantly enriched in matched SF samples of AS patients. Our results suggest the involvement of a particular antigen-specific subset of CD8+ T cells in AS pathogenesis, confirming and expanding earlier findings. The high similarity of the clonotypes with the ones found in ReA implies common mechanisms for the initiation of the diseases.

  5. Genomic resources and their influence on the detection of the signal of positive selection in genome scans.

    Science.gov (United States)

    Manel, S; Perrier, C; Pratlong, M; Abi-Rached, L; Paganini, J; Pontarotti, P; Aurelle, D

    2016-01-01

    Genome scans represent powerful approaches to investigate the action of natural selection on the genetic variation of natural populations and to better understand local adaptation. This is very useful, for example, in the field of conservation biology and evolutionary biology. Thanks to Next Generation Sequencing, genomic resources are growing exponentially, improving genome scan analyses in non-model species. Thousands of SNPs called using Reduced Representation Sequencing are increasingly used in genome scans. Besides, genome sequences are also becoming increasingly available, allowing better processing of short-read data, offering physical localization of variants, and improving haplotype reconstruction and data imputation. Ultimately, genome sequences are also becoming the raw material for selection inferences. Here, we discuss how the increasing availability of such genomic resources, notably genome sequences, influences the detection of signals of selection. Mainly, increasing data density and having the information of physical linkage data expand genome scans by (i) improving the overall quality of the data, (ii) helping the reconstruction of demographic history for the population studied to decrease false-positive rates and (iii) improving the statistical power of methods to detect the signal of selection. Of particular importance, the availability of a high-quality reference genome can improve the detection of the signal of selection by (i) allowing matching the potential candidate loci to linked coding regions under selection, (ii) rapidly moving the investigation to the gene and function and (iii) ensuring that the highly variable regions of the genomes that include functional genes are also investigated. For all those reasons, using reference genomes in genome scan analyses is highly recommended. © 2015 John Wiley & Sons Ltd.

  6. Detection of genomic rearrangements in cucumber using genomecmp software

    Science.gov (United States)

    Kulawik, Maciej; Pawełkowicz, Magdalena Ewa; Wojcieszek, Michał; PlÄ der, Wojciech; Nowak, Robert M.

    2017-08-01

    Comparative genomic by increasing information about the genomes sequences available in the databases is a rapidly evolving science. A simple comparison of the general features of genomes such as genome size, number of genes, and chromosome number presents an entry point into comparative genomic analysis. Here we present the utility of the new tool genomecmp for finding rearrangements across the compared sequences and applications in plant comparative genomics.

  7. Discovery of stress responsive DNA regulatory motifs in Arabidopsis.

    Science.gov (United States)

    Ma, Shisong; Bachan, Shawn; Porto, Matthew; Bohnert, Hans J; Snyder, Michael; Dinesh-Kumar, Savithramma P

    2012-01-01

    The discovery of DNA regulatory motifs in the sequenced genomes using computational methods remains challenging. Here, we present MotifIndexer--a comprehensive strategy for de novo identification of DNA regulatory motifs at a genome level. Using word-counting methods, we indexed the existence of every 8-mer oligo composed of bases A, C, G, T, r, y, s, w, m, k, n or 12-mer oligo composed of A, C, G, T, n, in the promoters of all predicted genes of Arabidopsis thaliana genome and of selected stress-induced co-expressed genes. From this analysis, we identified number of over-represented motifs. Among these, major critical motifs were identified using a position filter. We used a model based on uniform distribution and the z-scores derived from this model to describe position bias. Interestingly, many motifs showed position bias towards the transcription start site. We extended this model to show biased distribution of motifs in the genomes of both A. thaliana and rice. We also used MotifIndexer to identify conserved motifs in co-expressed gene groups from two Arabidopsis species, A. thaliana and A. lyrata. This new comparative genomics method does not depend on alignments of homologous gene promoter sequences.

  8. PhyloGibbs-MP: module prediction and discriminative motif-finding by Gibbs sampling.

    Directory of Open Access Journals (Sweden)

    Rahul Siddharthan

    2008-08-01

    Full Text Available PhyloGibbs, our recent Gibbs-sampling motif-finder, takes phylogeny into account in detecting binding sites for transcription factors in DNA and assigns posterior probabilities to its predictions obtained by sampling the entire configuration space. Here, in an extension called PhyloGibbs-MP, we widen the scope of the program, addressing two major problems in computational regulatory genomics. First, PhyloGibbs-MP can localise predictions to small, undetermined regions of a large input sequence, thus effectively predicting cis-regulatory modules (CRMs ab initio while simultaneously predicting binding sites in those modules-tasks that are usually done by two separate programs. PhyloGibbs-MP's performance at such ab initio CRM prediction is comparable with or superior to dedicated module-prediction software that use prior knowledge of previously characterised transcription factors. Second, PhyloGibbs-MP can predict motifs that differentiate between two (or more different groups of regulatory regions, that is, motifs that occur preferentially in one group over the others. While other "discriminative motif-finders" have been published in the literature, PhyloGibbs-MP's implementation has some unique features and flexibility. Benchmarks on synthetic and actual genomic data show that this algorithm is successful at enhancing predictions of differentiating sites and suppressing predictions of common sites and compares with or outperforms other discriminative motif-finders on actual genomic data. Additional enhancements include significant performance and speed improvements, the ability to use "informative priors" on known transcription factors, and the ability to output annotations in a format that can be visualised with the Generic Genome Browser. In stand-alone motif-finding, PhyloGibbs-MP remains competitive, outperforming PhyloGibbs-1.0 and other programs on benchmark data.

  9. PhyloGibbs-MP: module prediction and discriminative motif-finding by Gibbs sampling.

    Science.gov (United States)

    Siddharthan, Rahul

    2008-08-29

    PhyloGibbs, our recent Gibbs-sampling motif-finder, takes phylogeny into account in detecting binding sites for transcription factors in DNA and assigns posterior probabilities to its predictions obtained by sampling the entire configuration space. Here, in an extension called PhyloGibbs-MP, we widen the scope of the program, addressing two major problems in computational regulatory genomics. First, PhyloGibbs-MP can localise predictions to small, undetermined regions of a large input sequence, thus effectively predicting cis-regulatory modules (CRMs) ab initio while simultaneously predicting binding sites in those modules-tasks that are usually done by two separate programs. PhyloGibbs-MP's performance at such ab initio CRM prediction is comparable with or superior to dedicated module-prediction software that use prior knowledge of previously characterised transcription factors. Second, PhyloGibbs-MP can predict motifs that differentiate between two (or more) different groups of regulatory regions, that is, motifs that occur preferentially in one group over the others. While other "discriminative motif-finders" have been published in the literature, PhyloGibbs-MP's implementation has some unique features and flexibility. Benchmarks on synthetic and actual genomic data show that this algorithm is successful at enhancing predictions of differentiating sites and suppressing predictions of common sites and compares with or outperforms other discriminative motif-finders on actual genomic data. Additional enhancements include significant performance and speed improvements, the ability to use "informative priors" on known transcription factors, and the ability to output annotations in a format that can be visualised with the Generic Genome Browser. In stand-alone motif-finding, PhyloGibbs-MP remains competitive, outperforming PhyloGibbs-1.0 and other programs on benchmark data.

  10. FastMotif: spectral sequence motif discovery.

    Science.gov (United States)

    Colombo, Nicoló; Vlassis, Nikos

    2015-08-15

    Sequence discovery tools play a central role in several fields of computational biology. In the framework of Transcription Factor binding studies, most of the existing motif finding algorithms are computationally demanding, and they may not be able to support the increasingly large datasets produced by modern high-throughput sequencing technologies. We present FastMotif, a new motif discovery algorithm that is built on a recent machine learning technique referred to as Method of Moments. Based on spectral decompositions, our method is robust to model misspecifications and is not prone to locally optimal solutions. We obtain an algorithm that is extremely fast and designed for the analysis of big sequencing data. On HT-Selex data, FastMotif extracts motif profiles that match those computed by various state-of-the-art algorithms, but one order of magnitude faster. We provide a theoretical and numerical analysis of the algorithm's robustness and discuss its sensitivity with respect to the free parameters. The Matlab code of FastMotif is available from http://lcsb-portal.uni.lu/bioinformatics. vlassis@adobe.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. Detection of combined genomic variants in a Jordanian family with ...

    Indian Academy of Sciences (India)

    TSHR) gene was performed by direct sequencing of genomic DNA extracted from peripheral blood leukocytes of all family members. The sequence analysis of all TSHR gene exons and intron borders revealed two genomic variants. The first ...

  12. Bayesian centroid estimation for motif discovery.

    Science.gov (United States)

    Carvalho, Luis

    2013-01-01

    Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  13. Bayesian centroid estimation for motif discovery.

    Directory of Open Access Journals (Sweden)

    Luis Carvalho

    Full Text Available Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  14. Motif content comparison between monocot and dicot species

    Directory of Open Access Journals (Sweden)

    Matyas Cserhati

    2015-03-01

    Full Text Available While a number of DNA sequence motifs have been functionally characterized, the full repertoire of motifs in an organism (the motifome is yet to be characterized. The present study wishes to widen the scope of motif content analysis in different monocot and dicot species that include both rice species, Brachypodium, corn, wheat as monocots and Arabidopsis, Lotus japonica, Medicago truncatula, and Populus tremula as dicots. All possible existing motifs were analyzed in different regions of genomes such as were found in different sets of sequences in these species: the whole genome, core proximal and distal promoters, 5′ and 3′ UTRs, and the 1st introns. Due to the increased number of species involved in this study compared to previous works, species relationships were analyzed based on the similarity of common motif content. Certain secondary structure elements were inferred in the genomes of these species as well as new unknown motifs. The distribution of 20 motifs common to the studied species were found to have a significantly larger occurrence within the promoters and 3′ UTRs of genes, both being regulatory regions. Motifs common to the promoter regions of japonica rice, Brachypodium, and corn were also found in a number of orthologous and paralogous genes. Some of our motifs were found to be complementary to miRNA elements in Brachypodium distachyon and japonica rice.

  15. Genome-wide identification of basic helix-loop-helix and NF-1 motifs underlying GR binding sites in male rat hippocampus

    DEFF Research Database (Denmark)

    Pooley, John R.; Flynn, Ben P.; Grøntved, Lars

    2017-01-01

    in hippocampal GR function. Our findings imply a dosedependent and context-independent action of GRs in the hippocampus. Alterations in the expression or activity of NF-1/basic helix-loop-helix factors may play an as yet undetermined role in glucocorticoid-related disease susceptibility and outcome by altering......Glucocorticoids regulate hippocampal function in part by modulating gene expression through the glucocorticoid receptor (GR). GR binding is highly cell type specific, directed to accessible chromatin regions established during tissue differentiation. Distinct classes of GR binding sites...... linked to structural and organizational roles, an absence of major tethering partners for GRs, and little or no evidence for binding at negative glucocorticoid response elements. A basic helix-loop-helix motif closely resembling a NeuroD1 or Olig2 binding site was found underlying a subset of GR binding...

  16. [Cloning and sequence analysis of the DHBV genome of the brown ducks in Guilin region and establishment of the quantitative method for detecting DHBV].

    Science.gov (United States)

    Su, He-Ling; Huang, Ri-Dong; He, Song-Qing; Xu, Qing; Zhu, Hua; Mo, Zhi-Jing; Liu, Qing-Bo; Liu, Yong-Ming

    2013-03-01

    Brown ducks carrying DHBV were widely used as hepatitis B animal model in the research of the activity and toxicity of anti-HBV dugs. Studies showed that the ratio of DHBV carriers in the brown ducks in Guilin region was relatively high. Nevertheless, the characters of the DHBV genome of Guilin brown duck remain unknown. Here we report the cloning of the genome of Guilin brown duck DHBV and the sequence analysis of the genome. The full length of the DHBV genome of Guilin brown duck was 3 027bp. Analysis using ORF finder found that there was an ORF for an unknown peptide other than S-ORF, PORF and C-ORF in the genome of the DHBV. Vector NTI 8. 0 analysis revealed that the unknown peptide contained a motif which binded to HLA * 0201. Aligning with the DHBV sequences from different countries and regions indicated that there were no obvious differences of regional distribution among the sequences. A fluorescence quantitative PCR for detecting DHBV was establishment based on the recombinant plasmid pGEM-DHBV-S constructed. This study laid the groundwork for using Guilin brown duck as a hepatitis B animal model.

  17. Fingerprint motifs of phytases | Fan | African Journal of Biotechnology

    African Journals Online (AJOL)

    Among the total of potential 173 phytases gained in 11 plant genomes through MAST, PAPhys are the major phytases, and HAPhys are the minor, and other phytase groups are not found in planta. Keywords: Phytase, fingerprint motif, multiple EM for motif elicitation (MEME), MAST African Journal of Biotechnology Vol.

  18. A structured RNA motif is involved in correct placement of the tRNA(3)(Lys) primer onto the human immunodeficiency virus genome

    NARCIS (Netherlands)

    Beerens, N.; Klaver, B.; Berkhout, B.

    2000-01-01

    Human immunodeficiency virus type 1 (HIV-1) reverse transcription is primed by the cellular tRNA(3)(Lys) molecule that binds with its 3'-terminal 18 nucleotides to the fully complementary primer-binding site (PBS) on the viral RNA genome. Besides this complementarity, annealing of the primer may be

  19. Detection of combined genomic variants in a Jordanian family with ...

    Indian Academy of Sciences (India)

    Keywords. familial hyperthyroidism; TSHR gene; genomic variants; TSAB; intron; human genetics. ... The sequence analysis of all TSHR gene exons and intron borders revealed two genomic variants. ... This is the first Jordanian family with familial non-autoimmune hyperthyroidism, with mutations affecting the TSHR gene.

  20. Detection of combined genomic variants in a Jordanian family with ...

    Indian Academy of Sciences (India)

    for germline mutations in thyroid stimulating hormone (TSH) receptor (TSHR) gene was performed by direct sequencing of genomic DNA extracted from peripheral blood leukocytes of all family members. The sequence analysis of all TSHR gene exons and intron borders revealed two genomic variants. The first was a single ...

  1. The genome of the THE I human transposable repetitive elements is composed of a basic motif homologous to an ancestral immunoglobulin gene sequence.

    OpenAIRE

    Hakim, I; Amariglio, N; Grossman, Z; Simoni-Brok, F; Ohno, S; Rechavi, G

    1994-01-01

    Amplification of rearranged human immunoglobulin heavy-chain genes using the polymerase chain reaction resulted unexpectedly in the amplification of human transposable repetitive element genomes. These were identified as members of the THE I (transposon-like human element I) transposable element family. Analysis of the THE I sequences revealed the presence of several copies of the ancestral building block described > 10 years ago by Ohno and coworkers as the primordial immunoglobulin sequence...

  2. PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes.

    Science.gov (United States)

    Paul, Sandip; Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V; Chattopadhyay, Sujay

    2015-12-01

    A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing the pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen - a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for a species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars - Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. PanCoreGen – profiling, detecting, annotating protein-coding genes in microbial genomes

    Science.gov (United States)

    Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V.

    2015-01-01

    A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen – a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars – Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. PMID:26456591

  4. Practical Approaches for Detecting Selection in Microbial Genomes

    OpenAIRE

    Hedge, Jessica; Wilson, Daniel J.

    2016-01-01

    Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers th...

  5. DMINDA: an integrated web server for DNA motif identification and analyses.

    Science.gov (United States)

    Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

    2014-07-01

    DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Comparison of chromosomal and array-based comparative genomic hybridization for the detection of genomic imbalances in primary prostate carcinomas

    Directory of Open Access Journals (Sweden)

    Berg Marianne

    2006-09-01

    Full Text Available Abstract Background In order to gain new insights into the molecular mechanisms involved in prostate cancer, we performed array-based comparative genomic hybridization (aCGH on a series of 46 primary prostate carcinomas using a 1 Mbp whole-genome coverage platform. As chromosomal comparative genomic hybridization (cCGH data was available for these samples, we compared the sensitivity and overall concordance of the two methodologies, and used the combined information to infer the best of three different aCGH scoring approaches. Results Our data demonstrate that the reliability of aCGH in the analysis of primary prostate carcinomas depends to some extent on the scoring approach used, with the breakpoint estimation method being the most sensitive and reliable. The pattern of copy number changes detected by aCGH was concordant with that of cCGH, but the higher resolution technique detected 2.7 times more aberrations and 15.2% more carcinomas with genomic imbalances. We additionally show that several aberrations were consistently overlooked using cCGH, such as small deletions at 5q, 6q, 12p, and 17p. The latter were validated by fluorescence in situ hybridization targeting TP53, although only one carcinoma harbored a point mutation in this gene. Strikingly, homozygous deletions at 10q23.31, encompassing the PTEN locus, were seen in 58% of the cases with 10q loss. Conclusion We conclude that aCGH can significantly improve the detection of genomic aberrations in cancer cells as compared to previously established whole-genome methodologies, although contamination with normal cells may influence the sensitivity and specificity of some scoring approaches. Our work delineated recurrent copy number changes and revealed novel amplified loci and frequent homozygous deletions in primary prostate carcinomas, which may guide future work aimed at identifying the relevant target genes. In particular, biallelic loss seems to be a frequent mechanism of inactivation

  7. Detection of genome-wide polymorphisms in the AT-rich Plasmodium falciparum genome using a high-density microarray

    Directory of Open Access Journals (Sweden)

    Huyen Yentram

    2008-08-01

    Full Text Available Abstract Background Genetic mapping is a powerful method to identify mutations that cause drug resistance and other phenotypic changes in the human malaria parasite Plasmodium falciparum. For efficient mapping of a target gene, it is often necessary to genotype a large number of polymorphic markers. Currently, a community effort is underway to collect single nucleotide polymorphisms (SNP from the parasite genome. Here we evaluate polymorphism detection accuracy of a high-density 'tiling' microarray with 2.56 million probes by comparing single feature polymorphisms (SFP calls from the microarray with known SNP among parasite isolates. Results We found that probe GC content, SNP position in a probe, probe coverage, and signal ratio cutoff values were important factors for accurate detection of SFP in the parasite genome. We established a set of SFP calling parameters that could predict mSFP (SFP called by multiple overlapping probes with high accuracy (≥ 94% and identified 121,087 mSFP genome-wide from five parasite isolates including 40,354 unique mSFP (excluding those from multi-gene families and ~18,000 new mSFP, producing a genetic map with an average of one unique mSFP per 570 bp. Genomic copy number variation (CNV among the parasites was also cataloged and compared. Conclusion A large number of mSFP were discovered from the P. falciparum genome using a high-density microarray, most of which were in clusters of highly polymorphic genes at chromosome ends. Our method for accurate mSFP detection and the mSFP identified will greatly facilitate large-scale studies of genome variation in the P. falciparum parasite and provide useful resources for mapping important parasite traits.

  8. Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing

    DEFF Research Database (Denmark)

    Hou, Yong; Wu, Kui; Shi, Xulian

    2015-01-01

    BACKGROUND: Single-cell resequencing (SCRS) provides many biomedical advances in variations detection at the single-cell level, but it currently relies on whole genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate-oligonucleoti......BACKGROUND: Single-cell resequencing (SCRS) provides many biomedical advances in variations detection at the single-cell level, but it currently relies on whole genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate...

  9. Genome-wide detection of selection and other evolutionary forces

    DEFF Research Database (Denmark)

    Xu, Zhuofei; Zhou, Rui

    2015-01-01

    As is well known, pathogenic microbes evolve rapidly to escape from the host immune system and antibiotics. Genetic variations among microbial populations occur frequently during the long-term pathogen–host evolutionary arms race, and individual mutation beneficial for the fitness can be fixed...... to scan genome-wide alignments for evidence of positive Darwinian selection, recombination, and other evolutionary forces operating on the coding regions. In this chapter, we describe an integrative analysis pipeline and its application to tracking featured evolutionary trajectories on the genome...... of an animal pathogen. The evolutionary analysis of the protein-coding part of the genomes will provide a wide spectrum oof genetic variations that play potential roles in adaptive evolution of bacteria....

  10. Practical Approaches for Detecting Selection in Microbial Genomes.

    Science.gov (United States)

    Hedge, Jessica; Wilson, Daniel J

    2016-02-01

    Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers through the fundamentals underpinning popular methods for measuring selection in pathogens. These methods are transferable to a wide variety of organisms, and the exercises provided are designed for researchers with any level of programming experience.

  11. Practical Approaches for Detecting Selection in Microbial Genomes.

    Directory of Open Access Journals (Sweden)

    Jessica Hedge

    2016-02-01

    Full Text Available Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers through the fundamentals underpinning popular methods for measuring selection in pathogens. These methods are transferable to a wide variety of organisms, and the exercises provided are designed for researchers with any level of programming experience.

  12. Genomic Approaches for Detection and Treatment of Breast Cancer

    Science.gov (United States)

    2007-07-01

    this enhanced dependency may be the near-tetraploid nature of the HCC1954 genome. Compared to the diploid HMECs, HCC9154 cells may rely more heavily on...the autoantibody project and work out conditions to express these protein fragments in bacteria in a high through-put fashion. We have made the

  13. Genome-Wide Detection and Analysis of Multifunctional Genes

    Science.gov (United States)

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-01-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  14. A G-C-rich palindromic structural motif and a stretch of single-stranded purines are required for optimal packaging of Mason-Pfizer monkey virus (MPMV) genomic RNA.

    Science.gov (United States)

    Jaballah, Soumeya Ali; Aktar, Suriya J; Ali, Jahabar; Phillip, Pretty Susan; Al Dhaheri, Noura Salem; Jabeen, Aayesha; Rizvi, Tahir A

    2010-09-03

    During retroviral RNA packaging, two copies of genomic RNA are preferentially packaged into the budding virus particles whereas the spliced viral RNAs and the cellular RNAs are excluded during this process. Specificity towards retroviral RNA packaging is dependent upon sequences at the 5' end of the viral genome, which at times extend into Gag sequences. It has earlier been suggested that the Mason-Pfizer monkey virus (MPMV) contains packaging sequences within the 5' untranslated region (UTR) and Gag. These studies have also suggested that the packaging determinants of MPMV that lie in the UTR are bipartite and are divided into two regions both upstream and downstream of the major splice donor. However, the precise boundaries of these discontinuous regions within the UTR and the role of the intervening sequences between these dipartite sequences towards MPMV packaging have not been investigated. Employing a combination of genetic and structural prediction analyses, we have shown that region "A", immediately downstream of the primer binding site, is composed of 50 nt, whereas region "B" is composed of the last 23 nt of UTR, and the intervening 55 nt between these two discontinuous regions do not contribute towards MPMV RNA packaging. In addition, we have identified a 14-nt G-C-rich palindromic sequence (with 100% autocomplementarity) within region A that has been predicted to fold into a structural motif and is essential for optimal MPMV RNA packaging. Furthermore, we have also identified a stretch of single-stranded purines (ssPurines) within the UTR and 8 nt of these ssPurines are duplicated in region B. The native ssPurines or its repeat in region B when predicted to refold as ssPurines has been shown to be essential for RNA packaging, possibly functioning as a potential nucleocapsid binding site. Findings from this study should enhance our understanding of the steps involved in MPMV replication including RNA encapsidation process. Copyright (c) 2010 Elsevier Ltd

  15. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    Science.gov (United States)

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Genome-wide expression profiling, in vivo DNA binding analysis, and probabilistic motif prediction reveal novel Abf1 target genes during fermentation, respiration, and sporulation in yeast.

    Science.gov (United States)

    Schlecht, Ulrich; Erb, Ionas; Demougin, Philippe; Robine, Nicolas; Borde, Valérie; van Nimwegen, Erik; Nicolas, Alain; Primig, Michael

    2008-05-01

    The autonomously replicating sequence binding factor 1 (Abf1) was initially identified as an essential DNA replication factor and later shown to be a component of the regulatory network controlling mitotic and meiotic cell cycle progression in budding yeast. The protein is thought to exert its functions via specific interaction with its target site as part of distinct protein complexes, but its roles during mitotic growth and meiotic development are only partially understood. Here, we report a comprehensive approach aiming at the identification of direct Abf1-target genes expressed during fermentation, respiration, and sporulation. Computational prediction of the protein's target sites was integrated with a genome-wide DNA binding assay in growing and sporulating cells. The resulting data were combined with the output of expression profiling studies using wild-type versus temperature-sensitive alleles. This work identified 434 protein-coding loci as being transcriptionally dependent on Abf1. More than 60% of their putative promoter regions contained a computationally predicted Abf1 binding site and/or were bound by Abf1 in vivo, identifying them as direct targets. The present study revealed numerous loci previously unknown to be under Abf1 control, and it yielded evidence for the protein's variable DNA binding pattern during mitotic growth and meiotic development.

  17. Detection of alien genetic introgressions in bread wheat using dot-blot genomic hybridisation.

    Science.gov (United States)

    Rey, María-Dolores; Prieto, Pilar

    2017-01-01

    Simple, reliable methods for the identification of alien genetic introgressions are required in plant breeding programmes. The use of genomic dot-blot hybridisation allows the detection of small Hordeum chilense genomic introgressions in the descendants of genetic crosses between wheat and H. chilense addition or substitution lines in wheat when molecular markers are difficult to use. Based on genomic in situ hybridisation, DNA samples from wheat lines carrying putatively H. chilense introgressions were immobilised on a membrane, blocked with wheat genomic DNA and hybridised with biotin-labelled H. chilense genomic DNA as a probe. This dot-blot screening reduced the number of plants necessary to be analysed by molecular markers or in situ hybridisation, saving time and money. The technique was sensitive enough to detect a minimum of 5 ng of total genomic DNA immobilised on the membrane or about 1/420 dilution of H. chilense genomic DNA in the wheat background. The robustness of the technique was verified by in situ hybridisation. In addition, the detection of other wheat relative species such as Hordeum vulgare , Secale cereale and Agropyron cristatum in the wheat background was also reported .

  18. Detection of Prokaryotic Genes in the Amphimedon queenslandica Genome.

    Science.gov (United States)

    Conaco, Cecilia; Tsoulfas, Pantelis; Sakarya, Onur; Dolan, Amanda; Werren, John; Kosik, Kenneth S

    2016-01-01

    Horizontal gene transfer (HGT) is common between prokaryotes and phagotrophic eukaryotes. In metazoans, the scale and significance of HGT remains largely unexplored but is usually linked to a close association with parasites and endosymbionts. Marine sponges (Porifera), which host many microorganisms in their tissues and lack an isolated germ line, are potential carriers of genes transferred from prokaryotes. In this study, we identified a number of potential horizontally transferred genes within the genome of the sponge, Amphimedon queenslandica. We further identified homologs of some of these genes in other sponges. The transferred genes, most of which possess catalytic activity for carbohydrate or protein metabolism, have assimilated host genome characteristics and are actively expressed. The diversity of functions contributed by the horizontally transferred genes is likely an important factor in the adaptation and evolution of A. queenslandica. These findings highlight the potential importance of HGT on the success of sponges in diverse ecological niches.

  19. Detection of Prokaryotic Genes in the Amphimedon queenslandica Genome.

    Directory of Open Access Journals (Sweden)

    Cecilia Conaco

    Full Text Available Horizontal gene transfer (HGT is common between prokaryotes and phagotrophic eukaryotes. In metazoans, the scale and significance of HGT remains largely unexplored but is usually linked to a close association with parasites and endosymbionts. Marine sponges (Porifera, which host many microorganisms in their tissues and lack an isolated germ line, are potential carriers of genes transferred from prokaryotes. In this study, we identified a number of potential horizontally transferred genes within the genome of the sponge, Amphimedon queenslandica. We further identified homologs of some of these genes in other sponges. The transferred genes, most of which possess catalytic activity for carbohydrate or protein metabolism, have assimilated host genome characteristics and are actively expressed. The diversity of functions contributed by the horizontally transferred genes is likely an important factor in the adaptation and evolution of A. queenslandica. These findings highlight the potential importance of HGT on the success of sponges in diverse ecological niches.

  20. Across Breed QTL Detection and Genomic Prediction in French and Danish Dairy Cattle Breeds

    DEFF Research Database (Denmark)

    van den Berg, Irene; Guldbrandtsen, Bernt; Hozé, C

    Our objective was to investigate the potential benefits of using sequence data to improve across breed genomic prediction, using data from five French and Danish dairy cattle breeds. First, QTL for protein yield were detected using high density genotypes. Part of the QTL detected within breed was...

  1. Across Breed QTL Detection and Genomic Prediction in French and Danish Dairy Cattle Breeds

    DEFF Research Database (Denmark)

    van den Berg, Irene; Guldbrandtsen, Bernt; Hozé, C

    Our objective was to investigate the potential benefits of using sequence data to improve across breed genomic prediction, using data from five French and Danish dairy cattle breeds. First, QTL for protein yield were detected using high density genotypes. Part of the QTL detected within breed...

  2. [Personal motif in art].

    Science.gov (United States)

    Gerevich, József

    2015-01-01

    One of the basic questions of the art psychology is whether a personal motif is to be found behind works of art and if so, how openly or indirectly it appears in the work itself. Analysis of examples and documents from the fine arts and literature allow us to conclude that the personal motif that can be identified by the viewer through symbols, at times easily at others with more difficulty, gives an emotional plus to the artistic product. The personal motif may be found in traumatic experiences, in communication to the model or with other emotionally important persons (mourning, disappointment, revenge, hatred, rivalry, revolt etc.), in self-searching, or self-analysis. The emotions are expressed in artistic activity either directly or indirectly. The intention nourished by the artist's identity (Kunstwollen) may stand in the way of spontaneous self-expression, channelling it into hidden paths. Under the influence of certain circumstances, the artist may arouse in the viewer, consciously or unconsciously, an illusionary, misleading image of himself. An examination of the personal motif is one of the important research areas of art therapy.

  3. Artin t-Motifs

    OpenAIRE

    Taelman, Lenny

    2008-01-01

    We show that analytically trivial t-motifs satisfy a Tannakian duality, without restrictions on the base field, save for that it be of generic characteristic. We show that the group of components of the t-motivic Galois group coincides with the absolute Galois group of the base field.

  4. Comparative analysis of copy number detection by whole-genome BAC and oligonucleotide array CGH

    Directory of Open Access Journals (Sweden)

    Bejjani Bassem A

    2010-06-01

    Full Text Available Abstract Background Microarray-based comparative genomic hybridization (aCGH is a powerful diagnostic tool for the detection of DNA copy number gains and losses associated with chromosome abnormalities, many of which are below the resolution of conventional chromosome analysis. It has been presumed that whole-genome oligonucleotide (oligo arrays identify more clinically significant copy-number abnormalities than whole-genome bacterial artificial chromosome (BAC arrays, yet this has not been systematically studied in a clinical diagnostic setting. Results To determine the difference in detection rate between similarly designed BAC and oligo arrays, we developed whole-genome BAC and oligonucleotide microarrays and validated them in a side-by-side comparison of 466 consecutive clinical specimens submitted to our laboratory for aCGH. Of the 466 cases studied, 67 (14.3% had a copy-number imbalance of potential clinical significance detectable by the whole-genome BAC array, and 73 (15.6% had a copy-number imbalance of potential clinical significance detectable by the whole-genome oligo array. However, because both platforms identified copy number variants of unclear clinical significance, we designed a systematic method for the interpretation of copy number alterations and tested an additional 3,443 cases by BAC array and 3,096 cases by oligo array. Of those cases tested on the BAC array, 17.6% were found to have a copy-number abnormality of potential clinical significance, whereas the detection rate increased to 22.5% for the cases tested by oligo array. In addition, we validated the oligo array for detection of mosaicism and found that it could routinely detect mosaicism at levels of 30% and greater. Conclusions Although BAC arrays have faster turnaround times, the increased detection rate of oligo arrays makes them attractive for clinical cytogenetic testing.

  5. Genome-wide prediction and validation of sigma70 promoters in Lactobacillus plantarum WCFS1.

    Directory of Open Access Journals (Sweden)

    Tilman J Todt

    Full Text Available BACKGROUND: In prokaryotes, sigma factors are essential for directing the transcription machinery towards promoters. Various sigma factors have been described that recognize, and bind to specific DNA sequence motifs in promoter sequences. The canonical sigma factor σ(70 is commonly involved in transcription of the cell's housekeeping genes, which is mediated by the conserved σ(70 promoter sequence motifs. In this study the σ(70-promoter sequences in Lactobacillus plantarum WCFS1 were predicted using a genome-wide analysis. The accuracy of the transcriptionally-active part of this promoter prediction was subsequently evaluated by correlating locations of predicted promoters with transcription start sites inferred from the 5'-ends of transcripts detected by high-resolution tiling array transcriptome datasets. RESULTS: To identify σ(70-related promoter sequences, we performed a genome-wide sequence motif scan of the L. plantarum WCFS1 genome focussing on the regions upstream of protein-encoding genes. We obtained several highly conserved motifs including those resembling the conserved σ(70-promoter consensus. Position weight matrices-based models of the recovered σ(70-promoter sequence motif were employed to identify 3874 motifs with significant similarity (p-value<10(-4 to the model-motif in the L. plantarum genome. Genome-wide transcript information deduced from whole genome tiling-array transcriptome datasets, was used to infer transcription start sites (TSSs from the 5'-end of transcripts. By this procedure, 1167 putative TSSs were identified that were used to corroborate the transcriptionally active fraction of these predicted promoters. In total, 568 predicted promoters were found in proximity (≤ 40 nucleotides of the putative TSSs, showing a highly significant co-occurrence of predicted promoter and TSS (p-value<10(-263. CONCLUSIONS: High-resolution tiling arrays provide a suitable source to infer TSSs at a genome-wide level, and

  6. RAPD-based detection of genomic instability in cucumber plants ...

    African Journals Online (AJOL)

    STORAGESEVER

    2009-07-20

    Picea abies) (Heinze and. Schmidt, 1995). Cytogenetic analysis were carried out to detect somaclonal variation in somatic embryo-derived plants from two elite genotypes of Asparagus officinalis cv. Argenteuil (Raimondi et al., ...

  7. Sequence analysis of Schmallenberg virus genomes detected in Hungary.

    Science.gov (United States)

    Fehér, Enikő; Marton, Szilvia; Tóth, Ádám György; Ursu, Krisztina; Wernike, Kerstin; Beer, Martin; Dán, Ádám; Bányai, Krisztián

    2017-12-01

    Since its emergence near the German-Dutch border in 2011, Schmallenberg virus (SBV) has been identified in many European countries. In this study, we determined the complete coding sequence of seven Hungarian SBV genomes to expand our knowledge about the genetic diversity of circulating field strains. The samples originated from the first case, an aborted cattle fetus without malformation collected in 2012, and from the blood samples of six adult cattle in 2014. The Hungarian SBV sequences shared ≥99.3% nucleotide (nt) and ≥97.8% amino acid (aa) identity with each other, and ≥98.9 nt and ≥96.7% aa identity with reference strains. Although phylogenetic analyses showed low resolution in general, the M sequences of cattle and sheep origin SBV strains seemed to cluster on different branches. Both common and unique mutation sites were observed in different groups of sequences that might help understanding the evolution of emerging SBV strains.

  8. [The genome and its metaphors. Detectives, heroes or prophets?].

    Science.gov (United States)

    Davo, M C; Alvarez-Dardet, C

    2003-01-01

    The new genetics, or the impetus given to this discipline by the Genome Project, aims to a change of paradigm of the Health Sciences. This change is postulated from a phenotypic approach to a genotypic one, thereby excluding the influence of the environment, which could seriously undermine the grounds for the development and exercise of Public Health. Since the beginning of the genome project, information on genetic discoveries has frequently been reported in the mass media. Metaphors are often used by geneticists and journalists to convey the complex concepts of genetic research for which there are no equivalents in the lay language. The media do not merely shape the social agenda but also provide the space in which health culture is constructed. We present the results of a preliminary study exploring the metaphors used in the three most widely-read national daily newspapers in Spain, namely ABC, El Pais and El Mundo, when reporting news of the new genetics. The possible consequences of the natural history of these metaphors, or the process through which figurative terms acquire a literal meaning, are discussed. A preliminary taxonomy for the metaphors identified was developed. Fifty-one out of 342 identified headings (14.8%) contained metaphors. Strategic metaphors such as program, control, code, map, and puzzle, were the most commonly used, followed by teleological ones such as mystery or God language and finally war-like metaphors such as attack, defeat, and capture. The three groups of metaphors are characterized by an attempt to giving intentionality to genes. Strategic metaphors predominated over teleological and war-like ones and thus a technocratic perspective could form the basis of the future construction of health culture.

  9. The MHC motif viewer

    DEFF Research Database (Denmark)

    Rapin, Nicolas Philippe Jean-Pierre; Hoof, Ilka; Lund, Ole

    2010-01-01

    In vertebrates, the onset of cellular immune reactions is controlled by presentation of peptides in complex with major histocompatibility complex (MHC) molecules to T cell receptors. In humans, MHCs are called human leukocyte antigens (HLAs). Different MHC molecules present different subsets...... of peptides, and knowledge of their binding specificities is important for understanding differences in the immune response between individuals. Algorithms predicting which peptides bind a given MHC molecule have recently been developed with high prediction accuracy. The utility of these algorithms...... is hampered by the lack of tools for browsing and comparing specificity of these molecules. We have developed a Web server, MHC Motif Viewer, which allows the display of the binding motif for MHC class I proteins for human, chimpanzee, rhesus monkey, mouse, and swine, as well as HLA-DR protein sequences...

  10. SNARE motif: A common motif used by pathogens to manipulate membrane fusion

    Science.gov (United States)

    Wesolowski, Jordan

    2010-01-01

    To penetrate host cells through their membranes, pathogens use a variety of molecular components in which the presence of heptad repeat motifs seems to be a prevailing element. Heptad repeats are characterized by a pattern of seven, generally hydrophobic, residues. In order to initiate membrane fusion, viruses use glycoproteins-containing heptad repeats. These proteins are structurally and functionally similar to the SNARE proteins known to be involved in eukaryotic membrane fusion. SNAREs also display a heptad repeat motif called the “SNARE motif”. As bacterial genomes are being sequenced, microorganisms also appear to be carrying membrane proteins resembling eukaryotic SNAREs. This category of SNARE-like proteins might share similar functions and could be used by microorganisms to either promote or block membrane fusion. Such a recurrence across pathogenic organisms suggests that this architectural motif was evolutionarily selected because it most effectively ensures the survival of pathogens within the eukaryotic environment. PMID:21178463

  11. Functional characterization of variations on regulatory motifs.

    Directory of Open Access Journals (Sweden)

    Michal Lapidot

    2008-03-01

    Full Text Available Transcription factors (TFs regulate gene expression through specific interactions with short promoter elements. The same regulatory protein may recognize a variety of related sequences. Moreover, once they are detected it is hard to predict whether highly similar sequence motifs will be recognized by the same TF and regulate similar gene expression patterns, or serve as binding sites for distinct regulatory factors. We developed computational measures to assess the functional implications of variations on regulatory motifs and to compare the functions of related sites. We have developed computational means for estimating the functional outcome of substituting a single position within a binding site and applied them to a collection of putative regulatory motifs. We predict the effects of nucleotide variations within motifs on gene expression patterns. In cases where such predictions could be compared to suitable published experimental evidence, we found very good agreement. We further accumulated statistics from multiple substitutions across various binding sites in an attempt to deduce general properties that characterize nucleotide substitutions that are more likely to alter expression. We found that substitutions involving Adenine are more likely to retain the expression pattern and that substitutions involving Guanine are more likely to alter expression compared to the rest of the substitutions. Our results should facilitate the prediction of the expression outcomes of binding site variations. One typical important implication is expected to be the ability to predict the phenotypic effect of variation in regulatory motifs in promoters.

  12. Polymerase Chain Reaction (PCR) Detection Of The Genome Of ...

    African Journals Online (AJOL)

    A single, discrete and specific band of expected size (278bp) when measured against 200bp (base pair) DNA molecular marker (Roche) was observed from all the three ... This communication herein reported is the first documented report of the detection of ASFV from a Nigerian warthog reported hitherto only in eastern and ...

  13. RAPD-based detection of genomic instability in cucumber plants ...

    African Journals Online (AJOL)

    ... test using five primers OP-C10, OP-G14, OP-H05, OP-Y03 and OP-AT01. The results indicate the usefulness of RAPD markers to detect genetic instability in cucumber primary regenerant plants derived from somatic embryogenesis, and as a certification tool for monitoring genetic stability during the generation process.

  14. Optical Detection of Non-amplified Genomic DNA

    Science.gov (United States)

    Li, Di; Fan, Chunhai

    Nucleic acid sequences are unique to every living organisms including animals, plants and even bacteria and virus, which provide a practical molecular target for the identification and diagnosis of various diseases. DNA contains heterocyclic rings that has inherent optical absorbance at 260 nm, which is widely used to quantify single and double stranded DNA in biology. However, this simple quantification method could not differentiate sequences; therefore it is not suitable for sequence-specific analyte detection. In addition to a few exceptions such as chiral-related circular dichroism spectra, DNA hybridization does not produce significant changes in optical signals, thus an optical label is generally needed for sequence-specific DNA detection with optical means. During the last two decades, we have witnessed explosive progress in the area of optical DNA detection, especially with the help of simultaneously rapidly developed nanomaterials. In this chapter, we will summarize recent advances in optical DNA detection including colorimetric, fluorescent, luminescent, surface plasmon resonance (SPR) and Raman scattering assays. Challenges and problems remained to be addressed are also discussed.

  15. A method for accurate detection of genomic microdeletions using real-time quantitative PCR

    Directory of Open Access Journals (Sweden)

    Bassett Anne S

    2005-12-01

    Full Text Available Abstract Background Quantitative Polymerase Chain Reaction (qPCR is a well-established method for quantifying levels of gene expression, but has not been routinely applied to the detection of constitutional copy number alterations of human genomic DNA. Microdeletions or microduplications of the human genome are associated with a variety of genetic disorders. Although, clinical laboratories routinely use fluorescence in situ hybridization (FISH to identify such cryptic genomic alterations, there remains a significant number of individuals in which constitutional genomic imbalance is suspected, based on clinical parameters, but cannot be readily detected using current cytogenetic techniques. Results In this study, a novel application for real-time qPCR is presented that can be used to reproducibly detect chromosomal microdeletions and microduplications. This approach was applied to DNA from a series of patient samples and controls to validate genomic copy number alteration at cytoband 22q11. The study group comprised 12 patients with clinical symptoms of chromosome 22q11 deletion syndrome (22q11DS, 1 patient trisomic for 22q11 and 4 normal controls. 6 of the patients (group 1 had known hemizygous deletions, as detected by standard diagnostic FISH, whilst the remaining 6 patients (group 2 were classified as 22q11DS negative using the clinical FISH assay. Screening of the patients and controls with a set of 10 real time qPCR primers, spanning the 22q11.2-deleted region and flanking sequence, confirmed the FISH assay results for all patients with 100% concordance. Moreover, this qPCR enabled a refinement of the region of deletion at 22q11. Analysis of DNA from chromosome 22 trisomic sample demonstrated genomic duplication within 22q11. Conclusion In this paper we present a qPCR approach for the detection of chromosomal microdeletions and microduplications. The strategic use of in silico modelling for qPCR primer design to avoid regions of repetitive

  16. Intra-strain polymorphisms are detected but no genomic alteration is found in cloned mice

    International Nuclear Information System (INIS)

    Gotoh, Koshichi; Inoue, Kimiko; Ogura, Atsuo; Oishi, Michio

    2006-01-01

    In-gel competitive reassociation (IGCR) is a method for differential subtraction of polymorphic (RFLP) DNA fragments between two DNA samples of interest without probes or specific sequence information. Here, we applied the IGCR procedure to two cloned mice derived from an F1 hybrid of the C57BL/6Cr and DBA/2 strains, in order to investigate the possibility of genomic alteration in the cloned mouse genomes. Each of the five of the genomic alterations we detected between the two cloned mice corresponded to the 'intra-strain' polymorphisms in the C57BL/6Cr and DBA/2 mouse strains. Our result suggests that no severe aberration of genome sequences occurs due to somatic cell nuclear transfer

  17. Integrating landscape genomics and spatially explicit approaches to detect loci under selection in clinal populations.

    Science.gov (United States)

    Jones, Matthew R; Forester, Brenna R; Teufel, Ashley I; Adams, Rachael V; Anstett, Daniel N; Goodrich, Betsy A; Landguth, Erin L; Joost, Stéphane; Manel, Stéphanie

    2013-12-01

    Uncovering the genetic basis of adaptation hinges on the ability to detect loci under selection. However, population genomics outlier approaches to detect selected loci may be inappropriate for clinal populations or those with unclear population structure because they require that individuals be clustered into populations. An alternate approach, landscape genomics, uses individual-based approaches to detect loci under selection and reveal potential environmental drivers of selection. We tested four landscape genomics methods on a simulated clinal population to determine their effectiveness at identifying a locus under varying selection strengths along an environmental gradient. We found all methods produced very low type I error rates across all selection strengths, but elevated type II error rates under "weak" selection. We then applied these methods to an AFLP genome scan of an alpine plant, Campanula barbata, and identified five highly supported candidate loci associated with precipitation variables. These loci also showed spatial autocorrelation and cline patterns indicative of selection along a precipitation gradient. Our results suggest that landscape genomics in combination with other spatial analyses provides a powerful approach for identifying loci potentially under selection and explaining spatially complex interactions between species and their environment. © 2013 The Author(s). Evolution © 2013 The Society for the Study of Evolution.

  18. Genome-wide detection and analysis of hippocampus core promoters using DeepCAGE

    DEFF Research Database (Denmark)

    Valen, Eivind; Pascarella, Giovanni; Chalk, Alistair

    2009-01-01

    Finding and characterizing mRNAs, their transcription start sites (TSS), and their associated promoters is a major focus in post-genome biology. Mammalian cells have at least 5-10 magnitudes more TSS than previously believed, and deeper sequencing is necessary to detect all active promoters in a ...

  19. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  20. DNA motif elucidation using belief propagation

    KAUST Repository

    Wong, Ka-Chun

    2013-06-29

    Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k = 8 ?10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors\\' websites: e.g. http://www.cs.toronto.edu/?wkc/kmerHMM. 2013 The Author(s).

  1. Sensitive and reliable detection of genomic imbalances in human neuroblastomas using comparative genomic hybridisation analysis

    NARCIS (Netherlands)

    van Gele, M.; van Roy, N.; Jauch, A.; Laureys, G.; Benoit, Y.; Schelfhout, V.; de Potter, C. R.; Brock, P.; Uyttebroeck, A.; Sciot, R.; Schuuring, E.; Versteeg, R.; Speleman, F.

    1997-01-01

    Deletions of the short arm of chromosome 1, extra copies of chromosome 17q and MYCN amplification are the most frequently encountered genetic changes in neuroblastomas. Standard techniques for detection of one or more of these genetic changes are karyotyping, FISH analysis and LOH analysis by

  2. Motif-role-fingerprints: the building-blocks of motifs, clustering-coefficients and transitivities in directed networks.

    Science.gov (United States)

    McDonnell, Mark D; Yaveroğlu, Ömer Nebil; Schmerl, Brett A; Iannella, Nicolangelo; Ward, Lawrence M

    2014-01-01

    Complex networks are frequently characterized by metrics for which particular subgraphs are counted. One statistic from this category, which we refer to as motif-role fingerprints, differs from global subgraph counts in that the number of subgraphs in which each node participates is counted. As with global subgraph counts, it can be important to distinguish between motif-role fingerprints that are 'structural' (induced subgraphs) and 'functional' (partial subgraphs). Here we show mathematically that a vector of all functional motif-role fingerprints can readily be obtained from an arbitrary directed adjacency matrix, and then converted to structural motif-role fingerprints by multiplying that vector by a specific invertible conversion matrix. This result demonstrates that a unique structural motif-role fingerprint exists for any given functional motif-role fingerprint. We demonstrate a similar result for the cases of functional and structural motif-fingerprints without node roles, and global subgraph counts that form the basis of standard motif analysis. We also explicitly highlight that motif-role fingerprints are elemental to several popular metrics for quantifying the subgraph structure of directed complex networks, including motif distributions, directed clustering coefficient, and transitivity. The relationships between each of these metrics and motif-role fingerprints also suggest new subtypes of directed clustering coefficients and transitivities. Our results have potential utility in analyzing directed synaptic networks constructed from neuronal connectome data, such as in terms of centrality. Other potential applications include anomaly detection in networks, identification of similar networks and identification of similar nodes within networks. Matlab code for calculating all stated metrics following calculation of functional motif-role fingerprints is provided as S1 Matlab File.

  3. Motif-role-fingerprints: the building-blocks of motifs, clustering-coefficients and transitivities in directed networks.

    Directory of Open Access Journals (Sweden)

    Mark D McDonnell

    Full Text Available Complex networks are frequently characterized by metrics for which particular subgraphs are counted. One statistic from this category, which we refer to as motif-role fingerprints, differs from global subgraph counts in that the number of subgraphs in which each node participates is counted. As with global subgraph counts, it can be important to distinguish between motif-role fingerprints that are 'structural' (induced subgraphs and 'functional' (partial subgraphs. Here we show mathematically that a vector of all functional motif-role fingerprints can readily be obtained from an arbitrary directed adjacency matrix, and then converted to structural motif-role fingerprints by multiplying that vector by a specific invertible conversion matrix. This result demonstrates that a unique structural motif-role fingerprint exists for any given functional motif-role fingerprint. We demonstrate a similar result for the cases of functional and structural motif-fingerprints without node roles, and global subgraph counts that form the basis of standard motif analysis. We also explicitly highlight that motif-role fingerprints are elemental to several popular metrics for quantifying the subgraph structure of directed complex networks, including motif distributions, directed clustering coefficient, and transitivity. The relationships between each of these metrics and motif-role fingerprints also suggest new subtypes of directed clustering coefficients and transitivities. Our results have potential utility in analyzing directed synaptic networks constructed from neuronal connectome data, such as in terms of centrality. Other potential applications include anomaly detection in networks, identification of similar networks and identification of similar nodes within networks. Matlab code for calculating all stated metrics following calculation of functional motif-role fingerprints is provided as S1 Matlab File.

  4. Using RNA nanoparticles with thermostable motifs and fluorogenic modules for real-time detection of RNA folding and turnover in prokaryotic and eukaryotic cells.

    Science.gov (United States)

    Zhang, Hui; Pi, Fengmei; Shu, Dan; Vieweger, Mario; Guo, Peixuan

    2015-01-01

    RNA nanotechnology is an emerging field at the interface of biochemistry and nanomaterials that shows immense promise for applications in nanomedicines, therapeutics and nanotechnology. Noncoding RNAs, such as siRNA, miRNA, ribozymes, and riboswitches, play important roles in the regulation of cellular processes. They carry out highly specific functions on a compact and efficient footprint. The properties of specificity and small size make them excellent modules in the construction of multifaceted RNA nanoparticles for targeted delivery and therapy. Biological activity of RNA molecules, however, relies on their proper folding. Therefore their thermodynamic and biochemical stability in the cellular environment is critical. Consequently, it is essential to assess global fold and intracellular lifetime of multifaceted RNA nanoparticles to optimize their therapeutic effectiveness. Here, we describe a method to express and assemble stable RNA nanoparticles in cells, and to assess the folding and turnover rate of RNA nanoparticles in vitro as well as in vivo in real time using a thermostable core motif derived from pRNA of bacteriophage Phi29 DNA packaging motor and fluorogenic RNA modules.

  5. Genome-Wide SNP Detection, Validation, and Development of an 8K SNP Array for Apple

    Science.gov (United States)

    Chagné, David; Crowhurst, Ross N.; Troggio, Michela; Davey, Mark W.; Gilmore, Barbara; Lawley, Cindy; Vanderzande, Stijn; Hellens, Roger P.; Kumar, Satish; Cestaro, Alessandro; Velasco, Riccardo; Main, Dorrie; Rees, Jasper D.; Iezzoni, Amy; Mockler, Todd; Wilhelm, Larry; Van de Weg, Eric; Gardiner, Susan E.; Bassil, Nahla; Peace, Cameron

    2012-01-01

    As high-throughput genetic marker screening systems are essential for a range of genetics studies and plant breeding applications, the International RosBREED SNP Consortium (IRSC) has utilized the Illumina Infinium® II system to develop a medium- to high-throughput SNP screening tool for genome-wide evaluation of allelic variation in apple (Malus×domestica) breeding germplasm. For genome-wide SNP discovery, 27 apple cultivars were chosen to represent worldwide breeding germplasm and re-sequenced at low coverage with the Illumina Genome Analyzer II. Following alignment of these sequences to the whole genome sequence of ‘Golden Delicious’, SNPs were identified using SoapSNP. A total of 2,113,120 SNPs were detected, corresponding to one SNP to every 288 bp of the genome. The Illumina GoldenGate® assay was then used to validate a subset of 144 SNPs with a range of characteristics, using a set of 160 apple accessions. This validation assay enabled fine-tuning of the final subset of SNPs for the Illumina Infinium® II system. The set of stringent filtering criteria developed allowed choice of a set of SNPs that not only exhibited an even distribution across the apple genome and a range of minor allele frequencies to ensure utility across germplasm, but also were located in putative exonic regions to maximize genotyping success rate. A total of 7867 apple SNPs was established for the IRSC apple 8K SNP array v1, of which 5554 were polymorphic after evaluation in segregating families and a germplasm collection. This publicly available genomics resource will provide an unprecedented resolution of SNP haplotypes, which will enable marker-locus-trait association discovery, description of the genetic architecture of quantitative traits, investigation of genetic variation (neutral and functional), and genomic selection in apple. PMID:22363718

  6. Genome-wide SNP detection, validation, and development of an 8K SNP array for apple.

    Directory of Open Access Journals (Sweden)

    David Chagné

    Full Text Available As high-throughput genetic marker screening systems are essential for a range of genetics studies and plant breeding applications, the International RosBREED SNP Consortium (IRSC has utilized the Illumina Infinium® II system to develop a medium- to high-throughput SNP screening tool for genome-wide evaluation of allelic variation in apple (Malus×domestica breeding germplasm. For genome-wide SNP discovery, 27 apple cultivars were chosen to represent worldwide breeding germplasm and re-sequenced at low coverage with the Illumina Genome Analyzer II. Following alignment of these sequences to the whole genome sequence of 'Golden Delicious', SNPs were identified using SoapSNP. A total of 2,113,120 SNPs were detected, corresponding to one SNP to every 288 bp of the genome. The Illumina GoldenGate® assay was then used to validate a subset of 144 SNPs with a range of characteristics, using a set of 160 apple accessions. This validation assay enabled fine-tuning of the final subset of SNPs for the Illumina Infinium® II system. The set of stringent filtering criteria developed allowed choice of a set of SNPs that not only exhibited an even distribution across the apple genome and a range of minor allele frequencies to ensure utility across germplasm, but also were located in putative exonic regions to maximize genotyping success rate. A total of 7867 apple SNPs was established for the IRSC apple 8K SNP array v1, of which 5554 were polymorphic after evaluation in segregating families and a germplasm collection. This publicly available genomics resource will provide an unprecedented resolution of SNP haplotypes, which will enable marker-locus-trait association discovery, description of the genetic architecture of quantitative traits, investigation of genetic variation (neutral and functional, and genomic selection in apple.

  7. Detecting the somatic mutations spectrum of Chinese lung cancer by analyzing the whole mitochondrial DNA genomes.

    Science.gov (United States)

    Fang, Yu; Huang, Jie; Zhang, Jing; Wang, Jun; Qiao, Fei; Chen, Hua-Mei; Hong, Zhi-Peng

    2015-02-01

    To detect the somatic mutations and character its spectrum in Chinese lung cancer patients. In this study, we sequenced the whole mitochondrial DNA (mtDNA) genomes for 10 lung cancer patients including the primary cancerous, matched paracancerous normal and distant normal tissues. By analyzing the 30 whole mtDNA genomes, eight somatic mutations were identified from five patients investigated, which were confirmed with the cloning and sequencing of the somatic mutations. Five of the somatic mutations were detected among control region and the rests were found at the coding region. Heterogeneity was the main character of the somatic mutations in Chinese lung cancer patients. Further potential disease-related screening showed that, except the C deletion at position 309 showed AD-weakly associated, most of them were not disease-related. Although the role of aforementioned somatic mutations was unknown, however, considering the relative higher frequency of somatic mutations among the whole mtDNA genomes, it hints that detecting the somatic mutation(s) from the whole mtDNA genomes can serve as a useful tool for the Chinese lung cancer diagnostic to some extent.

  8. Detection of chromosomal aberrations in seminomatous germ cell tumours using comparative genomic hybridization

    DEFF Research Database (Denmark)

    Ottesen, A M; Kirchhoff, M; Rajpert-De Meyts, Ewa

    1997-01-01

    Comparative genomic hybridization (CGH) was used to evaluate tissue specimens from 16 seminomas in order to elucidate the pathogenesis of germ cell tumours in males. A characteristic pattern of losses and gains within the entire genomes was detected in 94% of the seminomas by comparing the ratio...... of 12p and 21q appeared most consistently. Results from CGH analysis displayed no relationship to the clinical stages of the malignancy. Some rare aberrations appeared, however, only in clinical stage II and in tumours showing relapse in the contralateral testis following orchiectomy, although...

  9. Detecting signatures of selection within the Tibetan sheep mitochondrial genome.

    Science.gov (United States)

    Niu, Lili; Chen, Xiaoyong; Xiao, Ping; Zhao, Qianjun; Zhou, Jingxuan; Hu, Jiangtao; Sun, Hongxin; Guo, Jiazhong; Li, Li; Wang, Linjie; Zhang, Hongping; Zhong, Tao

    2017-11-01

    Tibetan sheep, a Chinese indigenous breed, are mainly distributed in plateau and mountain-valley areas at a terrestrial elevation between 2260 and 4100 m. The herd is genetically distinct from the other domestic sheep and undergoes acclimatization to adapt to the hypoxic environment. To date, whether the mitochondrial DNA modification of Tibetan sheep shares the same feature as the other domestic breed remains unknown. In this study, we compared the whole mitogenome sequences from 32 Tibetan sheep, 22 domestic sheep and 24 commercial sheep to identify the selection signatures of hypoxic-tolerant in Tibetan sheep. Nucleotide diversity analysis using the sliding window method showed that the highest level of nucleotide diversity was observed in the control region with a peak value of π = 0.05215, while the lowest π value was detected in the tRNAs region. qPCR results showed that the relative mtDNA copy number in Tibetan sheep was significantly lower than that in Suffolk sheep. None of the mutations in 12S rRNA were fixed in Tibetan sheep, which indicated that there has been less artificial selection in this herd than the other domestic and commercial breeds. Although one site (1277G) might undergo the purifying selection, it was not identified as the breed-specific allele in Tibetan sheep. We proposed that nature selection was the main drive during the domestication of Tibetan sheep and single mutation (or locus) could not reveal the signature of selection as for the high diversity in the mitogenome of Tibetan sheep.

  10. Detection of small genomic imbalances using microarray-based multiplex amplifiable probe hybridization.

    Science.gov (United States)

    Patsalis, Philippos C; Kousoulidou, Ludmila; Männik, Katrin; Sismani, Carolina; Zilina, Olga; Parkel, Sven; Puusepp, Helen; Tõnisson, Neeme; Palta, Priit; Remm, Maido; Kurg, Ants

    2007-02-01

    Array-based genome-wide screening methods were recently introduced to clinical practice in order to detect small genomic imbalances that may cause severe genetic disorders. The continuous advancement of such methods plays an extremely important role in diagnostic genetics and medical genomics. We have modified and adapted the original multiplex amplifiable probe hybridization (MAPH) to a novel microarray format providing an important new diagnostic tool for detection of small size copy-number changes in any locus of human genome. Here, we describe the new array-MAPH diagnostic method and show proof of concept through fabrication, interrogation and validation of a human chromosome X-specific array. We have developed new bioinformatic tools and methodology for designing and producing amplifiable hybridization probes (200-600 bp) for array-MAPH. We designed 558 chromosome X-specific probes with median spacing 238 kb and 107 autosomal probes, which were spotted onto microarrays. DNA samples from normal individuals and patients with known and unknown chromosome X aberrations were analyzed for validation. Array-MAPH detected exactly the same deletions and duplications in blind studies, as well as other unknown small size deletions showing its accuracy and sensitivity. All results were confirmed by fluorescence in situ hybridization and probe-specific PCR. Array-MAPH is a new microarray-based diagnostic tool for the detection of small-scale copy-number changes in complex genomes, which may be useful for genotype-phenotype correlations, identification of new genes, studying genetic variation and provision of genetic services.

  11. Detection of Cytosolic Shigella flexneri via a C-Terminal Triple-Arginine Motif of GBP1 Inhibits Actin-Based Motility

    Directory of Open Access Journals (Sweden)

    Anthony S. Piro

    2017-12-01

    Full Text Available Dynamin-like guanylate binding proteins (GBPs are gamma interferon (IFN-γ-inducible host defense proteins that can associate with cytosol-invading bacterial pathogens. Mouse GBPs promote the lytic destruction of targeted bacteria in the host cell cytosol, but the antimicrobial function of human GBPs and the mechanism by which these proteins associate with cytosolic bacteria are poorly understood. Here, we demonstrate that human GBP1 is unique among the seven human GBP paralogs in its ability to associate with at least two cytosolic Gram-negative bacteria, Burkholderia thailandensis and Shigella flexneri. Rough lipopolysaccharide (LPS mutants of S. flexneri colocalize with GBP1 less frequently than wild-type S. flexneri does, suggesting that host recognition of O antigen promotes GBP1 targeting to Gram-negative bacteria. The targeting of GBP1 to cytosolic bacteria, via a unique triple-arginine motif present in its C terminus, promotes the corecruitment of four additional GBP paralogs (GBP2, GBP3, GBP4, and GBP6. GBP1-decorated Shigella organisms replicate but fail to form actin tails, leading to their intracellular aggregation. Consequentially, the wild type but not the triple-arginine GBP1 mutant restricts S. flexneri cell-to-cell spread. Furthermore, human-adapted S. flexneri, through the action of one its secreted effectors, IpaH9.8, is more resistant to GBP1 targeting than the non-human-adapted bacillus B. thailandensis. These studies reveal that human GBP1 uniquely functions as an intracellular “glue trap,” inhibiting the cytosolic movement of normally actin-propelled Gram-negative bacteria. In response to this powerful human defense program, S. flexneri has evolved an effective counterdefense to restrict GBP1 recruitment.

  12. Comparative analysis of two complete Corynebacterium ulcerans genomes and detection of candidate virulence factors

    Directory of Open Access Journals (Sweden)

    Trost Eva

    2011-07-01

    Full Text Available Abstract Background Corynebacterium ulcerans has been detected as a commensal in domestic and wild animals that may serve as reservoirs for zoonotic infections. During the last decade, the frequency and severity of human infections associated with C. ulcerans appear to be increasing in various countries. As the knowledge of genes contributing to the virulence of this bacterium was very limited, the complete genome sequences of two C. ulcerans strains detected in the metropolitan area of Rio de Janeiro were determined and characterized by comparative genomics: C. ulcerans 809 was initially isolated from an elderly woman with fatal pulmonary infection and C. ulcerans BR-AD22 was recovered from a nasal sample of an asymptomatic dog. Results The circular chromosome of C. ulcerans 809 has a total size of 2,502,095 bp and encodes 2,182 predicted proteins, whereas the genome of C. ulcerans BR-AD22 is 104,279 bp larger and comprises 2,338 protein-coding regions. The minor difference in size of the two genomes is mainly caused by additional prophage-like elements in the C. ulcerans BR-AD22 chromosome. Both genomes show a highly similar order of orthologous coding regions; and both strains share a common set of 2,076 genes, demonstrating their very close relationship. A screening for prominent virulence factors revealed the presence of phospholipase D (Pld, neuraminidase H (NanH, endoglycosidase E (EndoE, and subunits of adhesive pili of the SpaDEF type that are encoded in both C. ulcerans genomes. The rbp gene coding for a putative ribosome-binding protein with striking structural similarity to Shiga-like toxins was additionally detected in the genome of the human isolate C. ulcerans 809. Conclusions The molecular data deduced from the complete genome sequences provides considerable knowledge of virulence factors in C. ulcerans that is increasingly recognized as an emerging pathogen. This bacterium is apparently equipped with a broad and varying set of

  13. AMD, an Automated Motif Discovery Tool Using Stepwise Refinement of Gapped Consensuses

    OpenAIRE

    Shi, Jiantao; Yang, Wentao; Chen, Mingjie; Du, Yanzhi; Zhang, Ji; Wang, Kankan

    2011-01-01

    Motif discovery is essential for deciphering regulatory codes from high throughput genomic data, such as those from ChIP-chip/seq experiments. However, there remains a lack of effective and efficient methods for the identification of long and gapped motifs in many relevant tools reported to date. We describe here an automated tool that allows for de novo discovery of transcription factor binding sites, regardless of whether the motifs are long or short, gapped or contiguous.

  14. Whole-genome resequencing of honeybee drones to detect genomic selection in a population managed for royal jelly.

    Science.gov (United States)

    Wragg, David; Marti-Marimon, Maria; Basso, Benjamin; Bidanel, Jean-Pierre; Labarthe, Emmanuelle; Bouchez, Olivier; Le Conte, Yves; Vignal, Alain

    2016-06-03

    Four main evolutionary lineages of A. mellifera have been described including eastern Europe (C) and western and northern Europe (M). Many apiculturists prefer bees from the C lineage due to their docility and high productivity. In France, the routine importation of bees from the C lineage has resulted in the widespread admixture of bees from the M lineage. The haplodiploid nature of the honeybee Apis mellifera, and its small genome size, permits affordable and extensive genomics studies. As a pilot study of a larger project to characterise French honeybee populations, we sequenced 60 drones sampled from two commercial populations managed for the production of honey and royal jelly. Results indicate a C lineage origin, whilst mitochondrial analysis suggests two drones originated from the O lineage. Analysis of heterozygous SNPs identified potential copy number variants near to genes encoding odorant binding proteins and several cytochrome P450 genes. Signatures of selection were detected using the hapFLK haplotype-based method, revealing several regions under putative selection for royal jelly production. The framework developed during this study will be applied to a broader sampling regime, allowing the genetic diversity of French honeybees to be characterised in detail.

  15. Species and gene divergence in Littorina snails detected by array comparative genomic hybridization.

    Science.gov (United States)

    Panova, Marina; Johansson, Tomas; Canbäck, Björn; Bentzer, Johan; Rosenblad, Magnus Alm; Johannesson, Kerstin; Tunlid, Anders; André, Carl

    2014-08-18

    Array comparative genomic hybridization (aCGH) is commonly used to screen different types of genetic variation in humans and model species. Here, we performed aCGH using an oligonucleotide gene-expression array for a non-model species, the intertidal snail Littorina saxatilis. First, we tested what types of genetic variation can be detected by this method using direct re-sequencing and comparison to the Littorina genome draft. Secondly, we performed a genome-wide comparison of four closely related Littorina species: L. fabalis, L. compressa, L. arcana and L. saxatilis and of populations of L. saxatilis found in Spain, Britain and Sweden. Finally, we tested whether we could identify genetic variation underlying "Crab" and "Wave" ecotypes of L. saxatilis. We could reliably detect copy number variations, deletions and high sequence divergence (i.e. above 3%), but not single nucleotide polymorphisms. The overall hybridization pattern and number of significantly diverged genes were in close agreement with earlier phylogenetic reconstructions based on single genes. The trichotomy of L. arcana, L. compressa and L. saxatilis could not be resolved and we argue that these divergence events have occurred recently and very close in time. We found evidence for high levels of segmental duplication in the Littorina genome (10% of the transcripts represented on the array and up to 23% of the analyzed genomic fragments); duplicated genes and regions were mostly the same in all analyzed species. Finally, this method discriminated geographically distant populations of L. saxatilis, but we did not detect any significant genome divergence associated with ecotypes of L. saxatilis. The present study provides new information on the sensitivity and the potential use of oligonucleotide arrays for genotyping of non-model organisms. Applying this method to Littorina species yields insights into genome evolution following the recent species radiation and supports earlier single-gene based

  16. Direct detection of chicken genomic DNA for gender determination by thymine-DNA glycosylase.

    Science.gov (United States)

    Porat, N; Bogdanov, K; Danielli, A; Arie, A; Samina, I; Hadani, A

    2011-02-01

    1. Birds, especially nestlings, are generally difficult to sex by morphology and early detection of chick gender in ovo in the hatchery would facilitate removal of unwanted chicks and diminish welfare objections regarding culling after hatch. 2. We describe a method to determine chicken gender without the need for PCR via use of Thymine-DNA Glycosylase (TDG). TDG restores thymine (T)/guanine (G) mismatches to cytosine (C)/G. We show here, that like DNA Polymerase, TDG can recognise, bind and function on a primer hybridised to chicken genomic DNA. 3. The primer contained a T to mismatch a G in a chicken genomic template and the T/G was cleaved with high fidelity by TDG. Thus, the chicken genomic DNA can be identified without PCR amplification via direct and linear detection. Sensitivity was increased using gender specific sequences from the chicken genome. 4. Currently, these are laboratory results, but we anticipate that further development will allow this method to be used in non-laboratory settings, where PCR cannot be employed.

  17. A speedup technique for (l, d-motif finding algorithms

    Directory of Open Access Journals (Sweden)

    Dinh Hieu

    2011-03-01

    Full Text Available Abstract Background The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem. Results Three versions of the motif search problem have been proposed in the literature: Simple Motif Search (SMS, (l, d-motif search (or Planted Motif Search (PMS, and Edit-distance-based Motif Search (EMS. In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate. An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. Exact algorithms proposed in the literature for PMS take time that is exponential in some of the underlying parameters. In this paper we propose a generic technique that can be used to speedup PMS algorithms. Conclusions We present a speedup technique that can be used on any PMS algorithm. We have tested our speedup technique on a number of algorithms. These experimental results show that our speedup technique is indeed very

  18. A direct detection of Escherichia coli genomic DNA using gold nanoprobes

    Directory of Open Access Journals (Sweden)

    Padmavathy

    2012-02-01

    Full Text Available Abstract Background In situation like diagnosis of clinical and forensic samples there exists a need for highly sensitive, rapid and specific DNA detection methods. Though conventional DNA amplification using PCR can provide fast results, it is not widely practised in diagnostic laboratories partially because it requires skilled personnel and expensive equipment. To overcome these limitations nanoparticles have been explored as signalling probes for ultrasensitive DNA detection that can be used in field applications. Among the nanomaterials, gold nanoparticles (AuNPs have been extensively used mainly because of its optical property and ability to get functionalized with a variety of biomolecules. Results We report a protocol for the use of gold nanoparticles functionalized with single stranded oligonucleotide (AuNP- oligo probe as visual detection probes for rapid and specific detection of Escherichia coli. The AuNP- oligo probe on hybridization with target DNA containing complementary sequences remains red whereas test samples without complementary DNA sequences to the probe turns purple due to acid induced aggregation of AuNP- oligo probes. The color change of the solution is observed visually by naked eye demonstrating direct and rapid detection of the pathogenic Escherichia coli from its genomic DNA without the need for PCR amplification. The limit of detection was ~54 ng for unamplified genomic DNA. The method requires less than 30 minutes to complete after genomic DNA extraction. However, by using unamplified enzymatic digested genomic DNA, the detection limit of 11.4 ng was attained. Results of UV-Vis spectroscopic measurement and AFM imaging further support the hypothesis of aggregation based visual discrimination. To elucidate its utility in medical diagnostic, the assay was validated on clinical strains of pathogenic Escherichia coli obtained from local hospitals and spiked urine samples. It was found to be 100% sensitive and proves to

  19. Genomes

    National Research Council Canada - National Science Library

    Brown, T. A. (Terence A.)

    2002-01-01

    ... of genome expression and replication processes, and transcriptomics and proteomics. This text is richly illustrated with clear, easy-to-follow, full color diagrams, which are downloadable from the book's website...

  20. Genome rearrangements detected by SNP microarrays in individuals with intellectual disability referred with possible Williams syndrome.

    Directory of Open Access Journals (Sweden)

    Ariel M Pani

    2010-08-01

    Full Text Available Intellectual disability (ID affects 2-3% of the population and may occur with or without multiple congenital anomalies (MCA or other medical conditions. Established genetic syndromes and visible chromosome abnormalities account for a substantial percentage of ID diagnoses, although for approximately 50% the molecular etiology is unknown. Individuals with features suggestive of various syndromes but lacking their associated genetic anomalies pose a formidable clinical challenge. With the advent of microarray techniques, submicroscopic genome alterations not associated with known syndromes are emerging as a significant cause of ID and MCA.High-density SNP microarrays were used to determine genome wide copy number in 42 individuals: 7 with confirmed alterations in the WS region but atypical clinical phenotypes, 31 with ID and/or MCA, and 4 controls. One individual from the first group had the most telomeric gene in the WS critical region deleted along with 2 Mb of flanking sequence. A second person had the classic WS deletion and a rearrangement on chromosome 5p within the Cri du Chat syndrome (OMIM:123450 region. Six individuals from the ID/MCA group had large rearrangements (3 deletions, 3 duplications, one of whom had a large inversion associated with a deletion that was not detected by the SNP arrays.Combining SNP microarray analyses and qPCR allowed us to clone and sequence 21 deletion breakpoints in individuals with atypical deletions in the WS region and/or ID or MCA. Comparison of these breakpoints to databases of genomic variation revealed that 52% occurred in regions harboring structural variants in the general population. For two probands the genomic alterations were flanked by segmental duplications, which frequently mediate recurrent genome rearrangements; these may represent new genomic disorders. While SNP arrays and related technologies can identify potentially pathogenic deletions and duplications, obtaining sequence information

  1. Identification of coupling DNA motif pairs on long-range chromatin interactions in human K562 cells

    KAUST Repository

    Wong, Ka-Chun

    2015-09-27

    Motivation: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. Results: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. © The Author 2015. Published by Oxford University Press.

  2. A novel Bayesian DNA motif comparison method for clustering and retrieval.

    Directory of Open Access Journals (Sweden)

    Naomi Habib

    2008-02-01

    Full Text Available Characterizing the DNA-binding specificities of transcription factors is a key problem in computational biology that has been addressed by multiple algorithms. These usually take as input sequences that are putatively bound by the same factor and output one or more DNA motifs. A common practice is to apply several such algorithms simultaneously to improve coverage at the price of redundancy. In interpreting such results, two tasks are crucial: clustering of redundant motifs, and attributing the motifs to transcription factors by retrieval of similar motifs from previously characterized motif libraries. Both tasks inherently involve motif comparison. Here we present a novel method for comparing and merging motifs, based on Bayesian probabilistic principles. This method takes into account both the similarity in positional nucleotide distributions of the two motifs and their dissimilarity to the background distribution. We demonstrate the use of the new comparison method as a basis for motif clustering and retrieval procedures, and compare it to several commonly used alternatives. Our results show that the new method outperforms other available methods in accuracy and sensitivity. We incorporated the resulting motif clustering and retrieval procedures in a large-scale automated pipeline for analyzing DNA motifs. This pipeline integrates the results of various DNA motif discovery algorithms and automatically merges redundant motifs from multiple training sets into a coherent annotated library of motifs. Application of this pipeline to recent genome-wide transcription factor location data in S. cerevisiae successfully identified DNA motifs in a manner that is as good as semi-automated analysis reported in the literature. Moreover, we show how this analysis elucidates the mechanisms of condition-specific preferences of transcription factors.

  3. Complete Genome Sequences of Eight Helicobacter pylori Strains with Different Virulence Factor Genotypes and Methylation Profiles, Isolated from Patients with Diverse Gastrointestinal Diseases on Okinawa Island, Japan, Determined Using PacBio Single-Molecule Real-Time Technology

    Science.gov (United States)

    Shiroma, Akino; Teruya, Kuniko; Shimoji, Makiko; Nakano, Kazuma; Juan, Ayaka; Tamotsu, Hinako; Terabayashi, Yasunobu; Aoyama, Misako; Teruya, Morimi; Suzuki, Rumiko; Matsuda, Miyuki; Sekine, Akihiro; Kinjo, Nagisa; Kinjo, Fukunori; Yamaoka, Yoshio; Hirano, Takashi

    2014-01-01

    We report the complete genome sequences of eight Helicobacter pylori strains isolated from patients with gastrointestinal diseases in Okinawa, Japan. Whole-genome sequencing and DNA methylation detection were performed using the PacBio platform. De novo assembly determined a single, complete contig for each strain. Furthermore, methylation analysis identified virulence factor genotype-dependent motifs. PMID:24744331

  4. Development of Real Time PCR Using Novel Genomic Target for Detection of Multiple Salmonella Serovars from Milk and Chickens

    Science.gov (United States)

    Background: A highly sensitive and specific novel genomic and plasmid target-based PCR platform was developed to detect multiple Salmonella serovars (S. Heidelberg, S. Dublin, S. Hadar, S. Kentucky and S. Enteritidis). Through extensive genome mining of protein databases of these serovars and compar...

  5. Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sergey I Nikolaev

    Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.

  6. PISMA: A Visual Representation of Motif Distribution in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Rogelio Alcántara-Silva

    2017-03-01

    Full Text Available Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf .

  7. Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples

    Directory of Open Access Journals (Sweden)

    Maley Carlo C

    2008-10-01

    Full Text Available Abstract Background Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. Results We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12 genomes. Virtually all possible (> 98% 12 bp oligomers appear in vertebrate genomes while 98% to D. melanogaster (12–17 bp, C. elegans (11–17 bp, A. thaliana (11–17 bp, S. cerevisiae (10–16 bp and E. coli (9–15 bp. Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. Conclusion Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect

  8. Genome-wide scans detect adaptation to aridity in a widespread forest tree species.

    Science.gov (United States)

    Steane, Dorothy A; Potts, Brad M; McLean, Elizabeth; Prober, Suzanne M; Stock, William D; Vaillancourt, René E; Byrne, Margaret

    2014-05-01

    Patterns of adaptive variation within plant species are best studied through common garden experiments, but these are costly and time-consuming, especially for trees that have long generation times. We explored whether genome-wide scanning technology combined with outlier marker detection could be used to detect adaptation to climate and provide an alternative to common garden experiments. As a case study, we sampled nine provenances of the widespread forest tree species, Eucalyptus tricarpa, across an aridity gradient in southeastern Australia. Using a Bayesian analysis, we identified a suite of 94 putatively adaptive (outlying) sequence-tagged markers across the genome. Population-level allele frequencies of these outlier markers were strongly correlated with temperature and moisture availability at the site of origin, and with population differences in functional traits measured in two common gardens. Using the output from a canonical analysis of principal coordinates, we devised a metric that provides a holistic measure of genomic adaptation to aridity that could be used to guide assisted migration or genetic augmentation. © 2014 John Wiley & Sons Ltd.

  9. Genomic sequencing-based detection of large deletions in Rhodococcus rhodochrous strain B-276.

    Science.gov (United States)

    Saitoh, Seikoh; Aoyama, Hiroaki; Akutsu, Masako; Nakano, Kazuma; Shinzato, Naoya; Matsui, Toru

    2013-09-01

    Bacteria of the genus Rhodococcus (Actinomycetes) have the ability to catabolize various organic compounds and are therefore considered potential genetic resources for applications such as bioremediation. We investigated a next-generation sequencing-based procedure to rapidly identify candidate functional gene(s) from rhodococci on the basis of their frequent genome recombination. The Rhodococcus rhodochrous strain B-276 and its alkene monooxygenase (AMO) gene cluster were the focus of our investigation. Firstly, 2 types of cultures of the R. rhodochrous strain B-276 were prepared, one of which was supplied with propene, which requires AMO genes for its assimilation, whereas the other was supplied with glucose as the sole energy source. The latter culture was anticipated to have a lower gene frequency of AMO genes because of their deletion during cultivation. We then conducted whole genome shotgun sequencing of the genomic DNA extracted from both cultures. Next, all sequence data were pooled and assembled into contiguous sequences (contigs). Finally, the abundance of each contig was quantified in order to detect contigs that were highly biased between the 2 cultures. We identified contigs that were overrepresented by 2 orders of magnitude in the AMO-required culture and successfully identified an AMO gene cluster among these contigs. We propose this procedure as an efficient method for the rapid detection and sequencing of deleted region, which contributes to identification of functional genes in rhodococci. Copyright © 2013 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  10. Long insert whole genome sequencing for copy number variant and translocation detection.

    Science.gov (United States)

    Liang, Winnie S; Aldrich, Jessica; Tembe, Waibhav; Kurdoglu, Ahmet; Cherni, Irene; Phillips, Lori; Reiman, Rebecca; Baker, Angela; Weiss, Glen J; Carpten, John D; Craig, David W

    2014-01-01

    As next-generation sequencing continues to have an expanding presence in the clinic, the identification of the most cost-effective and robust strategy for identifying copy number changes and translocations in tumor genomes is needed. We hypothesized that performing shallow whole genome sequencing (WGS) of 900-1000-bp inserts (long insert WGS, LI-WGS) improves our ability to detect these events, compared with shallow WGS of 300-400-bp inserts. A priori analyses show that LI-WGS requires less sequencing compared with short insert WGS to achieve a target physical coverage, and that LI-WGS requires less sequence coverage to detect a heterozygous event with a power of 0.99. We thus developed an LI-WGS library preparation protocol based off of Illumina's WGS library preparation protocol and illustrate the feasibility of performing LI-WGS. We additionally applied LI-WGS to three separate tumor/normal DNA pairs collected from patients diagnosed with different cancers to demonstrate our application of LI-WGS on actual patient samples for identification of somatic copy number alterations and translocations. With the evolution of sequencing technologies and bioinformatics analyses, we show that modifications to current approaches may improve our ability to interrogate cancer genomes.

  11. The presence of the ancestral insect telomeric motif in kissing bugs (Triatominae) rules out the hypothesis of its loss in evolutionarily advanced Heteroptera (Cimicomorpha)

    Science.gov (United States)

    Pita, Sebastián; Panzera, Francisco; Mora, Pablo; Vela, Jesús; Palomeque, Teresa; Lorite, Pedro

    2016-01-01

    Abstract Next-generation sequencing data analysis on Triatoma infestans Klug, 1834 (Heteroptera, Cimicomorpha, Reduviidae) revealed the presence of the ancestral insect (TTAGG)n telomeric motif in its genome. Fluorescence in situ hybridization confirms that chromosomes bear this telomeric sequence in their chromosomal ends. Furthermore, motif amount estimation was about 0.03% of the total genome, so that the average telomere length in each chromosomal end is almost 18 kb long. We also detected the presence of (TTAGG)n telomeric repeat in mitotic and meiotic chromosomes in other three species of Triatominae: Triatoma dimidiata Latreille, 1811, Dipetalogaster maxima Uhler, 1894, and Rhodnius prolixus Ståhl, 1859. This is the first report of the (TTAGG)n telomeric repeat in the infraorder Cimicomorpha, contradicting the currently accepted hypothesis that evolutionarily recent heteropterans lack this ancestral insect telomeric sequence. PMID:27830050

  12. MotifMark: Finding regulatory motifs in DNA sequences.

    Science.gov (United States)

    Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L; Wang, May D

    2017-07-01

    The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.

  13. Chromosomal aberrations detected by comparative genomic hybridization technique (CGH in invasive ductal carcinoma of breast

    Directory of Open Access Journals (Sweden)

    Nooshiravanpour P

    2007-10-01

    Full Text Available Background: Nonlethal genetic damage is the basis for carcinogenesis. As various gene aberrations accumulate, malignant tumors are formed, regardless of whether the genetic damage is subtle or large enough to be distinguished in a karyotype. The study of chromosomal changes in tumor cells is important in the identification of oncogenes and tumor suppressor genes by molecular cloning of genes in the vicinity of chromosomal aberrations. Furthermore, some specific aberrations can be of great diagnostic and prognostic value. Comparative genomic hybridization (CGH is used to screen the entire genome for the detection and/or location chromosomal copy number changes.Methods: In this study, frozen sections of 20 primary breast tumors diagnosed as invasive ductal carcinoma from the Cancer Institute of Imam Khomeini Hospital, Tehran, Iran, were studied by CGH to detect chromosomal aberrations. We compared histopathological and immunohistochemical findings.Results: Hybridization in four of the cases was not optimal for CGH analysis and they were excluded from the study. DNA copy number changes were detected in 12 (75% of the remaining 16 cases. Twenty-one instances of chromosomal aberrations were detected in total, including: +1q, +17q, +8q, +20q, -13q, -11q, -22q, -1p, -16q, -8p. The most frequent were +1q, +17q, +8q, -13q, similar to other studies. In three cases, we detected -13q, which is associated with axillary lymph node metastasis and was reported in one previous study. The mean numbers of chromosomal aberrations per tumor in metastatic and nonmetastatic tumors was 1.5 and 1, respectively. No other association between detected chromosomal aberrations and histopathological and immunohistochemical findings were seen.Conclusion: Since intermediately to widely invasive carcinomas are more likely to have chromosomal aberrations, CGH can be a valuable prognostic tool. Furthermore, CGH can be used to detect targeting molecules within novel amplifications

  14. WordSpy: identifying transcription factor binding motifs by building a dictionary and learning a grammar.

    Science.gov (United States)

    Wang, Guandong; Yu, Taotao; Zhang, Weixiong

    2005-07-01

    Transcription factor (TF) binding sites or motifs (TFBMs) are functional cis-regulatory DNA sequences that play an essential role in gene transcriptional regulation. Although many experimental and computational methods have been developed, finding TFBMs remains a challenging problem. We propose and develop a novel dictionary based motif finding algorithm, which we call WordSpy. One significant feature of WordSpy is the combination of a word counting method and a statistical model which consists of a dictionary of motifs and a grammar specifying their usage. The algorithm is suitable for genome-wide motif finding; it is capable of discovering hundreds of motifs from a large set of promoters in a single run. We further enhance WordSpy by applying gene expression information to separate true TFBMs from spurious ones, and by incorporating negative sequences to identify discriminative motifs. In addition, we also use randomly selected promoters from the genome to evaluate the significance of the discovered motifs. The output from WordSpy consists of an ordered list of putative motifs and a set of regulatory sequences with motif binding sites highlighted. The web server of WordSpy is available at http://cic.cs.wustl.edu/wordspy.

  15. Poor Man's 1000 Genome Project: Recent Human Population Expansion Confounds the Detection of Disease Alleles in 7,098 Complete Mitochondrial Genomes.

    Science.gov (United States)

    Kim, Hie Lim; Schuster, Stephan C

    2013-01-01

    Rapid growth of the human population has caused the accumulation of rare genetic variants that may play a role in the origin of genetic diseases. However, it is challenging to identify those rare variants responsible for specific diseases without genetic data from an extraordinarily large population sample. Here we focused on the accumulated data from the human mitochondrial (mt) genome sequences because this data provided 7,098 whole genomes for analysis. In this dataset we identified 6,110 single nucleotide variants (SNVs) and their frequency and determined that the best-fit demographic model for the 7,098 genomes included severe population bottlenecks and exponential expansions of the non-African population. Using this model, we simulated the evolution of mt genomes in order to ascertain the behavior of deleterious mutations. We found that such deleterious mutations barely survived during population expansion. We derived the threshold frequency of a deleterious mutation in separate African, Asian, and European populations and used it to identify pathogenic mutations in our dataset. Although threshold frequency was very low, the proportion of variants showing a lower frequency than that threshold was 82, 83, and 91% of the total variants for the African, Asian, and European populations, respectively. Within these variants, only 18 known pathogenic mutations were detected in the 7,098 genomes. This result showed the difficulty of detecting a pathogenic mutation within an abundance of rare variants in the human population, even with a large number of genomes available for study.

  16. Poor man’s 1000 genome project: Recent human population expansion confounds the detection of disease alleles in 7,098 complete mitochondrial genomes

    Directory of Open Access Journals (Sweden)

    Hie Lim eKim

    2013-02-01

    Full Text Available Rapid growth of the human population has caused the accumulation of rare genetic variants that may play a role in the origin of genetic diseases. However, it is challenging to identify those rare variants responsible for specific diseases without genetic data from an extraordinarily large population sample. Here we focused on the accumulated data from the human mitochondrial (mt genome sequences because this data provided 7,098 whole genomes for analysis. In this dataset we identified 6,110 single nucleotide variants (SNVs and their frequency and determined that the best-fit demographic model for the 7,098 genomes included severe population bottlenecks and exponential expansions of the non-African population. Using this model, we simulated the evolution of mt genomes in order to ascertain the behavior of deleterious mutations. We found that such deleterious mutations barely survived during population expansion. We derived the threshold frequency of a deleterious mutation in separate African, Asian, and European populations and used it to identify pathogenic mutations in our dataset. Although threshold frequency was very low, the proportion of variants showing a lower frequency than that threshold was 82%, 83%, and 91% of the total variants for the African, Asian, and European populations, respectively. Within these variants, only 18 known pathogenic mutations were detected in the 7,098 genomes. This result showed the difficulty of detecting a pathogenic mutation within an abundance of rare variants in the human population, even with a large number of genomes available for study.

  17. Specific and selective target detection of supra-genome 21 Mers Salmonella via silicon nanowires biosensor

    Science.gov (United States)

    Mustafa, Mohammad Razif Bin; Dhahi, Th S.; Ehfaed, Nuri. A. K. H.; Adam, Tijjani; Hashim, U.; Azizah, N.; Mohammed, Mohammed; Noriman, N. Z.

    2017-09-01

    The nano structure based on silicon can be surface modified to be used as label-free biosensors that allow real-time measurements. The silicon nanowire surface was functionalized using 3-aminopropyltrimethoxysilane (APTES), which functions as a facilitator to immobilize biomolecules on the silicon nanowire surface. The process is simple, economical; this will pave the way for point-of-care applications. However, the surface modification and subsequent detection mechanism still not clear. Thus, study proposed step by step process of silicon nano surface modification and its possible in specific and selective target detection of Supra-genome 21 Mers Salmonella. The device captured the molecule with precisely; the approach took the advantages of strong binding chemistry created between APTES and biomolecule. The results indicated how modifications of the nanowires provide sensing capability with strong surface chemistries that can lead to specific and selective target detection.

  18. Complexities due to single-stranded RNA during antibody detection of genomic rna:dna hybrids.

    Science.gov (United States)

    Zhang, Zheng Z; Pannunzio, Nicholas R; Hsieh, Chih-Lin; Yu, Kefei; Lieber, Michael R

    2015-04-08

    Long genomic R-loops in eukaryotes were first described at the immunoglobulin heavy chain locus switch regions using bisulfite sequencing and functional studies. A mouse monoclonal antibody called S9.6 has been used for immunoprecipitation (IP) to identify R-loops, based on the assumption that it is specific for RNA:DNA over other nucleic acid duplexes. However, recent work has demonstrated that a variable domain of S9.6 binds AU-rich RNA:RNA duplexes with a KD that is only 5.6-fold weaker than for RNA:DNA duplexes. Most IP protocols do not pre-clear the genomic nucleic acid with RNase A to remove free RNA. Fold back of ssRNA can readily generate RNA:RNA duplexes that may bind the S9.6 antibody, and adventitious binding of RNA may also create short RNA:DNA regions. Here we investigate whether RNase A is needed to obtain reliable IP with S9.6. As our test locus, we chose the most well-documented site for kilobase-long mammalian genomic R-loops, the immunoglobulin heavy chain locus (IgH) class switch regions. The R-loops at this locus can be induced by using cytokines to stimulate transcription from germline transcript promoters. We tested IP using S9.6 with and without various RNase treatments. The RNase treatments included RNase H to destroy the RNA in an RNA:DNA duplex and RNase A to destroy single-stranded (ss) RNA to prevent it from binding S9.6 directly (as duplex RNA) and to prevent the ssRNA from annealing to the genome, resulting in adventitious RNA:DNA hybrids. We find that optimal detection of RNA:DNA duplexes requires removal of ssRNA using RNase A. Without RNase A treatment, known regions of R-loop formation containing RNA:DNA duplexes can not be reliably detected. With RNase A treatment, a signal can be detected over background, but only within a limited 2 or 3-fold range, even with a stable kilobase-long genomic R-loop. Any use of the S9.6 antibody must be preceded by RNase A treatment to remove free ssRNA that may compete for the S9.6 binding by

  19. Multiscale landscape genomic models to detect signatures of selection in the alpine plant Biscutella laevigata.

    Science.gov (United States)

    Leempoel, Kevin; Parisod, Christian; Geiser, Céline; Joost, Stéphane

    2018-02-01

    Plant species are known to adapt locally to their environment, particularly in mountainous areas where conditions can vary drastically over short distances. The climate of such landscapes being largely influenced by topography, using fine-scale models to evaluate environmental heterogeneity may help detecting adaptation to micro-habitats. Here, we applied a multiscale landscape genomic approach to detect evidence of local adaptation in the alpine plant Biscutella laevigata . The two gene pools identified, experiencing limited gene flow along a 1-km ridge, were different in regard to several habitat features derived from a very high resolution (VHR) digital elevation model (DEM). A correlative approach detected signatures of selection along environmental gradients such as altitude, wind exposure, and solar radiation, indicating adaptive pressures likely driven by fine-scale topography. Using a large panel of DEM-derived variables as ecologically relevant proxies, our results highlighted the critical role of spatial resolution. These high-resolution multiscale variables indeed indicate that the robustness of associations between genetic loci and environmental features depends on spatial parameters that are poorly documented. We argue that the scale issue is critical in landscape genomics and that multiscale ecological variables are key to improve our understanding of local adaptation in highly heterogeneous landscapes.

  20. Machine Learning Detects Pan-cancer Ras Pathway Activation in The Cancer Genome Atlas

    Directory of Open Access Journals (Sweden)

    Gregory P. Way

    2018-04-01

    Full Text Available Summary: Precision oncology uses genomic evidence to match patients with treatment but often fails to identify all patients who may respond. The transcriptome of these “hidden responders” may reveal responsive molecular states. We describe and evaluate a machine-learning approach to classify aberrant pathway activity in tumors, which may aid in hidden responder identification. The algorithm integrates RNA-seq, copy number, and mutations from 33 different cancer types across The Cancer Genome Atlas (TCGA PanCanAtlas project to predict aberrant molecular states in tumors. Applied to the Ras pathway, the method detects Ras activation across cancer types and identifies phenocopying variants. The model, trained on human tumors, can predict response to MEK inhibitors in wild-type Ras cell lines. We also present data that suggest that multiple hits in the Ras pathway confer increased Ras activity. The transcriptome is underused in precision oncology and, combined with machine learning, can aid in the identification of hidden responders. : Way et al. develop a machine-learning approach using PanCanAtlas data to detect Ras activation in cancer. Integrating mutation, copy number, and expression data, the authors show that their method detects Ras-activating variants in tumors and sensitivity to MEK inhibitors in cell lines. Keywords: Gene expression, machine learning, Ras, NF1, KRAS, NRAS, HRAS, pan-cancer, TCGA, drug sensitivity

  1. Modular fluorescence complementation sensors for live cell detection of epigenetic signals at endogenous genomic sites.

    Science.gov (United States)

    Lungu, Cristiana; Pinter, Sabine; Broche, Julian; Rathert, Philipp; Jeltsch, Albert

    2017-09-21

    Investigation of the fundamental role of epigenetic processes requires methods for the locus-specific detection of epigenetic modifications in living cells. Here, we address this urgent demand by developing four modular fluorescence complementation-based epigenetic biosensors for live-cell microscopy applications. These tools combine engineered DNA-binding proteins with domains recognizing defined epigenetic marks, both fused to non-fluorescent fragments of a fluorescent protein. The presence of the epigenetic mark at the target DNA sequence leads to the reconstitution of a functional fluorophore. With this approach, we could for the first time directly detect DNA methylation and histone 3 lysine 9 trimethylation at endogenous genomic sites in live cells and follow dynamic changes in these marks upon drug treatment, induction of epigenetic enzymes and during the cell cycle. We anticipate that this versatile technology will improve our understanding of how specific epigenetic signatures are set, erased and maintained during embryonic development or disease onset.Tools for imaging epigenetic modifications can shed light on the regulation of epigenetic processes. Here, the authors present a fluorescence complementation approach for detection of DNA and histone methylation at endogenous genomic sites allowing following of dynamic changes of these marks by live-cell microscopy.

  2. Predicting combinatorial binding of transcription factors to regulatory elements in the human genome by association rule mining

    Directory of Open Access Journals (Sweden)

    Iyer Vishwanath R

    2007-11-01

    Full Text Available Abstract Background Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cis-regulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment. Results Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature. Conclusion Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription

  3. Genomic Integrity Detection of In Vitro Irradiated Banana Using Microsatellite Marker.

    Directory of Open Access Journals (Sweden)

    Nina Ratna Djuita

    2010-11-01

    Full Text Available Genomic Integrity Detection of In Vitro Irradiated Banana Using Microsatellite Marker. The research aims todetect genomic integrity of in vitro irradiated banana using microsatellite marker. These studies were done on bananacv. Pisang Mas irradiated by 15 Gy of gamma ray. The DNA was isolated from each accesion following Dixie.Amplification of DNA products were done by Perkin Elmer Gene Amp PCR 2400 using ten primers, and thenelectroforesis in agarose 1%. Finally a vertical polyacrylamide gel electroforesis was run and the products werevisualized by silver staining. The result shown that among the primers tested, eight primers produced clear, discrete,and reproducible bands. Number of DNA band exhibited ranging from one to two, following the ploidy level of pisangMas which is a diploid banana cultivar (AA. One band suggest homozygote allele while two bands showedheterozygote allele. Out of eight primers, six primers produced different allele among irradiated, in vitro, and in vivocontrol plant. Meanwhile, for the other two primers the allele were monomorph for all the accessions examined.Genomic modification was observed at all irradiated plants. The modification can happened at zygosity of certain allelethat may change from heterozygote to homozygote or vice versa. While modification in allele size that underlyinggenomic instability could be caused by several genetic events such as deletion, insertion, and amplification ofnucleotides.

  4. Detection of inter-spread repeat sequence in genomic DNA sequence.

    Science.gov (United States)

    Murakami, Hiroo; Sugaya, Nobuyoshi; Sato, Makihiko; Imaizumi, Akira; Aburatani, Sachiyo; Horimoto, Katsuhisa

    2004-01-01

    Various types of periodic patterns in nucleotide sequences are known to be very abundant in a genomic DNA sequence, and to play important biological roles such as gene expression, genome structural stabilization, and recombination. We present a new method, named "STEPSTONE", to find a specific periodic pattern of repeat sequence, inter-spread repeat, in which the tandem repeats of the conserved and the not-conserved regions appear periodically. In our method, at first, the data on periods of short repeat sequences found in a target sequence are stored as a hash data, and then are selected by application of an auto-correlation test in time series analysis. Among the statistically selected sequences, the inter-spread repeats are obtained by usual alignment procedures through two steps. To test the performance of our method, we examined the inter-spread repeats in Mycobacterium tuberculosis and Zamia paucijuga genomic sequences. As a result, our method exactly detected the repeats in the two sequences, being useful for identifying systematically the inter-spread repeats in DNA sequence.

  5. The complete chloroplast genome sequence of Podocarpus lambertii: genome structure, evolutionary aspects, gene content and SSR detection.

    Directory of Open Access Journals (Sweden)

    Leila do Nascimento Vieira

    Full Text Available BACKGROUND: Podocarpus lambertii (Podocarpaceae is a native conifer from the Brazilian Atlantic Forest Biome, which is considered one of the 25 biodiversity hotspots in the world. The advancement of next-generation sequencing technologies has enabled the rapid acquisition of whole chloroplast (cp genome sequences at low cost. Several studies have proven the potential of cp genomes as tools to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, as well as further probe the structural and functional evolution of plants. In this work, we present the complete cp genome sequence of P. lambertii. METHODOLOGY/PRINCIPAL FINDINGS: The P. lambertii cp genome is 133,734 bp in length, and similar to other sequenced cupressophytes, it lacks one of the large inverted repeat regions (IR. It contains 118 unique genes and one duplicated tRNA (trnN-GUU, which occurs as an inverted repeat sequence. The rps16 gene was not found, which was previously reported for the plastid genome of another Podocarpaceae (Nageia nagi and Araucariaceae (Agathis dammara. Structurally, P. lambertii shows 4 inversions of a large DNA fragment ∼20,000 bp compared to the Podocarpus totara cp genome. These unexpected characteristics may be attributed to geographical distance and different adaptive needs. The P. lambertii cp genome presents a total of 28 tandem repeats and 156 SSRs, with homo- and dipolymers being the most common and tri-, tetra-, penta-, and hexapolymers occurring with less frequency. CONCLUSION: The complete cp genome sequence of P. lambertii revealed significant structural changes, even in species from the same genus. These results reinforce the apparently loss of rps16 gene in Podocarpaceae cp genome. In addition, several SSRs in the P. lambertii cp genome are likely intraspecific polymorphism sites, which may allow highly sensitive phylogeographic and population structure studies, as well as phylogenetic studies of species of

  6. Motif discovery in ranked lists of sequences

    DEFF Research Database (Denmark)

    Nielsen, Morten Muhlig; Tataru, Paula; Madsen, Tobias

    2016-01-01

    a growing need for motif analysis methods that can exploit this coupled data structure and be tailored for specific biological questions. Here, we present an exploratory motif analysis tool, Regmex (REGular expression Motif EXplorer), which offers several methods to evaluate the correlation of motifs....... These features make Regmex well suited for a range of biological sequence analysis problems related to motif discovery, exemplified by microRNA seed enrichment, but also including enrichment problems involving complex motifs and combinations of motifs. We demonstrate a number of usage scenarios that take...

  7. Whole Genome Sequencing of Candida glabrata for Detection of Markers of Antifungal Drug Resistance.

    Science.gov (United States)

    Biswas, Chayanika; Chen, Sharon C-A; Halliday, Catriona; Martinez, Elena; Rockett, Rebecca J; Wang, Qinning; Timms, Verlaine J; Dhakal, Rajat; Sadsad, Rosemarie; Kennedy, Karina J; Playford, Geoffrey; Marriott, Deborah J; Slavin, Monica A; Sorrell, Tania C; Sintchenko, Vitali

    2017-12-28

    Candida glabrata can rapidly acquire mutations that result in drug resistance, especially to azoles and echinocandins. Identification of genetic mutations is essential, as resistance detected in vitro can often be correlated with clinical failure. We examined the feasibility of using whole genome sequencing (WGS) for genome-wide analysis of antifungal drug resistance in C. glabrata. The aim was torecognize enablers and barriers in the implementation WGS and measure its effectiveness. This paper outlines the key quality control checkpoints and essential components of WGS methodology to investigate genetic markers associated with reduced susceptibility to antifungal agents. It also estimates the accuracy of data analysis and turn-around-time of testing. Phenotypic susceptibility of 12 clinical, and one ATCC strain of C. glabrata was determined through antifungal susceptibility testing. These included three isolate pairs, from three patients, that developed rise in drug minimum inhibitory concentrations. In two pairs, the second isolate of each pair developed resistance to echinocandins. The second isolate of the third pair developed resistance to 5-flucytosine. The remaining comprised of susceptible and azole resistant isolates. Single nucleotide polymorphisms (SNPs) in genes linked to echinocandin, azole and 5-flucytosine resistance were confirmed in resistant isolates through WGS using the next generation sequencing. Non-synonymous SNPs in antifungal resistance genes such as FKS1, FKS2, CgPDR1, CgCDR1 and FCY2 were identified. Overall, an average of 98% of the WGS reads of C. glabrata isolates mapped to the reference genome with about 75-fold read depth coverage. The turnaround time and cost were comparable to Sanger sequencing. In conclusion, WGS of C. glabrata was feasible in revealing clinically significant gene mutations involved in resistance to different antifungal drug classes without the need for multiple PCR/DNA sequencing reactions. This represents a

  8. DELISHUS: an efficient and exact algorithm for genome-wide detection of deletion polymorphism in autism

    Science.gov (United States)

    Aguiar, Derek; Halldórsson, Bjarni V.; Morrow, Eric M.; Istrail, Sorin

    2012-01-01

    Motivation: The understanding of the genetic determinants of complex disease is undergoing a paradigm shift. Genetic heterogeneity of rare mutations with deleterious effects is more commonly being viewed as a major component of disease. Autism is an excellent example where research is active in identifying matches between the phenotypic and genomic heterogeneities. A considerable portion of autism appears to be correlated with copy number variation, which is not directly probed by single nucleotide polymorphism (SNP) array or sequencing technologies. Identifying the genetic heterogeneity of small deletions remains a major unresolved computational problem partly due to the inability of algorithms to detect them. Results: In this article, we present an algorithmic framework, which we term DELISHUS, that implements three exact algorithms for inferring regions of hemizygosity containing genomic deletions of all sizes and frequencies in SNP genotype data. We implement an efficient backtracking algorithm—that processes a 1 billion entry genome-wide association study SNP matrix in a few minutes—to compute all inherited deletions in a dataset. We further extend our model to give an efficient algorithm for detecting de novo deletions. Finally, given a set of called deletions, we also give a polynomial time algorithm for computing the critical regions of recurrent deletions. DELISHUS achieves significantly lower false-positive rates and higher power than previously published algorithms partly because it considers all individuals in the sample simultaneously. DELISHUS may be applied to SNP array or sequencing data to identify the deletion spectrum for family-based association studies. Availability: DELISHUS is available at http://www.brown.edu/Research/Istrail_Lab/. Contact: Eric_Morrow@brown.edu and Sorin_Istrail@brown.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22689755

  9. Germline contamination and leakage in whole genome somatic single nucleotide variant detection.

    Science.gov (United States)

    Sendorek, Dorota H; Caloian, Cristian; Ellrott, Kyle; Bare, J Christopher; Yamaguchi, Takafumi N; Ewing, Adam D; Houlahan, Kathleen E; Norman, Thea C; Margolin, Adam A; Stuart, Joshua M; Boutros, Paul C

    2018-01-31

    The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called "germline leakage". The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software.

  10. Facilitating the indirect detection of genomic DNA in an electrochemical DNA biosensor using magnetic nanoparticles and DNA ligase

    Directory of Open Access Journals (Sweden)

    Roozbeh Hushiarian

    2015-12-01

    This technique was found to be reliably repeatable. The indirect detection of genomic DNA using this method is significantly improved and showed high efficiency in small amounts of samples with the detection limit of 5.37 × 10−14 M.

  11. Detecting signatures of positive selection associated with musical aptitude in the human genome.

    Science.gov (United States)

    Liu, Xuanyao; Kanduri, Chakravarthi; Oikkonen, Jaana; Karma, Kai; Raijas, Pirre; Ukkola-Vuoti, Liisa; Teo, Yik-Ying; Järvelä, Irma

    2016-02-16

    Abilities related to musical aptitude appear to have a long history in human evolution. To elucidate the molecular and evolutionary background of musical aptitude, we compared genome-wide genotyping data (641 K SNPs) of 148 Finnish individuals characterized for musical aptitude. We assigned signatures of positive selection in a case-control setting using three selection methods: haploPS, XP-EHH and FST. Gene ontology classification revealed that the positive selection regions contained genes affecting inner-ear development. Additionally, literature survey has shown that several of the identified genes were known to be involved in auditory perception (e.g. GPR98, USH2A), cognition and memory (e.g. GRIN2B, IL1A, IL1B, RAPGEF5), reward mechanisms (RGS9), and song perception and production of songbirds (e.g. FOXP1, RGS9, GPR98, GRIN2B). Interestingly, genes related to inner-ear development and cognition were also detected in a previous genome-wide association study of musical aptitude. However, the candidate genes detected in this study were not reported earlier in studies of musical abilities. Identification of genes related to language development (FOXP1 and VLDLR) support the popular hypothesis that music and language share a common genetic and evolutionary background. The findings are consistent with the evolutionary conservation of genes related to auditory processes in other species and provide first empirical evidence for signatures of positive selection for abilities that contribute to musical aptitude.

  12. Detecting signatures of positive selection associated with musical aptitude in the human genome

    Science.gov (United States)

    Liu, Xuanyao; Kanduri, Chakravarthi; Oikkonen, Jaana; Karma, Kai; Raijas, Pirre; Ukkola-Vuoti, Liisa; Teo, Yik-Ying; Järvelä, Irma

    2016-01-01

    Abilities related to musical aptitude appear to have a long history in human evolution. To elucidate the molecular and evolutionary background of musical aptitude, we compared genome-wide genotyping data (641 K SNPs) of 148 Finnish individuals characterized for musical aptitude. We assigned signatures of positive selection in a case-control setting using three selection methods: haploPS, XP-EHH and FST. Gene ontology classification revealed that the positive selection regions contained genes affecting inner-ear development. Additionally, literature survey has shown that several of the identified genes were known to be involved in auditory perception (e.g. GPR98, USH2A), cognition and memory (e.g. GRIN2B, IL1A, IL1B, RAPGEF5), reward mechanisms (RGS9), and song perception and production of songbirds (e.g. FOXP1, RGS9, GPR98, GRIN2B). Interestingly, genes related to inner-ear development and cognition were also detected in a previous genome-wide association study of musical aptitude. However, the candidate genes detected in this study were not reported earlier in studies of musical abilities. Identification of genes related to language development (FOXP1 and VLDLR) support the popular hypothesis that music and language share a common genetic and evolutionary background. The findings are consistent with the evolutionary conservation of genes related to auditory processes in other species and provide first empirical evidence for signatures of positive selection for abilities that contribute to musical aptitude. PMID:26879527

  13. iRegulon: from a gene list to a gene regulatory network using large motif and track collections.

    Directory of Open Access Journals (Sweden)

    Rekin's Janky

    2014-07-01

    Full Text Available Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org.

  14. Insights into the motif preference of APOBEC3 enzymes.

    Directory of Open Access Journals (Sweden)

    Diako Ebrahimi

    Full Text Available We used a multivariate data analysis approach to identify motifs associated with HIV hypermutation by different APOBEC3 enzymes. The analysis showed that APOBEC3G targets G mainly within GG, TG, TGG, GGG, TGGG and also GGGT. The G nucleotides flanked by a C at the 3' end (in +1 and +2 positions were indicated as disfavoured targets by APOBEC3G. The G nucleotides within GGGG were found to be targeted at a frequency much less than what is expected. We found that the infrequent G-to-A mutation within GGGG is not limited to the inaccessibility, to APOBEC3, of poly Gs in the central and 3'polypurine tracts (PPTs which remain double stranded during the HIV reverse transcription. GGGG motifs outside the PPTs were also disfavoured. The motifs GGAG and GAGG were also found to be disfavoured targets for APOBEC3. The motif-dependent mutation of G within the HIV genome by members of the APOBEC3 family other than APOBEC3G was limited to GA→AA changes. The results did not show evidence of other types of context dependent G-to-A changes in the HIV genome.

  15. Unravelling daily human mobility motifs.

    Science.gov (United States)

    Schneider, Christian M; Belik, Vitaly; Couronné, Thomas; Smoreda, Zbigniew; González, Marta C

    2013-07-06

    Human mobility is differentiated by time scales. While the mechanism for long time scales has been studied, the underlying mechanism on the daily scale is still unrevealed. Here, we uncover the mechanism responsible for the daily mobility patterns by analysing the temporal and spatial trajectories of thousands of persons as individual networks. Using the concept of motifs from network theory, we find only 17 unique networks are present in daily mobility and they follow simple rules. These networks, called here motifs, are sufficient to capture up to 90 per cent of the population in surveys and mobile phone datasets for different countries. Each individual exhibits a characteristic motif, which seems to be stable over several months. Consequently, daily human mobility can be reproduced by an analytically tractable framework for Markov chains by modelling periods of high-frequency trips followed by periods of lower activity as the key ingredient.

  16. MOCCS: Clarifying DNA-binding motif ambiguity using ChIP-Seq data.

    Science.gov (United States)

    Ozaki, Haruka; Iwasaki, Wataru

    2016-08-01

    As a key mechanism of gene regulation, transcription factors (TFs) bind to DNA by recognizing specific short sequence patterns that are called DNA-binding motifs. A single TF can accept ambiguity within its DNA-binding motifs, which comprise both canonical (typical) and non-canonical motifs. Clarification of such DNA-binding motif ambiguity is crucial for revealing gene regulatory networks and evaluating mutations in cis-regulatory elements. Although chromatin immunoprecipitation sequencing (ChIP-seq) now provides abundant data on the genomic sequences to which a given TF binds, existing motif discovery methods are unable to directly answer whether a given TF can bind to a specific DNA-binding motif. Here, we report a method for clarifying the DNA-binding motif ambiguity, MOCCS. Given ChIP-Seq data of any TF, MOCCS comprehensively analyzes and describes every k-mer to which that TF binds. Analysis of simulated datasets revealed that MOCCS is applicable to various ChIP-Seq datasets, requiring only a few minutes per dataset. Application to the ENCODE ChIP-Seq datasets proved that MOCCS directly evaluates whether a given TF binds to each DNA-binding motif, even if known position weight matrix models do not provide sufficient information on DNA-binding motif ambiguity. Furthermore, users are not required to provide numerous parameters or background genomic sequence models that are typically unavailable. MOCCS is implemented in Perl and R and is freely available via https://github.com/yuifu/moccs. By complementing existing motif-discovery software, MOCCS will contribute to the basic understanding of how the genome controls diverse cellular processes via DNA-protein interactions. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. Array-MAPH: a methodology for the detection of locus copy-number changes in complex genomes.

    Science.gov (United States)

    Kousoulidou, Ludmila; Männik, Katrin; Sismani, Carolina; Zilina, Olga; Parkel, Sven; Puusepp, Helen; Tõnisson, Neeme; Palta, Priit; Remm, Maido; Kurg, Ants; Patsalis, Philippos C

    2008-01-01

    High-throughput genome-wide screening methods to detect subtle genomic imbalances are extremely important for diagnostic genetics and genomics. Here, we provide a detailed protocol for a microarray-based technique, applying the principle of multiplex amplifiable probe hybridization (MAPH). Methodology and software have been developed for designing unique PCR-amplifiable sequences (400-600 bp) covering any genomic region of interest. These sequences are amplified, cloned and spotted onto arrays (targets). A mixture of the same sequences (probes) is hybridized to genomic DNA immobilized on a membrane. Bound probes are recovered and quantitatively amplified by PCR, labeled and hybridized to the array. The procedure can be completed in 4-5 working days, excluding microarray preparation. Unlike array-comparative genomic hybridization (array-CGH), test DNA of specifically reduced complexity is hybridized to an array of identical small amplifiable target sequences, resulting in increased hybridization specificity and higher potential for increasing resolution. Array-MAPH can be used for detection of small-scale copy-number changes in complex genomes, leading to genotype-phenotype correlations and the discovery of new genes.

  18. Detection of structural mosaicism from targeted and whole-genome sequencing data.

    Science.gov (United States)

    King, Daniel A; Sifrim, Alejandro; Fitzgerald, Tomas W; Rahbari, Raheleh; Hobson, Emma; Homfray, Tessa; Mansour, Sahar; Mehta, Sarju G; Shehla, Mohammed; Tomkins, Susan E; Vasudevan, Pradeep C; Hurles, Matthew E

    2017-10-01

    Structural mosaic abnormalities are large post-zygotic mutations present in a subset of cells and have been implicated in developmental disorders and cancer. Such mutations have been conventionally assessed in clinical diagnostics using cytogenetic or microarray testing. Modern disease studies rely heavily on exome sequencing, yet an adequate method for the detection of structural mosaicism using targeted sequencing data is lacking. Here, we present a method, called MrMosaic, to detect structural mosaic abnormalities using deviations in allele fraction and read coverage from next-generation sequencing data. Whole-exome sequencing (WES) and whole-genome sequencing (WGS) simulations were used to calculate detection performance across a range of mosaic event sizes, types, clonalities, and sequencing depths. The tool was applied to 4911 patients with undiagnosed developmental disorders, and 11 events among nine patients were detected. For eight of these 11 events, mosaicism was observed in saliva but not blood, suggesting that assaying blood alone would miss a large fraction, possibly >50%, of mosaic diagnostic chromosomal rearrangements. © 2017 King et al.; Published by Cold Spring Harbor Laboratory Press.

  19. Efficient motif finding algorithms for large-alphabet inputs

    Directory of Open Access Journals (Sweden)

    Pavlovic Vladimir

    2010-10-01

    Full Text Available Abstract Background We consider the problem of identifying motifs, recurring or conserved patterns, in the biological sequence data sets. To solve this task, we present a new deterministic algorithm for finding patterns that are embedded as exact or inexact instances in all or most of the input strings. Results The proposed algorithm (1 improves search efficiency compared to existing algorithms, and (2 scales well with the size of alphabet. On a synthetic planted DNA motif finding problem our algorithm is over 10× more efficient than MITRA, PMSPrune, and RISOTTO for long motifs. Improvements are orders of magnitude higher in the same setting with large alphabets. On benchmark TF-binding site problems (FNP, CRP, LexA we observed reduction in running time of over 12×, with high detection accuracy. The algorithm was also successful in rapidly identifying protein motifs in Lipocalin, Zinc metallopeptidase, and supersecondary structure motifs for Cadherin and Immunoglobin families. Conclusions Our algorithm reduces computational complexity of the current motif finding algorithms and demonstrate strong running time improvements over existing exact algorithms, especially in important and difficult cases of large-alphabet sequences.

  20. Finding a Leucine in a Haystack: Searching the Proteome for ambigous Leucine-Aspartic Acid motifs

    KAUST Repository

    Arold, Stefan T.

    2016-01-25

    Leucine-aspartic acid (LD) motifs are short helical protein-protein interaction motifs involved in cell motility, survival and communication. LD motif interactions are also implicated in cancer metastasis and are targeted by several viruses. LD motifs are notoriously difficult to detect because sequence pattern searches lead to an excessively high number of false positives. Hence, despite 20 years of research, only six LD motif–containing proteins are known in humans, three of which are close homologues of the paxillin family. To enable the proteome-wide discovery of LD motifs, we developed LD Motif Finder (LDMF), a web tool based on machine learning that combines sequence information with structural predictions to detect LD motifs with high accuracy. LDMF predicted 13 new LD motifs in humans. Using biophysical assays, we experimentally confirmed in vitro interactions for four novel LD motif proteins. Thus, LDMF allows proteome-wide discovery of LD motifs, despite a highly ambiguous sequence pattern. Functional implications will be discussed.

  1. Using Informatics-, Bioinformatics- and Genomics-Based Approaches for the Molecular Surveillance and Detection of Biothreat Agents

    Science.gov (United States)

    Seto, Donald

    The convergence and wealth of informatics, bioinformatics and genomics methods and associated resources allow a comprehensive and rapid approach for the surveillance and detection of bacterial and viral organisms. Coupled with the continuing race for the fastest, most cost-efficient and highest-quality DNA sequencing technology, that is, "next generation sequencing", the detection of biological threat agents by `cheaper and faster' means is possible. With the application of improved bioinformatic tools for the understanding of these genomes and for parsing unique pathogen genome signatures, along with `state-of-the-art' informatics which include faster computational methods, equipment and databases, it is feasible to apply new algorithms to biothreat agent detection. Two such methods are high-throughput DNA sequencing-based and resequencing microarray-based identification. These are illustrated and validated by two examples involving human adenoviruses, both from real-world test beds.

  2. The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection

    Science.gov (United States)

    Jiang, Yue; Turinsky, Andrei L.; Brudno, Michael

    2015-01-01

    With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them. PMID:26130710

  3. Patterns of cross-contamination in a multispecies population genomic project: detection, quantification, impact, and solutions.

    Science.gov (United States)

    Ballenghien, Marion; Faivre, Nicolas; Galtier, Nicolas

    2017-03-29

    Contamination is a well-known but often neglected problem in molecular biology. Here, we investigated the prevalence of cross-contamination among 446 samples from 116 distinct species of animals, which were processed in the same laboratory and subjected to subcontracted transcriptome sequencing. Using cytochrome oxidase 1 as a barcode, we identified a minimum of 782 events of between-species contamination, with approximately 80% of our samples being affected. An analysis of laboratory metadata revealed a strong effect of the sequencing center: nearly all the detected events of between-species contamination involved species that were sent the same day to the same company. We introduce new methods to address the amount of within-species, between-individual contamination, and to correct for this problem when calling genotypes from base read counts. We report evidence for pervasive within-species contamination in this data set, and show that classical population genomic statistics, such as synonymous diversity, the ratio of non-synonymous to synonymous diversity, inbreeding coefficient F IT , and Tajima's D, are sensitive to this problem to various extents. Control analyses suggest that our published results are probably robust to the problem of contamination. Recommendations on how to prevent or avoid contamination in large-scale population genomics/molecular ecology are provided based on this analysis.

  4. Machine Learning Detects Pan-cancer Ras Pathway Activation in The Cancer Genome Atlas.

    Science.gov (United States)

    Way, Gregory P; Sanchez-Vega, Francisco; La, Konnor; Armenia, Joshua; Chatila, Walid K; Luna, Augustin; Sander, Chris; Cherniack, Andrew D; Mina, Marco; Ciriello, Giovanni; Schultz, Nikolaus; Sanchez, Yolanda; Greene, Casey S

    2018-04-03

    Precision oncology uses genomic evidence to match patients with treatment but often fails to identify all patients who may respond. The transcriptome of these "hidden responders" may reveal responsive molecular states. We describe and evaluate a machine-learning approach to classify aberrant pathway activity in tumors, which may aid in hidden responder identification. The algorithm integrates RNA-seq, copy number, and mutations from 33 different cancer types across The Cancer Genome Atlas (TCGA) PanCanAtlas project to predict aberrant molecular states in tumors. Applied to the Ras pathway, the method detects Ras activation across cancer types and identifies phenocopying variants. The model, trained on human tumors, can predict response to MEK inhibitors in wild-type Ras cell lines. We also present data that suggest that multiple hits in the Ras pathway confer increased Ras activity. The transcriptome is underused in precision oncology and, combined with machine learning, can aid in the identification of hidden responders. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.

  5. Applications of flow cytometry in plant pathology for genome size determination, detection and physiological status.

    Science.gov (United States)

    D'Hondt, Liesbet; Höfte, Monica; Van Bockstaele, Erik; Leus, Leen

    2011-10-01

    Flow cytometers are probably the most multipurpose laboratory devices available. They can analyse a vast and very diverse range of cell parameters. This technique has left its mark on cancer, human immunodeficiency virus and immunology research, and is indispensable in routine clinical diagnostics. Flow cytometry (FCM) is also a well-known tool for the detection and physiological status assessment of microorganisms in drinking water, marine environments, food and fermentation processes. However, flow cytometers are seldom used in plant pathology, despite FCM's major advantages as both a detection method and a research tool. Potential uses of FCM include the characterization of genome sizes of fungal and oomycete populations, multiplexed pathogen detection and the monitoring of the viability, culturability and gene expression of plant pathogens, and many others. This review provides an overview of the history, advantages and disadvantages of FCM, and focuses on the current applications and future possibilities of FCM in plant pathology. © 2011 THE AUTHORS. MOLECULAR PLANT PATHOLOGY © 2011 BSPP AND BLACKWELL PUBLISHING LTD.

  6. diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data.

    Science.gov (United States)

    Lun, Aaron T L; Smyth, Gordon K

    2015-08-19

    Chromatin conformation capture with high-throughput sequencing (Hi-C) is a technique that measures the in vivo intensity of interactions between all pairs of loci in the genome. Most conventional analyses of Hi-C data focus on the detection of statistically significant interactions. However, an alternative strategy involves identifying significant changes in the interaction intensity (i.e., differential interactions) between two or more biological conditions. This is more statistically rigorous and may provide more biologically relevant results. Here, we present the diffHic software package for the detection of differential interactions from Hi-C data. diffHic provides methods for read pair alignment and processing, counting into bin pairs, filtering out low-abundance events and normalization of trended or CNV-driven biases. It uses the statistical framework of the edgeR package to model biological variability and to test for significant differences between conditions. Several options for the visualization of results are also included. The use of diffHic is demonstrated with real Hi-C data sets. Performance against existing methods is also evaluated with simulated data. On real data, diffHic is able to successfully detect interactions with significant differences in intensity between biological conditions. It also compares favourably to existing software tools on simulated data sets. These results suggest that diffHic is a viable approach for differential analyses of Hi-C data.

  7. HyDe: a Python Package for Genome-Scale Hybridization Detection.

    Science.gov (United States)

    Blischak, Paul D; Chifman, Julia; Wolfe, Andrea D; Kubatko, Laura S

    2018-03-19

    The analysis of hybridization and gene flow among closely related taxa is a common goal for researchers studying speciation and phylogeography. Many methods for hybridization detection use simple site pattern frequencies from observed genomic data and compare them to null models that predict an absence of gene flow. The theory underlying the detection of hybridization using these site pattern probabilities exploits the relationship between the coalescent process for gene trees within population trees and the process of mutation along the branches of the gene trees. For certain models, site patterns are predicted to occur in equal frequency (i.e., their difference is 0), producing a set of functions called phylogenetic invariants. In this paper we introduce HyDe, a software package for detecting hybridization using phylogenetic invariants arising under the coalescent model with hybridization. HyDe is written in Python, and can be used interactively or through the command line using pre-packaged scripts. We demonstrate the use of HyDe on simulated data, as well as on two empirical data sets from the literature. We focus in particular on identifying individual hybrids within population samples and on distinguishing between hybrid speciation and gene flow. HyDe is freely available as an open source Python package under the GNU GPL v3 on both GitHub (https://github.com/pblischak/HyDe) and the Python Package Index (PyPI: https://pypi.python.org/pypi/phyde).

  8. Detecting Microsatellites in Genome Data: Variance in Definitions and Bioinformatic Approaches Cause Systematic Bias

    Directory of Open Access Journals (Sweden)

    Angelika Merkel

    2008-01-01

    Full Text Available Microsatellites are currently one of the most commonly used genetic markers. The application of bioinformatic tools has become common practice in the study of these short tandem repeats (STR. However, in silico studies can suffer from study bias. Using a meta-analysis on microsatellite distribution in yeast we show that estimates of numbers of repeats reported by different studies can differ in the order of several magnitudes, even within a single genome. These differences arise because varying definitions of microsatellites, spanning repeat size, array length and array composition, are used in different search paradigms, with minimum array length being the main influencing factor. Structural differences in the implemented search algorithm additionally contribute to variation in the number of repeats detected. We suggest that for future studies a consistent approach to STR searches is adopted in order to improve the power of intra- and interspecific comparisons

  9. Detection of genomic signatures for pig hairlessness using high-density SNP data

    Directory of Open Access Journals (Sweden)

    Ying SU,Yi LONG,Xinjun LIAO,Huashui AI,Zhiyan ZHANG,Bin YANG,Shijun XIAO,Jianhong TANG,Wenshui XIN,Lusheng HUANG,Jun REN,Nengshui DING

    2014-12-01

    Full Text Available Hair provides thermal regulation for mammals and protects the skin from wounds, bites and ultraviolet (UV radiation, and is important in adaptation to volatile environments. Pigs in nature are divided into hairy and hairless, which provide a good model for deciphering the molecular mechanisms of hairlessness. We conducted a genomic scan for genetically differentiated regions between hairy and hairless pigs using 60K SNP data, with the aim to better understand the genetic basis for the hairless phenotype in pigs. A total of 38405 SNPs in 498 animals from 36 diverse breeds were used to detect genomic signatures for pig hairlessness by estimating between-population (FST values. Seven diversifying signatures between Yucatan hairless pig and hairy pigs were identified on pig chromosomes (SSC 1, 3, 7, 8, 10, 11 and 16, and the biological functions of two notable genes, RGS17 and RB1, were revealed. When Mexican hairless pigs were contrasted with hairypigs, strong signatures were detected on SSC1 and SSC10, which harbor two functionally plausible genes, REV3L and BAMBI. KEGG pathway analysis showed a subset of overrepresented genes involved in the T cell receptor signaling pathway, MAPK signaling pathway and the tight junction pathways. All of these pathways may be important in local adaptability of hairless pigs. The potential mechanisms underlying the hairless phenotype in pigs are reported for the first time. RB1 and BAMBI are interesting candidate genes for the hairless phenotype in Yucatan hairless and Mexico hairless pigs, respectively. RGS17, REV3L, ICOS and RASGRP1 as well as other genes involved in the MAPK and T cell receptor signaling pathways may be important in environmental adaption by improved tolerance to UV damage in hairless pigs. These findings improve our understanding of the genetic basis for inherited hairlessness in pigs.

  10. The most conserved genome segments for life detection on Earth and other planets.

    Science.gov (United States)

    Isenbarger, Thomas A; Carr, Christopher E; Johnson, Sarah Stewart; Finney, Michael; Church, George M; Gilbert, Walter; Zuber, Maria T; Ruvkun, Gary

    2008-12-01

    On Earth, very simple but powerful methods to detect and classify broad taxa of life by the polymerase chain reaction (PCR) are now standard practice. Using DNA primers corresponding to the 16S ribosomal RNA gene, one can survey a sample from any environment for its microbial inhabitants. Due to massive meteoritic exchange between Earth and Mars (as well as other planets), a reasonable case can be made for life on Mars or other planets to be related to life on Earth. In this case, the supremely sensitive technologies used to study life on Earth, including in extreme environments, can be applied to the search for life on other planets. Though the 16S gene has become the standard for life detection on Earth, no genome comparisons have established that the ribosomal genes are, in fact, the most conserved DNA segments across the kingdoms of life. We present here a computational comparison of full genomes from 13 diverse organisms from the Archaea, Bacteria, and Eucarya to identify genetic sequences conserved across the widest divisions of life. Our results identify the 16S and 23S ribosomal RNA genes as well as other universally conserved nucleotide sequences in genes encoding particular classes of transfer RNAs and within the nucleotide binding domains of ABC transporters as the most conserved DNA sequence segments across phylogeny. This set of sequences defines a core set of DNA regions that have changed the least over billions of years of evolution and provides a means to identify and classify divergent life, including ancestrally related life on other planets.

  11. Detection of Clinically Relevant Genetic Variants in Autism Spectrum Disorder by Whole-Genome Sequencing

    Science.gov (United States)

    Jiang, Yong-hui; Yuen, Ryan K.C.; Jin, Xin; Wang, Mingbang; Chen, Nong; Wu, Xueli; Ju, Jia; Mei, Junpu; Shi, Yujian; He, Mingze; Wang, Guangbiao; Liang, Jieqin; Wang, Zhe; Cao, Dandan; Carter, Melissa T.; Chrysler, Christina; Drmic, Irene E.; Howe, Jennifer L.; Lau, Lynette; Marshall, Christian R.; Merico, Daniele; Nalpathamkalam, Thomas; Thiruvahindrapuram, Bhooma; Thompson, Ann; Uddin, Mohammed; Walker, Susan; Luo, Jun; Anagnostou, Evdokia; Zwaigenbaum, Lonnie; Ring, Robert H.; Wang, Jian; Lajonchere, Clara; Wang, Jun; Shih, Andy; Szatmari, Peter; Yang, Huanming; Dawson, Geraldine; Li, Yingrui; Scherer, Stephen W.

    2013-01-01

    Autism Spectrum Disorder (ASD) demonstrates high heritability and familial clustering, yet the genetic causes remain only partially understood as a result of extensive clinical and genomic heterogeneity. Whole-genome sequencing (WGS) shows promise as a tool for identifying ASD risk genes as well as unreported mutations in known loci, but an assessment of its full utility in an ASD group has not been performed. We used WGS to examine 32 families with ASD to detect de novo or rare inherited genetic variants predicted to be deleterious (loss-of-function and damaging missense mutations). Among ASD probands, we identified deleterious de novo mutations in six of 32 (19%) families and X-linked or autosomal inherited alterations in ten of 32 (31%) families (some had combinations of mutations). The proportion of families identified with such putative mutations was larger than has been previously reported; this yield was in part due to the comprehensive and uniform coverage afforded by WGS. Deleterious variants were found in four unrecognized, nine known, and eight candidate ASD risk genes. Examples include CAPRIN1 and AFF2 (both linked to FMR1, which is involved in fragile X syndrome), VIP (involved in social-cognitive deficits), and other genes such as SCN2A and KCNQ2 (linked to epilepsy), NRXN1, and CHD7, which causes ASD-associated CHARGE syndrome. Taken together, these results suggest that WGS and thorough bioinformatic analyses for de novo and rare inherited mutations will improve the detection of genetic variants likely to be associated with ASD or its accompanying clinical symptoms. PMID:23849776

  12. Challenging a bioinformatic tool’s ability to detect microbial contaminants using in silico whole genome sequencing data

    Directory of Open Access Journals (Sweden)

    Nathan D. Olson

    2017-09-01

    Full Text Available High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus, Escherichia, and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods.

  13. Detecting single DNA copy number variations in complex genomes using one nanogram of starting DNA and BAC-array CGH.

    Science.gov (United States)

    Guillaud-Bataille, Marine; Valent, Alexander; Soularue, Pascal; Perot, Christine; Inda, Maria Mar; Receveur, Aline; Smaïli, Sadek; Roest Crollius, Hugues; Bénard, Jean; Bernheim, Alain; Gidrol, Xavier; Danglot, Gisèle

    2004-07-29

    Comparative genomic hybridization to bacterial artificial chromosome (BAC)-arrays (array-CGH) is a highly efficient technique, allowing the simultaneous measurement of genomic DNA copy number at hundreds or thousands of loci, and the reliable detection of local one-copy-level variations. We report a genome-wide amplification method allowing the same measurement sensitivity, using 1 ng of starting genomic DNA, instead of the classical 1 microg usually necessary. Using a discrete series of DNA fragments, we defined the parameters adapted to the most faithful ligation-mediated PCR amplification and the limits of the technique. The optimized protocol allows a 3000-fold DNA amplification, retaining the quantitative characteristics of the initial genome. Validation of the amplification procedure, using DNA from 10 tumour cell lines hybridized to BAC-arrays of 1500 spots, showed almost perfectly superimposed ratios for the non-amplified and amplified DNAs. Correlation coefficients of 0.96 and 0.99 were observed for regions of low-copy-level variations and all regions, respectively (including in vivo amplified oncogenes). Finally, labelling DNA using two nucleotides bearing the same fluorophore led to a significant increase in reproducibility and to the correct detection of one-copy gain or loss in >90% of the analysed data, even for pseudotriploid tumour genomes.

  14. Orion: Detecting regions of the human non-coding genome that are intolerant to variation using population genetics.

    Science.gov (United States)

    Gussow, Ayal B; Copeland, Brett R; Dhindsa, Ryan S; Wang, Quanli; Petrovski, Slavé; Majoros, William H; Allen, Andrew S; Goldstein, David B

    2017-01-01

    There is broad agreement that genetic mutations occurring outside of the protein-coding regions play a key role in human disease. Despite this consensus, we are not yet capable of discerning which portions of non-coding sequence are important in the context of human disease. Here, we present Orion, an approach that detects regions of the non-coding genome that are depleted of variation, suggesting that the regions are intolerant of mutations and subject to purifying selection in the human lineage. We show that Orion is highly correlated with known intolerant regions as well as regions that harbor putatively pathogenic variation. This approach provides a mechanism to identify pathogenic variation in the human non-coding genome and will have immediate utility in the diagnostic interpretation of patient genomes and in large case control studies using whole-genome sequences.

  15. A ChIP-Seq benchmark shows that sequence conservation mainly improves detection of strong transcription factor binding sites.

    Directory of Open Access Journals (Sweden)

    Tony Håndstad

    Full Text Available BACKGROUND: Transcription factors are important controllers of gene expression and mapping transcription factor binding sites (TFBS is key to inferring transcription factor regulatory networks. Several methods for predicting TFBS exist, but there are no standard genome-wide datasets on which to assess the performance of these prediction methods. Also, it is believed that information about sequence conservation across different genomes can generally improve accuracy of motif-based predictors, but it is not clear under what circumstances use of conservation is most beneficial. RESULTS: Here we use published ChIP-seq data and an improved peak detection method to create comprehensive benchmark datasets for prediction methods which use known descriptors or binding motifs to detect TFBS in genomic sequences. We use this benchmark to assess the performance of five different prediction methods and find that the methods that use information about sequence conservation generally perform better than simpler motif-scanning methods. The difference is greater on high-affinity peaks and when using short and information-poor motifs. However, if the motifs are specific and information-rich, we find that simple motif-scanning methods can perform better than conservation-based methods. CONCLUSIONS: Our benchmark provides a comprehensive test that can be used to rank the relative performance of transcription factor binding site prediction methods. Moreover, our results show that, contrary to previous reports, sequence conservation is better suited for predicting strong than weak transcription factor binding sites.

  16. Can AFLP genome scans detect small islands of differentiation? The case of shell sculpture variation in the periwinkle Echinolittorina hawaiiensis.

    Science.gov (United States)

    Tice, K A; Carlon, D B

    2011-08-01

    Genome scans have identified candidate regions of the genome undergoing selection in a wide variety of organisms, yet have rarely been applied to broadly dispersing marine organisms experiencing divergent selection pressures, where high recombination rates can reduce the extent of linkage disequilibrium (LD) and the ability to detect genomic regions under selection. The broadly dispersing periwinkle Echinolittorina hawaiiensis exhibits a heritable shell sculpture polymorphism that is correlated with environmental variation. To elucidate the genetic basis of phenotypic variation, a genome scan using over 1000 AFLP loci was conducted on smooth and sculptured snails from divergent habitats at four replicate sites. Approximately 5% of loci were identified as outliers with Dfdist, whereas no outliers were identified by BayeScan. Closer examination of the Dfdist outliers supported the conclusion that these loci were false positives. These results highlight the importance of controlling for Type I error using multiple outlier detection approaches, multitest corrections and replicate population comparisons. Assuming shell phenotypes have a genetic basis, our failure to detect outliers suggests that the life history of the target species needs to be considered when designing a genome scan. © 2011 The Authors. Journal of Evolutionary Biology © 2011 European Society For Evolutionary Biology.

  17. Identification of coupling DNA motif pairs on long-range chromatin interactions in human K562 cells.

    Science.gov (United States)

    Wong, Ka-Chun; Li, Yue; Peng, Chengbin

    2016-02-01

    The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. The identified motif pair data is compressed and available in the supplementary materials associated with this manuscript. kc.w@cityu.edu.hk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  18. Multilayer motif analysis of brain networks

    Science.gov (United States)

    Battiston, Federico; Nicosia, Vincenzo; Chavez, Mario; Latora, Vito

    2017-04-01

    In the last decade, network science has shed new light both on the structural (anatomical) and on the functional (correlations in the activity) connectivity among the different areas of the human brain. The analysis of brain networks has made possible to detect the central areas of a neural system and to identify its building blocks by looking at overabundant small subgraphs, known as motifs. However, network analysis of the brain has so far mainly focused on anatomical and functional networks as separate entities. The recently developed mathematical framework of multi-layer networks allows us to perform an analysis of the human brain where the structural and functional layers are considered together. In this work, we describe how to classify the subgraphs of a multiplex network, and we extend the motif analysis to networks with an arbitrary number of layers. We then extract multi-layer motifs in brain networks of healthy subjects by considering networks with two layers, anatomical and functional, respectively, obtained from diffusion and functional magnetic resonance imaging. Results indicate that subgraphs in which the presence of a physical connection between brain areas (links at the structural layer) coexists with a non-trivial positive correlation in their activities are statistically overabundant. Finally, we investigate the existence of a reinforcement mechanism between the two layers by looking at how the probability to find a link in one layer depends on the intensity of the connection in the other one. Showing that functional connectivity is non-trivially constrained by the underlying anatomical network, our work contributes to a better understanding of the interplay between the structure and function in the human brain.

  19. Utility of comprehensive genomic sequencing for detecting HER2-positive colorectal cancer.

    Science.gov (United States)

    Shimada, Yoshifumi; Yagi, Ryoma; Kameyama, Hitoshi; Nagahashi, Masayuki; Ichikawa, Hiroshi; Tajima, Yosuke; Okamura, Takuma; Nakano, Mae; Nakano, Masato; Sato, Yo; Matsuzawa, Takeaki; Sakata, Jun; Kobayashi, Takashi; Nogami, Hitoshi; Maruyama, Satoshi; Takii, Yasumasa; Kawasaki, Takashi; Homma, Kei-Ichi; Izutsu, Hiroshi; Kodama, Keisuke; Ring, Jennifer E; Protopopov, Alexei; Lyle, Stephen; Okuda, Shujiro; Akazawa, Kohei; Wakai, Toshifumi

    2017-08-01

    HER2-targeted therapy is considered effective for KRAS codon 12/13 wild-type, HER2-positive metastatic colorectal cancer (CRC). In general, HER2 status is determined by the use of immunohistochemistry (IHC) and fluorescence in situ hybridization (FISH). Comprehensive genomic sequencing (CGS) enables the detection of gene mutations and copy number alterations including KRAS mutation and HER2 amplification; however, little is known about the utility of CGS for detecting HER2-positive CRC. To assess its utility, we retrospectively investigated 201 patients with stage I-IV CRC. The HER2 status of the primary site was assessed using IHC and FISH, and HER2 amplification of the primary site was also assessed using CGS, and the findings of these approaches were compared in each patient. CGS successfully detected alterations in 415 genes including KRAS codon 12/13 mutation and HER2 amplification. Fifty-nine (29%) patients had a KRAS codon 12/13 mutation. Ten (5%) patients were diagnosed as HER2 positive because of HER2 IHC 3+, and the same 10 (5%) patients had HER2 amplification evaluated using CGS. The results of HER2 status and HER2 amplification were completely identical in all 201 patients (P < .001). Nine of the 10 HER2-positive patients were KRAS 12/13 wild-type and were considered possible candidates for HER2-targeted therapy. CGS has the same utility as IHC and FISH for detecting HER2-positive patients who are candidates for HER2-targeted therapy, and facilitates precision medicine and tailor-made treatment. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  20. Detection of bacterial contaminants and hybrid sequences in the genome of the kelp Saccharina japonica using Taxoblast

    Directory of Open Access Journals (Sweden)

    Simon M. Dittami

    2017-11-01

    Full Text Available Modern genome sequencing strategies are highly sensitive to contamination making the detection of foreign DNA sequences an important part of analysis pipelines. Here we use Taxoblast, a simple pipeline with a graphical user interface, for the post-assembly detection of contaminating sequences in the published genome of the kelp Saccharina japonica. Analyses were based on multiple blastn searches with short sequence fragments. They revealed a number of probable bacterial contaminations as well as hybrid scaffolds that contain both bacterial and algal sequences. This or similar types of analysis, in combination with manual curation, may thus constitute a useful complement to standard bioinformatics analyses prior to submission of genomic data to public repositories. Our analysis pipeline is open-source and freely available at http://sdittami.altervista.org/taxoblast and via SourceForge (https://sourceforge.net/projects/taxoblast.

  1. Improved statistical methods enable greater sensitivity in rhythm detection for genome-wide data.

    Directory of Open Access Journals (Sweden)

    Alan L Hutchison

    2015-03-01

    Full Text Available Robust methods for identifying patterns of expression in genome-wide data are important for generating hypotheses regarding gene function. To this end, several analytic methods have been developed for detecting periodic patterns. We improve one such method, JTK_CYCLE, by explicitly calculating the null distribution such that it accounts for multiple hypothesis testing and by including non-sinusoidal reference waveforms. We term this method empirical JTK_CYCLE with asymmetry search, and we compare its performance to JTK_CYCLE with Bonferroni and Benjamini-Hochberg multiple hypothesis testing correction, as well as to five other methods: cyclohedron test, address reduction, stable persistence, ANOVA, and F24. We find that ANOVA, F24, and JTK_CYCLE consistently outperform the other three methods when data are limited and noisy; empirical JTK_CYCLE with asymmetry search gives the greatest sensitivity while controlling for the false discovery rate. Our analysis also provides insight into experimental design and we find that, for a fixed number of samples, better sensitivity and specificity are achieved with higher numbers of replicates than with higher sampling density. Application of the methods to detecting circadian rhythms in a metadataset of microarrays that quantify time-dependent gene expression in whole heads of Drosophila melanogaster reveals annotations that are enriched among genes with highly asymmetric waveforms. These include a wide range of oxidation reduction and metabolic genes, as well as genes with transcripts that have multiple splice forms.

  2. Implementation of Nationwide Real-time Whole-genome Sequencing to Enhance Listeriosis Outbreak Detection and Investigation

    OpenAIRE

    Jackson, Brendan R.; Tarr, Cheryl; Strain, Errol; Jackson, Kelly A.; Conrad, Amanda; Carleton, Heather; Katz, Lee S.; Stroika, Steven; Gould, L. Hannah; Mody, Rajal K.; Silk, Benjamin J.; Beal, Jennifer; Chen, Yi; Timme, Ruth; Doyle, Matthew

    2016-01-01

    Implementation of whole-genome sequencing (WGS)–based surveillance for Listeria monocytogenes in 2013 greatly improved detection and investigation of listeriosis outbreaks in the United States. Lessons from this intervention can guide WGS-based surveillance for other foodborne pathogens.

  3. Detecting Loci under recent positive selection in dairy and beef cattle by combining different genome-wide scan methods

    Science.gov (United States)

    As the methodologies available for the detection of positive selection from genomic data vary in terms of assumptions and execution, weak correlations are expected among them. However, if there is any given signal that is consistently supported across different tests, it might be a strong evidence o...

  4. Unravelling daily human mobility motifs

    OpenAIRE

    Schneider, Christian M.; Belik, Vitaly; Couronné, Thomas; Smoreda, Zbigniew; González, Marta C.

    2013-01-01

    Human mobility is differentiated by time scales. While the mechanism for long time scales has been studied, the underlying mechanism on the daily scale is still unrevealed. Here, we uncover the mechanism responsible for the daily mobility patterns by analysing the temporal and spatial trajectories of thousands of persons as individual networks. Using the concept of motifs from network theory, we find only 17 unique networks are present in daily mobility and they follow simple rules. These net...

  5. DETECTING SELECTION IN NATURAL POPULATIONS: MAKING SENSE OF GENOME SCANS AND TOWARDS ALTERNATIVE SOLUTIONS

    Science.gov (United States)

    Haasl, Ryan J.; Payseur, Bret A.

    2016-01-01

    Genomewide scans for natural selection (GWSS) have become increasingly common over the last 15 years due to increased availability of genome-scale genetic data. Here, we report a representative survey of GWSS from 1999 to present and find that (i) between 1999 and 2009, 35 of 49 (71%) GWSS focused on human, while from 2010 to present, only 38 of 83 (46%) of GWSS focused on human, indicating increased focus on nonmodel organisms; (ii) the large majority of GWSS incorporate interpopulation or interspecific comparisons using, for example FST, cross-population extended haplotype homozygosity or the ratio of nonsynonymous to synonymous substitutions; (iii) most GWSS focus on detection of directional selection rather than other modes such as balancing selection; and (iv) in human GWSS, there is a clear shift after 2004 from microsatellite markers to dense SNP data. A survey of GWSS meant to identify loci positively selected in response to severe hypoxic conditions support an approach to GWSS in which a list of a priori candidate genes based on potential selective pressures are used to filter the list of significant hits a posteriori. We also discuss four frequently ignored determinants of genomic heterogeneity that complicate GWSS: mutation, recombination, selection and the genetic architecture of adaptive traits. We recommend that GWSS methodology should better incorporate aspects of genomewide heterogeneity using empirical estimates of relevant parameters and/or realistic, whole-chromosome simulations to improve interpretation of GWSS results. Finally, we argue that knowledge of potential selective agents improves interpretation of GWSS results and that new methods focused on correlations between environmental variables and genetic variation can help automate this approach. PMID:26224644

  6. Tumour procurement, DNA extraction, coverage analysis and optimisation of mutation-detection algorithms for human melanoma genomes.

    Science.gov (United States)

    Wilmott, James S; Field, Matthew A; Johansson, Peter A; Kakavand, Hojabr; Shang, Ping; De Paoli-Iseppi, Ricardo; Vilain, Ricardo E; Pupo, Gulietta M; Tembe, Varsha; Jakrot, Valerie; Shang, Catherine A; Cebon, Jonathan; Shackleton, Mark; Fitzgerald, Anna; Thompson, John F; Hayward, Nicholas K; Mann, Graham J; Scolyer, Richard A

    2015-12-01

    Whole genome sequencing (WGS) of cancer patients' tumours offers the most comprehensive method of identifying both novel and known clinically-actionable genomic targets. However, the practicalities of performing WGS on clinical samples are poorly defined.This study was designed to test sample preparation, sequencing specifications and bioinformatic algorithms for their effect on accuracy and cost-efficiency in a large WGS analysis of human melanoma samples.WGS was performed on melanoma cell lines (n = 15) and melanoma fresh frozen tumours (n = 222). The appropriate level of coverage and the optimal mutation detection algorithm for the project pipeline were determined.An incremental increase in sequencing coverage from 36X to 132X in melanoma tissue samples and 30X to 103X for cell lines only resulted in a small increase (1-2%) in the number of mutations detected, and the quality scores of the additional mutations indicated a low probability that the mutations were real. The results suggest that 60X coverage for melanoma tissue and 40X for melanoma cell lines empower the detection of 98-99% of informative single nucleotide variants (SNVs), a sensitivity level at which clinical decision making or landscape research projects can be carried out with a high degree of confidence in the results. Likewise the bioinformatic mutation analysis methodology strongly influenced the number and quality of SNVs detected. Detecting mutations in the blood genomes separate to the tumour genomes generated 41% more SNVs than if the blood and melanoma tissue genomes were analysed simultaneously. Therefore, simultaneous analysis should be employed on matched melanoma tissue and blood genomes to reduce errors in mutation detection.This study provided valuable insights into the accuracy of SNV with WGS at various coverage levels in human clinical cancer specimens. Additionally, we investigated the accuracy of the publicly available mutation detection algorithms to detect cancer

  7. A sensitive, support-vector-machine method for the detection of horizontal gene transfers in viral, archaeal and bacterial genomes.

    Science.gov (United States)

    Tsirigos, Aristotelis; Rigoutsos, Isidore

    2005-01-01

    In earlier work, we introduced and discussed a generalized computational framework for identifying horizontal transfers. This framework relied on a gene's nucleotide composition, obviated the need for knowledge of codon boundaries and database searches, and was shown to perform very well across a wide range of archaeal and bacterial genomes when compared with previously published approaches, such as Codon Adaptation Index and C + G content. Nonetheless, two considerations remained outstanding: we wanted to further increase the sensitivity of detecting horizontal transfers and also to be able to apply the method to increasingly smaller genomes. In the discussion that follows, we present such a method, Wn-SVM, and show that it exhibits a very significant improvement in sensitivity compared with earlier approaches. Wn-SVM uses a one-class support-vector machine and can learn using rather small training sets. This property makes Wn-SVM particularly suitable for studying small-size genomes, similar to those of viruses, as well as the typically larger archaeal and bacterial genomes. We show experimentally that the new method results in a superior performance across a wide range of organisms and that it improves even upon our own earlier method by an average of 10% across all examined genomes. As a small-genome case study, we analyze the genome of the human cytomegalovirus and demonstrate that Wn-SVM correctly identifies regions that are known to be conserved and prototypical of all beta-herpesvirinae, regions that are known to have been acquired horizontally from the human host and, finally, regions that had not up to now been suspected to be horizontally transferred. Atypical region predictions for many eukaryotic viruses, including the alpha-, beta- and gamma-herpesvirinae, and 123 archaeal and bacterial genomes, have been made available online at http://cbcsrv.watson.ibm.com/HGT_SVM/.

  8. Kopi dan Kakao dalam Kreasi Motif Batik Khas Jember

    Directory of Open Access Journals (Sweden)

    Irfa'ina Rohana Salma

    2015-06-01

    Full Text Available ABSTRAK Batik Jember selama ini identik dengan motif daun tembakau. Visualisasi daun tembakau dalam motif Batik Jember cukup lemah, yaitu kurang berkarakter karena motif yang muncul adalah seperti gambar daun pada umumnya. Oleh karena itu perlu diciptakan desain motif batik khas Jember yang sumber inspirasinya digali dari kekayaan alam lainnya dari Jember yang mempunyai bentuk spesifik dan karakteristik sehingga identitas motif bisa didapatkan dengan lebih kuat. Hasil alam khas Jember tersebut adalah kopi dan kakao. Tujuan penciptaan seni ini adalah untuk menghasilkan motif batik  baru yang mempunyai ciri khas Jember. Metode yang digunakan yaitu pengumpulan data, pengamatan mendalam terhadap objek penciptaan, pengkajian sumber inspirasi, pembuatan desain motif, dan perwujudan menjadi batik. Dari penciptaan seni ini berhasil dikreasikan 6 (enam motif batik yaitu: (1 Motif Uwoh Kopi; (2 Motif Godong Kopi;  (3 Motif Ceplok Kakao; (4 Motif Kakao Raja; (5 Motif Kakao Biru; dan (6 Motif Wiji Mukti. Berdasarkan hasil penilaian “Selera Estetika” diketahui bahwa motif yang paling banyak disukai adalah Motif Uwoh Kopi dan Motif Kakao Raja. Kata kunci: Motif Woh Kopi, Motif Godong Kopi, Motif Ceplok Kakao, Motif Kakao Raja, Motif Kakao Biru, Motif Wiji Mukti ABSTRACTBatik Jember is synonymous with tobacco leaf motif. Tobacco leaf shape is quite weak in the visual appearance characterized as that motif emerges like a picture of leaves in general. Therefore, it is necessary to create a distinctive design motif extracted from other natural resources of Jember that have specific shapes and characteristics that can be obtained as the stronger motif identity. The typical natural resources from Jember are coffee and cocoa. The purpose of the creation of this art is to produce the unique, creative and innovative batik and have specific characteristics of Jember. The method used are data collection, observation of the object, reviewing inspiration sources

  9. Detection of Alien Oryza punctata Kotschy Chromosomes in Rice, Oryza sativa L., by Genomic in situ Hybridization

    OpenAIRE

    Yasui, Hideshi; Nonomura, Ken-ichi; Iwata, Nobuo; 安井, 秀; 野々村, 賢一; 岩田, 伸夫

    1997-01-01

    Genomic in situ hybridization (GIS H) using total Oryza punctata Kotschy genomic DNA as a probe was applied to detect alien chromosomes transferred from O. punctata (W1514: 2n=2x=24: BB) to O. sativa Japonica cultivar, Nipponbare (2n=2x=24: AA). Only 12 chromosomes in the interspecific hybrids (2n=3x=36: AAB) between autotetraploid of O. sativa cultivar Nipponbare and a diploid strain of O. punctata (W1514) showed intense staining by FITC in mitotic metaphase spreads. Only one homologous pair...

  10. Computational methods for detecting copy number variations in cancer genome using next generation sequencing: principles and challenges

    Science.gov (United States)

    Liu, Biao; Morrison, Carl D.; Johnson, Candace S.; Trump, Donald L.; Qin, Maochun; Conroy, Jeffrey C.; Wang, Jianmin; Liu, Song

    2013-01-01

    Accurate detection of somatic copy number variations (CNVs) is an essential part of cancer genome analysis, and plays an important role in oncotarget identifications. Next generation sequencing (NGS) holds the promise to revolutionize somatic CNV detection. In this review, we provide an overview of current analytic tools used for CNV detection in NGS-based cancer studies. We summarize the NGS data types used for CNV detection, decipher the principles for data preprocessing, segmentation, and interpretation, and discuss the challenges in somatic CNV detection. This review aims to provide a guide to the analytic tools used in NGS-based cancer CNV studies, and to discuss the important factors that researchers need to consider when analyzing NGS data for somatic CNV detections. PMID:24240121

  11. Whole Genome Association Study to Detect Single Nucleotide Polymorphisms for Behavior in Sapsaree Dog (

    Directory of Open Access Journals (Sweden)

    J. H. Ha

    2015-07-01

    Full Text Available The purpose of this study was to characterize genetic architecture of behavior patterns in Sapsaree dogs. The breed population (n = 8,256 has been constructed since 1990 over 12 generations and managed at the Sapsaree Breeding Research Institute, Gyeongsan, Korea. Seven behavioral traits were investigated for 882 individuals. The traits were classified as a quantitative or a categorical group, and heritabilities (h2 and variance components were estimated under the Animal model using ASREML 2.0 software program. In general, the h2 estimates of the traits ranged between 0.00 and 0.16. Strong genetic (rG and phenotypic (rP correlations were observed between nerve stability, affability and adaptability, i.e. 0.9 to 0.94 and 0.46 to 0.68, respectively. To detect significant single nucleotide polymorphism (SNP for the behavioral traits, a total of 134 and 60 samples were genotyped using the Illumina 22K CanineSNP20 and 170K CanineHD bead chips, respectively. Two datasets comprising 60 (Sap60 and 183 (Sap183 samples were analyzed, respectively, of which the latter was based on the SNPs that were embedded on both the 22K and 170K chips. To perform genome-wide association analysis, each SNP was considered with the residuals of each phenotype that were adjusted for sex and year of birth as fixed effects. A least squares based single marker regression analysis was followed by a stepwise regression procedure for the significant SNPs (p<0.01, to determine a best set of SNPs for each trait. A total of 41 SNPs were detected with the Sap183 samples for the behavior traits. The significant SNPs need to be verified using other samples, so as to be utilized to improve behavior traits via marker-assisted selection in the Sapsaree population.

  12. Detecting riboSNitches with RNA folding algorithms: a genome-wide benchmark.

    Science.gov (United States)

    Corley, Meredith; Solem, Amanda; Qu, Kun; Chang, Howard Y; Laederach, Alain

    2015-02-18

    Ribonucleic acid (RNA) secondary structure prediction continues to be a significant challenge, in particular when attempting to model sequences with less rigidly defined structures, such as messenger and non-coding RNAs. Crucial to interpreting RNA structures as they pertain to individual phenotypes is the ability to detect RNAs with large structural disparities caused by a single nucleotide variant (SNV) or riboSNitches. A recently published human genome-wide parallel analysis of RNA structure (PARS) study identified a large number of riboSNitches as well as non-riboSNitches, providing an unprecedented set of RNA sequences against which to benchmark structure prediction algorithms. Here we evaluate 11 different RNA folding algorithms' riboSNitch prediction performance on these data. We find that recent algorithms designed specifically to predict the effects of SNVs on RNA structure, in particular remuRNA, RNAsnp and SNPfold, perform best on the most rigorously validated subsets of the benchmark data. In addition, our benchmark indicates that general structure prediction algorithms (e.g. RNAfold and RNAstructure) have overall better performance if base pairing probabilities are considered rather than minimum free energy calculations. Although overall aggregate algorithmic performance on the full set of riboSNitches is relatively low, significant improvement is possible if the highest confidence predictions are evaluated independently. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Genome-wide detection and characterization of positive selection in human populations.

    Science.gov (United States)

    Sabeti, Pardis C; Varilly, Patrick; Fry, Ben; Lohmueller, Jason; Hostetter, Elizabeth; Cotsapas, Chris; Xie, Xiaohui; Byrne, Elizabeth H; McCarroll, Steven A; Gaudet, Rachelle; Schaffner, Stephen F; Lander, Eric S; Frazer, Kelly A; Ballinger, Dennis G; Cox, David R; Hinds, David A; Stuve, Laura L; Gibbs, Richard A; Belmont, John W; Boudreau, Andrew; Hardenbol, Paul; Leal, Suzanne M; Pasternak, Shiran; Wheeler, David A; Willis, Thomas D; Yu, Fuli; Yang, Huanming; Zeng, Changqing; Gao, Yang; Hu, Haoran; Hu, Weitao; Li, Chaohua; Lin, Wei; Liu, Siqi; Pan, Hao; Tang, Xiaoli; Wang, Jian; Wang, Wei; Yu, Jun; Zhang, Bo; Zhang, Qingrun; Zhao, Hongbin; Zhao, Hui; Zhou, Jun; Gabriel, Stacey B; Barry, Rachel; Blumenstiel, Brendan; Camargo, Amy; Defelice, Matthew; Faggart, Maura; Goyette, Mary; Gupta, Supriya; Moore, Jamie; Nguyen, Huy; Onofrio, Robert C; Parkin, Melissa; Roy, Jessica; Stahl, Erich; Winchester, Ellen; Ziaugra, Liuda; Altshuler, David; Shen, Yan; Yao, Zhijian; Huang, Wei; Chu, Xun; He, Yungang; Jin, Li; Liu, Yangfan; Shen, Yayun; Sun, Weiwei; Wang, Haifeng; Wang, Yi; Wang, Ying; Xiong, Xiaoyan; Xu, Liang; Waye, Mary M Y; Tsui, Stephen K W; Xue, Hong; Wong, J Tze-Fei; Galver, Luana M; Fan, Jian-Bing; Gunderson, Kevin; Murray, Sarah S; Oliphant, Arnold R; Chee, Mark S; Montpetit, Alexandre; Chagnon, Fanny; Ferretti, Vincent; Leboeuf, Martin; Olivier, Jean-François; Phillips, Michael S; Roumy, Stéphanie; Sallée, Clémentine; Verner, Andrei; Hudson, Thomas J; Kwok, Pui-Yan; Cai, Dongmei; Koboldt, Daniel C; Miller, Raymond D; Pawlikowska, Ludmila; Taillon-Miller, Patricia; Xiao, Ming; Tsui, Lap-Chee; Mak, William; Song, You Qiang; Tam, Paul K H; Nakamura, Yusuke; Kawaguchi, Takahisa; Kitamoto, Takuya; Morizono, Takashi; Nagashima, Atsushi; Ohnishi, Yozo; Sekine, Akihiro; Tanaka, Toshihiro; Tsunoda, Tatsuhiko; Deloukas, Panos; Bird, Christine P; Delgado, Marcos; Dermitzakis, Emmanouil T; Gwilliam, Rhian; Hunt, Sarah; Morrison, Jonathan; Powell, Don; Stranger, Barbara E; Whittaker, Pamela; Bentley, David R; Daly, Mark J; de Bakker, Paul I W; Barrett, Jeff; Chretien, Yves R; Maller, Julian; McCarroll, Steve; Patterson, Nick; Pe'er, Itsik; Price, Alkes; Purcell, Shaun; Richter, Daniel J; Sabeti, Pardis; Saxena, Richa; Schaffner, Stephen F; Sham, Pak C; Varilly, Patrick; Altshuler, David; Stein, Lincoln D; Krishnan, Lalitha; Smith, Albert Vernon; Tello-Ruiz, Marcela K; Thorisson, Gudmundur A; Chakravarti, Aravinda; Chen, Peter E; Cutler, David J; Kashuk, Carl S; Lin, Shin; Abecasis, Gonçalo R; Guan, Weihua; Li, Yun; Munro, Heather M; Qin, Zhaohui Steve; Thomas, Daryl J; McVean, Gilean; Auton, Adam; Bottolo, Leonardo; Cardin, Niall; Eyheramendy, Susana; Freeman, Colin; Marchini, Jonathan; Myers, Simon; Spencer, Chris; Stephens, Matthew; Donnelly, Peter; Cardon, Lon R; Clarke, Geraldine; Evans, David M; Morris, Andrew P; Weir, Bruce S; Tsunoda, Tatsuhiko; Johnson, Todd A; Mullikin, James C; Sherry, Stephen T; Feolo, Michael; Skol, Andrew; Zhang, Houcan; Zeng, Changqing; Zhao, Hui; Matsuda, Ichiro; Fukushima, Yoshimitsu; Macer, Darryl R; Suda, Eiko; Rotimi, Charles N; Adebamowo, Clement A; Ajayi, Ike; Aniagwu, Toyin; Marshall, Patricia A; Nkwodimmah, Chibuzor; Royal, Charmaine D M; Leppert, Mark F; Dixon, Missy; Peiffer, Andy; Qiu, Renzong; Kent, Alastair; Kato, Kazuto; Niikawa, Norio; Adewole, Isaac F; Knoppers, Bartha M; Foster, Morris W; Clayton, Ellen Wright; Watkin, Jessica; Gibbs, Richard A; Belmont, John W; Muzny, Donna; Nazareth, Lynne; Sodergren, Erica; Weinstock, George M; Wheeler, David A; Yakub, Imtaz; Gabriel, Stacey B; Onofrio, Robert C; Richter, Daniel J; Ziaugra, Liuda; Birren, Bruce W; Daly, Mark J; Altshuler, David; Wilson, Richard K; Fulton, Lucinda L; Rogers, Jane; Burton, John; Carter, Nigel P; Clee, Christopher M; Griffiths, Mark; Jones, Matthew C; McLay, Kirsten; Plumb, Robert W; Ross, Mark T; Sims, Sarah K; Willey, David L; Chen, Zhu; Han, Hua; Kang, Le; Godbout, Martin; Wallenburg, John C; L'Archevêque, Paul; Bellemare, Guy; Saeki, Koji; Wang, Hongguang; An, Daochang; Fu, Hongbo; Li, Qing; Wang, Zhen; Wang, Renwu; Holden, Arthur L; Brooks, Lisa D; McEwen, Jean E; Guyer, Mark S; Wang, Vivian Ota; Peterson, Jane L; Shi, Michael; Spiegel, Jack; Sung, Lawrence M; Zacharia, Lynn F; Collins, Francis S; Kennedy, Karen; Jamieson, Ruth; Stewart, John

    2007-10-18

    With the advent of dense maps of human genetic variation, it is now possible to detect positive natural selection across the human genome. Here we report an analysis of over 3 million polymorphisms from the International HapMap Project Phase 2 (HapMap2). We used 'long-range haplotype' methods, which were developed to identify alleles segregating in a population that have undergone recent selection, and we also developed new methods that are based on cross-population comparisons to discover alleles that have swept to near-fixation within a population. The analysis reveals more than 300 strong candidate regions. Focusing on the strongest 22 regions, we develop a heuristic for scrutinizing these regions to identify candidate targets of selection. In a complementary analysis, we identify 26 non-synonymous, coding, single nucleotide polymorphisms showing regional evidence of positive selection. Examination of these candidates highlights three cases in which two genes in a common biological process have apparently undergone positive selection in the same population:LARGE and DMD, both related to infection by the Lassa virus, in West Africa;SLC24A5 and SLC45A2, both involved in skin pigmentation, in Europe; and EDAR and EDA2R, both involved in development of hair follicles, in Asia.

  14. Transnationalism as a motif in family stories.

    Science.gov (United States)

    Stone, Elizabeth; Gomez, Erica; Hotzoglou, Despina; Lipnitsky, Jane Y

    2005-12-01

    Family stories have long been recognized as a vehicle for assessing components of a family's emotional and social life, including the degree to which an immigrant family has been willing to assimilate. Transnationalism, defined as living in one or more cultures and maintaining connections to both, is now increasingly common. A qualitative study of family stories in the family of those who appear completely "American" suggests that an affiliation with one's home country is nevertheless detectable in the stories via motifs such as (1) positively connotated home remedies, (2) continuing denigration of home country "enemies," (3) extensive knowledge of the home country history and politics, (4) praise of endogamy and negative assessment of exogamy, (5) superiority of home country to America, and (6) beauty of home country. Furthermore, an awareness of which model--assimilationist or transnational--governs a family's experience may help clarify a clinician's understanding of a family's strengths, vulnerabilities, and mode of framing their cultural experiences.

  15. Mutation detection with next-generation resequencing through a mediator genome.

    Directory of Open Access Journals (Sweden)

    Omri Wurtzel

    Full Text Available The affordability of next generation sequencing (NGS is transforming the field of mutation analysis in bacteria. The genetic basis for phenotype alteration can be identified directly by sequencing the entire genome of the mutant and comparing it to the wild-type (WT genome, thus identifying acquired mutations. A major limitation for this approach is the need for an a-priori sequenced reference genome for the WT organism, as the short reads of most current NGS approaches usually prohibit de-novo genome assembly. To overcome this limitation we propose a general framework that utilizes the genome of relative organisms as mediators for comparing WT and mutant bacteria. Under this framework, both mutant and WT genomes are sequenced with NGS, and the short sequencing reads are mapped to the mediator genome. Variations between the mutant and the mediator that recur in the WT are ignored, thus pinpointing the differences between the mutant and the WT. To validate this approach we sequenced the genome of Bdellovibrio bacteriovorus 109J, an obligatory bacterial predator, and its prey-independent mutant, and compared both to the mediator species Bdellovibrio bacteriovorus HD100. Although the mutant and the mediator sequences differed in more than 28,000 nucleotide positions, our approach enabled pinpointing the single causative mutation. Experimental validation in 53 additional mutants further established the implicated gene. Our approach extends the applicability of NGS-based mutant analyses beyond the domain of available reference genomes.

  16. Mutation Detection with Next-Generation Resequencing through a Mediator Genome

    Energy Technology Data Exchange (ETDEWEB)

    Wurtzel, Omri; Dori-Bachash, Mally; Pietrokovski, Shmuel; Jurkevitch, Edouard; Sorek, Rotem; Ben-Jacob, Eshel

    2010-12-31

    The affordability of next generation sequencing (NGS) is transforming the field of mutation analysis in bacteria. The genetic basis for phenotype alteration can be identified directly by sequencing the entire genome of the mutant and comparing it to the wild-type (WT) genome, thus identifying acquired mutations. A major limitation for this approach is the need for an a-priori sequenced reference genome for the WT organism, as the short reads of most current NGS approaches usually prohibit de-novo genome assembly. To overcome this limitation we propose a general framework that utilizes the genome of relative organisms as mediators for comparing WT and mutant bacteria. Under this framework, both mutant and WT genomes are sequenced with NGS, and the short sequencing reads are mapped to the mediator genome. Variations between the mutant and the mediator that recur in the WT are ignored, thus pinpointing the differences between the mutant and the WT. To validate this approach we sequenced the genome of Bdellovibrio bacteriovorus 109J, an obligatory bacterial predator, and its prey-independent mutant, and compared both to the mediator species Bdellovibrio bacteriovorus HD100. Although the mutant and the mediator sequences differed in more than 28,000 nucleotide positions, our approach enabled pinpointing the single causative mutation. Experimental validation in 53 additional mutants further established the implicated gene. Our approach extends the applicability of NGS-based mutant analyses beyond the domain of available reference genomes.

  17. Imaging analysis of nuclear antiviral factors through direct detection of incoming adenovirus genome complexes

    Energy Technology Data Exchange (ETDEWEB)

    Komatsu, Tetsuro [Microbiologie Fondamentale et Pathogénicité, MFP CNRS UMR 5234, Université de Bordeaux, Bordeaux 33076 (France); Department of Infection Biology, Faculty of Medicine, University of Tsukuba, Tsukuba 305-8575 (Japan); Will, Hans [Department of Tumor Biology, University Hospital Hamburg-Eppendorf, 20246 Hamburg (Germany); Nagata, Kyosuke [Department of Infection Biology, Faculty of Medicine, University of Tsukuba, Tsukuba 305-8575 (Japan); Wodrich, Harald, E-mail: harald.wodrich@u-bordeaux.fr [Microbiologie Fondamentale et Pathogénicité, MFP CNRS UMR 5234, Université de Bordeaux, Bordeaux 33076 (France)

    2016-04-22

    Recent studies involving several viral systems have highlighted the importance of cellular intrinsic defense mechanisms through nuclear antiviral proteins that restrict viral propagation. These factors include among others components of PML nuclear bodies, the nuclear DNA sensor IFI16, and a potential restriction factor PHF13/SPOC1. For several nuclear replicating DNA viruses, it was shown that these factors sense and target viral genomes immediately upon nuclear import. In contrast to the anticipated view, we recently found that incoming adenoviral genomes are not targeted by PML nuclear bodies. Here we further explored cellular responses against adenoviral infection by focusing on specific conditions as well as additional nuclear antiviral factors. In line with our previous findings, we show that neither interferon treatment nor the use of specific isoforms of PML nuclear body components results in co-localization between incoming adenoviral genomes and the subnuclear domains. Furthermore, our imaging analyses indicated that neither IFI16 nor PHF13/SPOC1 are likely to target incoming adenoviral genomes. Thus our findings suggest that incoming adenoviral genomes may be able to escape from a large repertoire of nuclear antiviral mechanisms, providing a rationale for the efficient initiation of lytic replication cycle. - Highlights: • Host nuclear antiviral factors were analyzed upon adenovirus genome delivery. • Interferon treatments fail to permit PML nuclear bodies to target adenoviral genomes. • Neither Sp100A nor B targets adenoviral genomes despite potentially opposite roles. • The nuclear DNA sensor IFI16 does not target incoming adenoviral genomes. • PHF13/SPOC1 targets neither incoming adenoviral genomes nor genome-bound protein VII.

  18. Fast Detection of a BRCA2 Large Genomic Duplication by Next Generation Sequencing as a Single Procedure: A Case Report

    Directory of Open Access Journals (Sweden)

    Marcella Nunziato

    2017-11-01

    Full Text Available The aim of this study was to verify the reliability of a next generation sequencing (NGS-based method as a strategy to detect all possible BRCA mutations, including large genomic rearrangements. Genomic DNA was obtained from a peripheral blood sample provided by a patient from Southern Italy with early onset breast cancer and a family history of diverse cancers. BRCA molecular analysis was performed by NGS, and sequence data were analyzed using two software packages. Comparative genomic hybridization (CGH array was used as confirmatory method. A novel large duplication, involving exons 4–26, of BRCA2 was directly detected in the patient by NGS workflow including quantitative analysis of copy number variants. The duplication observed was also found by CGH array, thus confirming its extent. Large genomic rearrangements can affect the BRCA1/2 genes, and thus contribute to germline predisposition to familial breast and ovarian cancers. The frequency of these mutations could be underestimated because of technical limitations of several routinely used molecular analysis, while their evaluation should be included also in these molecular testing. The NGS-based strategy described herein is an effective procedure to screen for all kinds of BRCA mutations.

  19. Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX

    DEFF Research Database (Denmark)

    Schubert, Mikkel; Ermini, Luca; Der Sarkissian, Clio

    2014-01-01

    a variety of computational tools. Here we present PALEOMIX (http://geogenetics.ku.dk/publications/paleomix), a flexible and user-friendly pipeline applicable to both modern and ancient genomes, which largely automates the in silico analyses behind whole-genome resequencing. Starting with next...

  20. Genome-scale detection of positive selection in nine primates predicts human-virus evolutionary conflicts

    NARCIS (Netherlands)

    van der Lee, Robin; Wiel, Laurens; van Dam, Teunis J P; Huynen, Martijn A

    2017-01-01

    Hotspots of rapid genome evolution hold clues about human adaptation. We present a comparative analysis of nine whole-genome sequenced primates to identify high-confidence targets of positive selection. We find strong statistical evidence for positive selection in 331 protein-coding genes (3%),

  1. Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

    Directory of Open Access Journals (Sweden)

    Fauteux François

    2009-10-01

    Full Text Available Abstract Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP gene promoters from three plant families, namely Brassicaceae (mustards, Fabaceae (legumes and Poaceae (grasses using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L. Heynh., soybean (Glycine max (L. Merr. and rice (Oryza sativa L. respectively. We have identified three conserved motifs (two RY-like and one ACGT-like in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination

  2. Evolutionary gradient of predicted nuclear localization signals (NLS)-bearing proteins in genomes of family Planctomycetaceae.

    Science.gov (United States)

    Guo, Min; Yang, Ruifu; Huang, Chen; Liao, Qiwen; Fan, Guangyi; Sun, Chenghang; Lee, Simon Ming-Yuen

    2017-04-04

    The nuclear envelope is considered a key classification marker that distinguishes prokaryotes from eukaryotes. However, this marker does not apply to the family Planctomycetaceae, which has intracellular spaces divided by lipidic intracytoplasmic membranes (ICMs). Nuclear localization signal (NLS), a short stretch of amino acid sequence, destines to transport proteins from cytoplasm into nucleus, and is also associated with the development of nuclear envelope. We attempted to investigate the NLS motifs in Planctomycetaceae genomes to demonstrate the potential molecular transition in the development of intracellular membrane system. In this study, we identified NLS-like motifs that have the same amino acid compositions as experimentally identified NLSs in genomes of 11 representative species of family Planctomycetaceae. A total of 15 NLS types and 170 NLS-bearing proteins were detected in the 11 strains. To determine the molecular transformation, we compared NLS-bearing protein abundances in the 11 representative Planctomycetaceae genomes with them in genomes of 16 taxonomically varied microorganisms: nine bacteria, two archaea and five fungi. In the 27 strains, 29 NLS types and 1101 NLS-bearing proteins were identified, principal component analysis showed a significant transitional gradient from bacteria to Planctomycetaceae to fungi on their NLS-bearing protein abundance profiles. Then, we clustered the 993 non-redundant NLS-bearing proteins into 181 families and annotated their involved metabolic pathways. Afterwards, we aligned the ten types of NLS motifs from the 13 families containing NLS-bearing proteins among bacteria, Planctomycetaceae or fungi, considering their diversity, length and origin. A transition towards increased complexity from non-planctomycete bacteria to Planctomycetaceae to archaea and fungi was detected based on the complexity of the 10 types of NLS-like motifs in the 13 NLS-bearing proteins families. The results of this study reveal that

  3. Fitness for synchronization of network motifs

    DEFF Research Database (Denmark)

    Vega, Y.M.; Vázquez-Prada, M.; Pacheco, A.F.

    2004-01-01

    We study the synchronization of Kuramoto's oscillators in small parts of networks known as motifs. We first report on the system dynamics for the case of a scale-free network and show the existence of a non-trivial critical point. We compute the probability that network motifs synchronize, and fi...... that the fitness for synchronization correlates well with motifs interconnectedness and structural complexity. Possible implications for present debates about network evolution in biological and other systems are discussed....

  4. Detecting exact breakpoints of deletions with diversity in hepatitis B viral genomic DNA from next-generation sequencing data.

    Science.gov (United States)

    Cheng, Ji-Hong; Liu, Wen-Chun; Chang, Ting-Tsung; Hsieh, Sun-Yuan; Tseng, Vincent S

    2017-10-01

    Many studies have suggested that deletions of Hepatitis B Viral (HBV) are associated with the development of progressive liver diseases, even ultimately resulting in hepatocellular carcinoma (HCC). Among the methods for detecting deletions from next-generation sequencing (NGS) data, few methods considered the characteristics of virus, such as high evolution rates and high divergence among the different HBV genomes. Sequencing high divergence HBV genome sequences using the NGS technology outputs millions of reads. Thus, detecting exact breakpoints of deletions from these big and complex data incurs very high computational cost. We proposed a novel analytical method named VirDelect (Virus Deletion Detect), which uses split read alignment base to detect exact breakpoint and diversity variable to consider high divergence in single-end reads data, such that the computational cost can be reduced without losing accuracy. We use four simulated reads datasets and two real pair-end reads datasets of HBV genome sequence to verify VirDelect accuracy by score functions. The experimental results show that VirDelect outperforms the state-of-the-art method Pindel in terms of accuracy score for all simulated datasets and VirDelect had only two base errors even in real datasets. VirDelect is also shown to deliver high accuracy in analyzing the single-end read data as well as pair-end data. VirDelect can serve as an effective and efficient bioinformatics tool for physiologists with high accuracy and efficient performance and applicable to further analysis with characteristics similar to HBV on genome length and high divergence. The software program of VirDelect can be downloaded at https://sourceforge.net/projects/virdelect/. Copyright © 2017. Published by Elsevier Inc.

  5. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli.

    OpenAIRE

    Joensen, Katrine Grimstrup; Scheutz, Flemming; Lund, Ole; Hasman, Henrik; Kaas, Rolf Sommer; Nielsen, Eva M.; Aarestrup, Frank Møller

    2014-01-01

    Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming cheaper, it has huge potential in both diagnostics and routine surveillance. The aim of this study was to perform a real-time evaluation of WGS for routine typing and surveillance of verocytotoxin-prod...

  6. Real-Time Whole-Genome Sequencing for Routine Typing, Surveillance, and Outbreak Detection of Verotoxigenic Escherichia coli

    OpenAIRE

    Joensen, Katrine Grimstrup; Scheutz, Flemming; Lund, Ole; Hasman, Henrik; Kaas, Rolf S.; Nielsen, Eva M.; Aarestrup, Frank M.

    2014-01-01

    Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming cheaper, it has huge potential in both diagnostics and routine surveillance. The aim of this study was to perform a real-time evaluation of WGS for routine typing and surveillance of verocytotoxin-prod...

  7. Microarray comparative genomic hybridization detection of chromosomal imbalances in uterine cervix carcinoma

    Science.gov (United States)

    Hidalgo, Alfredo; Baudis, Michael; Petersen, Iver; Arreola, Hugo; Piña, Patricia; Vázquez-Ortiz, Guelaguetza; Hernández, Dulce; González, José; Lazos, Minerva; López, Ricardo; Pérez, Carlos; García, José; Vázquez, Karla; Alatorre, Brenda; Salcedo, Mauricio

    2005-01-01

    Background Chromosomal Comparative Genomic Hybridization (CGH) has been applied to all stages of cervical carcinoma progression, defining a specific pattern of chromosomal imbalances in this tumor. However, given its limited spatial resolution, chromosomal CGH has offered only general information regarding the possible genetic targets of DNA copy number changes. Methods In order to further define specific DNA copy number changes in cervical cancer, we analyzed 20 cervical samples (3 pre-malignant lesions, 10 invasive tumors, and 7 cell lines), using the GenoSensor microarray CGH system to define particular genetic targets that suffer copy number changes. Results The most common DNA gains detected by array CGH in the invasive samples were located at the RBP1-RBP2 (3q21-q22) genes, the sub-telomeric clone C84C11/T3 (5ptel), D5S23 (5p15.2) and the DAB2 gene (5p13) in 58.8% of the samples. The most common losses were found at the FHIT gene (3p14.2) in 47% of the samples, followed by deletions at D8S504 (8p23.3), CTDP1-SHGC- 145820 (18qtel), KIT (4q11-q12), D1S427-FAF1 (1p32.3), D9S325 (9qtel), EIF4E (eukaryotic translation initiation factor 4E, 4q24), RB1 (13q14), and DXS7132 (Xq12) present in 5/17 (29.4%) of the samples. Conclusion Our results confirm the presence of a specific pattern of chromosomal imbalances in cervical carcinoma and define specific targets that are suffering DNA copy number changes in this neoplasm. PMID:16004614

  8. Microarray comparative genomic hybridization detection of chromosomal imbalances in uterine cervix carcinoma

    Directory of Open Access Journals (Sweden)

    García José

    2005-07-01

    Full Text Available Abstract Background Chromosomal Comparative Genomic Hybridization (CGH has been applied to all stages of cervical carcinoma progression, defining a specific pattern of chromosomal imbalances in this tumor. However, given its limited spatial resolution, chromosomal CGH has offered only general information regarding the possible genetic targets of DNA copy number changes. Methods In order to further define specific DNA copy number changes in cervical cancer, we analyzed 20 cervical samples (3 pre-malignant lesions, 10 invasive tumors, and 7 cell lines, using the GenoSensor microarray CGH system to define particular genetic targets that suffer copy number changes. Results The most common DNA gains detected by array CGH in the invasive samples were located at the RBP1-RBP2 (3q21-q22 genes, the sub-telomeric clone C84C11/T3 (5ptel, D5S23 (5p15.2 and the DAB2 gene (5p13 in 58.8% of the samples. The most common losses were found at the FHIT gene (3p14.2 in 47% of the samples, followed by deletions at D8S504 (8p23.3, CTDP1-SHGC- 145820 (18qtel, KIT (4q11-q12, D1S427-FAF1 (1p32.3, D9S325 (9qtel, EIF4E (eukaryotic translation initiation factor 4E, 4q24, RB1 (13q14, and DXS7132 (Xq12 present in 5/17 (29.4% of the samples. Conclusion Our results confirm the presence of a specific pattern of chromosomal imbalances in cervical carcinoma and define specific targets that are suffering DNA copy number changes in this neoplasm.

  9. Sample size requirements to detect gene-environment interactions in genome-wide association studies.

    Science.gov (United States)

    Murcray, Cassandra E; Lewinger, Juan Pablo; Conti, David V; Thomas, Duncan C; Gauderman, W James

    2011-04-01

    Many complex diseases are likely to be a result of the interplay of genes and environmental exposures. The standard analysis in a genome-wide association study (GWAS) scans for main effects and ignores the potentially useful information in the available exposure data. Two recently proposed methods that exploit environmental exposure information involve a two-step analysis aimed at prioritizing the large number of SNPs tested to highlight those most likely to be involved in a GE interaction. For example, Murcray et al. ([2009] Am J Epidemiol 169:219–226) proposed screening on a test that models the G-E association induced by an interaction in the combined case-control sample. Alternatively, Kooperberg and LeBlanc ([2008] Genet Epidemiol 32:255–263) suggested screening on genetic marginal effects. In both methods, SNPs that pass the respective screening step at a pre-specified significance threshold are followed up with a formal test of interaction in the second step. We propose a hybrid method that combines these two screening approaches by allocating a proportion of the overall genomewide significance level to each test. We show that the Murcray et al. approach is often the most efficient method, but that the hybrid approach is a powerful and robust method for nearly any underlying model. As an example, for a GWAS of 1 million markers including a single true disease SNP with minor allele frequency of 0.15, and a binary exposure with prevalence 0.3, the Murcray, Kooperberg and hybrid methods are 1.90, 1.27, and 1.87 times as efficient, respectively, as the traditional case-control analysis to detect an interaction effect size of 2.0.

  10. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  11. Global MYCN transcription factor binding analysis in neuroblastoma reveals association with distinct E-box motifs and regions of DNA hypermethylation.

    LENUS (Irish Health Repository)

    Murphy, Derek M

    2009-01-01

    BACKGROUND: Neuroblastoma, a cancer derived from precursor cells of the sympathetic nervous system, is a major cause of childhood cancer related deaths. The single most important prognostic indicator of poor clinical outcome in this disease is genomic amplification of MYCN, a member of a family of oncogenic transcription factors. METHODOLOGY: We applied MYCN chromatin immunoprecipitation to microarrays (ChIP-chip) using MYCN amplified\\/non-amplified cell lines as well as a conditional knockdown cell line to determine the distribution of MYCN binding sites within all annotated promoter regions. CONCLUSION: Assessment of E-box usage within consistently positive MYCN binding sites revealed a predominance for the CATGTG motif (p<0.0016), with significant enrichment of additional motifs CATTTG, CATCTG, CAACTG in the MYCN amplified state. For cell lines over-expressing MYCN, gene ontology analysis revealed enrichment for the binding of MYCN at promoter regions of numerous molecular functional groups including DNA helicases and mRNA transcriptional regulation. In order to evaluate MYCN binding with respect to other genomic features, we determined the methylation status of all annotated CpG islands and promoter sequences using methylated DNA immunoprecipitation (MeDIP). The integration of MYCN ChIP-chip and MeDIP data revealed a highly significant positive correlation between MYCN binding and DNA hypermethylation. This association was also detected in regions of hemizygous loss, indicating that the observed association occurs on the same homologue. In summary, these findings suggest that MYCN binding occurs more commonly at CATGTG as opposed to the classic CACGTG E-box motif, and that disease associated over expression of MYCN leads to aberrant binding to additional weaker affinity E-box motifs in neuroblastoma. The co-localization of MYCN binding and DNA hypermethylation further supports the dual role of MYCN, namely that of a classical transcription factor affecting the

  12. DistAMo: A web-based tool to characterize DNA-motif distribution on bacterial chromosomes

    Directory of Open Access Journals (Sweden)

    Patrick eSobetzko

    2016-03-01

    Full Text Available Short DNA motifs are involved in a multitude of functions such as for example chromosome segregation, DNA replication or mismatch repair. Distribution of such motifs is often not random and the specific chromosomal pattern relates to the respective motif function. Computational approaches which quantitatively assess such chromosomal motif patterns are necessary. Here we present a new computer tool DistAMo (Distribution Analysis of DNA Motifs. The algorithm uses codon redundancy to calculate the relative abundance of short DNA motifs from single genes to entire chromosomes. Comparative genomics analyses of the GATC-motif distribution in γ-proteobacterial genomes using DistAMo revealed that (i genes beside the replication origin are enriched in GATCs, (ii genome-wide GATC distribution follows a distinct pattern and (iii genes involved in DNA replication and repair are enriched in GATCs. These features are specific for bacterial chromosomes encoding a Dam methyltransferase. The new software is available as a stand-alone or as an easy-to-use web-based server version at http://www.computational.bio.uni-giessen.de/distamo.

  13. SECOM: A novel hash seed and community detection based-approach for genome-scale protein domain identification

    KAUST Repository

    Fan, Ming

    2012-06-28

    With rapid advances in the development of DNA sequencing technologies, a plethora of high-throughput genome and proteome data from a diverse spectrum of organisms have been generated. The functional annotation and evolutionary history of proteins are usually inferred from domains predicted from the genome sequences. Traditional database-based domain prediction methods cannot identify novel domains, however, and alignment-based methods, which look for recurring segments in the proteome, are computationally demanding. Here, we propose a novel genome-wide domain prediction method, SECOM. Instead of conducting all-against-all sequence alignment, SECOM first indexes all the proteins in the genome by using a hash seed function. Local similarity can thus be detected and encoded into a graph structure, in which each node represents a protein sequence and each edge weight represents the shared hash seeds between the two nodes. SECOM then formulates the domain prediction problem as an overlapping community-finding problem in this graph. A backward graph percolation algorithm that efficiently identifies the domains is proposed. We tested SECOM on five recently sequenced genomes of aquatic animals. Our tests demonstrated that SECOM was able to identify most of the known domains identified by InterProScan. When compared with the alignment-based method, SECOM showed higher sensitivity in detecting putative novel domains, while it was also three orders of magnitude faster. For example, SECOM was able to predict a novel sponge-specific domain in nucleoside-triphosphatase (NTPases). Furthermore, SECOM discovered two novel domains, likely of bacterial origin, that are taxonomically restricted to sea anemone and hydra. SECOM is an open-source program and available at http://sfb.kaust.edu.sa/Pages/Software.aspx. © 2012 Fan et al.

  14. Two novel circo-like viruses detected in human feces: complete genome sequencing and electron microscopy analysis.

    Science.gov (United States)

    Castrignano, Silvana Beres; Nagasse-Sugahara, Teresa Keico; Kisielius, Jonas José; Ueda-Ito, Marli; Brandão, Paulo Eduardo; Curti, Suely Pires

    2013-12-26

    The application of viral metagenomic techniques and a series of PCRs in a human fecal sample enabled the detection of two novel circular unisense DNA viral genomes with 92% nucleotide similarity. The viruses were tentatively named circo-like virus-Brazil (CLV-BR) strains hs1 and hs2 and have genome lengths of 2526 and 2533 nucleotides, respectively. Four major open reading frames (ORFs) were identified in each of the genomes, and differences between the two genomes were primarily observed in ORF 2. Only ORF 3 showed significant amino acid similarities to a putative rolling circle replication initiator protein (Rep), although with low identity (36%). Our phylogenetic analysis, based on the Rep protein, demonstrated that the CLV-BRs do not cluster with members of the Circoviridae, Nanoviridae or Geminiviridae families and are more closely related to circo-like genomes previously identified in reclaimed water and feces of a wild rodent and of a bat. The CLV-BRs are members of a putative new family of circular Rep-encoding ssDNA viruses. Electron microscopy revealed icosahedral (~23 nm) structures, likely reflecting the novel viruses, and rod-shaped viral particles (~65-460 × 21 × 10 nm in length, diameter, and axial canal, respectively). Circo-like viruses have been detected in stool samples from humans and other mammals (bats, rodents, chimpanzees and bovines), cerebrospinal fluid and sera from humans, as well as samples from many other sources, e.g., insects, meat and the environment. Further studies are needed to classify all novel circular DNA viruses and elucidate their hosts, pathogenicity and evolutionary history. Copyright © 2013 Elsevier B.V. All rights reserved.

  15. The Verrucomicrobia LexA-binding Motif: Insights into the Evolutionary Dynamics of the SOS Response

    Directory of Open Access Journals (Sweden)

    Ivan Erill

    2016-07-01

    Full Text Available The SOS response is the primary bacterial mechanism to address DNA damage, coordinating multiple cellular processes that include DNA repair, cell division and translesion synthesis. In contrast to other regulatory systems, the composition of the SOS genetic network and the binding motif of its transcriptional repressor, LexA, have been shown to vary greatly across bacterial clades, making it an ideal system to study the co-evolution of transcription factors and their regulons. Leveraging comparative genomics approaches and prior knowledge on the core SOS regulon, here we define the binding motif of the Verrucomicrobia, a recently described phylum of emerging interest due to its association with eukaryotic hosts. Site directed mutagenesis of the Verrucomicrobium spinosum recA promoter confirms that LexA binds a 14 bp palindromic motif with consensus sequence TGTTC-N4-GAACA. Computational analyses suggest that recognition of this novel motif is determined primarily by changes in base-contacting residues of the third alpha helix of the LexA helix-turn-helix DNA binding motif. In conjunction with comparative genomics analysis of the LexA regulon in the Verrucomicrobia phylum, electrophoretic shift assays reveal that LexA binds to operators in the promoter region of DNA repair genes and a mutagenesis cassette in this organism, and identify previously unreported components of the SOS response. The identification of tandem LexA-binding sites generating instances of other LexA-binding motifs in the lexA gene promoter of Verrucomicrobia species leads us to postulate a novel mechanism for LexA-binding motif evolution. This model, based on gene duplication, successfully addresses outstanding questions in the intricate co-evolution of the LexA protein, its binding motif and the regulatory network it controls.

  16. Detection of short repeated genomic sequences on metaphase chromosomes using padlock probes and target primed rolling circle DNA synthesis

    Directory of Open Access Journals (Sweden)

    Stougaard Magnus

    2007-11-01

    Full Text Available Abstract Background In situ detection of short sequence elements in genomic DNA requires short probes with high molecular resolution and powerful specific signal amplification. Padlock probes can differentiate single base variations. Ligated padlock probes can be amplified in situ by rolling circle DNA synthesis and detected by fluorescence microscopy, thus enhancing PRINS type reactions, where localized DNA synthesis reports on the position of hybridization targets, to potentially reveal the binding of single oligonucleotide-size probe molecules. Such a system has been presented for the detection of mitochondrial DNA in fixed cells, whereas attempts to apply rolling circle detection to metaphase chromosomes have previously failed, according to the literature. Methods Synchronized cultured cells were fixed with methanol/acetic acid to prepare chromosome spreads in teflon-coated diagnostic well-slides. Apart from the slide format and the chromosome spreading everything was done essentially according to standard protocols. Hybridization targets were detected in situ with padlock probes, which were ligated and amplified using target primed rolling circle DNA synthesis, and detected by fluorescence labeling. Results An optimized protocol for the spreading of condensed metaphase chromosomes in teflon-coated diagnostic well-slides was developed. Applying this protocol we generated specimens for target primed rolling circle DNA synthesis of padlock probes recognizing a 40 nucleotide sequence in the male specific repetitive satellite I sequence (DYZ1 on the Y-chromosome and a 32 nucleotide sequence in the repetitive kringle IV domain in the apolipoprotein(a gene positioned on the long arm of chromosome 6. These targets were detected with good efficiency, but the efficiency on other target sites was unsatisfactory. Conclusion Our aim was to test the applicability of the method used on mitochondrial DNA to the analysis of nuclear genomes, in particular as

  17. Design and validation of an oligonucleotide microarray for the detection of genomic rearrangements associated with common hereditary cancer syndromes.

    Science.gov (United States)

    Mancini-DiNardo, Debora; Judkins, Thaddeus; Woolstenhulme, Nick; Burton, Collin; Schoenberger, Jeremy; Ryder, Matthew; Murray, Adam; Gutin, Natalia; Theisen, Aaron; Holladay, Jayson; Craft, Jonathan; Arnell, Christopher; Moyes, Kelsey; Roa, Benjamin

    2014-09-11

    Conventional Sanger sequencing reliably detects the majority of genetic mutations associated with hereditary cancers, such as single-base changes and small insertions or deletions. However, detection of genomic rearrangements, such as large deletions and duplications, requires special technologies. Microarray analysis has been successfully used to detect large rearrangements (LRs) in genetic disorders. We designed and validated a high-density oligonucleotide microarray for the detection of gene-level genomic rearrangements associated with hereditary breast and ovarian cancer (HBOC), Lynch, and polyposis syndromes. The microarray consisted of probes corresponding to the exons and flanking introns of BRCA1 and BRCA2 (≈1,700) and Lynch syndrome/polyposis genes MLH1, MSH2, MSH6, APC, MUTYH, and EPCAM (≈2,200). We validated the microarray with 990 samples previously tested for LR status in BRCA1, BRCA2, MLH1, MSH2, MSH6, APC, MUTYH, or EPCAM. Microarray results were 100% concordant with previous results in the validation studies. Subsequently, clinical microarray analysis was performed on samples from patients with a high likelihood of HBOC mutations (13,124), Lynch syndrome mutations (18,498), and polyposis syndrome mutations (2,739) to determine the proportion of LRs. Our results demonstrate that LRs constitute a substantial proportion of genetic mutations found in patients referred for hereditary cancer genetic testing. The use of microarray comparative genomic hybridization (CGH) for the detection of LRs is well-suited as an adjunct technology for both single syndrome (by Sanger sequencing analysis) and extended gene panel testing by next generation sequencing analysis. Genetic testing strategies using microarray analysis will help identify additional patients carrying LRs, who are predisposed to various hereditary cancers.

  18. MixupMapper: correcting sample mix-ups in genome-wide datasets increases power to detect small genetic effects.

    Science.gov (United States)

    Westra, Harm-Jan; Jansen, Ritsert C; Fehrmann, Rudolf S N; te Meerman, Gerard J; van Heel, David; Wijmenga, Cisca; Franke, Lude

    2011-08-01

    Sample mix-ups can arise during sample collection, handling, genotyping or data management. It is unclear how often sample mix-ups occur in genome-wide studies, as there currently are no post hoc methods that can identify these mix-ups in unrelated samples. We have therefore developed an algorithm (MixupMapper) that can both detect and correct sample mix-ups in genome-wide studies that study gene expression levels. We applied MixupMapper to five publicly available human genetical genomics datasets. On average, 3% of all analyzed samples had been assigned incorrect expression phenotypes: in one of the datasets 23% of the samples had incorrect expression phenotypes. The consequences of sample mix-ups are substantial: when we corrected these sample mix-ups, we identified on average 15% more significant cis-expression quantitative trait loci (cis-eQTLs). In one dataset, we identified three times as many significant cis-eQTLs after correction. Furthermore, we show through simulations that sample mix-ups can lead to an underestimation of the explained heritability of complex traits in genome-wide association datasets. MixupMapper is freely available at http://www.genenetwork.nl/mixupmapper/

  19. Miz-1 activates gene expression via a novel consensus DNA binding motif.

    Directory of Open Access Journals (Sweden)

    Bonnie L Barrilleaux

    Full Text Available The transcription factor Miz-1 can either activate or repress gene expression in concert with binding partners including the Myc oncoprotein. The genomic binding of Miz-1 includes both core promoters and more distal sites, but the preferred DNA binding motif of Miz-1 has been unclear. We used a high-throughput in vitro technique, Bind-n-Seq, to identify two Miz-1 consensus DNA binding motif sequences--ATCGGTAATC and ATCGAT (Mizm1 and Mizm2--bound by full-length Miz-1 and its zinc finger domain, respectively. We validated these sequences directly as high affinity Miz-1 binding motifs. Competition assays using mutant probes indicated that the binding affinity of Miz-1 for Mizm1 and Mizm2 is highly sequence-specific. Miz-1 strongly activates gene expression through the motifs in a Myc-independent manner. MEME-ChIP analysis of Miz-1 ChIP-seq data in two different cell types reveals a long motif with a central core sequence highly similar to the Mizm1 motif identified by Bind-n-Seq, validating the in vivo relevance of the findings. Miz-1 ChIP-seq peaks containing the long motif are predominantly located outside of proximal promoter regions, in contrast to peaks without the motif, which are highly concentrated within 1.5 kb of the nearest transcription start site. Overall, our results indicate that Miz-1 may be directed in vivo to the novel motif sequences we have identified, where it can recruit its specific binding partners to control gene expression and ultimately regulate cell fate.

  20. The first detection and whole genome characterization of the G6P[15] group A rotavirus strain from roe deer.

    Science.gov (United States)

    Jamnikar-Ciglenecki, Urska; Kuhar, Urska; Sturm, Sabina; Kirbis, Andrej; Racki, Nejc; Steyer, Andrej

    2016-08-15

    Although rotaviruses have been detected in a variety of host species, there are only limited records of their occurrence in deer, where their role is unknown. In this study, group A rotavirus was identified in roe deer during a study of enteric viruses in game animals. 102 samples of intestinal content were collected from roe deer (56), wild boars (29), chamois (10), red deer (6) and mouflon (1), but only one sample from roe deer was positive. Following whole genome sequence analysis, the rotavirus strain D38/14 was characterized by next generation sequencing. The genotype constellation, comprising 11 genome segments, was G6-P[15]-I2-R2-C2-M2-A3-N2-T6-E2-H3. Phylogenetic analysis of the VP7 genome segment showed that the D38/14 rotavirus strain is closely related to the various G6 zoonotic rotavirus strains of bovine-like origin frequently detected in humans. In the VP4 segment, this strain showed high variation compared to that in the P[15] strain found in sheep and in a goat. This finding suggests that rotaviruses from deer are similar to those in other DS-1 rotavirus groups and could constitute a source of zoonotically transmitted rotaviruses. The epidemiological status of group A rotaviruses in deer should be further investigated. Copyright © 2016 Elsevier B.V. All rights reserved.

  1. Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array

    Directory of Open Access Journals (Sweden)

    Sugnet Charles

    2006-12-01

    Full Text Available Abstract Background Alternative splicing is a mechanism for increasing protein diversity by excluding or including exons during post-transcriptional processing. Alternatively spliced proteins are particularly relevant in oncology since they may contribute to the etiology of cancer, provide selective drug targets, or serve as a marker set for cancer diagnosis. While conventional identification of splice variants generally targets individual genes, we present here a new exon-centric array (GeneChip Human Exon 1.0 ST that allows genome-wide identification of differential splice variation, and concurrently provides a flexible and inclusive analysis of gene expression. Results We analyzed 20 paired tumor-normal colon cancer samples using a microarray designed to detect over one million putative exons that can be virtually assembled into potential gene-level transcripts according to various levels of prior supporting evidence. Analysis of high confidence (empirically supported transcripts identified 160 differentially expressed genes, with 42 genes occupying a network impacting cell proliferation and another twenty nine genes with unknown functions. A more speculative analysis, including transcripts based solely on computational prediction, produced another 160 differentially expressed genes, three-fourths of which have no previous annotation. We also present a comparison of gene signal estimations from the Exon 1.0 ST and the U133 Plus 2.0 arrays. Novel splicing events were predicted by experimental algorithms that compare the relative contribution of each exon to the cognate transcript intensity in each tissue. The resulting candidate splice variants were validated with RT-PCR. We found nine genes that were differentially spliced between colon tumors and normal colon tissues, several of which have not been previously implicated in cancer. Top scoring candidates from our analysis were also found to substantially overlap with EST-based bioinformatic

  2. Improved genome annotation through untargeted detection of pathway-specific metabolites

    Directory of Open Access Journals (Sweden)

    Banfield Jillian F

    2011-06-01

    Full Text Available Abstract Background Mass spectrometry-based metabolomics analyses have the potential to complement sequence-based methods of genome annotation, but only if raw mass spectral data can be linked to specific metabolic pathways. In untargeted metabolomics, the measured mass of a detected compound is used to define the location of the compound in chemical space, but uncertainties in mass measurements lead to "degeneracies" in chemical space since multiple chemical formulae correspond to the same measured mass. We compare two methods to eliminate these degeneracies. One method relies on natural isotopic abundances, and the other relies on the use of stable-isotope labeling (SIL to directly determine C and N atom counts. Both depend on combinatorial explorations of the "chemical space" comprised of all possible chemical formulae comprised of biologically relevant chemical elements. Results Of 1532 metabolic pathways curated in the MetaCyc database, 412 contain a metabolite having a chemical formula unique to that metabolic pathway. Thus, chemical formulae alone can suffice to infer the presence of some metabolic pathways. Of 248,928 unique chemical formulae selected from the PubChem database, more than 95% had at least one degeneracy on the basis of accurate mass information alone. Consideration of natural isotopic abundance reduced degeneracy to 64%, but mainly for formulae less than 500 Da in molecular weight, and only if the error in the relative isotopic peak intensity was less than 10%. Knowledge of exact C and N atom counts as determined by SIL enabled reduced degeneracy, allowing for determination of unique chemical formula for 55% of the PubChem formulae. Conclusions To facilitate the assignment of chemical formulae to unknown mass-spectral features, profiling can be performed on cultures uniformly labeled with stable isotopes of nitrogen (15N or carbon (13C. This makes it possible to accurately count the number of carbon and nitrogen atoms in

  3. Inference of transcriptional regulation using gene expression data from the bovine and human genomes

    Directory of Open Access Journals (Sweden)

    McEwan John C

    2007-08-01

    Full Text Available Abstract Background Gene expression is in part regulated by sequences in promoters that bind transcription factors. Thus, co-expressed genes may have shared sequence motifs representing putative transcription factor binding sites (TFBSs. However, for agriculturally important animals the genomic sequence is often incomplete. The more complete human genome may be able to be used for this prediction by taking advantage of the expected evolutionary conservation in TFBSs between the species. Results A method of de novo TFBS prediction based on MEME was implemented, tested, and validated on a muscle-specific dataset. Muscle specific expression data from EST library analysis from cattle was used to predict sets of genes whose expression was enriched in muscle and cardiac tissues. The upstream 1500 bases from calculated orthologous genes were extracted from the human reference set. A set of common motifs were discovered in these promoters. Slightly over one third of these motifs were identified as known TFBSs including known muscle specific binding sites. This analysis also predicted several highly statistically significantly overrepresented sites that may be novel TFBS. An independent analysis of the equivalent bovine genomic sequences was also done, this gave less detailed results than the human analysis due to both the quality of orthologue prediction and assembly in promoter regions. However, the most common motifs could be detected in both sets. Conclusion Using promoter sequences from human genes is a useful approach when studying gene expression in species with limited or non-existing genomic sequence. As the bovine genome becomes better annotated it can in turn serve as the reference genome for other agriculturally important ruminants, such as sheep, goat and deer.

  4. MotifNet: a web-server for network motif analysis.

    Science.gov (United States)

    Smoly, Ilan Y; Lerman, Eugene; Ziv-Ukelson, Michal; Yeger-Lotem, Esti

    2017-06-15

    Network motifs are small topological patterns that recur in a network significantly more often than expected by chance. Their identification emerged as a powerful approach for uncovering the design principles underlying complex networks. However, available tools for network motif analysis typically require download and execution of computationally intensive software on a local computer. We present MotifNet, the first open-access web-server for network motif analysis. MotifNet allows researchers to analyze integrated networks, where nodes and edges may be labeled, and to search for motifs of up to eight nodes. The output motifs are presented graphically and the user can interactively filter them by their significance, number of instances, node and edge labels, and node identities, and view their instances. MotifNet also allows the user to distinguish between motifs that are centered on specific nodes and motifs that recur in distinct parts of the network. MotifNet is freely available at http://netbio.bgu.ac.il/motifnet . The website was implemented using ReactJs and supports all major browsers. The server interface was implemented in Python with data stored on a MySQL database. estiyl@bgu.ac.il or michaluz@cs.bgu.ac.il. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  5. Detection of genomic instability in normal human bronchial epithelial cells exposed to 238Pu

    International Nuclear Information System (INIS)

    Kennedy, C.H.; Fukushima, N.H.; Neft, R.E.; Lechner, J.F.

    1994-01-01

    Alpha particle-emitting radon daughters constitute a risk for development of lung cancer in humans. The development of this disease involves multiple genetic alterations. These changes and the time course they follow are not yet defined despite numerous in vitro endeavors to transform human lung cells with various physical or chemical agents. However, genomic instability, characterized both by structural and numerical chromosomal aberrations and by elevated rates of point mutations, is a common feature of tumor cells. Further, both types of genomic instability have been reported in the noncancerous progeny of normal murine hemopoietic cells exposed in vitro to α-particles. The purpose of this investigation was to determine if genomic instability is also a prominent feature of normal human bronchial epithelial cells exposed to α-particle irradiation from the decay of inhaled radon daughters

  6. Detection of Genomic Structural Variants from Next-Generation Sequencing Data

    Directory of Open Access Journals (Sweden)

    Lorenzo eTattini

    2015-06-01

    Full Text Available Structural variants are genomic rearrangements larger than 50 bp accounting for around1% of the variation among human genomes. They impact on phenotypic diversityand play a role in various diseases including neurological/neurocognitive disordersand cancer development and progression. Dissecting structural variants from next-generation sequencing data presents several challenges and a number of approacheshave been proposed in the literature. In this mini review we describe and summarisethe latest tools – and their underlying algorithms – designed for the analysis ofwhole-genome sequencing, whole-exome sequencing, custom captures and ampliconsequencing data, pointing out the major advantages/drawbacks. We also report asummary of the most recent applications of third-generation sequencing platforms.This assessment provides a guided indication – with particular emphasis on humangenetics and copy number variants – for researchers involved in the investigation of thesegenomic events.

  7. Clinical Utility of Array Comparative Genomic Hybridization for Detection of Chromosomal Abnormalities in Pediatric Acute Lymphoblastic Leukemia

    Science.gov (United States)

    Rabin, Karen R.; Man, Tsz-Kwong; Yu, Alexander; Folsom, Matthew R.; Zhao, Yi-Jue; Rao, Pulivarthi H.; Plon, Sharon E.; Naeem, Rizwan C.

    2014-01-01

    Background Accurate detection of recurrent chromosomal abnormalities is critical to assign patients to risk-based therapeutic regimens for pediatric acute lymphoblastic leukemia (ALL). Procedure We investigated the utility of array comparative genomic hybridization (aCGH) for detection of chromosomal abnormalities compared to standard clinical evaluation with karyotype and fluorescent in-situ hybridization (FISH). Fifty pediatric ALL diagnostic bone marrows were analyzed by bacterial artificial chromosome (BAC) array, and findings compared to standard clinical evaluation. Results Sensitivity of aCGH was 79% to detect karyotypic findings other than balanced translocations, which cannot be detected by aCGH because they involve no copy number change. aCGH also missed abnormalities occurring in subclones constituting less than 25% of cells. aCGH detected 44 additional abnormalities undetected or misidentified by karyotype, 21 subsequently validated by FISH, including abnormalities in 4 of 10 cases with uninformative cytogenetics. aCGH detected concurrent terminal deletions of both 9p and 20q in three cases, in two of which the 20q deletion was undetected by karyotype. A narrow region of loss at 7p21 was detected in two cases. Conclusions An array with increased BAC density over regions important in ALL, combined with PCR for fusion products of balanced translocations, could minimize labor- and time-intensive cytogenetic assays and provide key prognostic information in the approximately 35% of cases with uninformative cytogenetics. PMID:18253961

  8. MCAST: scanning for cis-regulatory motif clusters.

    Science.gov (United States)

    Grant, Charles E; Johnson, James; Bailey, Timothy L; Noble, William Stafford

    2016-04-15

    Precise regulatory control of genes, particularly in eukaryotes, frequently requires the joint action of multiple sequence-specific transcription factors. A cis-regulatory module (CRM) is a genomic locus that is responsible for gene regulation and that contains multiple transcription factor binding sites in close proximity. Given a collection of known transcription factor binding motifs, many bioinformatics methods have been proposed over the past 15 years for identifying within a genomic sequence candidate CRMs consisting of clusters of those motifs. The MCAST algorithm uses a hidden Markov model with a P-value-based scoring scheme to identify candidate CRMs. Here, we introduce a new version of MCAST that offers improved graphical output, a dynamic background model, statistical confidence estimates based on false discovery rate estimation and, most significantly, the ability to predict CRMs while taking into account epigenomic data such as DNase I sensitivity or histone modification data. We demonstrate the validity of MCAST's statistical confidence estimates and the utility of epigenomic priors in identifying CRMs. MCAST is part of the MEME Suite software toolkit. A web server and source code are available at http://meme-suite.org and http://alternate.meme-suite.org t.bailey@imb.uq.edu.au or william-noble@uw.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest.

    Science.gov (United States)

    Wang, Xin; Lin, Peijie; Ho, Joshua W K

    2018-01-19

    It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs - a motif grammar - located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific.

  10. Comparative genomic hybridization detects novel amplifications in fibroadenomas of the breast

    DEFF Research Database (Denmark)

    Ojopi, E P; Rogatto, S R; Caldeira, J R

    2001-01-01

    Comparative genomic hybridization analysis was performed for identification of chromosomal imbalances in 23 samples of fibroadenomas of the breast. Chromosomal gains rather than losses were a feature of these lesions. Only two cases with a familial and/or previous history of breast lesions had gain...

  11. Evaluation of whole genome sequencing for outbreak detection of Salmonella enterica

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Nielsen, Eva M.; Kaas, Rolf Sommer

    2014-01-01

    Salmonella enterica is a common cause of minor and large food borne outbreaks. To achieve successful and nearly ‘real-time’ monitoring and identification of outbreaks, reliable sub-typing is essential. Whole genome sequencing (WGS) shows great promises for using as a routine epidemiological typing...

  12. Genome-Wide SNP Detection, Validation, and Development of an 8K SNP Array for Apple

    NARCIS (Netherlands)

    Chagné, D.; Crowhurst, R.N.; Troggio, M.; Davey, M.W.; Gilmore, B.; Lawley, C.; Vanderzande, S.; Hellens, R.P.; Kumar, S.; Cestaro, A.; Velasco, R.; Main, D.; Rees, J.D.; Iezzoni, A.F.; Mockler, T.; Wilhelm, L.; Weg, van de W.E.; Gardiner, S.E.; Bassil, N.; Peace, C.

    2012-01-01

    As high-throughput genetic marker screening systems are essential for a range of genetics studies and plant breeding applications, the International RosBREED SNP Consortium (IRSC) has utilized the Illumina Infinium® II system to develop a medium- to high-throughput SNP screening tool for genome-wide

  13. Genome-wide detection of signatures of selection in Korean Hanwoo cattle

    Science.gov (United States)

    The Korean Hanwoo cattle have been intensively selected for production traits, especially high intramuscular fat content. It is believed that ancient crossings between different breeds contributed to forming the Hanwoo, but little is known about the genomic differences and similarities between other...

  14. ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS

    Directory of Open Access Journals (Sweden)

    Alves-Ferreira Marcelo

    2008-09-01

    Full Text Available Abstract Background Genome survey sequences (GSS offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. Results We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. Conclusion The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties

  15. Detecting low frequent loss-of-function alleles in genome wide association studies with red hair color as example.

    Directory of Open Access Journals (Sweden)

    Fan Liu

    Full Text Available Multiple loss-of-function (LOF alleles at the same gene may influence a phenotype not only in the homozygote state when alleles are considered individually, but also in the compound heterozygote (CH state. Such LOF alleles typically have low frequencies and moderate to large effects. Detecting such variants is of interest to the genetics community, and relevant statistical methods for detecting and quantifying their effects are sorely needed. We present a collapsed double heterozygosity (CDH test to detect the presence of multiple LOF alleles at a gene. When causal SNPs are available, which may be the case in next generation genome sequencing studies, this CDH test has overwhelmingly higher power than single SNP analysis. When causal SNPs are not directly available such as in current GWA settings, we show the CDH test has higher power than standard single SNP analysis if tagging SNPs are in linkage disequilibrium with the underlying causal SNPs to at least a moderate degree (r²>0.1. The test is implemented for genome-wide analysis in the publically available software package GenABEL which is based on a sliding window approach. We provide the proof of principle by conducting a genome-wide CDH analysis of red hair color, a trait known to be influenced by multiple loss-of-function alleles, in a total of 7,732 Dutch individuals with hair color ascertained. The association signals at the MC1R gene locus from CDH were uniformly more significant than traditional GWA analyses (the most significant P for CDH = 3.11×10⁻¹⁴² vs. P for rs258322 = 1.33×10⁻⁶⁶. The CDH test will contribute towards finding rare LOF variants in GWAS and sequencing studies.

  16. Rapid and sensitive approach to simultaneous detection of genomes of hepatitis A, B, C, D and E viruses.

    Science.gov (United States)

    Kodani, Maja; Mixson-Hayden, Tonya; Drobeniuc, Jan; Kamili, Saleem

    2014-10-01

    Five viruses have been etiologically associated with viral hepatitis. Nucleic acid testing (NAT) remains the gold standard for diagnosis of viremic stages of infection. NAT methodologies have been developed for all hepatitis viruses; however, a NAT-based assay that can simultaneously detect all five viruses is not available. We designed TaqMan card-based assays for detection of HAV RNA, HBV DNA, HCV RNA, HDV RNA and HEV RNA. The performances of individual assays were evaluated on TaqMan Array Cards (TAC) for detecting five viral genomes simultaneously. Sensitivity and specificity were determined by testing 329 NAT-tested clinical specimens. All NAT-positive samples for HCV (n = 32), HDV (n = 28) and HEV (n = 14) were also found positive in TAC (sensitivity, 100%). Forty-three of 46 HAV-NAT positive samples were also positive in TAC (sensitivity, 94%), while 36 of 39 HBV-NAT positive samples were positive (sensitivity, 92%). No false-positives were detected for HBV (n = 32), HCV (n = 36), HDV (n = 30), and HEV (n = 31) NAT-negative samples (specificity 100%), while 38 of 41 HAV-NAT negative samples were negative by TAC (specificity 93%). TAC assay was concordant with corresponding individual NATs for hepatitis A-E viral genomes and can be used for their detection simultaneously. The TAC assay has potential for use in hepatitis surveillance, for screening of donor specimens and in outbreak situations. Wider availability of TAC-ready assays may allow for customized assays, for improving acute jaundice surveillance and for other purposes for which there is need to identify multiple pathogens rapidly. Published by Elsevier B.V.

  17. Identification, evaluation, and application of the genomic-SSR loci in ramie

    Directory of Open Access Journals (Sweden)

    Ming-Bao Luan

    2016-09-01

    Full Text Available To provide a theoretical and practical foundation for ramie genetic analysis, simple sequence repeats (SSRs were identified in the ramie genome and employed in this study. From the 115 369 sequences of a specific-locus amplified fragment library, a type of reduced representation library obtained by high-throughput sequencing, we identified 4774 sequences containing 5064 SSR motifs. SSRs of ramie included repeat motifs with lengths of 1 to 6 nucleotides, and the abundance of each motif type varied greatly. We found that mononucleotide, dinucleotide, and trinucleotide repeat motifs were the most prevalent (95.91%. A total of 98 distinct motif types were detected in the genomic-SSRs of ramie. Of them, The A/T mononucleotide motif was the most abundant, accounting for 41.45% of motifs, followed by AT/TA, accounting for 20.30%. The number of alleles per locus in 31 polymorphic microsatellite loci ranged from 2 to 7, and observed and expected heterozygosities ranged from 0.04 to 1.00 and 0.04 to 0.83, respectively. Furthermore, molecular identity cards (IDs of the germplasms were constructed employing the ID Analysis 3.0 software. In the current study, the 26 germplasms of ramie can be distinguished by a combination of five SSR primers including Ibg5-5, Ibg3-210, Ibg1-11, Ibg6-468, and Ibg6-481. The allele polymorphisms produced by all SSR primers were used to analyze genetic relationships among the germplasms. The similarity coefficients ranged from 0.41 to 0.88. We found that these 26 germplasms were clustered into five categories using UPGMA, with poor correlation between germplasm and geographical distribution. Our study is the first large-scale SSR identification from ramie genomic sequences. We have further studied the SSR distribution pattern in the ramie genome, and proposed that it is possible to develop SSR loci from genomic data for population genetics studies, linkage mapping, quantitative trait locus mapping, cultivar fingerprinting

  18. Multiplex PCR for detection of the Vibrio genus and five pathogenic Vibrio species with primer sets designed using comparative genomics.

    Science.gov (United States)

    Kim, Hyun-Joong; Ryu, Ji-Oh; Lee, Shin-Young; Kim, Ei-Seul; Kim, Hae-Yeong

    2015-10-26

    The genus Vibrio is clinically significant and major pathogenic Vibrio species causing human Vibrio infections are V. cholerae, V. parahaemolyticus, V. vulnificus, V. alginolyticus and V. mimicus. In this study, we screened for novel genetic markers using comparative genomics and developed a Vibrio multiplex PCR for the reliable diagnosis of the Vibrio genus and the associated major pathogenic Vibrio species. A total of 30 Vibrio genome sequences were subjected to comparative genomics, and specific genes of the Vibrio genus and five major pathogenic Vibrio species were screened. The designed primer sets from the screened genes were evaluated by single PCR using DNAs from various Vibrio spp. and other non-Vibrio bacterial strains. A sextuplet multiplex PCR using six primer sets was developed to enable detection of the Vibrio genus and five pathogenic Vibrio species. The designed primer sets from the screened genes yielded specific diagnostic results for target the Vibrio genus and Vibrio species. The specificity of the developed multiplex PCR was confirmed with various Vibrio and non-Vibrio strains. This Vibrio multiplex PCR was evaluated using 117 Vibrio strains isolated from the south seashore areas in Korea and Vibrio isolates were identified as Vibrio spp., V. parahaemolyticus, V. vulnificus and V. alginolyticus, demonstrating the specificity and discriminative ability of the assay towards Vibrio species. This novel multiplex PCR method could provide reliable and informative identification of the Vibrio genus and major pathogenic Vibrio species in the food safety industry and in early clinical treatment, thereby protecting humans against Vibrio infection.

  19. Familial Case of Pelizaeus-Merzbacher Disorder Detected by Oligoarray Comparative Genomic Hybridization: Genotype-to-Phenotype Diagnosis

    Directory of Open Access Journals (Sweden)

    Kimia Najafi

    2017-01-01

    Full Text Available Introduction. Pelizaeus-Merzbacher disease (PMD is an X-linked recessive hypomyelinating leukodystrophy characterized by nystagmus, spastic quadriplegia, ataxia, and developmental delay. It is caused by mutation in the PLP1 gene. Case Description. We report a 9-year-old boy referred for oligoarray comparative genomic hybridization (OA-CGH because of intellectual delay, seizures, microcephaly, nystagmus, and spastic paraplegia. Similar clinical findings were reported in his older brother and maternal uncle. Both parents had normal phenotypes. OA-CGH was performed and a 436 Kb duplication was detected and the diagnosis of PMD was made. The mother was carrier of this 436 Kb duplication. Conclusion. Clinical presentation has been accepted as being the mainstay of diagnosis for most conditions. However, recent developments in genetic diagnosis have shown that, in many congenital and sporadic disorders lacking specific phenotypic manifestations, a genotype-to-phenotype approach can be conclusive. In this case, a diagnosis was reached by universal genomic testing, namely, whole genomic array.

  20. Detection and validation of single feature polymorphisms in cowpea (Vigna unguiculata L. Walp using a soybean genome array

    Directory of Open Access Journals (Sweden)

    Wanamaker Steve

    2008-02-01

    Full Text Available Abstract Background Cowpea (Vigna unguiculata L. Walp is an important food and fodder legume of the semiarid tropics and subtropics worldwide, especially in sub-Saharan Africa. High density genetic linkage maps are needed for marker assisted breeding but are not available for cowpea. A single feature polymorphism (SFP is a microarray-based marker which can be used for high throughput genotyping and high density mapping. Results Here we report detection and validation of SFPs in cowpea using a readily available soybean (Glycine max genome array. Robustified projection pursuit (RPP was used for statistical analysis using RNA as a surrogate for DNA. Using a 15% outlying score cut-off, 1058 potential SFPs were enumerated between two parents of a recombinant inbred line (RIL population segregating for several important traits including drought tolerance, Fusarium and brown blotch resistance, grain size and photoperiod sensitivity. Sequencing of 25 putative polymorphism-containing amplicons yielded a SFP probe set validation rate of 68%. Conclusion We conclude that the Affymetrix soybean genome array is a satisfactory platform for identification of some 1000's of SFPs for cowpea. This study provides an example of extension of genomic resources from a well supported species to an orphan crop. Presumably, other legume systems are similarly tractable to SFP marker development using existing legume array resources.

  1. Hunting Motifs in Situla Art

    Directory of Open Access Journals (Sweden)

    Andrej Preložnik

    2013-07-01

    Full Text Available Situla art developed as an echo of the toreutic style which had spread from the Near East through the Phoenicians, Greeks and Etruscans as far as the Veneti, Raeti, Histri, and their eastern neighbours in the region of Dolenjska (Lower Carniola. An Early Iron Age phenomenon (c. 600—300 BC, it rep- resents the major and most arresting form of the contemporary visual arts in an area stretching from the foot of the Apennines in the south to the Drava and Sava rivers in the east. Indeed, individual pieces have found their way across the Alpine passes and all the way north to the Danube. In the world and art of the situlae, a prominent role is accorded to ani- mals. They are displayed in numerous representations of human activities on artefacts crafted in the classic situla style – that is, between the late 6th  and early 5th centuries BC – as passive participants (e.g. in pageants or in harness or as an active element of the situla narrative. The most typical example of the latter is the hunting scene. Today we know at least four objects decorat- ed exclusively with hunting themes, and a number of situlae and other larger vessels where hunting scenes are embedded in composite narratives. All this suggests a popularity unparallelled by any other genre. Clearly recognisable are various hunting techniques and weapons, each associated with a particu- lar type of game (Fig. 1. The chase of a stag with javelin, horse and hound is depicted on the long- familiar and repeatedly published fibula of Zagorje (Fig. 2. It displays a hound mauling the stag’s back and a hunter on horseback pursuing a hind, her neck already pierced by the javelin. To judge by the (so far unnoticed shaft end un- der the stag’s muzzle, the hunter would have been brandishing a second jave- lin as well, like the warrior of the Vače fibula or the rider of the Nesactium situla, presumably himself a hunter. Many parallels to his motif are known from Greece, Etruria, and

  2. Combination of native and denaturing PAGE for the detection of protein binding regions in long fragments of genomic DNA

    Directory of Open Access Journals (Sweden)

    Metsis Madis

    2008-06-01

    Full Text Available Abstract Background In a traditional electrophoresis mobility shift assay (EMSA a 32P-labeled double-stranded DNA oligonucleotide or a restriction fragment bound to a protein is separated from the unbound DNA by polyacrylamide gel electrophoresis (PAGE in nondenaturing conditions. An extension of this method uses the large population of fragments derived from long genomic regions (approximately 600 kb for the identification of fragments containing protein binding regions. With this method, genomic DNA is fragmented by restriction enzymes, fragments are amplified by PCR, radiolabeled, incubated with nuclear proteins and the resulting DNA-protein complexes are separated by two-dimensional PAGE. Shifted DNA fragments containing protein binding sites are identified by using additional procedures, i. e. gel elution, PCR amplification, cloning and sequencing. Although the method allows simultaneous analysis of a large population of fragments, it is relatively laborious and can be used to detect only high affinity protein binding sites. Here we propose an alternative and straightforward strategy which is based on a combination of native and denaturing PAGE. This strategy allows the identification of DNA fragments containing low as well as high affinity protein binding regions, derived from genomic DNA ( Results We have combined an EMSA-based selection step with subsequent denaturing PAGE for the localization of protein binding regions in long (up to10 kb fragments of genomic DNA. Our strategy consists of the following steps: digestion of genomic DNA with a 4-cutter restriction enzyme (AluI, BsuRI, TruI, etc, separation of low and high molecular weight fractions of resultant DNA fragments, 32P-labeling with Klenow polymerase, traditional EMSA, gel elution and identification of the shifted bands (or smear by denaturing PAGE. The identification of DNA fragments containing protein binding sites is carried out by running the gel-eluted fragments alongside

  3. Detection of shared genes among Asian and European waterfowl reoviruses in the whole genome constellations.

    Science.gov (United States)

    Farkas, Szilvia L; Dandár, Eszter; Marton, Szilvia; Fehér, Enikő; Oldal, Miklós; Jakab, Ferenc; Mató, Tamás; Palya, Vilmos; Bányai, Krisztián

    2014-12-01

    In order to explore the genetic relatedness and evolution of 'classical' and 'novel' waterfowl origin reoviruses (WRV) isolated in different years and continents, and filling up our lacking knowledge about the European WRV strains, the complete genomic sequence of two French isolates causing the 'classical' type of reovirus infection of Muscovy ducks had been determined. Based on the genome organization and the encoded proteins the two isolates could be referred as classical type strains. Sequence comparison showed that the two strains were most closely related to each other and belong to the same monophyletic group of European and Asian WRV strains. Phylogeny of the appropriate segments revealed potential reassortment events between waterfowl and chicken origin, and 'classical' and 'novel' and European and Chinese WRV strains. Our results point out a complex way of viral evolution regarding the origin and biological properties of the WRVs. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Genome-wide copy number profiling to detect gene amplifications in neural progenitor cells

    Directory of Open Access Journals (Sweden)

    U. Fischer

    2014-12-01

    Full Text Available DNA sequence amplification occurs at defined stages during normal development in amphibians and flies and seems to be restricted in humans to drug-resistant and tumor cells only. We used array-CGH to discover copy number changes including gene amplifications and deletions during differentiation of human neural progenitor cells. Here, we describe cell culture features, DNA extraction, and comparative genomic hybridization (CGH analysis tailored towards the identification of genomic copy number changes. Further detailed analysis of amplified chromosome regions associated with this experiment, was published by Fischer and colleagues in PLOS One in 2012 (Fischer et al., 2012. We provide detailed information on deleted chromosome regions during differentiation and give an overview on copy number changes during differentiation induction for two representative chromosome regions.

  5. GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences.

    Science.gov (United States)

    Yu, Ning; Guo, Xuan; Zelikovsky, Alexander; Pan, Yi

    2017-05-24

    As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of CGI rely on: (a) %G+C content is ≥ 50%, (b) the ratio of the observed CpG content and the expected CpG content is ≥ 0.6, and (c) the general length of CGI is greater than 200 nucleotides. Most existing computational methods for the prediction of CpG island are programmed on these rules. However, many experimentally verified CpG islands deviate from these artificial criteria. Experiments indicate that in many cases %G+C is human genome. We analyze the energy distribution over genomic primary structure for each CpG site and adopt the parameters from statistics of Human genome. The evaluation results show that the new model can predict CpG islands efficiently by balancing both sensitivity and specificity over known human CGI data sets. Compared with other models, GaussianCpG can achieve better performance in CGI detection. Our Gaussian model aims to simplify the complex interaction between nucleotides. The model is computed not by the linear statistical method but by the Gaussian energy distribution and accumulation. The parameters of Gaussian function are not arbitrarily designated but deliberately chosen by optimizing the biological statistics. By using the pseudopotential analysis on CpG islands, the novel model is validated on both the real and artificial data sets.

  6. Microarray comparative genomic hybridization detection of chromosomal imbalances in uterine cervix carcinoma

    OpenAIRE

    García José; Pérez Carlos; López Ricardo; Lazos Minerva; González José; Hernández Dulce; Vázquez-Ortiz Guelaguetza; Piña Patricia; Arreola Hugo; Petersen Iver; Baudis Michael; Hidalgo Alfredo; Vázquez Karla; Alatorre Brenda; Salcedo Mauricio

    2005-01-01

    Abstract Background Chromosomal Comparative Genomic Hybridization (CGH) has been applied to all stages of cervical carcinoma progression, defining a specific pattern of chromosomal imbalances in this tumor. However, given its limited spatial resolution, chromosomal CGH has offered only general information regarding the possible genetic targets of DNA copy number changes. Methods In order to further define specific DNA copy number changes in cervical cancer, we analyzed 20 cervical samples (3 ...

  7. Genome-wide detection of copy number variations among diverse horse breeds by array CGH.

    Directory of Open Access Journals (Sweden)

    Wei Wang

    Full Text Available Recent studies have found that copy number variations (CNVs are widespread in human and animal genomes. CNVs are a significant source of genetic variation, and have been shown to be associated with phenotypic diversity. However, the effect of CNVs on genetic variation in horses is not well understood. In the present study, CNVs in 6 different breeds of mare horses, Mongolia horse, Abaga horse, Hequ horse and Kazakh horse (all plateau breeds and Debao pony and Thoroughbred, were determined using aCGH. In total, seven hundred CNVs were identified ranging in size from 6.1 Kb to 0.57 Mb across all autosomes, with an average size of 43.08 Kb and a median size of 15.11 Kb. By merging overlapping CNVs, we found a total of three hundred and fifty-three CNV regions (CNVRs. The length of the CNVRs ranged from 6.1 Kb to 1.45 Mb with average and median sizes of 38.49 Kb and 13.1 Kb. Collectively, 13.59 Mb of copy number variation was identified among the horses investigated and accounted for approximately 0.61% of the horse genome sequence. Five hundred and eighteen annotated genes were affected by CNVs, which corresponded to about 2.26% of all horse genes. Through the gene ontology (GO, genetic pathway analysis and comparison of CNV genes among different breeds, we found evidence that CNVs involving 7 genes may be related to the adaptation to severe environment of these plateau horses. This study is the first report of copy number variations in Chinese horses, which indicates that CNVs are ubiquitous in the horse genome and influence many biological processes of the horse. These results will be helpful not only in mapping the horse whole-genome CNVs, but also to further research for the adaption to the high altitude severe environment for plateau horses.

  8. Rapid whole genome sequencing for the detection and characterization of microorganisms directly from clinical samples

    DEFF Research Database (Denmark)

    Hasman, Henrik; Saputra, Dhany; Sicheritz-Pontén, Thomas

    2014-01-01

    Whole genome sequencing (WGS) is becoming available as a routine tool for clinical microbiology. If applied directly on clinical samples this could further reduce diagnostic time and thereby improve control and treatment. A major bottle-neck is the availability of fast and reliable bioinformatics...... agreement was observed between phenotypic and predicted antimicrobial susceptibility. Complete agreement was observed between species identification, multi-locus-sequence typing and phylogenetic relationship for the Escherichia coli and Enterococcus faecalis isolates when comparing the results of WGS...

  9. Whole genome detection of signature of positive selection in African cattle reveals selection for thermotolerance.

    Science.gov (United States)

    Taye, Mengistie; Lee, Wonseok; Caetano-Anolles, Kelsey; Dessie, Tadelle; Hanotte, Olivier; Mwai, Okeyo Ally; Kemp, Stephen; Cho, Seoae; Oh, Sung Jong; Lee, Hak-Kyo; Kim, Heebal

    2017-12-01

    As African indigenous cattle evolved in a hot tropical climate, they have developed an inherent thermotolerance; survival mechanisms include a light-colored and shiny coat, increased sweating, and cellular and molecular mechanisms to cope with high environmental temperature. Here, we report the positive selection signature of genes in African cattle breeds which contribute for their heat tolerance mechanisms. We compared the genomes of five indigenous African cattle breeds with the genomes of four commercial cattle breeds using cross-population composite likelihood ratio (XP-CLR) and cross-population extended haplotype homozygosity (XP-EHH) statistical methods. We identified 296 (XP-EHH) and 327 (XP-CLR) positively selected genes. Gene ontology analysis resulted in 41 biological process terms and six Kyoto Encyclopedia of Genes and Genomes pathways. Several genes and pathways were found to be involved in oxidative stress response, osmotic stress response, heat shock response, hair and skin properties, sweat gland development and sweating, feed intake and metabolism, and reproduction functions. The genes and pathways identified directly or indirectly contribute to the superior heat tolerance mechanisms in African cattle populations. The result will improve our understanding of the biological mechanisms of heat tolerance in African cattle breeds and opens an avenue for further study. © 2017 Japanese Society of Animal Science.

  10. Genome scans for detecting footprints of local adaptation using a Bayesian factor model.

    Science.gov (United States)

    Duforet-Frebourg, Nicolas; Bazin, Eric; Blum, Michael G B

    2014-09-01

    There is a considerable impetus in population genomics to pinpoint loci involved in local adaptation. A powerful approach to find genomic regions subject to local adaptation is to genotype numerous molecular markers and look for outlier loci. One of the most common approaches for selection scans is based on statistics that measure population differentiation such as FST. However, there are important caveats with approaches related to FST because they require grouping individuals into populations and they additionally assume a particular model of population structure. Here, we implement a more flexible individual-based approach based on Bayesian factor models. Factor models capture population structure with latent variables called factors, which can describe clustering of individuals into populations or isolation-by-distance patterns. Using hierarchical Bayesian modeling, we both infer population structure and identify outlier loci that are candidates for local adaptation. In order to identify outlier loci, the hierarchical factor model searches for loci that are atypically related to population structure as measured by the latent factors. In a model of population divergence, we show that it can achieve a 2-fold or more reduction of false discovery rate compared with the software BayeScan or with an FST approach. We show that our software can handle large data sets by analyzing the single nucleotide polymorphisms of the Human Genome Diversity Project. The Bayesian factor model is implemented in the open-source PCAdapt software. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  11. Application of whole genome shotgun sequencing for detection and characterization of genetically modified organisms and derived products.

    Science.gov (United States)

    Holst-Jensen, Arne; Spilsberg, Bjørn; Arulandhu, Alfred J; Kok, Esther; Shi, Jianxin; Zel, Jana

    2016-07-01

    The emergence of high-throughput, massive or next-generation sequencing technologies has created a completely new foundation for molecular analyses. Various selective enrichment processes are commonly applied to facilitate detection of predefined (known) targets. Such approaches, however, inevitably introduce a bias and are prone to miss unknown targets. Here we review the application of high-throughput sequencing technologies and the preparation of fit-for-purpose whole genome shotgun sequencing libraries for the detection and characterization of genetically modified and derived products. The potential impact of these new sequencing technologies for the characterization, breeding selection, risk assessment, and traceability of genetically modified organisms and genetically modified products is yet to be fully acknowledged. The published literature is reviewed, and the prospects for future developments and use of the new sequencing technologies for these purposes are discussed.

  12. Surface-enhanced Raman scattering detection of DNA derived from the West Nile virus genome using magnetic capture of Raman-active gold nanoparticles

    Science.gov (United States)

    A model paramagnetic nanoparticle (MNP) assay is demonstrated for surface-enhanced Raman scattering (SERS) detection of DNA oligonucleotides derived from the West Nile virus (WNV) genome. Detection is based on the capture of WNV target sequences by hybridization with complementary oligonucleotide pr...

  13. Using the ubiquitous pH meter combined with a loop mediated isothermal amplification method for facile and sensitive detection of Nosema bombycis genomic DNA PTP1.

    Science.gov (United States)

    Xie, Shunbi; Yuan, Yali; Song, Yue; Zhuo, Ying; Li, Tian; Chai, Yaqin; Yuan, Ruo

    2014-12-28

    Here we show an amplification-coupled detection method for directly measuring released hydrogen ions during the loop mediated isothermal amplification (LAMP) procedure by using a pH meter. The genomic DNA of Nosema bombycis (N. bombycis) was amplified and detected by employing this LAMP-pH meter platform for the first time.

  14. A new method for detecting signal regions in ordered sequences of real numbers, and application to viral genomic data.

    Science.gov (United States)

    Gog, Julia R; Lever, Andrew M L; Skittrall, Jordan P

    2018-01-01

    We present a fast, robust and parsimonious approach to detecting signals in an ordered sequence of numbers. Our motivation is in seeking a suitable method to take a sequence of scores corresponding to properties of positions in virus genomes, and find outlying regions of low scores. Suitable statistical methods without using complex models or making many assumptions are surprisingly lacking. We resolve this by developing a method that detects regions of low score within sequences of real numbers. The method makes no assumptions a priori about the length of such a region; it gives the explicit location of the region and scores it statistically. It does not use detailed mechanistic models so the method is fast and will be useful in a wide range of applications. We present our approach in detail, and test it on simulated sequences. We show that it is robust to a wide range of signal morphologies, and that it is able to capture multiple signals in the same sequence. Finally we apply it to viral genomic data to identify regions of evolutionary conservation within influenza and rotavirus.

  15. Implementation of Nationwide Real-time Whole-genome Sequencing to Enhance Listeriosis Outbreak Detection and Investigation.

    Science.gov (United States)

    Jackson, Brendan R; Tarr, Cheryl; Strain, Errol; Jackson, Kelly A; Conrad, Amanda; Carleton, Heather; Katz, Lee S; Stroika, Steven; Gould, L Hannah; Mody, Rajal K; Silk, Benjamin J; Beal, Jennifer; Chen, Yi; Timme, Ruth; Doyle, Matthew; Fields, Angela; Wise, Matthew; Tillman, Glenn; Defibaugh-Chavez, Stephanie; Kucerova, Zuzana; Sabol, Ashley; Roache, Katie; Trees, Eija; Simmons, Mustafa; Wasilenko, Jamie; Kubota, Kristy; Pouseele, Hannes; Klimke, William; Besser, John; Brown, Eric; Allard, Marc; Gerner-Smidt, Peter

    2016-08-01

    Listeria monocytogenes (Lm) causes severe foodborne illness (listeriosis). Previous molecular subtyping methods, such as pulsed-field gel electrophoresis (PFGE), were critical in detecting outbreaks that led to food safety improvements and declining incidence, but PFGE provides limited genetic resolution. A multiagency collaboration began performing real-time, whole-genome sequencing (WGS) on all US Lm isolates from patients, food, and the environment in September 2013, posting sequencing data into a public repository. Compared with the year before the project began, WGS, combined with epidemiologic and product trace-back data, detected more listeriosis clusters and solved more outbreaks (2 outbreaks in pre-WGS year, 5 in WGS year 1, and 9 in year 2). Whole-genome multilocus sequence typing and single nucleotide polymorphism analyses provided equivalent phylogenetic relationships relevant to investigations; results were most useful when interpreted in context of epidemiological data. WGS has transformed listeriosis outbreak surveillance and is being implemented for other foodborne pathogens. Published by Oxford University Press for the Infectious Diseases Society of America 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  16. Detection and quantification of enterovirus 71 genome from cerebrospinal fluid of an encephalitis patient by PCR applications.

    Science.gov (United States)

    Fujimoto, Tsuguto; Yoshida, Shigeru; Munemura, Tetsuya; Taniguchi, Kiyosu; Shinohara, Michiyo; Nishio, Osamu; Chikahira, Masatsugu; Okabe, Nobuhiko

    2008-11-01

    Enterovirus 71 (EV71) is one of the causative agents of hand, foot, and mouth disease (HFMD) and is known to cause encephalitis, but several reports have identified EV71 in cerebrospinal fluid (CSF). We detected EV71 in CSF from a 20-month-old infant. The patient was diagnosed with brainstem encephalitis associated with HFMD. The clinical features of the patient were high fever (39.1C) and myoclonic jerks, and magnetic resonance imaging of the brain showed a bright signal area around the 4th ventricle. From a nasopharyngeal swab and rectal swab, EV71 was detected using reverse transcription (RT)-nested polymerase chain reaction (PCR). From CSF, the EV71 genome was identified using pan-enterovirus RT-nested PCR and sequencing. By real-time PCR, the nasopharyngeal swab, rectal swab, and CSF contained 1.8 x 10(4), 9.8 x 10(4), and 1.8 x 10 copies of the EV71 genome/microL, respectively. The enterovirus could only be isolated by cell culture from the rectal swab, and it was identified by a neutralization test using EV71-specific antiserum. RT-nested PCR and real-time PCR are considered to be sensitive tools for EV71 diagnosis in CSF.

  17. SSTRAP: A computational model for genomic motif discovery ...

    African Journals Online (AJOL)

    Journal of Computer Science and Its Application. Journal Home · ABOUT THIS JOURNAL · Advanced Search · Current Issue · Archives · Journal Home > Vol 21, No 2 (2014) >. Log in or Register to get access to full text downloads.

  18. Motif oriented high-resolution analysis of ChIP-seq data reveals the topological order of CTCF and cohesin proteins on DNA.

    Science.gov (United States)

    Nagy, Gergely; Czipa, Erik; Steiner, László; Nagy, Tibor; Pongor, Sándor; Nagy, László; Barta, Endre

    2016-08-15

    ChIP-seq provides a wealth of information on the approximate location of DNA-binding proteins genome-wide. It is known that the targeted motifs in most cases can be found at the peak centers. A high resolution mapping of ChIP-seq peaks could in principle allow the fine mapping of the protein constituents within protein complexes, but the current ChIP-seq analysis pipelines do not target the basepair resolution strand specific mapping of peak summits. The approach proposed here is based on i) locating regions that are bound by a sufficient number of proteins constituting a complex; ii) determining the position of the underlying motif using either a direct or a de novo motif search approach; and iii) determining the exact location of the peak summits with respect to the binding motif in a strand specific manner. We applied this method for analyzing the CTCF/cohesin complex, which holds together DNA loops. The relative positions of the constituents of the complex were determined with one-basepair estimated accuracy. Mapping the positions on a 3D model of DNA made it possible to deduce the approximate local topology of the complex that allowed us to predict how the CTCF/cohesin complex locks the DNA loops. As the positioning of the proteins was not compatible with previous models of loop closure, we proposed a plausible "double embrace" model in which the DNA loop is held together by two adjacent cohesin rings in such a way that the ring anchored by CTCF to one DNA duplex encircles the other DNA double helix and vice versa. A motif-centered, strand specific analysis of ChIP-seq data improves the accuracy of determining peak positions. If a genome contains a large number of binding sites for a given protein complex, such as transcription factor heterodimers or transcription factor/cofactor complexes, the relative position of the constituent proteins on the DNA can be established with an accuracy that allow one to deduce the local topology of the protein complex. The

  19. Comparative genome scan detects host-related divergent selection in the grasshopper Hesperotettix viridis.

    Science.gov (United States)

    Apple, Jennifer L; Grace, Tony; Joern, Anthony; St Amand, Paul; Wisely, Samantha M

    2010-09-01

    In this study, we used a comparative genome scan to examine patterns of population differentiation with respect to host plant use in Hesperotettix viridis, a Nearctic oligophagous grasshopper locally specialized on various Asteraceae including Solidago, Gutierrezia, and Ericameria. We identified amplified fragment length polymorphism (AFLP) loci with significantly elevated F(ST) (outlier loci) in multiple different-host and same-host comparisons of populations while controlling for geographic distance. By comparing the number and identities of outlier loci in different-host vs. same-host comparisons, we found evidence of host plant-related divergent selection for some population comparisons (Solidago- vs. Gutierrezia-feeders), while other comparisons (Ericameria- vs. Gutierrezia-feeders) failed to demonstrate a strong role for host association in population differentiation. In comparisons of Solidago- vs. Gutierrezia-feeding populations, a relatively high number of outlier loci observed repeatedly in different-host comparisons (35% of all outliers and 2.7% of all 625 AFLP loci) indicated a significant role for host-related selection in contributing to overall genomic differentiation in this grasshopper. Mitochondrial DNA sequence data revealed a star-shaped phylogeny with no host- or geography-related structure, low nucleotide diversity, and high haplotype diversity, suggesting a recent population expansion. mtDNA data do not suggest a long period of isolation in separate glacial refugia but are instead more compatible with a single glacial refugium and more recent divergence in host use. Our study adds to research documenting heterogeneity in differentiation across the genome as a consequence of divergent natural selection, a phenomenon that may occur as part of the process of ecological speciation. © 2010 Blackwell Publishing Ltd.

  20. Strategic Lean Organizational Design: Towards Lean World-Small World Configurations through Discrete Dynamic Organizational Motifs

    Directory of Open Access Journals (Sweden)

    Javier Villalba-Diez

    2016-01-01

    Full Text Available Organizations face strong international competition in the global market arena in achieving strategic goals such as high quality of product or service at lower cost while increasing their ability to respond quickly to requirements of the market. These challenges concern strategically designing organizations that can meet global challenges and specialize locally to meet performance constraints. After introducing the concept of organizational functional and structural motifs as small organizational building block, our findings suggest the hypothesis that a strategic organizational design (SOD approach to meet these challenges involves maximizing the number and diversity of functional motifs, while minimizing the repertoire of structural motifs. By detecting characteristic structural motifs, we provide organizational leaders with specific Lean SOD solutions with which to meet local and global challenges simultaneously. As a matter of application, we show the implementation of such an SOD approach in nine US hospitals that form one large health care holding.

  1. A novel swarm intelligence algorithm for finding DNA motifs.

    Science.gov (United States)

    Lei, Chengwei; Ruan, Jianhua

    2009-01-01

    Discovering DNA motifs from co-expressed or co-regulated genes is an important step towards deciphering complex gene regulatory networks and understanding gene functions. Despite significant improvement in the last decade, it still remains one of the most challenging problems in computational molecular biology. In this work, we propose a novel motif finding algorithm that finds consensus patterns using a population-based stochastic optimisation technique called Particle Swarm Optimisation (PSO), which has been shown to be effective in optimising difficult multidimensional problems in continuous domains. We propose to use a word dissimilarity graph to remap the neighborhood structure of the solution space of DNA motifs, and propose a modification of the naive PSO algorithm to accommodate discrete variables. In order to improve efficiency, we also propose several strategies for escaping from local optima and for automatically determining the termination criteria. Experimental results on simulated challenge problems show that our method is both more efficient and more accurate than several existing algorithms. Applications to several sets of real promoter sequences also show that our approach is able to detect known transcription factor binding sites, and outperforms two of the most popular existing algorithms.

  2. Short Arginine Motifs Drive Protein Stickiness in the Escherichia coli Cytoplasm.

    Science.gov (United States)

    Kyne, Ciara; Crowley, Peter B

    2017-09-19

    Although essential to numerous biotech applications, knowledge of molecular recognition by arginine-rich motifs in live cells remains limited. 1 H, 15 N HSQC and 19 F NMR spectroscopies were used to investigate the effects of C-terminal -GR n (n = 1-5) motifs on GB1 interactions in Escherichia coli cells and cell extracts. While the "biologically inert" GB1 yields high-quality in-cell spectra, the -GR n fusions with n = 4 or 5 were undetectable. This result suggests that a tetra-arginine motif is sufficient to drive interactions between a test protein and macromolecules in the E. coli cytoplasm. The inclusion of a 12 residue flexible linker between GB1 and the -GR 5 motif did not improve detection of the "inert" domain. In contrast, all of the constructs were detectable in cell lysates and extracts, suggesting that the arginine-mediated complexes were weak. Together these data reveal the significance of weak interactions between short arginine-rich motifs and the E. coli cytoplasm and demonstrate the potential of such motifs to modify protein interactions in living cells. These interactions must be considered in the design of (in vivo) nanoscale assemblies that rely on arginine-rich sequences.

  3. Detecting and correcting the binding-affinity bias in ChIP-seq data using inter-species information.

    Science.gov (United States)

    Nettling, Martin; Treutler, Hendrik; Cerquides, Jesus; Grosse, Ivo

    2016-05-10

    Transcriptional gene regulation is a fundamental process in nature, and the experimental and computational investigation of DNA binding motifs and their binding sites is a prerequisite for elucidating this process. ChIP-seq has become the major technology to uncover genomic regions containing those binding sites, but motifs predicted by traditional computational approaches using these data are distorted by a ubiquitous binding-affinity bias. Here, we present an approach for detecting and correcting this bias using inter-species information. We find that the binding-affinity bias caused by the ChIP-seq experiment in the reference species is stronger than the indirect binding-affinity bias in orthologous regions from phylogenetically related species. We use this difference to develop a phylogenetic footprinting model that is capable of detecting and correcting the binding-affinity bias. We find that this model improves motif prediction and that the corrected motifs are typically softer than those predicted by traditional approaches. These findings indicate that motifs published in databases and in the literature are artificially sharpened compared to the native motifs. These findings also indicate that our current understanding of transcriptional gene regulation might be blurred, but that it is possible to advance this understanding by taking into account inter-species information available today and even more in the future.

  4. Multiplexed, rapid detection of H5N1 using a PCR-free nanoparticle-based genomic microarray assay

    Directory of Open Access Journals (Sweden)

    Ragupathy Viswanath

    2010-10-01

    Full Text Available Abstract Background For more than a decade there has been increasing interest in the use of nanotechnology and microarray platforms for diagnostic applications. In this report, we describe a rapid and simple gold nanoparticle (NP-based genomic microarray assay for specific identification of avian influenza virus H5N1 and its discrimination from other major influenza A virus strains (H1N1, H3N2. Results Capture and intermediate oligonucleotides were designed based on the consensus sequences of the matrix (M gene of H1N1, H3N2 and H5N1 viruses, and sequences specific for the hemaglutinin (HA and neuraminidase (NA genes of the H5N1 virus. Viral RNA was detected within 2.5 hours using capture-target-intermediate oligonucleotide hybridization and gold NP-mediated silver staining in the absence of RNA fragmentation, target amplification, and enzymatic reactions. The lower limit of detection (LOD of the assay was less than 100 fM for purified PCR fragments and 103 TCID50 units for H5N1 viral RNA. Conclusions The NP-based microarray assay was able to detect and distinguish H5N1 sequences from those of major influenza A viruses (H1N1, H3N2. The new method described here may be useful for simultaneous detection and subtyping of major influenza A viruses.

  5. Detection of alien chromatin introgression from Thinopyrum into wheat using S genomic DNA as a probe--a landmark approach for Thinopyrum genome research.

    Science.gov (United States)

    Chen, Q

    2005-01-01

    The introduction of alien genetic variation from the genus Thinopyrum through chromosome engineering into wheat is a valuable and proven technique for wheat improvement. A number of economically important traits have been transferred into wheat as single genes, chromosome arms or entire chromosomes. Successful transfers can be greatly assisted by the precise identification of alien chromatin in the recipient progenies. Chromosome identification and characterization are useful for genetic manipulation and transfer in wheat breeding following chromosome engineering. Genomic in situ hybridization (GISH) using an S genomic DNA probe from the diploid species Pseudoroegneria has proven to be a powerful diagnostic cytogenetic tool for monitoring the transfer of many promising agronomic traits from Thinopyrum. This specific S genomic probe not only allows the direct determination of the chromosome composition in wheat-Thinopyrum hybrids, but also can separate the Th. intermedium chromosomes into the J, J(S) and S genomes. The J(S) genome, which consists of a modified J genome chromosome distinguished by S genomic sequences of Pseudoroegneria near the centromere and telomere, carries many disease and mite resistance genes. Utilization of this S genomic probe leads to a better understanding of genomic affinities between Thinopyrum and wheat, and provides a molecular cytogenetic marker for monitoring the transfer of alien Thinopyrum agronomic traits into wheat recipient lines. Copyright 2005 S. Karger AG, Basel.

  6. Capacitive DNA sensor for rapid and sensitive detection of whole genome human herpesvirus-1 dsDNA in serum.

    Science.gov (United States)

    Cheng, Cheng; Oueslati, Rania; Wu, Jayne; Chen, Jiangang; Eda, Shigetoshi

    2017-06-01

    This work presents a rapid, highly sensitive, low-cost, and specific capacitive DNA sensor for detection of whole genome human herpesvirus-1 DNA. This sensor is capable of direct DNA detection with a response time of 30 s, and it can be used to test standard buffer or serum samples. The sensing approach for DNA detection is based on alternating current (AC) electrokinetics. By applying an inhomogeneous AC electric field on sensor electrodes, positive dielectrophoresis is induced to accelerate DNA hybridization. The same applied AC signal also directly measures the hybridization of target with the probe on the sensor surface. Experiments are conducted to optimize the AC signal, as well as the buffers for probe immobilization and target DNA hybridization. The assay is highly sensitive and specific, with no response to human herpesvirus-2 DNA at 5 ng/mL and a LOD of 1.0 pg/mL (6.5 copies/μL or 10.7 aM) in standard buffer. When testing the double stranded (ds) DNA spiked in human serum samples, the sensor yields a LOD of 20.0 pg/mL (129.5 copies/μL or 0.21 femtomolar (fM)) in neat serum. In this work, the target is whole genome dsDNA, consequently the test can be performed without the use of enzyme or amplification, which considerably simplifies the sensor operation and is highly suitable for point of care disease diagnosis. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Genomic and oncoproteomic advances in detection and treatment of colorectal cancer.

    LENUS (Irish Health Repository)

    McHugh, Seamus M

    2012-02-01

    AIMS: We will examine the latest advances in genomic and proteomic laboratory technology. Through an extensive literature review we aim to critically appraise those studies which have utilized these latest technologies and ascertain their potential to identify clinically useful biomarkers. METHODS: An extensive review of the literature was carried out in both online medical journals and through the Royal College of Surgeons in Ireland library. RESULTS: Laboratory technology has advanced in the fields of genomics and oncoproteomics. Gene expression profiling with DNA microarray technology has allowed us to begin genetic profiling of colorectal cancer tissue. The response to chemotherapy can differ amongst individual tumors. For the first time researchers have begun to isolate and identify the genes responsible. New laboratory techniques allow us to isolate proteins preferentially expressed in colorectal cancer tissue. This could potentially lead to identification of a clinically useful protein biomarker in colorectal cancer screening and treatment. CONCLUSION: If a set of discriminating genes could be used for characterization and prediction of chemotherapeutic response, an individualized tailored therapeutic regime could become the standard of care for those undergoing systemic treatment for colorectal cancer. New laboratory techniques of protein identification may eventually allow identification of a clinically useful biomarker that could be used for screening and treatment. At present however, both expression of different gene signatures and isolation of various protein peaks has been limited by study size. Independent multi-centre correlation of results with larger sample sizes is needed to allow translation into clinical practice.

  8. Fluorescent In Situ Hybridization to Detect Transgene Integration into Plant Genomes

    Science.gov (United States)

    Schwarzacher, Trude

    Fluorescent chromosome analysis technologies have advanced our understanding of genome organization during the last 30 years and have enabled the investigation of DNA organization and structure as well as the evolution of chromosomes. Fluorescent chromosome staining allows even small chromosomes to be visualized, characterized by their composition and morphology, and counted. Aneuploidies and polyploidies can be established for species, breeding lines, and individuals, including changes occurring during hybridization or tissue culture and transformation protocols. Fluorescent in situ hybridization correlates molecular information of a DNA sequence with its physical location on chromosomes and genomes. It thus allows determination of the physical position of sequences and often is the only means to determine the abundance and distribution of DNA sequences that are difficult to map with any other molecular method or would require segregation analysis, in particular multicopy or repetitive DNA. Equally, it is often the best way to establish the incorporation of transgenes, their numbers, and physical organization along chromosomes. This chapter presents protocols for probe and chromosome preparation, fluorescent in situ hybridization, chromosome staining, and the analysis of results.

  9. HYBRIDCHECK: software for the rapid detection, visualization and dating of recombinant regions in genome sequence data.

    Science.gov (United States)

    Ward, Ben J; van Oosterhout, Cock

    2016-03-01

    HYBRIDCHECK is a software package to visualize the recombination signal in large DNA sequence data set, and it can be used to analyse recombination, genetic introgression, hybridization and horizontal gene transfer. It can scan large (multiple kb) contigs and whole-genome sequences of three or more individuals. HYBRIDCHECK is written in the r software for OS X, Linux and Windows operating systems, and it has a simple graphical user interface. In addition, the r code can be readily incorporated in scripts and analysis pipelines. HYBRIDCHECK implements several ABBA-BABA tests and visualizes the effects of hybridization and the resulting mosaic-like genome structure in high-density graphics. The package also reports the following: (i) the breakpoint positions, (ii) the number of mutations in each introgressed block, (iii) the probability that the identified region is not caused by recombination and (iv) the estimated age of each recombination event. The divergence times between the donor and recombinant sequence are calculated using a JC, K80, F81, HKY or GTR correction, and the dating algorithm is exceedingly fast. By estimating the coalescence time of introgressed blocks, it is possible to distinguish between hybridization and incomplete lineage sorting. HYBRIDCHECK is libré software and it and its manual are free to download from http://ward9250.github.io/HybridCheck/. © 2015 John Wiley & Sons Ltd.

  10. Genomic and oncoproteomic advances in detection and treatment of colorectal cancer.

    LENUS (Irish Health Repository)

    McHugh, Seamus M

    2009-01-01

    AIMS: We will examine the latest advances in genomic and proteomic laboratory technology. Through an extensive literature review we aim to critically appraise those studies which have utilized these latest technologies and ascertain their potential to identify clinically useful biomarkers. METHODS: An extensive review of the literature was carried out in both online medical journals and through the Royal College of Surgeons in Ireland library. RESULTS: Laboratory technology has advanced in the fields of genomics and oncoproteomics. Gene expression profiling with DNA microarray technology has allowed us to begin genetic profiling of colorectal cancer tissue. The response to chemotherapy can differ amongst individual tumors. For the first time researchers have begun to isolate and identify the genes responsible. New laboratory techniques allow us to isolate proteins preferentially expressed in colorectal cancer tissue. This could potentially lead to identification of a clinically useful protein biomarker in colorectal cancer screening and treatment. CONCLUSION: If a set of discriminating genes could be used for characterization and prediction of chemotherapeutic response, an individualized tailored therapeutic regime could become the standard of care for those undergoing systemic treatment for colorectal cancer. New laboratory techniques of protein identification may eventually allow identification of a clinically useful biomarker that could be used for screening and treatment. At present however, both expression of different gene signatures and isolation of various protein peaks has been limited by study size. Independent multi-centre correlation of results with larger sample sizes is needed to allow translation into clinical practice.

  11. Efficient sequential and parallel algorithms for planted motif search.

    Science.gov (United States)

    Nicolae, Marius; Rajasekaran, Sanguthevar

    2014-01-31

    Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances. This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/. We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.

  12. scnRCA: a novel method to detect consistent patterns of translational selection in mutationally-biased genomes.

    Directory of Open Access Journals (Sweden)

    Patrick K O'Neill

    Full Text Available Codon usage bias (CUB results from the complex interplay between translational selection and mutational biases. Current methods for CUB analysis apply heuristics to integrate both components, limiting the depth and scope of CUB analysis as a technique to probe into the evolution and optimization of protein-coding genes. Here we introduce a self-consistent CUB index (scnRCA that incorporates implicit correction for mutational biases, facilitating exploration of the translational selection component of CUB. We validate this technique using gene expression data and we apply it to a detailed analysis of CUB in the Pseudomonadales. Our results illustrate how the selective enrichment of specific codons among highly expressed genes is preserved in the context of genome-wide shifts in codon frequencies, and how the balance between mutational and translational biases leads to varying definitions of codon optimality. We extend this analysis to other moderate and fast growing bacteria and we provide unified support for the hypothesis that C- and A-ending codons of two-box amino acids, and the U-ending codons of four-box amino acids, are systematically enriched among highly expressed genes across bacteria. The use of an unbiased estimator of CUB allows us to report for the first time that the signature of translational selection is strongly conserved in the Pseudomonadales in spite of drastic changes in genome composition, and extends well beyond the core set of highly optimized genes in each genome. We generalize these results to other moderate and fast growing bacteria, hinting at selection for a universal pattern of gene expression that is conserved and detectable in conserved patterns of codon usage bias.

  13. Analisis Unsur Matematika pada Motif Sulam Usus

    Directory of Open Access Journals (Sweden)

    Fredi Ganda Putra

    2017-12-01

    Full Text Available Based on interviews with researchers sources said that the beginning of the intestine embroidery is an art of genuine crafts. Called the intestine embroidery because this technique is a technique of combining a strand of cloth resembling the intestine formed according to the pattern by means of embroidered using a thread. Intestinal embroidery techniques were originally used to create a cover of the women's customary wardrobe of Lampung or often referred to as bebe. But not many people in Lampung, especially people who live in Lampung are still many who do not know and recognize the intestine embroidery because most only know tapis only characteristic of Lampung, besides that there are other cultural results that is embroidered intestine. There are still many who do not know that the intestine motif there is a knowledge of mathematics. The researcher's problem formulation is whether there are mathematical elements contained in the intestine embroidery motif based on the concept of geometry. The purpose of this study is to determine whether there are elements of mathematics contained in the intestine motif based on the concept of geometry. Subjects in this study consisted of 4 people obtained by purposive sampling technique. From the results of data analysis conducted by using descriptive analysis and discussion as follows: (1 Intestinal embroidery motif contains the meaning of mathematics and culture or often called Etnomatematika. On the meaning of culture there is a link between the embroidery intestine with a culture that has been there before as the existence of cultural linkage between Hindu belief Buddhism and there are similarities of motifs and decorative patterns contained in the motif embroidery intestine with ornamental variety in Indonesia. (2 The relationship between the intestine with mathematical motifs there are elements of mathematics such as geometry elements in the form of geometry of dimension one and dimension two, and the

  14. Molecular markers detect stable genomic regions underlying tomato fruit shelf life and weight

    Directory of Open Access Journals (Sweden)

    Guillermo Raúl Pratta

    2011-01-01

    Full Text Available Incorporating wild germplasm such as S. pimpinellifolium is an alternative strategy to prolong tomato fruit shelf life(SL without reducing fruit quality. A set of recombinant inbred lines with discrepant values of SL and weight (FW were derived byantagonistic-divergent selection from an interspecific cross. The general objective of this research was to evaluate Genotype x Year(GY and Marker x Year (MY interaction in these new genetic materials for both traits. Genotype and year principal effects and GYinteraction were statistically significant for SL. Genotype and year principal effects were significant for FW but GY interaction wasnot. The marker principal effect was significant for SL and FW but both year principal effect and MY interaction were not significant.Though SL was highly influenced by year conditions, some genome regions appeared to maintain a stable effect across years ofevaluation. Fruit weight, instead, was more independent of year effect.

  15. Comparative genomic hybridization detects novel amplifications in fibroadenomas of the breast

    DEFF Research Database (Denmark)

    Ojopi, E P; Rogatto, S R; Caldeira, J R

    2001-01-01

    Comparative genomic hybridization analysis was performed for identification of chromosomal imbalances in 23 samples of fibroadenomas of the breast. Chromosomal gains rather than losses were a feature of these lesions. Only two cases with a familial and/or previous history of breast lesions had gain...... of 1q or 16q as the sole abnormality. The most frequently overrepresented segments were 5p14 (10/23 cases), 5q34-qter (6/23 cases), 13q32-qter (6/23 cases), 10q25-qter (5/23 cases), and 18q22 (4/23 cases). Some of these regions have previously been associated with breast carcinoma, but this study...... indicates that gain of these regions can also occur in benign breast lesions. Our findings may provide a basis for conducting further investigations to locate and identify genes associated with proliferation that may be involved in the early steps of tumorigenesis of the breast....

  16. Detection of yellow fever virus genomes from four imported cases in China.

    Science.gov (United States)

    Cui, Shujuan; Pan, Yang; Lyu, Yanning; Liang, Zhichao; Li, Jie; Sun, Yulan; Dou, Xiangfeng; Tian, Lili; Huo, Da; Chen, Lijuan; Li, Xinyu; Wang, Quanyi

    2017-07-01

    Yellow fever virus (YFV), as the first proven human-pathogenic virus, is still a major public health problem with a dramatic upsurge in recent years. This is a report on four imported cases of yellow fever virus into China identified by whole genome sequencing. Phylogenetic analysis was performed and the results showed that these four viruses were highly homologous with Angola 71 strains (AY968064). In addition, effective mutations of amino acids were not observed in the E protein domain of four viruses, thus confirming the effectiveness of the YFV-17D vaccine (X03700). Although there is low risk of local transmission in most part of China, the increasing public health risk of YF caused by international exchange should not be ignored. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  17. Genome-scale detection of positive selection in nine primates predicts human-virus evolutionary conflicts.

    Science.gov (United States)

    van der Lee, Robin; Wiel, Laurens; van Dam, Teunis J P; Huynen, Martijn A

    2017-10-13

    Hotspots of rapid genome evolution hold clues about human adaptation. We present a comparative analysis of nine whole-genome sequenced primates to identify high-confidence targets of positive selection. We find strong statistical evidence for positive selection in 331 protein-coding genes (3%), pinpointing 934 adaptively evolving codons (0.014%). Our new procedure is stringent and reveals substantial artefacts (20% of initial predictions) that have inflated previous estimates. The final 331 positively selected genes (PSG) are strongly enriched for innate and adaptive immunity, secreted and cell membrane proteins (e.g. pattern recognition, complement, cytokines, immune receptors, MHC, Siglecs). We also find evidence for positive selection in reproduction and chromosome segregation (e.g. centromere-associated CENPO, CENPT), apolipoproteins, smell/taste receptors and mitochondrial proteins. Focusing on the virus-host interaction, we retrieve most evolutionary conflicts known to influence antiviral activity (e.g. TRIM5, MAVS, SAMHD1, tetherin) and predict 70 novel cases through integration with virus-human interaction data. Protein structure analysis further identifies positive selection in the interaction interfaces between viruses and their cellular receptors (CD4-HIV; CD46-measles, adenoviruses; CD55-picornaviruses). Finally, primate PSG consistently show high sequence variation in human exomes, suggesting ongoing evolution. Our curated dataset of positive selection is a rich source for studying the genetics underlying human (antiviral) phenotypes. Procedures and data are available at https://github.com/robinvanderlee/positive-selection. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. ReCombine: a suite of programs for detection and analysis of meiotic recombination in whole-genome datasets.

    Directory of Open Access Journals (Sweden)

    Carol M Anderson

    Full Text Available In meiosis, the exchange of DNA between chromosomes by homologous recombination is a critical step that ensures proper chromosome segregation and increases genetic diversity. Products of recombination include reciprocal exchanges, known as crossovers, and non-reciprocal gene conversions or non-crossovers. The mechanisms underlying meiotic recombination remain elusive, largely because of the difficulty of analyzing large numbers of recombination events by traditional genetic methods. These traditional methods are increasingly being superseded by high-throughput techniques capable of surveying meiotic recombination on a genome-wide basis. Next-generation sequencing or microarray hybridization is used to genotype thousands of polymorphic markers in the progeny of hybrid yeast strains. New computational tools are needed to perform this genotyping and to find and analyze recombination events. We have developed a suite of programs, ReCombine, for using short sequence reads from next-generation sequencing experiments to genotype yeast meiotic progeny. Upon genotyping, the program CrossOver, a component of ReCombine, then detects recombination products and classifies them into categories based on the features found at each location and their distribution among the various chromatids. CrossOver is also capable of analyzing segregation data from microarray experiments or other sources. This package of programs is designed to allow even researchers without computational expertise to use high-throughput, whole-genome methods to study the molecular mechanisms of meiotic recombination.

  19. FaSD-somatic: a fast and accurate somatic SNV detection algorithm for cancer genome sequencing data.

    Science.gov (United States)

    Wang, Weixin; Wang, Panwen; Xu, Feng; Luo, Ruibang; Wong, Maria Pik; Lam, Tak-Wah; Wang, Junwen

    2014-09-01

    Recent advances in high-throughput sequencing technologies have enabled us to sequence large number of cancer samples to reveal novel insights into oncogenetic mechanisms. However, the presence of intratumoral heterogeneity, normal cell contamination and insufficient sequencing depth, together pose a challenge for detecting somatic mutations. Here we propose a fast and an accurate somatic single-nucleotide variations (SNVs) detection program, FaSD-somatic. The performance of FaSD-somatic is extensively assessed on various types of cancer against several state-of-the-art somatic SNV detection programs. Benchmarked by somatic SNVs from either existing databases or de novo higher-depth sequencing data, FaSD-somatic has the best overall performance. Furthermore, FaSD-somatic is efficient, it finishes somatic SNV calling within 14 h on 50X whole genome sequencing data in paired samples. The program, datasets and supplementary files are available at http://jjwanglab.org/FaSD-somatic/. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  20. CollapsABEL: an R library for detecting compound heterozygote alleles in genome-wide association studies.

    Science.gov (United States)

    Zhong, Kaiyin; Karssen, Lennart C; Kayser, Manfred; Liu, Fan

    2016-04-08

    Compound Heterozygosity (CH) in classical genetics is the presence of two different recessive mutations at a particular gene locus. A relaxed form of CH alleles may account for an essential proportion of the missing heritability, i.e. heritability of phenotypes so far not accounted for by single genetic variants. Methods to detect CH-like effects in genome-wide association studies (GWAS) may facilitate explaining the missing heritability, but to our knowledge no viable software tools for this purpose are currently available. In this work we present the Generalized Compound Double Heterozygosity (GCDH) test and its implementation in the R package CollapsABEL. Time-consuming procedures are optimized for computational efficiency using Java or C++. Intermediate results are stored either in an SQL database or in a so-called big.matrix file to achieve reasonable memory footprint. Our large scale simulation studies show that GCDH is capable of discovering genetic associations due to CH-like interactions with much higher power than a conventional single-SNP approach under various settings, whether the causal genetic variations are available or not. CollapsABEL provides a user-friendly pipeline for genotype collapsing, statistical testing, power estimation, type I error control and graphics generation in the R language. CollapsABEL provides a computationally efficient solution for screening general forms of CH alleles in densely imputed microarray or whole genome sequencing datasets. The GCDH test provides an improved power over single-SNP based methods in detecting the prevalence of CH in human complex phenotypes, offering an opportunity for tackling the missing heritability problem. Binary and source packages of CollapsABEL are available on CRAN ( https://cran.r-project.org/web/packages/CollapsABEL ) and the website of the GenABEL project ( http://www.genabel.org/packages ).

  1. Detection of Sleeping Beauty transposition in the genome of host cells by non-radioactive Southern blot analysis

    Energy Technology Data Exchange (ETDEWEB)

    Aravalli, Rajagopal N., E-mail: aravalli@umn.edu [Department of Radiology, University of Minnesota Medical School, MMC 292, 420 Delaware Street SE, Minneapolis, MN 55455 (United States); Park, Chang W. [Department of Medicine, University of Minnesota Medical School, MMC 36, 420 Delaware Street SE, Minneapolis, MN 55455 (United States); Steer, Clifford J., E-mail: steer001@umn.edu [Department of Medicine, University of Minnesota Medical School, MMC 36, 420 Delaware Street SE, Minneapolis, MN 55455 (United States); Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN 55455 (United States)

    2016-08-26

    The Sleeping Beauty transposon (SB-Tn) system is being used widely as a DNA vector for the delivery of therapeutic transgenes, as well as a tool for the insertional mutagenesis in animal models. In order to accurately assess the insertional potential and properties related to the integration of SB it is essential to determine the copy number of SB-Tn in the host genome. Recently developed SB100X transposase has demonstrated an integration rate that was much higher than the original SB10 and that of other versions of hyperactive SB transposases, such as HSB3 or HSB17. In this study, we have constructed a series of SB vectors carrying either a DsRed or a human β-globin transgene that was encompassed by cHS4 insulator elements, and containing the SB100X transposase gene outside the SB-Tn unit within the same vector in cis configuration. These SB-Tn constructs were introduced into the K-562 erythroid cell line, and their presence in the genomes of host cells was analyzed by Southern blot analysis using non-radioactive probes. Many copies of SB-Tn insertions were detected in host cells regardless of transgene sequences or the presence of cHS4 insulator elements. Interestingly, the size difference of 2.4 kb between insulated SB and non-insulated controls did not reflect the proportional difference in copy numbers of inserted SB-Tns. We then attempted methylation-sensitive Southern blots to assess the potential influence of cHS4 insulator elements on the epigenetic modification of SB-Tn. Our results indicated that SB100X was able to integrate at multiple sites with the number of SB-Tn copies larger than 6 kb in size. In addition, the non-radioactive Southern blot protocols developed here will be useful to detect integrated SB-Tn copies in any mammalian cell type.

  2. Direct AUC optimization of regulatory motifs.

    Science.gov (United States)

    Zhu, Lin; Zhang, Hong-Bo; Huang, De-Shuang

    2017-07-15

    The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high-throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences. We propose a novel algorithm called CDAUC for optimizing DML-learned motifs based on the area under the receiver-operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high-throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities. CDAUC is available at: https://drive.google.com/drive/folders/0BxOW5MtIZbJjNFpCeHlBVWJHeW8 . dshuang@tongji.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  3. Development of a real-time PCR for detection of Staphylococcus pseudintermedius using a novel automated comparison of whole-genome sequences.

    Directory of Open Access Journals (Sweden)

    Koen M Verstappen

    Full Text Available Staphylococcus pseudintermedius is an opportunistic pathogen in dogs and cats and occasionally causes infections in humans. S. pseudintermedius is often resistant to multiple classes of antimicrobials. It requires a reliable detection so that it is not misidentified as S. aureus. Phenotypic and currently-used molecular-based diagnostic assays lack specificity or are labour-intensive using multiplex PCR or nucleic acid sequencing. The aim of this study was to identify a specific target for real-time PCR by comparing whole genome sequences of S. pseudintermedius and non-pseudintermedius.Genome sequences were downloaded from public repositories and supplemented by isolates that were sequenced in this study. A Perl-script was written that analysed 300-nt fragments from a reference genome sequence of S. pseudintermedius and checked if this sequence was present in other S. pseudintermedius genomes (n = 74 and non-pseudintermedius genomes (n = 138. Six sequences specific for S. pseudintermedius were identified (sequence length between 300-500 nt. One sequence, which was located in the spsJ gene, was used to develop primers and a probe. The real-time PCR showed 100% specificity when testing for S. pseudintermedius isolates (n = 54, and eight other staphylococcal species (n = 43. In conclusion, a novel approach by comparing whole genome sequences identified a sequence that is specific for S. pseudintermedius and provided a real-time PCR target for rapid and reliable detection of S. pseudintermedius.

  4. Detection of phytochrome-like genes from Rhazya stricta (Apocynaceae) using de novo genome assembly.

    Science.gov (United States)

    Sabir, Jamal S M; Baeshen, Nabih A; Shokry, Ahmed M; Gadalla, Nour O; Edris, Sherif; Mutwakil, Mohammed H; Ramadan, Ahmed M; Atef, Ahmed; Al-Kordy, Magdy A; Abuzinadah, Osama A; El-Domyati, Fotouh M; Jansen, Robert K; Bahieldin, Ahmed

    2013-01-01

    Phytochrome-like genes in the wild plant species Rhazya stricta Decne were characterized using a de novo genome assembly of next generation sequence data. Rhazya stricta contains more than 100 alkaloids with multiple pharmacological properties, and leaf extracts have been used to cure chronic rheumatism, to treat tumors, and in the treatment of several other diseases. Phytochromes are known to be involved in the light-regulated biosynthesis of some alkaloids. Phytochromes are soluble chromoproteins that function in the absorption of red and far-red light and the transduction of intracellular signals during light-regulated plant development. De novo assembly of the nuclear genome of R. stricta recovered 45,641 contigs greater than 1000bp long, which were used in constructing a local database. Five sequences belonging to Arabidopsis thaliana phytochrome gene family (i.e., AtphyABCDE) were used to identify R. stricta contigs with phytochrome-like sequences using BLAST. This led to the identification of three contigs with phytochrome-like sequences covering AtphyA-, AtphyC- and AtphyE-like full-length genes. Annotation of the three sequences showed that each contig consists of one phytochrome-like gene with three exons and two introns. BLASTn and BLASTp results indicated that RsphyA mRNA and protein sequences had homologues in Wrightia coccinea and and Solanum tuberosum, respectively. RsphyC-like mRNA and protein sequence were homologous to Vitis vinifera and Vitis riparia. RsphyE-like mRNA coding and protein sequences were homologous to Ipomoea nil. Multiple-sequence alignment of phytochrome proteins indicated a homology with 30 sequences from 23 different species of flowering plants. Phylogenetic analysis confirmed that each R. stricta phytochrome gene is related to the same phytochrome gene of other flowering plants. It is proposed that the absence of phyB gene in R. stricta is due to RsphyA gene taking over the role of phyB. © 2013 Académie des sciences. All

  5. Zepto-molar electrochemical detection of Brucella genome based on gold nanoribbons covered by gold nanoblooms

    Science.gov (United States)

    Rahi, Amid; Sattarahmady, Naghmeh; Heli, Hossein

    2015-12-01

    Gold nanoribbons covered by gold nanoblooms were sonoelectrodeposited on a polycrystalline gold surface at -1800 mV (vs. AgCl) with the assistance of ultrasound and co-occurrence of the hydrogen evolution reaction. The nanostructure, as a transducer, was utilized to immobilize a Brucella-specific probe and fabrication of a genosensor, and the process of immobilization and hybridization was detected by electrochemical methods, using methylene blue as a redox marker. The proposed method for detection of the complementary sequence, sequences with base-mismatched (one-, two- and three-base mismatches), and the sequence of non-complementary sequence was assayed. The fabricated genosensor was evaluated for the assay of the bacteria in the cultured and human samples without polymerase chain reactions (PCR). The genosensor could detect the complementary sequence with a calibration sensitivity of 0.40 μA dm3 mol-1, a linear concentration range of 10 zmol dm-3 to 10 pmol dm-3, and a detection limit of 1.71 zmol dm-3.

  6. Genomics-enabled sensor platform for rapid detection of viruses related to disease outbreak.

    Energy Technology Data Exchange (ETDEWEB)

    Brozik, Susan M; Manginell, Ronald P; Moorman, Matthew W; Xiao, Xiaoyin; Edwards, Thayne L.; Anderson, John Moses; Pfeifer, Kent Bryant; Branch, Darren W.; Wheeler, David Roger; Polsky, Ronen; Lopez, DeAnna M.; Ebel, Gregory D.; Prasad, Abhishek N.; Brozik, James A.; Rudolph, Angela R.; Wong, Lillian P.

    2013-09-01

    Bioweapons and emerging infectious diseases pose growing threats to our national security. Both natural disease outbreak and outbreaks due to a bioterrorist attack are a challenge to detect, taking days after the outbreak to identify since most outbreaks are only recognized through reportable diseases by health departments and reports of unusual diseases by clinicians. In recent decades, arthropod-borne viruses (arboviruses) have emerged as some of the most significant threats to human health. They emerge, often unexpectedly, from cryptic transmission foci causing localized outbreaks that can rapidly spread to multiple continents due to increased human travel and trade. Currently, diagnosis of acute infections requires amplification of viral nucleic acids, which can be costly, highly specific, technically challenging and time consuming. No diagnostic devices suitable for use at the bedside or in an outbreak setting currently exist. The original goals of this project were to 1) develop two highly sensitive and specific diagnostic assays for detecting RNA from a wide range of arboviruses; one based on an electrochemical approach and the other a fluorescent based assay and 2) develop prototype microfluidic diagnostic platforms for preclinical and field testing that utilize the assays developed in goal 1. We generated and characterized suitable primers for West Nile Virus RNA detection. Both optical and electrochemical transduction technologies were developed for DNA-RNA hybridization detection and were implemented in microfluidic diagnostic sensing platforms that were developed in this project.

  7. Copy number and loss of heterozygosity detected by SNP array of formalin-fixed tissues using whole-genome amplification.

    Directory of Open Access Journals (Sweden)

    Angela Stokes

    Full Text Available The requirement for large amounts of good quality DNA for whole-genome applications prohibits their use for small, laser capture micro-dissected (LCM, and/or rare clinical samples, which are also often formalin-fixed and paraffin-embedded (FFPE. Whole-genome amplification of DNA from these samples could, potentially, overcome these limitations. However, little is known about the artefacts introduced by amplification of FFPE-derived DNA with regard to genotyping, and subsequent copy number and loss of heterozygosity (LOH analyses. Using a ligation adaptor amplification method, we present data from a total of 22 Affymetrix SNP 6.0 experiments, using matched paired amplified and non-amplified DNA from 10 LCM FFPE normal and dysplastic oral epithelial tissues, and an internal method control. An average of 76.5% of SNPs were called in both matched amplified and non-amplified DNA samples, and concordance was a promising 82.4%. Paired analysis for copy number, LOH, and both combined, showed that copy number changes were reduced in amplified DNA, but were 99.5% concordant when detected, amplifications were the changes most likely to be 'missed', only 30% of non-amplified LOH changes were identified in amplified pairs, and when copy number and LOH are combined ∼50% of gene changes detected in the unamplified DNA were also detected in the amplified DNA and within these changes, 86.5% were concordant for both copy number and LOH status. However, there are also changes introduced as ∼20% of changes in the amplified DNA are not detected in the non-amplified DNA. An integrative network biology approach revealed that changes in amplified DNA of dysplastic oral epithelium localize to topologically critical regions of the human protein-protein interaction network, suggesting their functional implication in the pathobiology of this disease. Taken together, our results support the use of amplification of FFPE-derived DNA, provided sufficient samples are used

  8. Subgraph Covers: An Information-Theoretic Approach to Motif Analysis in Networks

    Directory of Open Access Journals (Sweden)

    Anatol E. Wegner

    2014-11-01

    Full Text Available Many real-world networks contain a statistically surprising number of certain subgraphs, called network motifs. In the prevalent approach to motif analysis, network motifs are detected by comparing subgraph frequencies in the original network with a statistical null model. In this paper, we propose an alternative approach to motif analysis where network motifs are defined to be connectivity patterns that occur in a subgraph cover that represents the network using minimal total information. A subgraph cover is defined to be a set of subgraphs such that every edge of the graph is contained in at least one of the subgraphs in the cover. Some recently introduced random graph models that can incorporate significant densities of motifs have natural formulations in terms of subgraph covers, and the presented approach can be used to match networks with such models. To prove the practical value of our approach, we also present a heuristic for the resulting NP hard optimization problem and give results for several real-world networks.

  9. An analysis of multi-type relational interactions in FMA using graph motifs with disjointness constraints.

    Science.gov (United States)

    Zhang, Guo-Qiang; Luo, Lingyun; Ogbuji, Chime; Joslyn, Cliff; Mejino, Jose; Sahoo, Satya S

    2012-01-01

    The interaction of multiple types of relationships among anatomical classes in the Foundational Model of Anatomy (FMA) can provide inferred information valuable for quality assurance. This paper introduces a method called Motif Checking (MOCH) to study the effects of such multi-relation type interactions for detecting logical inconsistencies as well as other anomalies represented by the motifs. MOCH represents patterns of multi-type interaction as small labeled (with multiple types of edges) sub-graph motifs, whose nodes represent class variables, and labeled edges represent relational types. By representing FMA as an RDF graph and motifs as SPARQL queries, fragments of FMA are automatically obtained as auditing candidates. Leveraging the scalability and reconfigurability of Semantic Web Technology, we performed exhaustive analyses of a variety of labeled sub-graph motifs. The quality assurance feature of MOCH comes from the distinct use of a subset of the edges of the graph motifs as constraints for disjointness, whereby bringing in rule-based flavor to the approach as well. With possible disjointness implied by antonyms, we performed manual inspection of the resulting FMA fragments and tracked down sources of abnormal inferred conclusions (logical inconsistencies), which are amendable for programmatic revision of the FMA. Our results demonstrate that MOCH provides a unique source of valuable information for quality assurance. Since our approach is general, it is applicable to any ontological system with an OWL representation.

  10. Intronic alternative splicing regulators identified by comparative genomics in nematodes.

    Directory of Open Access Journals (Sweden)

    Jennifer L Kabat

    2006-07-01

    Full Text Available Many alternative splicing events are regulated by pentameric and hexameric intronic sequences that serve as binding sites for splicing regulatory factors. We hypothesized that intronic elements that regulate alternative splicing are under selective pressure for evolutionary conservation. Using a Wobble Aware Bulk Aligner genomic alignment of Caenorhabditis elegans and Caenorhabditis briggsae, we identified 147 alternatively spliced cassette exons that exhibit short regions of high nucleotide conservation in the introns flanking the alternative exon. In vivo experiments on the alternatively spliced let-2 gene confirm that these conserved regions can be important for alternative splicing regulation. Conserved intronic element sequences were collected into a dataset and the occurrence of each pentamer and hexamer motif was counted. We compared the frequency of pentamers and hexamers in the conserved intronic elements to a dataset of all C. elegans intron sequences in order to identify short intronic motifs that are more likely to be associated with alternative splicing. High-scoring motifs were examined for upstream or downstream preferences in introns surrounding alternative exons. Many of the high-scoring nematode pentamer and hexamer motifs correspond to known mammalian splicing regulatory sequences, such as (TGCATG, indicating that the mechanism of alternative splicing regulation is well conserved in metazoans. A comparison of the analysis of the conserved intronic elements, and analysis of the entire introns flanking these same exons, reveals that focusing on intronic conservation can increase the sensitivity of detecting putative splicing regulatory motifs. This approach also identified novel sequences whose role in splicing is under investigation and has allowed us to take a step forward in defining a catalog of splicing regulatory elements for an organism. In vivo experiments confirm that one novel high-scoring sequence from our analysis

  11. Microarray MAPH: accurate array-based detection of relative copy number in genomic DNA.

    Science.gov (United States)

    Gibbons, Brian; Datta, Parikkhit; Wu, Ying; Chan, Alan; Al Armour, John

    2006-06-30

    Current methods for measurement of copy number do not combine all the desirable qualities of convenience, throughput, economy, accuracy and resolution. In this study, to improve the throughput associated with Multiplex Amplifiable Probe Hybridisation (MAPH) we aimed to develop a modification based on the 3-Dimensional, Flow-Through Microarray Platform from PamGene International. In this new method, electrophoretic analysis of amplified products is replaced with photometric analysis of a probed oligonucleotide array. Copy number analysis of hybridised probes is based on a dual-label approach by comparing the intensity of Cy3-labelled MAPH probes amplified from test samples co-hybridised with similarly amplified Cy5-labelled reference MAPH probes. The key feature of using a hybridisation-based end point with MAPH is that discrimination of amplified probes is based on sequence and not fragment length. In this study we showed that microarray MAPH measurement of PMP22 gene dosage correlates well with PMP22 gene dosage determined by capillary MAPH and that copy number was accurately reported in analyses of DNA from 38 individuals, 12 of which were known to have Charcot-Marie-Tooth disease type 1A (CMT1A). Measurement of microarray-based endpoints for MAPH appears to be of comparable accuracy to electrophoretic methods, and holds the prospect of fully exploiting the potential multiplicity of MAPH. The technology has the potential to simplify copy number assays for genes with a large number of exons, or of expanded sets of probes from dispersed genomic locations.

  12. A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants

    Science.gov (United States)

    Scott, Laura J.; Mohlke, Karen L.; Bonnycastle, Lori L.; Willer, Cristen J.; Li, Yun; Duren, William L.; Erdos, Michael R.; Stringham, Heather M.; Chines, Peter S.; Jackson, Anne U.; Prokunina-Olsson, Ludmila; Ding, Chia-Jen; Swift, Amy J.; Narisu, Narisu; Hu, Tianle; Pruim, Randall; Xiao, Rui; Li, Xiao-Yi; Conneely, Karen N.; Riebow, Nancy L.; Sprau, Andrew G.; Tong, Maurine; White, Peggy P.; Hetrick, Kurt N.; Barnhart, Michael W.; Bark, Craig W.; Goldstein, Janet L.; Watkins, Lee; Xiang, Fang; Saramies, Jouko; Buchanan, Thomas A.; Watanabe, Richard M.; Valle, Timo T.; Kinnunen, Leena; Abecasis, Gonçalo R.; Pugh, Elizabeth W.; Doheny, Kimberly F.; Bergman, Richard N.; Tuomilehto, Jaakko; Collins, Francis S.; Boehnke, Michael

    2011-01-01

    Identifying the genetic variants that increase the risk of type 2 diabetes (T2D) in humans has been a formidable challenge. Adopting a genome-wide association strategy, we genotyped 1161 Finnish T2D cases and 1174 Finnish normal glucose-tolerant (NGT) controls with >315,000 single-nucleotide polymorphisms (SNPs) and imputed genotypes for an additional >2 million autosomal SNPs. We carried out association analysis with these SNPs to identify genetic variants that predispose to T2D, compared our T2D association results with the results of two similar studies, and genotyped 80 SNPs in an additional 1215 Finnish T2D cases and 1258 Finnish NGT controls. We identify T2D-associated variants in an intergenic region of chromosome 11p12, contribute to the identification of T2D-associated variants near the genes IGF2BP2 and CDKAL1 and the region of CDKN2A and CDKN2B, and confirm that variants near TCF7L2, SLC30A8, HHEX, FTO, PPARG, and KCNJ11 are associated with T2D risk. This brings the number of T2D loci now confidently identified to at least 10. PMID:17463248

  13. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli.

    Science.gov (United States)

    Joensen, Katrine Grimstrup; Scheutz, Flemming; Lund, Ole; Hasman, Henrik; Kaas, Rolf S; Nielsen, Eva M; Aarestrup, Frank M

    2014-05-01

    Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming cheaper, it has huge potential in both diagnostics and routine surveillance. The aim of this study was to perform a real-time evaluation of WGS for routine typing and surveillance of verocytotoxin-producing Escherichia coli (VTEC). In Denmark, the Statens Serum Institut (SSI) routinely receives all suspected VTEC isolates. During a 7-week period in the fall of 2012, all incoming isolates were concurrently subjected to WGS using IonTorrent PGM. Real-time bioinformatics analysis was performed using web-tools (www.genomicepidemiology.org) for species determination, multilocus sequence type (MLST) typing, and determination of phylogenetic relationship, and a specific VirulenceFinder for detection of E. coli virulence genes was developed as part of this study. In total, 46 suspected VTEC isolates were characterized in parallel during the study. VirulenceFinder proved successful in detecting virulence genes included in routine typing, explicitly verocytotoxin 1 (vtx1), verocytotoxin 2 (vtx2), and intimin (eae), and also detected additional virulence genes. VirulenceFinder is also a robust method for assigning verocytotoxin (vtx) subtypes. A real-time clustering of isolates in agreement with the epidemiology was established from WGS, enabling discrimination between sporadic and outbreak isolates. Overall, WGS typing produced results faster and at a lower cost than the current routine. Therefore, WGS typing is a superior alternative to conventional typing strategies. This approach may also be applied to typing and surveillance of other pathogens.

  14. Detection and quantification of human adenovirus genomes in Acanthamoeba isolated from swimming pools

    Directory of Open Access Journals (Sweden)

    RODRIGO STAGGEMEIER

    2016-01-01

    Full Text Available ABSTRACT Acanthamoeba is the most common free-living environmental amoeba, it may serve as an important vehicle for various microorganisms living in the same environment, such as viruses, being pathogenic to humans. This study aimed to detect and quantify human adenoviruses (HAdV in Acanthamoebas isolated from water samples collected from swimming pools in the city of Porto Alegre, Southern Brazil. Free-living amoebae of the genus Acanthamoeba were isolated from water samples, and isolates (n=16 were used to investigate the occurrence of HAdVs. HAdV detection was performed by quantitative real-time polymerase chain reaction (qPCR. HAdVs were detected in 62.5% (10/16 of Acanthamoeba isolates, ranging from 3.24x103 to 5.14x105 DNA copies per milliliter of isolate. HAdV viral loads found in this study are not negligible, especially because HAdV infections are associated with several human diseases, including gastroenteritis, respiratory distress, and ocular diseases. These findings reinforce the concept that Acanthamoeba may act as a reservoir and promote HAdV transmission through water.

  15. Microarray MAPH: accurate array-based detection of relative copy number in genomic DNA

    Directory of Open Access Journals (Sweden)

    Chan Alan

    2006-06-01

    Full Text Available Abstract Background Current methods for measurement of copy number do not combine all the desirable qualities of convenience, throughput, economy, accuracy and resolution. In this study, to improve the throughput associated with Multiplex Amplifiable Probe Hybridisation (MAPH we aimed to develop a modification based on the 3-Dimensional, Flow-Through Microarray Platform from PamGene International. In this new method, electrophoretic analysis of amplified products is replaced with photometric analysis of a probed oligonucleotide array. Copy number analysis of hybridised probes is based on a dual-label approach by comparing the intensity of Cy3-labelled MAPH probes amplified from test samples co-hybridised with similarly amplified Cy5-labelled reference MAPH probes. The key feature of using a hybridisation-based end point with MAPH is that discrimination of amplified probes is based on sequence and not fragment length. Results In this study we showed that microarray MAPH measurement of PMP22 gene dosage correlates well with PMP22 gene dosage determined by capillary MAPH and that copy number was accurately reported in analyses of DNA from 38 individuals, 12 of which were known to have Charcot-Marie-Tooth disease type 1A (CMT1A. Conclusion Measurement of microarray-based endpoints for MAPH appears to be of comparable accuracy to electrophoretic methods, and holds the prospect of fully exploiting the potential multiplicity of MAPH. The technology has the potential to simplify copy number assays for genes with a large number of exons, or of expanded sets of probes from dispersed genomic locations.

  16. Airborne Detection of H5N8 Highly Pathogenic Avian Influenza Virus Genome in Poultry Farms, France

    Directory of Open Access Journals (Sweden)

    Axelle Scoizec

    2018-02-01

    Full Text Available In southwestern France, during the winter of 2016–2017, the rapid spread of highly pathogenic avian influenza H5N8 outbreaks despite the implementation of routine control measures, raised the question about the potential role of airborne transmission in viral spread. As a first step to investigate the plausibility of that transmission, air samples were collected inside, outside and downwind from infected duck and chicken facilities. H5 avian influenza virus RNA was detected in all samples collected inside poultry houses, at external exhaust fans and at 5 m distance from poultry houses. For three of the five flocks studied, in the sample collected at 50–110 m distance, viral genomic RNA was detected. The measured viral air concentrations ranged between 4.3 and 6.4 log10 RNA copies per m3, and their geometric mean decreased from external exhaust fans to the downwind measurement point. These findings are in accordance with the possibility of airborne transmission and question the procedures for outbreak depopulation.

  17. Faster exact Markovian probability functions for motif occurrences: a DFA-only approach.

    Science.gov (United States)

    Ribeca, Paolo; Raineri, Emanuele

    2008-12-15

    The computation of the statistical properties of motif occurrences has an obviously relevant application: patterns that are significantly over- or under-represented in genomes or proteins are interesting candidates for biological roles. However, the problem is computationally hard; as a result, virtually all the existing motif finders use fast but approximate scoring functions, in spite of the fact that they have been shown to produce systematically incorrect results. A few interesting exact approaches are known, but they are very slow and hence not practical in the case of realistic sequences. We give an exact solution, solely based on deterministic finite-state automata (DFA), to the problem of finding the whole relevant part of the probability distribution function of a simple-word motif in a homogeneous (biological) sequence. Out of that, the z-value can always be computed, while the P-value can be obtained either when it is not too extreme with respect to the number of floating-point digits available in the implementation, or when the number of pattern occurrences is moderately low. In particular, the time complexity of the algorithms for Markov models of moderate order (0 manage to obtain an algorithm which is both easily interpretable and efficient. This approach can be used for exact statistical studies of very long genomes and protein sequences, as we illustrate with some examples on the scale of the human genome.

  18. Identifying motifs in folktales using topic models

    NARCIS (Netherlands)

    Karsdorp, F.; Bosch, A.P.J. van den

    2013-01-01

    With the undertake of various folktale digitalization initiatives, the need for computational aids to explore these collections is increasing. In this paper we compare Labeled LDA (L-LDA) to a simple retrieval model on the task of identifying motifs in folktales. We show that both methods are well

  19. Genome Scan Detects Quantitative Trait Loci Affecting Female Fertility Traits in Danish and Swedish Holstein Cattle

    DEFF Research Database (Denmark)

    Höglund, Johanna Karolina; Guldbrandtsen, B; Su, G

    2009-01-01

    Data from the joint Nordic breeding value prediction for Danish and Swedish Holstein grandsire families were used to locate quantitative trait loci (QTL) for female fertility traits in Danish and Swedish Holstein cattle. Up to 36 Holstein grandsires with over 2,000 sons were genotyped for 416 mic...... for QTL segregating on Bos taurus chromosome (BTA)1, BTA7, BTA10, and BTA26. On each of these chromosomes, several QTL were detected affecting more than one of the fertility traits investigated in this study. Evidence for segregation of additional QTL on BTA2, BTA9, and BTA24 was found...

  20. Genome editing using FACS enrichment of nuclease-expressing cells and indel detection by amplicon analysis

    DEFF Research Database (Denmark)

    Lonowski, Lindsey A; Narimatsu, Yoshiki; Riaz, Anjum

    2017-01-01

    ). First, Indel Detection by Amplicon Analysis (IDAA) determines the size and frequency of insertions and deletions elicited by nucleases in cells, tissues or embryos through analysis of fluorophore-labeled PCR amplicons covering the nuclease target site by capillary electrophoresis in a sequenator. Second...... the testing of new nuclease reagents and the generation of edited cell pools or clonal cell lines, reducing the number of clones that need to be generated and increasing the ease with which they are screened. The pipeline shortens the time line, but it most prominently reduces the workload of cell...

  1. Conserved C-Terminal Motifs Required for Avirulence and Suppression of Cell Death by Phytophthora sojae effector Avr1b

    NARCIS (Netherlands)

    Dou, D.; Kale, S.D.; Wang, X.; Chen, Y.; Wang, Q.; Jiang, R.H.Y.; Arredondo, F.D.; Anderson, R.G.; Thakur, P.B.; McDowell, J.M.; Wang, Y.; Tyler, B.M.

    2008-01-01

    The sequenced genomes of oomycete plant pathogens contain large superfamilies of effector proteins containing the protein translocation motif RXLR-dEER. However, the contributions of these effectors to pathogenicity remain poorly understood. Here, we show that the Phytophthora sojae effector protein

  2. ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions

    NARCIS (Netherlands)

    Muino, J.M.; Kaufmann, K.; Ham, van R.C.H.J.; Angenent, G.C.; Krajewski, P.

    2011-01-01

    Background In vivo detection of protein-bound genomic regions can be achieved by combining chromatin-immunoprecipitation with next-generation sequencing technology (ChIP-seq). The large amount of sequence data produced by this method needs to be analyzed in a statistically proper and computationally

  3. Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm

    NARCIS (Netherlands)

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to

  4. Performance Evaluation of NIPT in Detection of Chromosomal Copy Number Variants Using Low-Coverage Whole-Genome Sequencing of Plasma DNA.

    Science.gov (United States)

    Liu, Hongtai; Gao, Ya; Hu, Zhiyang; Lin, Linhua; Yin, Xuyang; Wang, Jun; Chen, Dayang; Chen, Fang; Jiang, Hui; Ren, Jinghui; Wang, Wei

    2016-01-01

    The aim of this study was to assess the performance of noninvasively prenatal testing (NIPT) for fetal copy number variants (CNVs) in clinical samples, using a whole-genome sequencing method. A total of 919 archived maternal plasma samples with karyotyping/microarray results, including 33 CNVs samples and 886 normal samples from September 1, 2011 to May 31, 2013, were enrolled in this study. The samples were randomly rearranged and blindly sequenced by low-coverage (about 7M reads) whole-genome sequencing of plasma DNA. Fetal CNVs were detected by Fetal Copy-number Analysis through Maternal Plasma Sequencing (FCAPS) to compare to the karyotyping/microarray results. Sensitivity, specificity and were evaluated. 33 samples with deletions/duplications ranging from 1 to 129 Mb were detected with the consistent CNV size and location to karyotyping/microarray results in the study. Ten false positive results and two false negative results were obtained. The sensitivity and specificity of detection deletions/duplications were 84.21% and 98.42%, respectively. Whole-genome sequencing-based NIPT has high performance in detecting genome-wide CNVs, in particular >10Mb CNVs using the current FCAPS algorithm. It is possible to implement the current method in NIPT to prenatally screening for fetal CNVs.

  5. A Genome Scan to Detect Quantitative Trait Loci for Economically Important Traits in Holstein Cattle Using Two Methods and a Dense Single Nucleotide Polymorphism Map

    NARCIS (Netherlands)

    Daetwyler, H.D.; Schenkel, F.S.; Sargolzaei, M.; Robinson, J.A.B.

    2008-01-01

    Genome scans for detection of bovine quantitative trait loci (QTL) were performed via variance component linkage analysis and linkage disequilibrium single-locus regression (LDRM). Four hundred eighty-four Holstein sires, of which 427 were from 10 grandsire families, were genotyped for 9,919 single

  6. Significant variance in genetic diversity among populations of Schistosoma haematobium detected using microsatellite DNA loci from a genome-wide database.

    Science.gov (United States)

    Glenn, Travis C; Lance, Stacey L; McKee, Anna M; Webster, Bonnie L; Emery, Aidan M; Zerlotini, Adhemar; Oliveira, Guilherme; Rollinson, David; Faircloth, Brant C

    2013-10-17

    Urogenital schistosomiasis caused by Schistosoma haematobium is widely distributed across Africa and is increasingly being targeted for control. Genome sequences and population genetic parameters can give insight into the potential for population- or species-level drug resistance. Microsatellite DNA loci are genetic markers in wide use by Schistosoma researchers, but there are few primers available for S. haematobium. We sequenced 1,058,114 random DNA fragments from clonal cercariae collected from a snail infected with a single Schistosoma haematobium miracidium. We assembled and aligned the S. haematobium sequences to the genomes of S. mansoni and S. japonicum, identifying microsatellite DNA loci across all three species and designing primers to amplify the loci in S. haematobium. To validate our primers, we screened 32 randomly selected primer pairs with population samples of S. haematobium. We designed >13,790 primer pairs to amplify unique microsatellite loci in S. haematobium, (available at http://www.cebio.org/projetos/schistosoma-haematobium-genome). The three Schistosoma genomes contained similar overall frequencies of microsatellites, but the frequency and length distributions of specific motifs differed among species. We identified 15 primer pairs that amplified consistently and were easily scored. We genotyped these 15 loci in S. haematobium individuals from six locations: Zanzibar had the highest levels of diversity; Malawi, Mauritius, Nigeria, and Senegal were nearly as diverse; but the sample from South Africa was much less diverse. About half of the primers in the database of Schistosoma haematobium microsatellite DNA loci should yield amplifiable and easily scored polymorphic markers, thus providing thousands of potential markers. Sequence conservation among S. haematobium, S. japonicum, and S. mansoni is relatively high, thus it should now be possible to identify markers that are universal among Schistosoma species (i.e., using DNA sequences

  7. Defining the genome structure of 'Tongil' rice, an important cultivar in the Korean "Green Revolution".

    Science.gov (United States)

    Kim, Backki; Kim, Dong-Gwan; Lee, Gileung; Seo, Jeonghwan; Choi, Ik-Young; Choi, Beom-Soon; Yang, Tae-Jin; Kim, Kwang Soo; Lee, Joohyun; Chin, Joong Hyoun; Koh, Hee-Jong

    2014-12-01

    Tongil (IR667-98-1-2) rice, developed in 1972, is a high-yield rice variety derived from a three-way cross between indica and japonica varieties. Tongil contributed to the self-sufficiency of staple food production in Korea during a period known as the 'Korean Green Revolution'. We analyzed the nucleotide-level genome structure of Tongil rice and compared it to those of the parental varieties. A total of 17.3 billion Illumina Hiseq reads, 47× genome coverage, were generated for Tongil rice. Three parental accessions of Tongil rice, two indica types and one japonica type, were also sequenced at approximately 30x genome coverage. A total of 2,149,991 SNPs were detected between Tongil and Nipponbare varieties. The average SNP frequency of Tongil was 5.77 per kb. Genome composition was determined based on SNP data by comparing Tongil with three parental genome sequences using the sliding window approach. Analyses revealed that 91.8% of the Tongil genome originated from the indica parents and 7.9% from the japonica parent. Copy numbers of SSR motifs, ORF gene distribution throughout the whole genome, gene ontology (GO) annotation, and some yield-related QTLs or gene locations were also comparatively analyzed between Tongil and parental varieties using sequence-based tools. Each genetic factor was transferred from the parents into Tongil rice in amounts that were in proportion to the whole genome composition. Tongil was derived from a three-way cross among two indica and one japonica varieties. Defining the genome structure of Tongil rice demonstrates that the Tongil genome is derived primarily from the indica genome with a small proportion of japonica genome introgression. Comparative gene distribution, SSR, GO, and yield-related gene analysis support the finding that the Tongil genome is primarily made up of the indica genome.

  8. Evaluation of Global Genomic DNA Methylation in Human Whole Blood by Capillary Electrophoresis UV Detection

    Directory of Open Access Journals (Sweden)

    Angelo Zinellu

    2017-01-01

    Full Text Available Alterations in global DNA methylation are implicated in various pathophysiological processes. The development of simple and quick, yet robust, methods to assess DNA methylation is required to facilitate its measurement and interpretation in clinical practice. We describe a highly sensitive and reproducible capillary electrophoresis method with UV detection for the separation and detection of cytosine and methylcytosine, after formic acid hydrolysis of DNA extracted from human whole blood. Hydrolysed samples were dried and resuspended with water and directly injected into the capillary without sample derivatization procedures. The use of a run buffer containing 50 mmol/L BIS-TRIS propane (BTP phosphate buffer at pH 3.25 and 60 mmol/L sodium acetate buffer at pH 3.60 (4 : 1, v/v allowed full analyte identification within 11 min. Precision tests indicated an elevated reproducibility with an interassay CV of 1.98% when starting from 2 μg of the extracted DNA. The method was successfully tested by measuring the DNA methylation degree both in healthy volunteers and in reference calf thymus DNA.

  9. Accurate quantification of microRNA via single strand displacement reaction on DNA origami motif.

    Directory of Open Access Journals (Sweden)

    Jie Zhu

    Full Text Available DNA origami is an emerging technology that assembles hundreds of staple strands and one single-strand DNA into certain nanopattern. It has been widely used in various fields including detection of biological molecules such as DNA, RNA and proteins. MicroRNAs (miRNAs play important roles in post-transcriptional gene repression as well as many other biological processes such as cell growth and differentiation. Alterations of miRNAs' expression contribute to many human diseases. However, it is still a challenge to quantitatively detect miRNAs by origami technology. In this study, we developed a novel approach based on streptavidin and quantum dots binding complex (STV-QDs labeled single strand displacement reaction on DNA origami to quantitatively detect the concentration of miRNAs. We illustrated a linear relationship between the concentration of an exemplary miRNA as miRNA-133 and the STV-QDs hybridization efficiency; the results demonstrated that it is an accurate nano-scale miRNA quantifier motif. In addition, both symmetrical rectangular motif and asymmetrical China-map motif were tested. With significant linearity in both motifs, our experiments suggested that DNA Origami motif with arbitrary shape can be utilized in this method. Since this DNA origami-based method we developed owns the unique advantages of simple, time-and-material-saving, potentially multi-targets testing in one motif and relatively accurate for certain impurity samples as counted directly by atomic force microscopy rather than fluorescence signal detection, it may be widely used in quantification of miRNAs.

  10. Accurate Quantification of microRNA via Single Strand Displacement Reaction on DNA Origami Motif

    Science.gov (United States)

    Lou, Jingyu; Li, Weidong; Li, Sheng; Zhu, Hongxin; Yang, Lun; Zhang, Aiping; He, Lin; Li, Can

    2013-01-01

    DNA origami is an emerging technology that assembles hundreds of staple strands and one single-strand DNA into certain nanopattern. It has been widely used in various fields including detection of biological molecules such as DNA, RNA and proteins. MicroRNAs (miRNAs) play important roles in post-transcriptional gene repression as well as many other biological processes such as cell growth and differentiation. Alterations of miRNAs' expression contribute to many human diseases. However, it is still a challenge to quantitatively detect miRNAs by origami technology. In this study, we developed a novel approach based on streptavidin and quantum dots binding complex (STV-QDs) labeled single strand displacement reaction on DNA origami to quantitatively detect the concentration of miRNAs. We illustrated a linear relationship between the concentration of an exemplary miRNA as miRNA-133 and the STV-QDs hybridization efficiency; the results demonstrated that it is an accurate nano-scale miRNA quantifier motif. In addition, both symmetrical rectangular motif and asymmetrical China-map motif were tested. With significant linearity in both motifs, our experiments suggested that DNA Origami motif with arbitrary shape can be utilized in this method. Since this DNA origami-based method we developed owns the unique advantages of simple, time-and-material-saving, potentially multi-targets testing in one motif and relatively accurate for certain impurity samples as counted directly by atomic force microscopy rather than fluorescence signal detection, it may be widely used in quantification of miRNAs. PMID:23990889

  11. The Investigation of Promoter Sequences of Marseilleviruses Highlights a Remarkable Abundance of the AAATATTT Motif in Intergenic Regions.

    Science.gov (United States)

    Oliveira, Graziele Pereira; Lima, Maurício Teixeira; Arantes, Thalita Souza; Assis, Felipe Lopes; Rodrigues, Rodrigo Araújo Lima; da Fonseca, Flávio Guimarães; Bonjardim, Cláudio Antônio; Kroon, Erna Geessien; Colson, Philippe; La Scola, Bernard; Abrahão, Jônatas Santos

    2017-11-01

    Viruses display a wide range of genomic profiles and, consequently, a variety of gene expression strategies. Specific sequences associated with transcriptional processes have been described in viruses, and putative promoter motifs have been elucidated for some nucleocytoplasmic large DNA viruses (NCLDV). Among NCLDV, the Marseilleviridae is a well-recognized family because of its genomic mosaicism. The marseilleviruses have an ability to incorporate foreign genes, especially from sympatric organisms inhabiting Acanthamoeba , its main known host. Here, we identified for the first time an eight-nucleotide A/T-rich promoter sequence (AAATATTT) associated with 55% of marseillevirus genes that is conserved in all marseilleviruses lineages, a higher level of conservation than that of any giant virus described to date. We instigated our prediction about the promoter motif by biological assays and by evaluating how single mutations in this octamer can impact gene expression. The investigation of sequences that regulate the expression of genes relative to lateral transfer revealed that the promoter motifs do not appear to be incorporated by marseilleviruses from donor organisms. Indeed, analyses of the intergenic regions that regulate lateral gene transfer-related genes have revealed an independent origin of the marseillevirus intergenic regions that does not match gene-donor organisms. About 50% of AAATATTT motifs spread throughout intergenic regions of the marseilleviruses are present as multiple copies. We believe that such multiple motifs are associated with increased expression of a given gene or are related to incorporation of foreign genes into the mosaic genome of marseilleviruses. IMPORTANCE The marseilleviruses draw attention because of the peculiar features of their genomes; however, little is known about their gene expression patterns or the factors that regulate those expression patterns. The limited published research on the expression patterns of the

  12. Simultaneous detection of a sex-specific sequence and the Ryr1 point mutation in porcine genomic DNA.

    Science.gov (United States)

    Lockley, A K; Bruce, J S; Franklin, S J; Bardsley, R G

    1997-04-01

    The advantages are becoming increasingly apparent of designing livestock breeding programmes around the detection of specific sequences in genomic DNA using amplification by the polymerase chain reaction (PCR). Furthermore, by subjecting the products of such reactions to restriction enzyme digestion, important information conveyed by single-base substitutions can be retrieved and used in marker-assisted selection. The potential for the rapid diagnosis of several DNA markers simultaneously would seem to offer particular benefits in the field of in vitro fertilisation and embryo transfer, where only a few cells constitute the source of the DNA, and where keeping the duration of the tests to a minimum is imperative. However, where the markers to be detected fall into different categories, different kinds of amplification reactions may need to be combined. The present study with porcine DNA combines a one-step multiplex PCR test for sex-determination with a specialised PCR reaction designed to diagnose the Ryrl or 'halothane' genotype. A total of seven primers have been utilised to amplify by, firstly, a control sequence related to the Zfx/y genes present in both sexes, secondly to amplify a Y chromosome sex-specific sequence related to the Sry gene and lastly, to detect either allele of the Ryr1 mutation associated with porcine stress syndrome and pale, soft exudative meat. The presence of PCR products of characteristic size on agarose gel electrophoresis gives a visual read-out of animal sex and halothane genotype. Although primarily a model system, the test may have direct applications in the context of embryo transfer, sperm separation technology and also in the characterisation of pork samples undergoing sensory evaluation by meat scientists.

  13. Evaluation of two molecular methods for the detection of Yellow fever virus genome.

    Science.gov (United States)

    Nunes, Marcio R T; Palacios, Gustavo; Nunes, Keley N B; Casseb, Samir M M; Martins, Lívia C; Quaresma, Juarez A S; Savji, Nazir; Lipkin, W Ian; Vasconcelos, Pedro F C

    2011-06-01

    Yellow fever virus (YFV), a member of the family Flaviviridae, genus Flavivirus is endemic to tropical areas of Africa and South America and is among the arboviruses that pose a threat to public health. Recent outbreaks in Brazil, Bolivia, and Paraguay and the observation that vectors capable of transmitting YFV are presenting in urban areas underscore the urgency of improving surveillance and diagnostic methods. Two novel methods (RT-hemi-nested-PCR and SYBR(®) Green qRT-PCR) for efficient detection of YFV strains circulating in South America have been developed. The methods were validated using samples obtained from golden hamsters infected experimentally with wild-type YFV strains as well as human serum and tissue samples with acute disease. Copyright © 2011 Elsevier B.V. All rights reserved.

  14. DNA sequence analysis of cagA 3' motifs of Helicobacter pylori strains from patients with peptic ulcer diseases.

    Science.gov (United States)

    Salih, Barik A; Bolek, Bora Kazim; Arikan, Soykan

    2010-02-01

    The Helicobacter pylori cagA gene is a major virulence factor that plays an important role in gastric pathologies. DNA sequence data for the cagA 3' region of Western isolates differ markedly in their EPIYA motifs from those of East Asian isolates. An increase in the number of these motifs is known to be associated with gastric cancer. Whether such an association is also the case for peptic ulceration was investigated in this study. Gastric biopsies were collected from 96 patients with duodenal ulcer (DU), gastric ulcer (GU) and gastritis. The types of EPIYA motif detected by PCR among 28 DU strains were 13 ABC, eight ABCC, six ABCCC, and in one patient both ABC and ABCCCCC; among nine GU strains were two ABC, five ABCC and two ABCCC; and among 40 gastritis strains were 35 ABC and five ABCC. DNA sequencing was carried out to confirm the detection of the EPIYA motif types and to analyse their peptide sequences. A significant association was found between the number of the EPIYA-C motifs (>or=2) and peptic ulceration (P=0.00001) compared with gastritis. In conclusion, this study shows that our patients harboured cagA-positive H. pylori strains with EPIYA motifs of the Western type and that the increase in the number of EPIYA-C motifs was significantly associated with DU and GU but not with gastritis, indicating predictive association with the severity of the disease.

  15. Evaluation of a new automated homogeneous PCR assay, GenomEra C. difficile, for rapid detection of Toxigenic Clostridium difficile in fecal specimens.

    Science.gov (United States)

    Hirvonen, Jari J; Mentula, Silja; Kaukoranta, Suvi-Sirkku

    2013-09-01

    We evaluated a new automated homogeneous PCR assay to detect toxigenic Clostridium difficile, the GenomEra C. difficile assay (Abacus Diagnostica, Finland), with 310 diarrheal stool specimens and with a collection of 33 known clostridial and nonclostridial isolates. Results were compared with toxigenic culture results, with discrepancies being resolved by the GeneXpert C. difficile PCR assay (Cepheid). Among the 80 toxigenic culture-positive or GeneXpert C. difficile assay-positive fecal specimens, 79 were also positive with the GenomEra C. difficile assay. Additionally, one specimen was positive with the GenomEra assay but negative with the confirmatory methods. Thus, the sensitivity and specificity were 98.8% and 99.6%, respectively. With the culture collection, no false-positive or -negative results were observed. The analytical sensitivity of the GenomEra C. difficile assay was approximately 5 CFU per PCR test. The short hands-on (<5 min for 1 to 4 samples) and total turnaround (<1 h) times, together with the high positive and negative predictive values (98.8% and 99.6%, respectively), make the GenomEra C. difficile assay an excellent option for toxigenic C. difficile detection in fecal specimens.

  16. iTriplet, a rule-based nucleic acid sequence motif finder

    Directory of Open Access Journals (Sweden)

    Gunderson Samuel I

    2009-10-01

    Full Text Available Abstract Background With the advent of high throughput sequencing techniques, large amounts of sequencing data are readily available for analysis. Natural biological signals are intrinsically highly variable making their complete identification a computationally challenging problem. Many attempts in using statistical or combinatorial approaches have been made with great success in the past. However, identifying highly degenerate and long (>20 nucleotides motifs still remains an unmet challenge as high degeneracy will diminish statistical significance of biological signals and increasing motif size will cause combinatorial explosion. In this report, we present a novel rule-based method that is focused on finding degenerate and long motifs. Our proposed method, named iTriplet, avoids costly enumeration present in existing combinatorial methods and is amenable to parallel processing. Results We have conducted a comprehensive assessment on the performance and sensitivity-specificity of iTriplet in analyzing artificial and real biological sequences in various genomic regions. The results show that iTriplet is able to solve challenging cases. Furthermore we have confirmed the utility of iTriplet by showing it accurately predicts polyA-site-related motifs using a dual Luciferase reporter assay. Conclusion iTriplet is a novel rule-based combinatorial or enumerative motif finding method that is able to process highly degenerate and long motifs that have resisted analysis by other methods. In addition, iTriplet is distinguished from other methods of the same family by its parallelizability, which allows it to leverage the power of today's readily available high-performance computing systems.

  17. Comprehensive human transcription factor binding site map for combinatory binding motifs discovery.

    Directory of Open Access Journals (Sweden)

    Arnoldo J Müller-Molina

    Full Text Available To know the map between transcription factors (TFs and their binding sites is essential to reverse engineer the regulation process. Only about 10%-20% of the transcription factor binding motifs (TFBMs have been reported. This lack of data hinders understanding gene regulation. To address this drawback, we propose a computational method that exploits never used TF properties to discover the missing TFBMs and their sites in all human gene promoters. The method starts by predicting a dictionary of regulatory "DNA words." From this dictionary, it distills 4098 novel predictions. To disclose the crosstalk between motifs, an additional algorithm extracts TF combinatorial binding patterns creating a collection of TF regulatory syntactic rules. Using these rules, we narrowed down a list of 504 novel motifs that appear frequently in syntax patterns. We tested the predictions against 509 known motifs confirming that our system can reliably predict ab initio motifs with an accuracy of 81%-far higher than previous approaches. We found that on average, 90% of the discovered combinatorial binding patterns target at least 10 genes, suggesting that to control in an independent manner smaller gene sets, supplementary regulatory mechanisms are required. Additionally, we discovered that the new TFBMs and their combinatorial patterns convey biological meaning, targeting TFs and genes related to developmental functions. Thus, among all the possible available targets in the genome, the TFs tend to regulate other TFs and genes involved in developmental functions. We provide a comprehensive resource for regulation analysis that includes a dictionary of "DNA words," newly predicted motifs and their corresponding combinatorial patterns. Combinatorial patterns are a useful filter to discover TFBMs that play a major role in orchestrating other factors and thus, are likely to lock/unlock cellular functional clusters.

  18. POTION: an end-to-end pipeline for positive Darwinian selection detection in genome-scale data through phylogenetic comparison of protein-coding genes.

    Science.gov (United States)

    Hongo, Jorge A; de Castro, Giovanni M; Cintra, Leandro C; Zerlotini, Adhemar; Lobo, Francisco P

    2015-08-01

    Detection of genes evolving under positive Darwinian evolution in genome-scale data is nowadays a prevailing strategy in comparative genomics studies to identify genes potentially involved in adaptation processes. Despite the large number of studies aiming to detect and contextualize such gene sets, there is virtually no software available to perform this task in a general, automatic, large-scale and reliable manner. This certainly occurs due to the computational challenges involved in this task, such as the appropriate modeling of data under analysis, the computation time to perform several of the required steps when dealing with genome-scale data and the highly error-prone nature of the sequence and alignment data structures needed for genome-wide positive selection detection. We present POTION, an open source, modular and end-to-end software for genome-scale detection of positive Darwinian selection in groups of homologous coding sequences. Our software represents a key step towards genome-scale, automated detection of positive selection, from predicted coding sequences and their homology relationships to high-quality groups of positively selected genes. POTION reduces false positives through several sophisticated sequence and group filters based on numeric, phylogenetic, quality and conservation criteria to remove spurious data and through multiple hypothesis corrections, and considerably reduces computation time thanks to a parallelized design. Our software achieved a high classification performance when used to evaluate a curated dataset of Trypanosoma brucei paralogs previously surveyed for positive selection. When used to analyze predicted groups of homologous genes of 19 strains of Mycobacterium tuberculosis as a case study we demonstrated the filters implemented in POTION to remove sources of errors that commonly inflate errors in positive selection detection. A thorough literature review found no other software similar to POTION in terms of customization

  19. DGGE with genomic DNA: suitable for detection of numerically important organisms but not for identification of the most abundant organisms.

    Science.gov (United States)

    de Araújo, Juliana Calábria; Schneider, René Peter

    2008-12-01

    Identification of all important community members as well as of the numerically dominant members of a community are key aspects of microbial community analysis of bioreactor samples. A systematic study was conducted with artificial consortia to test whether denaturing gradient gel electrophoresis (DGGE) is a reliable technique to obtain such community data under conditions where results would not be affected by differences in DNA extraction efficiency from cells. A total of 27 consortia were established by mixing DNA extracted from Escherichia coli K12, Burkholderia cepacia and Stenotrophomonas maltophilia in different proportions. Concentrations of DNA of single organisms in the consortia were either 0.04, 0.4 or 4ng/microl. DGGE-PCR of genomic DNA with primer sets targeted at the V3 and V6-V8 regions of the 16S rDNA failed to detect the three community members in only 7% of consortia, but provided incorrect information about dominance or co-dominance for 85% and 89% of consortia with the primer sets for the V6-V8 and V3 regions, respectively. The high failure rate in detection of dominant B. cepacia with the primers for the V6-V8 region was attributable to a single nucleotide primer mismatch in the target sequences of both, the forward and reverse primer. Amplification bias in PCR of E. coli and S. maltophilia for the V6-V8 region and for all three organisms for the V3 region occurred due to interference of genomic DNA in PCR-DGGE, since a nested PCR approach, where PCR-DGGE was started from mixtures of 16S rRNA genes of the organisms, provided correct information about the relative abundance of original DNA in the sample. Multiple bands were not observed in pure culture amplicons produced with the V6-V8 primer pair, but pure culture V3 DGGE profiles of E. coli, S. maltophilia and B. cepacia contained 5, 3 and 3 bands, respectively. These results demonstrate DGGE was suitable for identification of all important community members in the three-membered artificial

  20. Genome-wide scans to detect positive selection in Large White and Tongcheng pigs.

    Science.gov (United States)

    Li, Xiuling; Yang, Songbai; Tang, Zhonglin; Li, Kui; Rothschild, Max F; Liu, Bang; Fan, Bin

    2014-06-01

    Due to the direction, intensity, duration and consistency of genetic selection, especially recent artificial selection, the production performance of domestic pigs has been greatly changed. Therefore, we reasoned that there must be footprints or selection signatures that had been left during domestication. In this study, with porcine 60K BeadChip genotyping data from both commercial Large White and local Chinese Tongcheng pigs, we calculated the extended haplotype homozygosity values of the two breeds using the long-range haplotype method to detect selection signatures. We found 34 candidate regions, including 61 known genes, from Large White pigs and 25 regions comprising 57 known genes from Tongcheng pigs. Many selection signatures were found on SSC1, SSC4, SSC7 and SSC14 regions in both populations. According to quantitative trait loci and network pathway analyses, most of the regions and genes were linked to growth, reproduction and immune responses. In addition, the average genetic differentiation coefficient FST was 0.254, which means that there had already been a significant differentiation between the breeds. The findings from this study can contribute to further research on molecular mechanisms of pig evolution and domestication and also provide valuable references for improvement of their breeding and cultivation. © 2014 Stichting International Foundation for Animal Genetics.

  1. Optimizing a method for detection of hepatitis A virus in shellfish and study the effect of gamma radiation on the viral genome

    International Nuclear Information System (INIS)

    Amri, Islem

    2008-01-01

    Our work was aimed at detecting the hepatitis A virus (HAV) in bivalve mollusc collected from five shellfish harvesting areas and from a coastal region in Tunisia using RT-Nested-PCR and studying the effect of gamma radiation on HAV genome. Two methods used to recover HAV from mollusc flesh and two methods of extraction of virus RNA were compared in order to determine the most sensitive method. Glycine extraction and extraction of virus RNA using proteinase K were more convenient and then used in this study for detection of HAV in shellfish. The results of molecular analyses: RT-Nested-PCR using primers targeted at the P1 region revealed that 28 % of the samples were positive for HAV. Doses of gamma irradiation ranging between 5 to 30 kGy were used to study the effect of this radiation on HAV genome after the contamination of mollusc flesh with suspension of HAV (derived from stool specimens). HAV specific genomic band was observed for doses between 5 to 20 kGy. We didn't detect HAV genome with doses 25 and 30 kGy. (Author)

  2. An algorithmic perspective of de novo cis-regulatory motif finding based on ChIP-seq data.

    Science.gov (United States)

    Liu, Bingqiang; Yang, Jinyu; Li, Yang; McDermaid, Adam; Ma, Qin

    2017-03-08

    Transcription factors are proteins that bind to specific DNA sequences and play important roles in controlling the expression levels of their target genes. Hence, prediction of transcription factor binding sites (TFBSs) provides a solid foundation for inferring gene regulatory mechanisms and building regulatory networks for a genome. Chromatin immunoprecipitation sequencing (ChIP-seq) technology can generate large-scale experimental data for such protein-DNA interactions, providing an unprecedented opportunity to identify TFBSs (a.k.a. cis-regulatory motifs). The bottleneck, however, is the lack of robust mathematical models, as well as efficient computational methods for TFBS prediction to make effective use of massive ChIP-seq data sets in the public domain. The purpose of this study is to review existing motif-finding methods for ChIP-seq data from an algorithmic perspective and provide new computational insight into this field. The state-of-the-art methods were shown through summarizing eight representative motif-finding algorithms along with corresponding challenges, and introducing some important relative functions according to specific biological demands, including discriminative motif finding and cofactor motifs analysis. Finally, potential directions and plans for ChIP-seq-based motif-finding tools were showcased in support of future algorithm development. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  3. Detection of genomic instability in α-irradiated and bystander human fibroblasts

    International Nuclear Information System (INIS)

    Ponnaiya, B.; Jenkins-Baker, G.; Bigelow, A.; Marino, S.; Geard, C.R.

    2003-01-01

    well. Currently experiments are ongoing to examine irradiated and bystander cells at immediate and delayed times for the presence of chromosomal aberrations that are not detectable by standard Giemsa staining (e.g. translocations) using mFISH analyses

  4. Prevalence of Congenital Cytomegalovirus Infection Assessed Through Viral Genome Detection in Dried Blood Spots in Children with Autism Spectrum Disorders.

    Science.gov (United States)

    Gentile, Ivan; Zappulo, Emanuela; Riccio, Maria Pia; Binda, Sandro; Bubba, Laura; Pellegrinelli, Laura; Scognamiglio, Domenico; Operto, Francesca; Margari, Lucia; Borgia, Guglielmo; Bravaccio, Carmela

    2017-01-01

    Autism spectrum disorders (ASD) are neurodevelopmental disorders without a definitive etiology in most cases. Environmental factors, such as viral infections, have been linked with anomalies in brain growth, neuronal development, and functional connectivity. Congenital cytomegalovirus (CMV) infection has been associated with the onset of ASD in several case reports. The aim of this study was to evaluate the prevalence of congenital CMV infection in children with ASD and in healthy controls. The CMV genome was tested by polymerase chain reaction (PCR) on dried blood spots collected at birth from 82 children (38 with ASD and 44 controls). The prevalence of congenital CMV infection was 5.3% (2/38) in cases and 0% (0/44) in controls (p=0.212). The infection rate was about 10-fold higher in patients with ASD than in the general Italian population at birth. For this reason, detection of CMV-DNA on dried blood spots could be considered in the work-up that is usually performed at ASD diagnosis to rule-out a secondary form. Given the potential prevention and treatment of CMV infection, this study could have intriguing consequences, at least for a group of patients with ASD. Copyright© 2017, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.

  5. A Basic Set of Homeostatic Controller Motifs

    Science.gov (United States)

    Drengstig, T.; Jolma, I.W.; Ni, X.Y.; Thorsen, K.; Xu, X.M.; Ruoff, P.

    2012-01-01

    Adaptation and homeostasis are essential properties of all living systems. However, our knowledge about the reaction kinetic mechanisms leading to robust homeostatic behavior in the presence of environmental perturbations is still poor. Here, we describe, and provide physiological examples of, a set of two-component controller motifs that show robust homeostasis. This basic set of controller motifs, which can be considered as complete, divides into two operational work modes, termed as inflow and outflow control. We show how controller combinations within a cell can integrate uptake and metabolization of a homeostatic controlled species and how pathways can be activated and lead to the formation of alternative products, as observed, for example, in the change of fermentation products by microorganisms when the supply of the carbon source is altered. The antagonistic character of hormonal control systems can be understood by a combination of inflow and outflow controllers. PMID:23199928

  6. CSTminer: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison

    Science.gov (United States)

    Castrignanò, Tiziana; Canali, Alessandro; Grillo, Giorgio; Liuni, Sabino; Mignone, Flavio; Pesole, Graziano

    2004-01-01

    The identification and characterization of genome tracts that are highly conserved across species during evolution may contribute significantly to the functional annotation of whole-genome sequences. Indeed, such sequences are likely to correspond to known or unknown coding exons or regulatory motifs. Here, we present a web server implementing a previously developed algorithm that, by comparing user-submitted genome sequences, is able to identify statistically significant conserved blocks and assess their coding or noncoding nature through the measure of a coding potential score. The web tool, available at http://www.caspur.it/CSTminer/, is dynamically interconnected with the Ensembl genome resources and produces a graphical output showing a map of detected conserved sequences and annotated gene features. PMID:15215464

  7. Novel nucleotide sequence motifs that produce hotspots of meiotic recombination in Schizosaccharomyces pombe.

    Science.gov (United States)

    Steiner, Walter W; Steiner, Estelle M; Girvin, Angela R; Plewik, Lauren E

    2009-06-01

    In many organisms, including yeasts and humans, meiotic recombination is initiated preferentially at a limited number of sites in the genome referred to as recombination hotspots. Predicting precisely the location of most hotspots has remained elusive. In this study, we tested the hypothesis that hotspots can result from multiple different sequence motifs. We devised a method to rapidly screen many short random oligonucleotide sequences for hotspot activity in the fission yeast Schizosaccharomyces pombe and produced a library of approximately 500 unique 15- and 30-bp sequences containing hotspots. The frequency of hotspots found suggests that there may be a relatively large number of different sequence motifs that produce hotspots. Within our sequence library, we found many shorter 6- to 10-bp motifs that occurred multiple times, many of which produced hotspots when reconstructed in vivo. On the basis of sequence similarity, we were able to group those hotspots into five different sequence families. At least one of the novel hotspots we found appears to be a target for a transcription factor, as it requires that factor for its hotspot activity. We propose that many hotspots in S. pombe, and perhaps other organisms, result from simple sequence motifs, some of which are identified here.

  8. Systematic discovery of cofactor motifs from ChIP-seq data by SIOMICS.

    Science.gov (United States)

    Ding, Jun; Dhillon, Vikram; Li, Xiaoman; Hu, Haiyan

    2015-06-01

    Understanding transcriptional regulatory elements and particularly the transcription factor binding sites represents a significant challenge in computational biology. The chromatin immunoprecipitation followed by massive parallel sequencing (ChIP-seq) experiments provide an unprecedented opportunity to study transcription factor binding sites on the genome-wide scale. Here we describe a recently developed tool, SIOMICS, to systematically discover motifs and binding sites of transcription factors and their cofactors from ChIP-seq data. Unlike other tools, SIOMICS explores the co-binding properties of multiple transcription factors in short regions to predict motifs and binding sites. We have previously shown that the original SIOMICS method predicts motifs and binding sites of more cofactors in more accurate and time-effective ways than two popular methods. In this paper, we present the extended SIOMICS method, SIOMICS_Extension, and demonstrate its usage for systematic discovery of cofactor motifs and binding sites. The SIOMICS tool, including SIOMICS and SIOMICS_Extension, are available at http://hulab.ucf.edu/research/projects/SIOMICS/SIOMICS.html. Copyright © 2014 Elsevier Inc. All rights reserved.

  9. Dynamic motifs in socio-economic networks

    Science.gov (United States)

    Zhang, Xin; Shao, Shuai; Stanley, H. Eugene; Havlin, Shlomo

    2014-12-01

    Socio-economic networks are of central importance in economic life. We develop a method of identifying and studying motifs in socio-economic networks by focusing on “dynamic motifs,” i.e., evolutionary connection patterns that, because of “node acquaintances” in the network, occur much more frequently than random patterns. We examine two evolving bi-partite networks: i) the world-wide commercial ship chartering market and ii) the ship build-to-order market. We find similar dynamic motifs in both bipartite networks, even though they describe different economic activities. We also find that “influence” and “persistence” are strong factors in the interaction behavior of organizations. When two companies are doing business with the same customer, it is highly probable that another customer who currently only has business relationship with one of these two companies, will become customer of the second in the future. This is the effect of influence. Persistence means that companies with close business ties to customers tend to maintain their relationships over a long period of time.

  10. Annotating RNA motifs in sequences and alignments.

    Science.gov (United States)

    Gardner, Paul P; Eldai, Hisham

    2015-01-01

    RNA performs a diverse array of important functions across all cellular life. These functions include important roles in translation, building translational machinery and maturing messenger RNA. More recent discoveries include the miRNAs and bacterial sRNAs that regulate gene expression, the thermosensors, riboswitches and other cis-regulatory elements that help prokaryotes sense their environment and eukaryotic piRNAs that suppress transposition. However, there can be a long period between the initial discovery of a RNA and determining its function. We present a bioinformatic approach to characterize RNA motifs, which are critical components of many RNA structure-function relationships. These motifs can, in some instances, provide researchers with functional hypotheses for uncharacterized RNAs. Moreover, we introduce a new profile-based database of RNA motifs--RMfam--and illustrate some applications for investigating the evolution and functional characterization of RNA. All the data and scripts associated with this work are available from: https://github.com/ppgardne/RMfam. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. The Q Motif Is Involved in DNA Binding but Not ATP Binding in ChlR1 Helicase.

    Directory of Open Access Journals (Sweden)

    Hao Ding

    Full Text Available Helicases are molecular motors that couple the energy of ATP hydrolysis to the unwinding of structured DNA or RNA and chromatin remodeling. The conversion of energy derived from ATP hydrolysis into unwinding and remodeling is coordinated by seven sequence motifs (I, Ia, II, III, IV, V, and VI. The Q motif, consisting of nine amino acids (GFXXPXPIQ with an invariant glutamine (Q residue, has been identified in some, but not all helicases. Compared to the seven well-recognized conserved helicase motifs, the role of the Q motif is less acknowledged. Mutations in the human ChlR1 (DDX11 gene are associated with a unique genetic disorder known as Warsaw Breakage Syndrome, which is characterized by cellular defects in genome maintenance. To examine the roles of the Q motif in ChlR1 helicase, we performed site directed mutagenesis of glutamine to alanine at residue 23 in the Q motif of ChlR1. ChlR1 recombinant protein was overexpressed and purified from HEK293T cells. ChlR1-Q23A mutant abolished the helicase activity of ChlR1 and displayed reduced DNA binding ability. The mutant showed impaired ATPase activity but normal ATP binding. A thermal shift assay revealed that ChlR1-Q23A has a melting point value similar to ChlR1-WT. Partial proteolysis mapping demonstrated that ChlR1-WT and Q23A have a similar globular structure, although some subtle conformational differences in these two proteins are evident. Finally, we found ChlR1 exists and functions as a monomer in solution, which is different from FANCJ, in which the Q motif is involved in protein dimerization. Taken together, our results suggest that the Q motif is involved in DNA binding but not ATP binding in ChlR1 helicase.

  12. Poly(A) motif prediction using spectral latent features from human DNA sequences

    KAUST Repository

    Xie, Bo

    2013-06-21

    Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ?30% fewer error predictions relative to the other

  13. Two are better than one: combining landscape genomics and common gardens for detecting local adaptation in forest trees.

    Science.gov (United States)

    Lepais, Olivier; Bacles, Cecile F

    2014-10-01

    Predicting likely species responses to an alteration of their local environment is key to decision-making in resource management, ecosystem restoration and biodiversity conservation practice in the face of global human-induced habitat disturbance. This is especially true for forest trees which are a dominant life form on Earth and play a central role in supporting diverse communities and structuring a wide range of ecosystems. In Europe, it is expected that most forest tree species will not be able to migrate North fast enough to follow the estimated temperature isocline shift given current predictions for rapid climate warming. In this context, a topical question for forest genetics research is to quantify the ability for tree species to adapt locally to strongly altered environmental conditions (Kremer et al. ). Identifying environmental factors driving local adaptation is, however, a major challenge for evolutionary biology and ecology in general but is particularly difficult in trees given their large individual and population size and long generation time. Empirical evaluation of local adaptation in trees has traditionally relied on fastidious long-term common garden experiments (provenance trials) now supplemented by reference genome sequence analysis for a handful of economically valuable species. However, such resources have been lacking for most tree species despite their ecological importance in supporting whole ecosystems. In this issue of Molecular Ecology, De Kort et al. () provide original and convincing empirical evidence of local adaptation to temperature in black alder, Alnus glutinosa L. Gaertn, a surprisingly understudied keystone species supporting riparian ecosystems. Here, De Kort et al. () use an innovative empirical approach complementing state-of-the-art landscape genomics analysis of A. glutinosa populations sampled in natura across a regional climate gradient with phenotypic trait assessment in a common garden experiment (Fig. ). By

  14. Isolation of "Caenorhabditis elegans" Genomic DNA and Detection of Deletions in the "unc-93" Gene Using PCR

    Science.gov (United States)

    Lissemore, James L.; Lackner, Laura L.; Fedoriw, George D.; De Stasio, Elizabeth A.

    2005-01-01

    PCR, genomic DNA isolation, and agarose gel electrophoresis are common molecular biology techniques with a wide range of applications. Therefore, we have developed a series of exercises employing these techniques for an intermediate level undergraduate molecular biology laboratory course. In these exercises, students isolate genomic DNA from the…

  15. Next-generation sequencing detects repetitive elements expansion in giant genomes of annual killifish genus Austrolebias (Cyprinodontiformes, Rivulidae).

    Science.gov (United States)

    García, G; Ríos, N; Gutiérrez, V

    2015-06-01

    Among Neotropical fish fauna, the South American killifish genus Austrolebias (Cyprinodontiformes: Rivulidae) constitutes an excellent model to study the genomic evolutionary processes underlying speciation events. Recently, unusually large genome size has been described in 16 species of this genus, with an average DNA content of about 5.95 ± 0.45 pg per diploid cell (mean C-value of about 2.98 pg). In the present paper we explore the possible origin of this unparallel genomic increase by means of comparative analysis of the repetitive components using NGS (454-Roche) technology in the lowest and highest Rivulidae genomes. Here, we provide the first annotated Rivulidae-repeated sequences composition and their relative repetitive fraction in both genomes. Remarkably, the genomic proportion of the moderately repetitive DNA in Austrolebias charrua genome represents approximately twice (45%) of the repetitive components of the highly related rivulinae taxon Cynopoecilus melanotaenia (25%). Present work provides evidence about the impact of the repeat families that could be distinctly proliferated among sublineages within Rivulidae fish group, explaining the great genome size differences encompassing the differentiation and speciation events in this family.

  16. Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing.

    Science.gov (United States)

    Fang, Gang; Munera, Diana; Friedman, David I; Mandlik, Anjali; Chao, Michael C; Banerjee, Onureena; Feng, Zhixing; Losic, Bojan; Mahajan, Milind C; Jabado, Omar J; Deikus, Gintaras; Clark, Tyson A; Luong, Khai; Murray, Iain A; Davis, Brigid M; Keren-Paz, Alona; Chess, Andrew; Roberts, Richard J; Korlach, Jonas; Turner, Steve W; Kumar, Vipin; Waldor, Matthew K; Schadt, Eric E

    2012-12-01

    Single-molecule real-time (SMRT) DNA sequencing allows the systematic detection of chemical modifications such as methylation but has not previously been applied on a genome-wide scale. We used this approach to detect 49,311 putative 6-methyladenine (m6A) residues and 1,407 putative 5-methylcytosine (m5C) residues in the genome of a pathogenic Escherichia coli strain. We obtained strand-specific information for methylation sites and a quantitative assessment of the frequency of methylation at each modified position. We deduced the sequence motifs recognized by the methyltransferase enzymes present in this strain without prior knowledge of their specificity. Furthermore, we found that deletion of a phage-encoded methyltransferase-endonuclease (restriction-modification; RM) system induced global transcriptional changes and led to gene amplification, suggesting that the role of RM systems extends beyond protecting host genomes from foreign DNA.

  17. The EH1 motif in metazoan transcription factors

    Directory of Open Access Journals (Sweden)

    Copley Richard R

    2005-11-01

    Full Text Available Abstract Background The Engrailed Homology 1 (EH1 motif is a small region, believed to have evolved convergently in homeobox and forkhead containing proteins, that interacts with the Drosophila protein groucho (C. elegans unc-37, Human Transducin-like Enhancers of Split. The small size of the motif makes its reliable identification by computational means difficult. I have systematically searched the predicted proteomes of Drosophila, C. elegans and human for further instances of the motif. Results Using motif identification methods and database searching techniques, I delimit which homeobox and forkhead domain containing proteins also have likely EH1 motifs. I show that despite low database search scores, there is a significant association of the motif with transcription factor function. I further show that likely EH1 motifs are found in combination with T-Box, Zinc Finger and Doublesex domains as well as discussing other plausible candidate associations. I identify strong candidate EH1 motifs in basal metazoan phyla. Conclusion Candidate EH1 motifs exist in combination with a variety of transcription factor domains, suggesting that these proteins have repressor functions. The distribution of the EH1 motif is suggestive of convergent evolution, although in many cases, the motif has been conserved throughout bilaterian orthologs. Groucho mediated repression was established prior to the evolution of bilateria.

  18. RNA structural motif recognition based on least-squares distance.

    Science.gov (United States)

    Shen, Ying; Wong, Hau-San; Zhang, Shaohong; Zhang, Lin

    2013-09-01

    RNA structural motifs are recurrent structural elements occurring in RNA molecules. RNA structural motif recognition aims to find RNA substructures that are similar to a query motif, and it is important for RNA structure analysis and RNA function prediction. In view of this, we propose a new method known as RNA Structural Motif Recognition based on Least-Squares distance (LS-RSMR) to effectively recognize RNA structural motifs. A test set consisting of five types of RNA structural motifs occurring in Escherichia coli ribosomal RNA is compiled by us. Experiments are conducted for recognizing these five types of motifs. The experimental results fully reveal the superiority of the proposed LS-RSMR compared with four other state-of-the-art methods.

  19. CONTEMPORARY USAGE OF TRADITIONAL TURKISH MOTIFS IN PRODUCT DESIGNS

    Directory of Open Access Journals (Sweden)

    Tulay Gumuser

    2012-12-01

    Full Text Available The aim of this study is to identify the traditional Turkish motifs and its relations among present industrial designs. Traditional Turkish motifs played a very important role in 16th century onwards. The arts of the Ottoman Empire were used because of their symbolic meanings and unique styles. When we examine these motifs we encounter; Tiger Stripe, Three Spot (Çintemani, Rumi, Hatayi, Penç, Cloud, Crescent, Star, Crown, Hyacinth, Tulip and Carnation motifs. Nowadays, Turkish designers have begun to use these traditional Turkish motifs in their designs so as to create differences and awareness in the world design. The examples of these industrial designs, using the Turkish motifs, have survived and have Ottoman heritage and historical value. In this study, the Turkish motifs will be examined along with their focus on contemporary Turkish industrial designs used today.

  20. Genome-wide characterization of simple sequence repeats in cucumber (Cucumis sativus L.

    Directory of Open Access Journals (Sweden)

    Simon Philipp W

    2010-10-01

    Full Text Available Abstract Background Cucumber, Cucumis sativus L. is an important vegetable crop worldwide. Until very recently, cucumber genetic and genomic resources, especially molecular markers, have been very limited, impeding progress of cucumber breeding efforts. Microsatellites are short tandemly repeated DNA sequences, which are frequently favored as genetic markers due to their high level of polymorphism and codominant inheritance. Data from previously characterized genomes has shown that these repeats vary in frequency, motif sequence, and genomic location across taxa. During the last year, the genomes of two cucumber genotypes were sequenced including the Chinese fresh market type inbred line '9930' and the North American pickling type inbred line 'Gy14'. These sequences provide a powerful tool for developing markers in a large scale. In this study, we surveyed and characterized the distribution and frequency of perfect microsatellites in 203 Mbp assembled Gy14 DNA sequences, representing 55% of its nuclear genome, and in cucumber EST sequences. Similar analyses were performed in genomic and EST data from seven other plant species, and the results were compared with those of cucumber. Results A total of 112,073 perfect repeats were detected in the Gy14 cucumber genome sequence, accounting for 0.9% of the assembled Gy14 genome, with an overall density of 551.9 SSRs/Mbp. While tetranucleotides were the most frequent microsatellites in genomic DNA sequence, dinucleotide repeats, which had more repeat units than any other SSR type, had the highest cumulative sequence length. Coding regions (ESTs of the cucumber genome had fewer microsatellites compared to its genomic sequence, with trinucleotides predominating in EST sequences. AAG was the most frequent repeat in cucumber ESTs. Overall, AT-rich motifs prevailed in both genomic and EST data. Compared to the other species examined, cucumber genomic sequence had the highest density of SSRs (although

  1. Multiple novel promoter-architectures revealed by decoding the hidden heterogeneity within the genome

    Science.gov (United States)

    Narlikar, Leelavati

    2014-01-01

    An important question in biology is how different promoter-architectures contribute to the diversity in regulation of transcription initiation. A step forward has been the production of genome-wide maps of transcription start sites (TSSs) using high-throughput sequencing. However, the subsequent step of characterizing promoters and their functions is still largely done on the basis of previously established promoter-elements like the TATA-box in eukaryotes or the -10 box in bacteria. Unfortunately, a majority of promoters and their activities cannot be explained by these few elements. Traditional motif discovery methods that identify novel elements also fail here, because TSS neighborhoods are often highly heterogeneous containing no overrepresented motif. We present a new, organism-independent method that explicitly models this heterogeneity while unraveling different promoter-architectures. For example, in five bacteria, we detect the presence of a pyrimidine preceding the TSS under very specific circumstances. In tuberculosis, we show for the first time that the spacing between the bacterial 10-motif and TSS is utilized by the pathogen for dynamic gene-regulation. In eukaryotes, we identify several new elements that are important for development. Identified promoter-architectures show differential patterns of evolution, chromatin structure and TSS spread, suggesting distinct regulatory functions. This work highlights the importance of characterizing heterogeneity within high-throughput genomic data rather than analyzing average patterns of nucleotide composition. PMID:25326324

  2. Performance Evaluation of NIPT in Detection of Chromosomal Copy Number Variants Using Low-Coverage Whole-Genome Sequencing of Plasma DNA

    DEFF Research Database (Denmark)

    Liu, Hongtai; Gao, Ya; Hu, Zhiyang

    2016-01-01

    Objectives The aim of this study was to assess the performance of noninvasively prenatal testing (NIPT) for fetal copy number variants (CNVs) in clinical samples, using a whole-genome sequencing method. Method A total of 919 archived maternal plasma samples with karyotyping/microarray results...... through Maternal Plasma Sequencing (FCAPS) to compare to the karyotyping/microarray results. Sensitivity, specificity and were evaluated. Results 33 samples with deletions/duplications ranging from 1 to 129 Mb were detected with the consistent CNV size and location to karyotyping/microarray results......, including 33 CNVs samples and 886 normal samples from September 1, 2011 to May 31, 2013, were enrolled in this study. The samples were randomly rearranged and blindly sequenced by low-coverage (about 7M reads) whole-genome sequencing of plasma DNA. Fetal CNVs were detected by Fetal Copy-number Analysis...

  3. GenomEra MRSA/SA, a fully automated homogeneous PCR assay for rapid detection of Staphylococcus aureus and the marker of methicillin resistance in various sample matrixes.

    Science.gov (United States)

    Hirvonen, Jari J; Kaukoranta, Suvi-Sirkku

    2013-09-01

    The GenomEra MRSA/SA assay (Abacus Diagnostica, Turku, Finland) is the first commercial homogeneous PCR assay using thermally stable, intrinsically fluorescent time-resolved fluorometric (TRF) labels resistant to autofluorescence and other background effects. This fully automated closed tube PCR assay simultaneously detects Staphylococcus aureus specific DNA and the mecA gene within 50 min. It can be used for both screening and confirmation of methicillin-resistant and -sensitive S. aureus (MRSA and MSSA) directly in different specimen types or from preceding cultures. The assay has shown excellent performance in comparisons with other diagnostic methods in all the sample types tested. The GenomEra MRSA/SA assay provides rapid assistance for the detection of MRSA as well as invasive staphylococcal infections and helps the early targeting of antimicrobial therapy to patients with potential MRSA infection.

  4. Assessing the Exceptionality of Coloured Motifs in Networks

    Directory of Open Access Journals (Sweden)

    Lacroix Vincent

    2009-01-01

    Full Text Available Various methods have been recently employed to characterise the structure of biological networks. In particular, the concept of network motif and the related one of coloured motif have proven useful to model the notion of a functional/evolutionary building block. However, algorithms that enumerate all the motifs of a network may produce a very large output, and methods to decide which motifs should be selected for downstream analysis are needed. A widely used method is to assess if the motif is exceptional, that is, over- or under-represented with respect to a null hypothesis. Much effort has been put in the last thirty years to derive -values for the frequencies of topological motifs, that is, fixed subgraphs. They rely either on (compound Poisson and Gaussian approximations for the motif count distribution in Erdös-Rényi random graphs or on simulations in other models. We focus on a different definition of graph motifs that corresponds to coloured motifs. A coloured motif is a connected subgraph with fixed vertex colours but unspecified topology. Our work is the first analytical attempt to assess the exceptionality of coloured motifs in networks without any simulation. We first establish analytical formulae for the mean and the variance of the count of a coloured motif in an Erdös-Rényi random graph model. Using simulations under this model, we further show that a Pólya-Aeppli distribution better approximates the distribution of the motif count compared to Gaussian or Poisson distributions. The Pólya-Aeppli distribution, and more generally the compound Poisson distributions, are indeed well designed to model counts of clumping events. Altogether, these results enable to derive a -value for a coloured motif, without spending time on simulations.

  5. A linear time algorithm for detecting long genomic regions enriched with a specific combination of epigenetic states.

    Science.gov (United States)

    Ichikawa, Kazuki; Morishita, Shinichi

    2015-01-01

    Epigenetic modifications are essential for controlling gene expression. Recent studies have shown that not only single epigenetic modifications but also combinations of multiple epigenetic modifications play vital roles in gene regulation. A striking example is the long hypomethylated regions enriched with modified H3K27me3 (called, "K27HMD" regions), which are exposed to suppress the expression of key developmental genes relevant to cellular development and differentiation during embryonic stages in vertebrates. It is thus a biologically important issue to develop an effective optimization algorithm for detecting long DNA regions (e.g., >4 kbp in size) that harbor a specific combination of epigenetic modifications (e.g., K27HMD regions). However, to date, optimization algorithms for these purposes have received little attention, and available methods are still heuristic and ad hoc. In this paper, we propose a linear time algorithm for calculating a set of non-overlapping regions that maximizes the sum of similarities between the vector of focal epigenetic states and the vectors of raw epigenetic states at DNA positions in the set of regions. The average elapsed time to process the epigenetic data of any of human chromosomes was less than 2 seconds on an Intel Xeon CPU. To demonstrate the effectiveness of the algorithm, we estimated large K27HMD regions in the medaka and human genomes using our method, ChromHMM, and a heuristic method. We confirmed that the advantages of our method over those of the two other methods. Our method is flexible enough to handle other types of epigenetic combinations. The program that implements the method is called "CSMinfinder" and is made available at: http://mlab.cb.k.u-tokyo.ac.jp/~ichikawa/Segmentation/

  6. Systematic evaluation of the impact of ChIP-seq read designs on genome coverage, peak identification, and allele-specific binding detection.

    Science.gov (United States)

    Zhang, Qi; Zeng, Xin; Younkin, Sam; Kawli, Trupti; Snyder, Michael P; Keleş, Sündüz

    2016-02-24

    Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36-50 bps), long (75-100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection. We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell lines. Systematic evaluations using in silico variations of these data as well as fully simulated data, revealed complex interplay between the sequencing parameters and analysis tools, and indicated clear advantages of paired-end designs in several aspects such as alignment accuracy, peak resolution, and most notably, allele-specific binding detection. Our work elucidates the effect of design on the downstream analysis and provides insights to investigators in deciding sequencing parameters in ChIP-seq experiments. We present the first systematic evaluation of the impact of ChIP-seq designs on allele-specific binding detection and highlights the power of pair-end designs in such studies.

  7. Selection for Unequal Densities of Sigma70 Promoter-like Signalsin Different Regions of Large Bacterial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Huerta, Araceli M.; Francino, M. Pilar; Morett, Enrique; Collado-Vides, Julio

    2006-03-01

    The evolutionary processes operating in the DNA regions that participate in the regulation of gene expression are poorly understood. In Escherichia coli, we have established a sequence pattern that distinguishes regulatory from nonregulatory regions. The density of promoter-like sequences, that are recognizable by RNA polymerase and may function as potential promoters, is high within regulatory regions, in contrast to coding regions and regions located between convergently-transcribed genes. Moreover, functional promoter sites identified experimentally are often found in the subregions of highest density of promoter-like signals, even when individual sites with higher binding affinity for RNA polymerase exist elsewhere within the regulatory region. In order to investigate the generality of this pattern, we have used position weight matrices describing the -35 and -10 promoter boxes of E. coli to search for these motifs in 43 additional genomes belonging to most established bacterial phyla, after specific calibration of the matrices according to the base composition of the noncoding regions of each genome. We have found that all bacterial species analyzed contain similar promoter-like motifs, and that, in most cases, these motifs follow the same genomic distribution observed in E. coli. Differential densities between regulatory and nonregulatory regions are detectable in most bacterial genomes, with the exception of those that have experienced evolutionary extreme genome reduction. Thus, the phylogenetic distribution of this pattern mirrors that of genes and other genomic features that require weak selection to be effective in order to persist. On this basis, we suggest that the loss of differential densities in the reduced genomes of host-restricted pathogens and symbionts is the outcome of a process of genome degradation resulting from the decreased efficiency of purifying selection in highly structured small populations. This implies that the differential

  8. UKIRAN KERAWANG ACEH GAYO SEBAGAI INSPIRASI PENCIPTAAN MOTIF BATIK KHAS GAYO

    Directory of Open Access Journals (Sweden)

    Irfa ina Rohana Salma

    2016-12-01

    Full Text Available ABSTRAK Industri batik mulai berkembang di Gayo, tetapi belum memiliki motif batik khas daerah. Oleh karena itu perlu diciptakan motif batik khas Gayo, dengan mengambil inspirasi dari ukiran yang terdapat pada rumah tradisional yang biasa disebut ukiran kerawang Gayo. Tujuan penciptaan seni ini adalah untuk menciptakan motif batik yang memiliki ciri khas Gayo. Metode yang digunakan yaitu eksplorasi ide, perancangan, dan perwujudan menjadi motif batik. Dalam kegiatan ini telah diciptakan enam motif batik khas Gayo yaitu: (1 Motif Ceplok Gayo; (2 Motif Gayo Tegak; (3 Motif Gayo Lurus; (4 Motif Parang Gayo; (5 Motif Gayo Lembut; dan (6 Motif Geometris Gayo. Hasil uji kesukaan terhadap motif kepada lima puluh responden menunjukkan bahwa Motif Ceplok Gayo paling banyak dipilih oleh responden yaitu sebesar 19%, sedangkan Motif Parang Gayo 18%, Motif Gayo Lembut 17%, Motif Geometris Gayo 17%, Motif Gayo Lurus 15% dan Motif Gayo Tegak 14%. Rata-rata motif yang dihasilkan mendapatkan apresiasi yang baik dari responden, sehingga semua motif layak diproduksi sebagai batik khas Gayo.Kata kunci: batik Gayo, Motif Ceplok Gayo, Motif Parang Gayo.ABSTRACTBatik industry began to develop in Gayo, but have not had a typical batik motif itself. Therefore, it is necessary to create batik motifs of Gayo, by taking inspiration from the carvings found in traditional houses commonly called kerawang Gayo. The purpose of this art is to create motifs those have a Gayo characteristic. The method used are the idea exploration, design, and motifs embodiment. In this activity has created six Gayo batik motifs, namely: (1 Motif Ceplok Gayo; (2 Motif Gayo Tegak; (3 Motif GayoLurus; (4 Motif Parang Gayo; (5 Motif Gayo Lembut; dan (6 Motif Geometris Gayo. The test results fondness of the motives to fifty respondents indicated that the Motif Ceplok Gayo most preferred by respondents ie 19%, while Motif Parang Gayo 18%, Motif Gayo Lembut 17%, Motif Geometris Gayo 17%, Motif Gayo

  9. Dynamics of Fibril Growth and Feedback Motifs

    DEFF Research Database (Denmark)

    Cordsen, Pia

    lumped and long, straight fibrils. Previous results on real time observation of fibrils were successfully reproduced using mixed conditions of both sodium dodecyl sulfate and seeds but not when using only one of the two. The dynamics of a three-species network motif, consisting of a predator and two...... which of the two competitors is better and if one of them will become extinct. Further it is found that in the range of coexistence between the two preys, the better one peaks first....

  10. Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.

    Science.gov (United States)

    Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C

    2018-01-10

    Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing

  11. Dipeptide frequency/bias analysis identifies conserved sites of nonrandomness shared by cysteine-rich motifs.

    Science.gov (United States)

    Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N

    2001-08-15

    This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.

  12. Multiplex PCR for detection of the Vibrio genus and five pathogenic Vibrio species with primer sets designed using comparative genomics

    OpenAIRE

    Kim, Hyun-Joong; Ryu, Ji-Oh; Lee, Shin-Young; Kim, Ei-Seul; Kim, Hae-Yeong

    2015-01-01

    Background The genus Vibrio is clinically significant and major pathogenic Vibrio species causing human Vibrio infections are V. cholerae, V. parahaemolyticus, V. vulnificus, V. alginolyticus and V. mimicus. In this study, we screened for novel genetic markers using comparative genomics and developed a Vibrio multiplex PCR for the reliable diagnosis of the Vibrio genus and the associated major pathogenic Vibrio species. Methods A total of 30 Vibrio genome sequences were subjected to comparati...

  13. Development and validation of cross-transferable and polymorphic DNA markers for detecting alien genome introgression in Oryza sativa from Oryza brachyantha.

    Science.gov (United States)

    Ray, Soham; Bose, Lotan K; Ray, Joshitha; Ngangkham, Umakanta; Katara, Jawahar L; Samantaray, Sanghamitra; Behera, Lambodar; Anumalla, Mahender; Singh, Onkar N; Chen, Meingsheng; Wing, Rod A; Mohapatra, Trilochan

    2016-08-01

    African wild rice Oryza brachyantha (FF), a distant relative of cultivated rice Oryza sativa (AA), carries genes for pests and disease resistance. Molecular marker assisted alien gene introgression from this wild species to its domesticated counterpart is largely impeded due to the scarce availability of cross-transferable and polymorphic molecular markers that can clearly distinguish these two species. Availability of the whole genome sequence (WGS) of both the species provides a unique opportunity to develop markers, which are cross-transferable. We observed poor cross-transferability (~0.75 %) of O. sativa specific sequence tagged microsatellite (STMS) markers to O. brachyantha. By utilizing the genome sequence information, we developed a set of 45 low cost PCR based co-dominant polymorphic markers (STS and CAPS). These markers were found cross-transferrable (84.78 %) between the two species and could distinguish them from each other and thus allowed tracing alien genome introgression. Finally, we validated a Monosomic Alien Addition Line (MAAL) carrying chromosome 1 of O. brachyantha in O. sativa background using these markers, as a proof of concept. Hence, in this study, we have identified a set molecular marker (comprising of STMS, STS and CAPS) that are capable of detecting alien genome introgression from O. brachyantha to O. sativa.

  14. A comparative genomics approach revealed evolutionary dynamics of microsatellite imperfection and conservation in genus Gossypium.

    Science.gov (United States)

    Ahmed, Muhammad Mahmood; Shen, Chao; Khan, Anam Qadir; Wahid, Muhammad Atif; Shaban, Muhammad; Lin, Zhongxu

    2017-01-01

    Ongoing molecular processes in a cell could target microsatellites, a kind of repetitive DNA, owing to length variations and motif imperfection. Mutational mechanisms underlying such kind of genetic variations have been extensively investigated in diverse organisms. However, obscure impact of ploidization, an evolutionary process of genome content duplication prevails mostly in plants, on non-coding DNA is poorly understood. Genome sequences of diversely originated plant species were examined for genome-wide motif imperfection pattern, and various analytical tools were employed to canvass characteristic relationships among repeat density, imperfection and length of microsatellites. Moreover, comparative genomics approach aided in exploration of microsatellites conservation footprints in Gossypium evolution. Based on our results, motif imperfection in repeat length was found intricately related to genomic abundance of imperfect microsatellites among 13 genomes. Microsatellite decay estimation depicted slower decay of long motif repeats which led to predominant abundance of 5-nt repeat motif in Gossypium species. Short motif repeats exhibited rapid decay through the evolution of Gossypium lineage ensuing drastic decrease of 2-nt repeats, of which, "AT" motif type dilapidated in cultivated tetraploids of cotton. The outcome could be a directive to explore comparative evolutionary footprints of simple non-coding genetic elements i.e., repeat elements, through the evolution of genus-specific characteristics in cotton genomes.

  15. GNG Motifs Can Replace a GGG Stretch during G-Quadruplex Formation in a Context Dependent Manner.

    Science.gov (United States)

    Das, Kohal; Srivastava, Mrinal; Raghavan, Sathees C

    2016-01-01

    G-quadruplexes are one of the most commonly studied non-B DNA structures. Generally, these structures are formed using a minimum of 4, three guanine tracts, with connecting loops ranging from one to seven. Recent studies have reported deviation from this general convention. One such deviation is the involvement of bulges in the guanine tracts. In this study, guanines along with bulges, also referred to as GNG motifs have been extensively studied using recently reported HOX11 breakpoint fragile region I as a model template. By strategic mutagenesis approach we show that the contribution from continuous G-tracts may be dispensible during G-quadruplex formation when such motifs are flanked by GNGs. Importantly, the positioning and number of GNG/GNGNG can also influence the formation of G-quadruplexes. Further, we assessed three genomic regions from HIF1 alpha, VEGF and SHOX gene for G-quadruplex formation using GNG motifs. We show that HIF1 alpha sequence harbouring GNG motifs can fold into intramolecular G-quadruplex. In contrast, GNG motifs in mutant VEGF sequence could not participate in structure formation, suggesting that the usage of GNG is context dependent. Importantly, we show that when two continuous stretches of guanines are flanked by two independent GNG motifs in a naturally occurring sequence (SHOX), it can fold into an intramolecular G-quadruplex. Finally, we show the specific binding of G-quadruplex binding protein, Nucleolin and G-quadruplex antibody, BG4 to SHOX G-quadruplex. Overall, our study provides novel insights into the role of GNG motifs in G-quadruplex structure formation which may have both physiological and pathological implications.

  16. GNG Motifs Can Replace a GGG Stretch during G-Quadruplex Formation in a Context Dependent Manner.

    Directory of Open Access Journals (Sweden)

    Kohal Das

    Full Text Available G-quadruplexes are one of the most commonly studied non-B DNA structures. Generally, these structures are formed using a minimum of 4, three guanine tracts, with connecting loops ranging from one to seven. Recent studies have reported deviation from this general convention. One such deviation is the involvement of bulges in the guanine tracts. In this study, guanines along with bulges, also referred to as GNG motifs have been extensively studied using recently reported HOX11 breakpoint fragile region I as a model template. By strategic mutagenesis approach we show that the contribution from continuous G-tracts may be dispensible during G-quadruplex formation when such motifs are flanked by GNGs. Importantly, the positioning and number of GNG/GNGNG can also influence the formation of G-quadruplexes. Further, we assessed three genomic regions from HIF1 alpha, VEGF and SHOX gene for G-quadruplex formation using GNG motifs. We show that HIF1 alpha sequence harbouring GNG motifs can fold into intramolecular G-quadruplex. In contrast, GNG motifs in mutant VEGF sequence could not participate in structure formation, suggesting that the usage of GNG is context dependent. Importantly, we show that when two continuous stretches of guanines are flanked by two independent GNG motifs in a naturally occurring sequence (SHOX, it can fold into an intramolecular G-quadruplex. Finally, we show the specific binding of G-quadruplex binding protein, Nucleolin and G-quadruplex antibody, BG4 to SHOX G-quadruplex. Overall, our study provides novel insights into the role of GNG motifs in G-quadruplex structure formation which may have both physiological and pathological implications.

  17. An Affinity Propagation-Based DNA Motif Discovery Algorithm

    Directory of Open Access Journals (Sweden)

    Chunxiao Sun

    2015-01-01

    Full Text Available The planted (l,d motif search (PMS is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy.

  18. Rapid detection of Salmonella in raw chicken breast using real-time PCR combined with immunomagnetic separation and whole genome amplification.

    Science.gov (United States)

    Hyeon, Ji-Yeon; Deng, Xiangyu

    2017-05-01

    We presented the first attempt to combine immunomagnetic separation (IMS), whole genome amplification by multiple displacement amplification (MDA) and real-time PCR for detecting a bacterial pathogen in a food sample. This method was effective in enabling real-time PCR detection of low levels of Salmonella enterica Serotype Enteritidis (SE) (∼10 CFU/g) in raw chicken breast without culture enrichment. In addition, it was able to detect refrigeration-stressed SE cells at lower concentrations (∼0.1 CFU/g) in raw chicken breast after a 4-h culture enrichment, shortening the detection process from days to hours and displaying no statistical difference in detection rate in comparison with a culture-based detection method. By substantially improving performance in SE detection over conventional real-time PCR, we demonstrated the potential of IMS-MDA real-time PCR as a rapid, sensitive and affordable method for detecting Salmonella in food. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. Crystal structure of bacterial cell-surface alginate-binding protein with an M75 peptidase motif.

    Science.gov (United States)

    Maruyama, Yukie; Ochiai, Akihito; Mikami, Bunzo; Hashimoto, Wataru; Murata, Kousaku

    2011-02-18

    A gram-negative Sphingomonas sp. A1 directly incorporates alginate polysaccharide into the cytoplasm via the cell-surface pit and ABC transporter. A cell-surface alginate-binding protein, Algp7, functions as a concentrator of the polysaccharide in the pit. Based on the primary structure and genetic organization in the bacterial genome, Algp7 was found to be homologous to an M75 peptidase motif-containing EfeO, a component of a ferrous ion transporter. Despite the presence of an M75 peptidase motif with high similarity, the Algp7 protein purified from recombinant Escherichia coli cells was inert on insulin B chain and N-benzoyl-Phe-Val-Arg-p-nitroanilide, both of which are substrates for a typical M75 peptidase, imelysin, from Pseudomonas aeruginosa. The X-ray crystallographic structure of Algp7 was determined at 2.10Å resolution by single-wavelength anomalous diffraction. Although a metal-binding motif, HxxE, conserved in zinc ion-dependent M75 peptidases is also found in Algp7, the crystal structure of Algp7 contains no metal even at the motif. The protein consists of two structurally similar up-and-down helical bundles as the basic scaffold. A deep cleft between the bundles is sufficiently large to accommodate macromolecules such as alginate polysaccharide. This is the first structural report on a bacterial cell-surface alginate-binding protein with an M75 peptidase motif. Copyright © 2011 Elsevier Inc. All rights reserved.

  20. Periodic Distribution of a Putative Nucleosome Positioning Motif in Human, Nonhuman Primates, and Archaea: Mutual Information Analysis

    Science.gov (United States)

    Sosa, Daniela; Miramontes, Pedro; Li, Wentian; Mireles, Víctor; Bobadilla, Juan R.; José, Marco V.

    2013-01-01

    Recently, Trifonov's group proposed a 10-mer DNA motif YYYYYRRRRR as a solution of the long-standing problem of sequence-based nucleosome positioning. To test whether this generic decamer represents a biological meaningful signal, we compare the distribution of this motif in primates and Archaea, which are known to contain nucleosomes, and in Eubacteria, which do not possess nucleosomes. The distribution of the motif is analyzed by the mutual information function (MIF) with a shifted version of itself (MIF profile). We found common features in the patterns of this generic decamer on MIF profiles among primate species, and interestingly we found conspicuous but dissimilar MIF profiles for each Archaea tested. The overall MIF profiles for each chromosome in each primate species also follow a similar pattern. Trifonov's generic decamer may be a highly conserved motif for the nucleosome positioning, but we argue that this is not the only motif. The distribution of this generic decamer exhibits previously unidentified periodicities, which are associated to highly repetitive sequences in the genome. Alu repetitive elements contribute to the most fundamental structure of nucleosome positioning in higher Eukaryotes. In some regions of primate chromosomes, the distribution of the decamer shows symmetrical patterns including inverted repeats. PMID:23841049

  1. Novel Structural and Functional Motifs in cellulose synthase (CesA Genes of Bread Wheat (Triticum aestivum, L..

    Directory of Open Access Journals (Sweden)

    Simerjeet Kaur

    Full Text Available Cellulose is the primary determinant of mechanical strength in plant tissues. Late-season lodging is inversely related to the amount of cellulose in a unit length of the stem. Wheat is the most widely grown of all the crops globally, yet information on its CesA gene family is limited. We have identified 22 CesA genes from bread wheat, which include homoeologs from each of the three genomes, and named them as TaCesAXA, TaCesAXB or TaCesAXD, where X denotes the gene number and the last suffix stands for the respective genome. Sequence analyses of the CESA proteins from wheat and their orthologs from barley, maize, rice, and several dicot species (Arabidopsis, beet, cotton, poplar, potato, rose gum and soybean revealed motifs unique to monocots (Poales or dicots. Novel structural motifs CQIC and SVICEXWFA were identified, which distinguished the CESAs involved in the formation of primary and secondary cell wall (PCW and SCW in all the species. We also identified several new motifs specific to monocots or dicots. The conserved motifs identified in this study possibly play functional roles specific to PCW or SCW formation. The new insights from this study advance our knowledge about the structure, function and evolution of the CesA family in plants in general and wheat in particular. This information will be useful in improving culm strength to reduce lodging or alter wall composition to improve biofuel production.

  2. Transcription factor binding site positioning in yeast: proximal promoter motifs characterize TATA-less promoters.

    Science.gov (United States)

    Erb, Ionas; van Nimwegen, Erik

    2011-01-01

    The availability of sequence specificities for a substantial fraction of yeast's transcription factors and comparative genomic algorithms for binding site prediction has made it possible to comprehensively annotate transcription factor binding sites genome-wide. Here we use such a genome-wide annotation for comprehensively studying promoter architecture in yeast, focusing on the distribution of transcription factor binding sites relative to transcription start sites, and the architecture of TATA and TATA-less promoters. For most transcription factors, binding sites are positioned further upstream and vary over a wider range in TATA promoters than in TATA-less promoters. In contrast, a group of 6 'proximal promoter motifs' (GAT1/GLN3/DAL80, FKH1/2, PBF1/2, RPN4, NDT80, and ROX1) occur preferentially in TATA-less promoters and show a strong preference for binding close to the transcription start site in these promoters. We provide evidence that suggests that pre-initiation complexes are recruited at TATA sites in TATA promoters and at the sites of the other proximal promoter motifs in TATA-less promoters. TATA-less promoters can generally be classified by the proximal promoter motif they contain, with different classes of TATA-less promoters showing different patterns of transcription factor binding site positioning and nucleosome coverage. These observations suggest that different modes of regulation of transcription initiation may be operating in the different promoter classes. In addition we show that, across all promoter classes, there is a close match between nucleosome free regions and regions of highest transcription factor binding site density. This close agreement between transcription factor binding site density and nucleosome depletion suggests a direct and general competition between transcription factors and nucleosomes for binding to promoters.

  3. Detection of selection signatures of population-specific genomic regions selected during domestication process in Jinhua pigs.

    Science.gov (United States)

    Li, Zhengcao; Chen, Jiucheng; Wang, Zhen; Pan, Yuchun; Wang, Qishan; Xu, Ningying; Wang, Zhengguang

    2016-12-01

    Chinese pigs have been undergoing both natural and artificial selection for thousands of years. Jinhua pigs are of great importance, as they can be a valuable model for exploring the genetic mechanisms linked to meat quality and other traits such as disease resistance, reproduction and production. The purpose of this study was to identify distinctive footprints of selection between Jinhua pigs and other breeds utilizing genome-wide SNP data. Genotyping by genome reducing and sequencing was implemented in order to perform cross-population extended haplotype homozygosity to reveal strong signatures of selection for those economically important traits. This work was performed at a 2% genome level, which comprised 152 006 SNPs genotyped in a total of 517 individuals. Population-specific footprints of selective sweeps were searched for in the genome of Jinhua pigs using six native breeds and three European breeds as reference groups. Several candidate genes associated with meat quality, health and reproduction, such as GH1, CRHR2, TRAF4 and CCK, were found to be overlapping with the significantly positive outliers. Additionally, the results revealed that some genomic regions associated with meat quality, immune response and reproduction in Jinhua pigs have evolved directionally under domestication and subsequent selections. The identified genes and biological pathways in Jinhua pigs showed different selection patterns in comparison with the Chinese and European breeds. © 2016 Stichting International Foundation for Animal Genetics.

  4. STEME: a robust, accurate motif finder for large data sets.

    Directory of Open Access Journals (Sweden)

    John E Reid

    Full Text Available Motif finding is a difficult problem that has been studied for over 20 years. Some older popular motif finders are not suitable for analysis of the large data sets generated by next-generation sequencing. We recently published an efficient approximation (STEME to the EM algorithm that is at the core of many motif finders such as MEME. This approximation allows the EM algorithm to be applied to large data sets. In this work we describe several efficient extensions to STEME that are based on the MEME algorithm. Together with the original STEME EM approximation, these extensions make STEME a fully-fledged motif finder with similar properties to MEME. We discuss the difficulty of objectively comparing motif finders. We show that STEME performs comparably to existing prominent discriminative motif finders, DREME and Trawler, on 13 sets of transcription factor binding data in mouse ES cells. We demonstrate the ability of STEME to find long degenerate motifs which these discriminative motif finders do not find. As part of our method, we extend an earlier method due to Nagarajan et al. for the efficient calculation of motif E-values. STEME's source code is available under an open source license and STEME is available via a web interface.

  5. Flow Cytometric DNA index, G-band Karyotyping, and Comparative Genomic Hybridization in Detection of High Hyperdiploidy in Childhood Acute Lymphoblastic Leukemia

    DEFF Research Database (Denmark)

    Nygaard, Ulrikka; Larsen, Jacob; Kristensen, Tim D

    2006-01-01

    High hyperdiploid acute lymphoblastic leukemia in children is related to a good outcome. Because these patients may be stratified to a low-intensity treatment, we have investigated the sensitivity of flow cytometry (FCM), G-band karyotyping (GBK), and high-resolution comparative genomic hybridiza......High hyperdiploid acute lymphoblastic leukemia in children is related to a good outcome. Because these patients may be stratified to a low-intensity treatment, we have investigated the sensitivity of flow cytometry (FCM), G-band karyotyping (GBK), and high-resolution comparative genomic...... hybridization (HR-CGH) in detecting high hyperdiploid leukemic clones. Twenty-six girls and 34 boys with acute lymphoblastic leukemia diagnosed in 1998 to 1999 were analyzed by FCM, GBK, and HR-CGH. The correlations between DNA indices obtained by FCM, GBK, and HR-CGH were significant (rs=0.61 to 0.77; P

  6. De Novo Assembly of Human Herpes Virus Type 1 (HHV-1) Genome, Mining of Non-Canonical Structures and Detection of Novel Drug-Resistance Mutations Using Short- and Long-Read Next Generation Sequencing Technologies.

    Science.gov (United States)

    Karamitros, Timokratis; Harrison, Ian; Piorkowska, Renata; Katzourakis, Aris; Magiorkinis, Gkikas; Mbisa, Jean Lutamyo

    2016-01-01

    Human herpesvirus type 1 (HHV-1) has a large double-stranded DNA genome of approximately 152 kbp that is structurally complex and GC-rich. This makes the assembly of HHV-1 whole genomes from short-read sequencing data technically challenging. To improve the assembly of HHV-1 genomes we have employed a hybrid genome assembly protocol using data from two sequencing technologies: the short-read Roche 454 and the long-read Oxford Nanopore MinION sequencers. We sequenced 18 HHV-1 cell culture-isolated clinical specimens collected from immunocompromised patients undergoing antiviral therapy. The susceptibility of the samples to several antivirals was determined by plaque reduction assay. Hybrid genome assembly resulted in a decrease in the number of contigs in 6 out of 7 samples and an increase in N(G)50 and N(G)75 of all 7 samples sequenced by both technologies. The approach also enhanced the detection of non-canonical contigs including a rearrangement between the unique (UL) and repeat (T/IRL) sequence regions of one sample that was not detectable by assembly of 454 reads alone. We detected several known and novel resistance-associated mutations in UL23 and UL30 genes. Genome-wide genetic variability ranged from genomes will be useful in determining genetic determinants of drug resistance, virulence, pathogenesis and viral evolution. The numerous, complex repeat regions of the HHV-1 genome currently remain a barrier towards this goal.

  7. Use of conserved genomic regions and degenerate primers in a PCR-based assay for the detection of members of the genus Caulimovirus.

    Science.gov (United States)

    Pappu, H R; Druffel, K L

    2009-04-01

    The genus Caulimovirus consists of several distinct virus species with a double-stranded DNA genome that infect diverse plant species. A comparative analysis of the sequences of known Caulimovirus species revealed two regions that are conserved in all Caulimovirus species with the exception of Strawberry vein banding virus. Degenerate primers based on these two regions were designed and tested in a polymerase chain reaction-based assay for broad spectrum detection of members of this genus. Cauliflower mosaic virus, Figwort mosaic virus and three distinct caulimoviruses associated with dahlia (Dahlia variabilis) were used to show the utility of this test in detecting diverse caulimoviruses. The primer pair gave an amplicon of expected size (840bp). Amplicons from each virus were cloned and sequenced to verify their identity. The primer pair and the PCR assay provide approach for the broad spectrum detection of several members of the genus Caulimovirus.

  8. Genome variability in European and American bison detected using the BovineSNP50 BeadChip

    DEFF Research Database (Denmark)

    Pertoldi, C.; Wójcik, Jan M; Tokarska, Małgorzata

    2010-01-01

     The remaining wild populations of bison have all been through severe bottlenecks. The genomic consequences of these bottlenecks present an interesting area to study. Using a very large panel of SNPs developed in Bos taurus we have carried out a genome-wide screening on the European bison (Bison...... bonasus; EB) and on two subspecies of American bison: the plains bison (B. bison bison; PB) and the wood bison (B. bison athabascae; WB). One hundred bison samples were genotyped for 52,978 SNPs along with seven breeds of domestic bovine Bos taurus. Only 2,209 of the SNPs were polymorphic in the bison...

  9. Rekayasa Pengembangan Desain Motif Batik Khas Melayu

    Directory of Open Access Journals (Sweden)

    Eustasia Sri Murwati

    2016-04-01

    Full Text Available ABSTRAKPengembangan desain batik melalui rancang bangun perekayasaan desain menurut ragam hias Melayu meliputi pengembangan motif dan proses, termasuk pemilihan komposisi warna. Proses yang sering dilakukan yaitu proses celup, penghilangan lilin dan celup warna tumpangan atau proses colet, celup, penghilangan lilin atau celup kemudian penghilangan lilin yang disebut Batik Kelengan. Setiap pulau di Indonesia mempunyai ciri khas budaya dan kesenian yang dikenal dengan corak/ragam hias khas daerah, juga ornamen yang diminati oleh masyarakat dari daerah tersebut atau dari daerah lain. Kondisi demikian mendorong pertumbuhan industri kerajinan yang memanfaatkan unsur–unsur seni. Adapun motif yang diperoleh adalah: Ayam Berlaga, Bungo Matahari, Kuntum Bersanding, Lancang Kuning, Encong Kerinci, Durian Pecah, Bungo Bintang, Bungo Pauh Kecil, Riang-riang, Bungo Nagaro. Pengembangan desain tersebut dipilih 3 produk terbaik yang dinilai oleh 5 penilai yang ahli di bidang desain batik, yaitu motif Durian Pecah, Ayam Berlaga, dan Bungo Matahari. Rancang bangun diversifikasi desain dengan memanfaatkan unsur–unsur seni dan ketrampilan etnis Melayu yaitu pemilihan ragam hias dan motif batik Melayu untuk diterapkan ke bahan sandang dengan komposisi warna yang menarik, sehingga produk memenuhi selera konsumen. Memperbaiki keberagaman batik dengan meningkatkan desain produk antara lain menuangkan ragam hias Melayu ke dalam proses batik yang menggunakan berbagai macam warna sehingga komposisi warna memadai. Diperoleh hasil produk batik dengan ragam hias Melayu yang berkualitas dan komposisi warna yang sesuai dengan karakter ragam hias Melayu. Rancang bangun desain produk untuk mendapatkan formulasi desain serta kelayakan prosesnya dengan penekanan pada teknologi akrab lingkungan dilaksanakan dengan alternatif pendekatan yaitu penciptaan desain bentuk baru.Kata kunci: desain, batik, rancang bangun, ragam hias, MelayuABSTRACTDevelopment of batik design through

  10. CEGA--a catalog of conserved elements from genomic alignments.

    Science.gov (United States)

    Dousse, Aline; Junier, Thomas; Zdobnov, Evgeny M

    2016-01-04

    By identifying genomic sequence regions conserved among several species, comparative genomics offers opportunities to discover putatively functional elements without any prior knowledge of what these functions might be. Comparative analyses across mammals estimated 4-5% of the human genome to be functionally constrained, a much larger fraction than the 1-2% occupied by annotated protein-coding or RNA genes. Such functionally constrained yet unannotated regions have been referred to as conserved non-coding sequences (CNCs) or ultra-conserved elements (UCEs), which remain largely uncharacterized but probably form a highly heterogeneous group of elements including enhancers, promoters, motifs, and others. To facilitate the study of such CNCs/UCEs, we present our resource of Conserved Elements from Genomic Alignments (CEGA), accessible from http://cega.ezlab.org. Harnessing the power of multiple species comparisons to detect genomic elements under purifying selection, CEGA provides a comprehensive set of CNCs identified at different radiations along the vertebrate lineage. Evolutionary constraint is identified using threshold-free phylogenetic modeling of unbiased and sensitive global alignments of genomic synteny blocks identified using protein orthology. We identified CNCs independently for five vertebrate clades, each referring to a different last common ancestor and therefore to an overlapping but varying set of CNCs with 24 488 in vertebrates, 241 575 in amniotes, 709 743 in Eutheria, 642 701 in Boreoeutheria and 612 364 in Euarchontoglires, spanning from 6 Mbp in vertebrates to 119 Mbp in Euarchontoglires. The dynamic CEGA web interface displays alignments, genomic locations, as well as biologically relevant data to help prioritize and select CNCs of interest for further functional investigations. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Encoded expansion: an efficient algorithm to discover identical string motifs.

    Science.gov (United States)

    Azmi, Aqil M; Al-Ssulami, Abdulrakeeb

    2014-01-01

    A major task in computational biology is the discovery of short recurring string patterns known as motifs. Most of the schemes to discover motifs are either stochastic or combinatorial in nature. Stochastic approaches do not guarantee finding the correct motifs, while the combinatorial schemes tend to have an exponential time complexity with respect to motif length. To alleviate the cost, the combinatorial approach exploits dynamic data structures such as trees or graphs. Recently (Karci (2009) Efficient automatic exact motif discovery algorithms for biological sequences, Expert Systems with Applications 36:7952-7963) devised a deterministic algorithm that finds all the identical copies of string motifs of all sizes [Formula: see text] in theoretical time complexity of [Formula: see text] and a space complexity of [Formula: see text] where [Formula: see text] is the length of the input sequence and [Formula: see text] is the length of the longest possible string motif. In this paper, we present a significant improvement on Karci's original algorithm. The algorithm that we propose reports all identical string motifs of sizes [Formula: see text] that occur at least [Formula: see text] times. Our algorithm starts with string motifs of size 2, and at each iteration it expands the candidate string motifs by one symbol throwing out those that occur less than [Formula: see text] times in the entire input sequence. We use a simple array and data encoding to achieve theoretical worst-case time complexity of [Formula: see text] and a space complexity of [Formula: see text] Encoding of the substrings can speed up the process of comparison between string motifs. Experimental results on random and real biological sequences confirm that our algorithm has indeed a linear time complexity and it is more scalable in terms of sequence length than the existing algorithms.

  12. The limits of de novo DNA motif discovery.

    Directory of Open Access Journals (Sweden)

    David Simcha

    Full Text Available A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify "motifs" that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery-searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA "background" sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are "too null," resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where "ground truth" is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced "over-fitting" in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of

  13. The IQ Motif is Crucial for Ca v 1.1 Function

    Directory of Open Access Journals (Sweden)

    Katarina Stroffekova

    2011-01-01

    Full Text Available Ca2+-dependent modulation via calmodulin, with consensus CaM-binding IQ motif playing a key role, has been documented for most high-voltage-activated Ca2+ channels. The skeletal muscle Cav1.1 also exhibits Ca2+-/CaM-dependent modulation. Here, whole-cell Ca2+ current, Ca2+ transient, and maximal, immobilization-resistant charge movement (Qmax recordings were obtained from cultured mouse myotubes, to test a role of IQ motif in function of Cav1.1. The effect of introducing mutation (IQ to AA of IQ motif into Cav1.1 was examined. In dysgenic myotubes expressing YFP-Cav1.1AA, neither Ca2+ currents nor evoked Ca2+ transients were detectable. The loss of Ca2+ current and excitation-contraction coupling did not appear to be a consequence of defective trafficking to the sarcolemma. The Qmax in dysgenic myotubes expressing YFP-Cav1.1AA was similar to that of normal myotubes. These findings suggest that the IQ motif of the Cav1.1 may be an unrecognized site of structural and functional coupling between DHPR and RyR.

  14. The IQ motif is crucial for Cav1.1 function.

    Science.gov (United States)

    Stroffekova, Katarina

    2011-01-01

    Ca(2+)-dependent modulation via calmodulin, with consensus CaM-binding IQ motif playing a key role, has been documented for most high-voltage-activated Ca(2+) channels. The skeletal muscle Ca(v)1.1 also exhibits Ca(2+)-/CaM-dependent modulation. Here, whole-cell Ca(2+) current, Ca(2+) transient, and maximal, immobilization-resistant charge movement (Q(max)) recordings were obtained from cultured mouse myotubes, to test a role of IQ motif in function of Ca(v)1.1. The effect of introducing mutation (IQ to AA) of IQ motif into Ca(v)1.1 was examined. In dysgenic myotubes expressing YFP-Ca(v)1.1(AA), neither Ca(2+) currents nor evoked Ca(2+) transients were detectable. The loss of Ca(2+) current and excitation-contraction coupling did not appear to be a consequence of defective trafficking to the sarcolemma. The Q(max) in dysgenic myotubes expressing YFP-Ca(v)1.1(AA) was similar to that of normal myotubes. These findings suggest that the IQ motif of the Ca(v)1.1 may be an unrecognized site of structural and functional coupling between DHPR and RyR.

  15. Biclustering sparse binary genomic data.

    Science.gov (United States)

    van Uitert, Miranda; Meuleman, Wouter; Wessels, Lodewyk

    2008-12-01

    Genomic datasets often consist of large, binary, sparse data matrices. In such a dataset, one is often interested in finding contiguous blocks that (mostly) contain ones. This is a biclustering problem, and while many algorithms have been proposed to deal with gene expression data, only two algorithms have been proposed that specifically deal with binary matrices. None of the gene expression biclustering algorithms can handle the large number of zeros in sparse binary matrices. The two proposed binary algorithms failed to produce meaningful results. In this article, we present a new algorithm that is able to extract biclusters from sparse, binary datasets. A powerful feature is that biclusters with different numbers of rows and columns can be detected, varying from many rows to few columns and few rows to many columns. It allows the user to guide the search towards biclusters of specific dimensions. When applying our algorithm to an input matrix derived from TRANSFAC, we find transcription factors with distinctly dissimilar binding motifs, but a clear set of common targets that are significantly enriched for GO categories.

  16. Nuclear importation of Mariner transposases among eukaryotes: motif requirements and homo-protein interactions.

    Directory of Open Access Journals (Sweden)

    Marie-Véronique Demattei

    Full Text Available Mariner-like elements (MLEs are widespread transposable elements in animal genomes. They have been divided into at least five sub-families with differing host ranges. We investigated whether the ability of transposases encoded by Mos1, Himar1 and Mcmar1 to be actively imported into nuclei varies between host belonging to different eukaryotic taxa. Our findings demonstrate that nuclear importation could restrict the host range of some MLEs in certain eukaryotic lineages, depending on their expression level. We then focused on the nuclear localization signal (NLS in these proteins, and showed that the first 175 N-terminal residues in the three transposases were required for nuclear importation. We found that two components are involved in the nuclear importation of the Mos1 transposase: an SV40 NLS-like motif (position: aa 168 to 174, and a dimerization sub-domain located within the first 80 residues. Sequence analyses revealed that the dimerization moiety is conserved among MLE transposases, but the Himar1 and Mcmar1 transposases do not contain any conserved NLS motif. This suggests that other NLS-like motifs must intervene in these proteins. Finally, we showed that the over-expression of the Mos1 transposase prevents its nuclear importation in HeLa cells, due to the assembly of transposase aggregates in the cytoplasm.

  17. Complete motif analysis of sequence requirements for translation initiation at non-AUG start codons.

    Science.gov (United States)

    Diaz de Arce, Alexander J; Noderer, William L; Wang, Clifford L

    2018-01-25

    The initiation of mRNA translation from start codons other than AUG was previously believed to be rare and of relatively low impact. More recently, evidence has suggested that as much as half of all translation initiation utilizes non-AUG start codons, codons that deviate from AUG by a single base. Furthermore, non-AUG start codons have been shown to be involved in regulation of expression and disease etiology. Yet the ability to gauge expression based on the sequence of a translation initiation site (start codon and its flanking bases) has been limited. Here we have performed a comprehensive analysis of translation initiation sites that utilize non-AUG start codons. By combining genetic-reporter, cell-sorting, and high-throughput sequencing technologies, we have analyzed the expression associated with all possible variants of the -4 to +4 positions of non-AUG translation initiation site motifs. This complete motif analysis revealed that 1) with the right sequence context, certain non-AUG start codons can generate expression comparable to that of AUG start codons, 2) sequence context affects each non-AUG start codon differently, and 3) initiation at non-AUG start codons is highly sensitive to changes in the flanking sequences. Complete motif analysis has the potential to be a key tool for experimental and diagnostic genomics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Full-Genome Sequence of Porcine Circovirus type 3 recovered from serum of sows with stillbirths in Brazil.

    Science.gov (United States)

    Tochetto, C; Lima, D A; Varela, A P M; Loiko, M R; Paim, W P; Scheffer, C M; Herpich, J I; Cerva, C; Schmitd, C; Cibulski, S P; Santos, A C; Mayer, F Q; Roehe, P M

    2018-02-01

    Two full-genome sequences of porcine circovirus type 3 (PCV3) are reported. The genomes were recovered from pooled serum samples from sows who had just delivered litters with variable numbers of stillbirths. The two circular genomes (PCV3-BR/RS/6 and PCV3-BR/RS/8) are 2,000 nucleotides long and contain two open reading frames (ORFs) oriented in opposite directions that encode the putative capsid (Cap) and replicase (Rep) proteins. The intergenic region contains a stem-loop motif, as reported for other circoviruses. Rolling circle replication motifs and putative helicase domains were identified in the Rep coding region. The degree of overall nucleotide similarity between the genomes reported here and those available at GenBank was higher than 97%. No PCV3 sequence was detected in pooled serum samples from sows which had no stillbirths on the same farms. However, further studies are necessary to confirm the association between PCV3 and the occurrence of stillbirths. © 2017 Blackwell Verlag GmbH.

  19. Detecting Staphylococcus aureus Virulence and Resistance Genes: a Comparison of Whole-Genome Sequencing and DNA Microarray Technology

    NARCIS (Netherlands)

    Strauß, Lena; Ruffing, Ulla; Abdulla, Salim; Alabi, Abraham; Akulenko, Ruslan; Garrine, Marcelino; Germann, Anja; Grobusch, Martin Peter; Helms, Volkhard; Herrmann, Mathias; Kazimoto, Theckla; Kern, Winfried; Mandomando, Inácio; Peters, Georg; Schaumburg, Frieder; von Müller, Lutz; Mellmann, Alexander

    2016-01-01

    Staphylococcus aureusis a major bacterial pathogen causing a variety of diseases ranging from wound infections to severe bacteremia or intoxications. Besides host factors, the course and severity of disease is also widely dependent on the genotype of the bacterium. Whole-genome sequencing (WGS),

  20. The MHC motif viewer: a visualization tool for MHC binding motifs

    DEFF Research Database (Denmark)

    Rapin, Nicolas; Hoof, Ilka; Lund, Ole

    2010-01-01

    In vertebrates, the onset of cellular immune reactions is controlled by presentation of peptides in complex with major histocompatibility complex (MHC) molecules to T cell receptors. In humans, MHCs are called human leukocyte antigens (HLAs). Different MHC molecules present different subsets...... of peptides, and knowledge of their binding specificities is important for understanding differences in the immune response between individuals. Algorithms predicting which peptides bind a given MHC molecule have recently been developed with high prediction accuracy. The utility of these algorithms...... is hampered by the lack of tools for browsing and comparing specificity of these molecules. We have developed a Web server, MHC Motif Viewer, which allows the display of the binding motif for MHC class I proteins for human, chimpanzee, rhesus monkey, mouse, and swine, as well as HLA-DR protein sequences...

  1. Assessing Local Structure Motifs Using Order Parameters for Motif Recognition, Interstitial Identification, and Diffusion Path Characterization

    Directory of Open Access Journals (Sweden)

    Nils E. R. Zimmermann

    2017-11-01

    Full Text Available Structure–property relationships form the basis of many design rules in materials science, including synthesizability and long-term stability of catalysts, control of electrical and optoelectronic behavior in semiconductors, as well as the capacity of and transport properties in cathode materials for rechargeable batteries. The immediate atomic environments (i.e., the first coordination shells of a few atomic sites are often a key factor in achieving a desired property. Some of the most frequently encountered coordination patterns are tetrahedra, octahedra, body and face-centered cubic as well as hexagonal close packed-like environments. Here, we showcase the usefulness of local order parameters to identify these basic structural motifs in inorganic solid materials by developing classification criteria. We introduce a systematic testing framework, the Einstein crystal test rig, that probes the response of order parameters to distortions in perfect motifs to validate our approach. Subsequently, we highlight three important application cases. First, we map basic crystal structure information of a large materials database in an intuitive manner by screening the Materials Project (MP database (61,422 compounds for element-specific motif distributions. Second, we use the structure-motif recognition capabilities to automatically find interstitials in metals, semiconductor, and insulator materials. Our Interstitialcy Finding Tool (InFiT facilitates high-throughput screenings of defect properties. Third, the order parameters are reliable and compact quantitative structure descriptors for characterizing diffusion hops of intercalants as our example of magnesium in MnO2-spinel indicates. Finally, the tools developed in our work are readily and freely available as software implementations in the pymatgen library, and we expect them to be further applied to machine-learning approaches for emerging applications in materials science.

  2. Genome-wide detection of CNVs in Chinese indigenous sheep with different types of tails using ovine high-density 600K SNP arrays

    OpenAIRE

    Zhu, Caiye; Fan, Hongying; Yuan, Zehu; Hu, Shijin; Ma, Xiaomeng; Xuan, Junli; Wang, Hongwei; Zhang, Li; Wei, Caihong; Zhang, Qin; Zhao, Fuping; Du, Lixin

    2016-01-01

    Chinese indigenous sheep can be classified into three types based on tail morphology: fat-tailed, fat-rumped, and thin-tailed sheep, of which the typical breeds are large-tailed Han sheep, Altay sheep, and Tibetan sheep, respectively. To unravel the genetic mechanisms underlying the phenotypic differences among Chinese indigenous sheep with tails of three different types, we used ovine high-density 600K SNP arrays to detect genome-wide copy number variation (CNV). In large-tailed Han sheep, A...

  3. Detection and precise mapping of germline rearrangements in BRCA1, BRCA2, MSH2, and MLH1 using zoom-in array comparative genomic hybridization (aCGH)

    DEFF Research Database (Denmark)

    Staaf, Johan; Törngren, Therese; Rambech, Eva

    2008-01-01

    hybridization (CGH) platform of 60mer oligonucleotides. The 4 x 44 K array format provides high-resolution coverage (200-300 bp) of 400-700 kb genomic regions surrounding six cancer susceptibility genes. We evaluate its performance to accurately detect and precisely map earlier described or novel large germline...... of primers for sequence determination of the breakpoints. The array platform can be streamlined for a particular application, e.g., focusing on breast cancer susceptibility genes, with increased capacity using multiformat design, and represents a valuable new tool and complement for genetic screening...

  4. Functional diversity of CTCFs is encoded in their binding motifs.

    Science.gov (United States)

    Fang, Rongxin; Wang, Chengqi; Skogerbo, Geir; Zhang, Zhihua

    2015-08-28

    The CCCTC-binding factor (CTCF) has diverse regulatory functions. However, the definitive characteristics of the CTCF binding motif required for its functional diversity still remains elusive. Here, we describe a new motif discovery workflow by which we have identified three CTCF binding motif variations with highly divergent functionalities. Supported by transcriptomic, epigenomic and chromatin-interactomic data, we show that the functional diversity of the CTCF binding motifs is strongly associated with their GC content, CpG dinucleotide coverage and relative DNA methylation level at the 12th position of the motifs. Further analysis suggested that the co-localization of cohesin, the key factor in cohesion of sister chromatids, is negatively correlated with the CpG coverage and the relative DNA methylation level at the 12th position. Finally, we present evidences for a hypothetical model in which chromatin interactions between promoters and distal regulatory regions are likely mediated by CTCFs binding to sequences with high CpG. These results demonstrate the existence of definitive CTCF binding motifs corresponding to CTCF's diverse functions, and that the functional diversity of the motifs is strongly associated with genetic and epigenetic features at the 12th position of the motifs.

  5. An Examination of the Festival Motif in Femi Osofisan's Morountodun ...

    African Journals Online (AJOL)

    It is in this context that we closely look at how Femi Osofisan assertively leans on the aesthetic apparatus of the African traditional theatre to create Morountodun. In Morountodun, the rich elements of the traditional theatre are used as motif(s) to create a vintage and delightful play, which is very aesthetic and scintillating, yet ...

  6. Probing structural changes of self assembled i-motif DNA

    KAUST Repository

    Lee, Iljoon

    2015-01-01

    We report an i-motif structural probing system based on Thioflavin T (ThT) as a fluorescent sensor. This probe can discriminate the structural changes of RET and Rb i-motif sequences according to pH change. This journal is

  7. Perceptions of Seshoeshoe fabric, naming and meanings of motifs ...

    African Journals Online (AJOL)

    It was further found that the choice of the fabric has increased in the market due to the wide variety of motifs and colours although the quality of fabric has not improved. There are still problems encountered by dressmakers when handling the fabric. Most participants in the study had a good knowledge of the names of motifs.

  8. Motif Participation by Genes in E. coli Transcriptional Networks

    Directory of Open Access Journals (Sweden)

    Michael eMayo

    2012-09-01

    Full Text Available Motifs are patterns of recurring connections among the genes of genetic networks that occur more frequently than would be expected from randomized networks with the same degree sequence. Although the abundance of certain three-node motifs, such as the feed-forward loop, is positively correlated with a networks’ ability to tolerate moderate disruptions to gene expression, little is known regarding the connectivity of individual genes participating in multiple motifs. Using the transcriptional network of the bacterium Escherichia coli, we investigate this feature by reconstructing the distribution of genes participating in feed-forward loop motifs from its largest connected network component. We contrast these motif participation distributions with those obtained from model networks built using the preferential attachment mechanism employed by many biological and man-made networks. We report that, although some of these model networks support a motif participation distribution that appears qualitatively similar to that obtained from the bacterium Escherichia coli, the probability for a node to support a feed-forward loop motif may instead be strongly influenced by only a few master transcriptional regulators within the network. From these analyses we conclude that such master regulators may be a crucial ingredient to describe coupling among feed-forward loop motifs in transcriptional regulatory networks.

  9. Ancient Writers’ Motifs in Spanish Golden Age Drama

    Directory of Open Access Journals (Sweden)

    Bojana Tomc

    2016-12-01

    Full Text Available In Spanish Golden Age drama we come across all forms of the reception of ancient writers’ motifs: explicit (direct quotation of an ancient author, where the quotation may be more or less complete, or a clear allusion to it, implicit (where there is no explicit mentioning of the ancient source, however certain ancient elements are mentioned such as persons, places, historical circumstances, hidden (where there is no clear hint about a literary intervention in Antiquity or an imitation of the literary excerpt or motif, as well as direct imitation (aemulatio or adaptation (variatio. In the Renaissance and Baroque there are almost no motifs, which could not be taken over from Antiquity without a transformation or innovation. If there is a close correspondence to the ancient motif, it is generally sufficient simply to mention it or employ a side motif as an illustration of a similar situation without elaborating the motif further or weaving it more deeply into the supporting fabric of the dramatic work. The ancient authors who contribute the motifs are numerous and diverse: Vergil, the Roman elegists Propertius in Tibullus, the lyric poet Horace, the comedian Plautus, the stoic philosopher Seneca, the historian Tacitus, the novelist Apuleius, as well as Greek dramatist Aeschylus and stoic philosopher Epictetus. The genres, which are a source for the surviving ancient motifs in the Golden Age in the selected authors, include literary as well as not-literary forms: epic poetry, lyric, dramatics, philosophy and historiography.

  10. Genomics-based non-invasive prenatal testing for detection of fetal chromosomal aneuploidy in pregnant women.

    Science.gov (United States)

    Badeau, Mylène; Lindsay, Carmen; Blais, Jonatan; Nshimyumukiza, Leon; Takwoingi, Yemisi; Langlois, Sylvie; Légaré, France; Giguère, Yves; Turgeon, Alexis F; Witteman, William; Rousseau, François

    2017-11-10

    pooled analyses (246 T21 cases, 112 T18 cases, 20 T13 cases and 4282 unaffected pregnancies), the clinical sensitivity (95% CI) of TMPS was 99.2% (96.8% to 99.8%), 98.2% (93.1% to 99.6%), 100% (83.9% to 100%) and 92.4% (84.1% to 96.5%) for T21, T18, T13 and 45,X respectively. The clinical specificities were above 100% for T21, T18 and T13 and 99.8% (98.3% to 100%) for 45,X. Indirect comparisons of MPSS and TMPS for T21, T18 and 45,X showed no statistical difference in clinical sensitivity, clinical specificity or both. Due to limited data, comparative meta-analysis of MPSS and TMPS was not possible for T13.We were unable to perform meta-analyses of gNIPT for 47,XXX, 47,XXY and 47,XYY because there were very few or no studies in one or more risk groups. These results show that MPSS and TMPS perform similarly in terms of clinical sensitivity and specificity for the detection of fetal T31, T18, T13 and sex chromosome aneuploidy (SCA). However, no study compared the two approaches head-to-head in the same cohort of patients. The accuracy of gNIPT as a prenatal screening test has been mainly evaluated as a second-tier screening test to identify pregnancies at very low risk of fetal aneuploidies (T21, T18 and T13), thus avoiding invasive procedures. Genomics-based non-invasive prenatal testing methods appear to be sensitive and highly specific for detection of fetal trisomies 21, 18 and 13 in high-risk populations. There is paucity of data on the accuracy of gNIPT as a first-tier aneuploidy screening test in a population of unselected pregnant women. With respect to the replacement of invasive tests, the performance of gNIPT observed in this review is not sufficient to replace current invasive diagnostic tests.We conclude that given the current data on the performance of gNIPT, invasive fetal karyotyping is still the required diagnostic approach to confirm the presence of a chromosomal abnormality prior to making irreversible decisions relative to the pregnancy outcome

  11. A dictionary based informational genome analysis

    Directory of Open Access Journals (Sweden)

    Castellini Alberto

    2012-09-01

    Full Text Available Abstract Background In the post-genomic era several methods of computational genomics are emerging to understand how the whole information is structured within genomes. Literature of last five years accounts for several alignment-free methods, arisen as alternative metrics for dissimilarity of biological sequences. Among the others, recent approaches are based on empirical frequencies of DNA k-mers in whole genomes. Results Any set of words (factors occurring in a genome provides a genomic dictionary. About sixty genomes were analyzed by means of informational indexes based on genomic dictionaries, where a systemic view replaces a local sequence analysis. A software prototype applying a methodology here outlined carried out some computations on genomic data. We computed informational indexes, built the genomic dictionaries with different sizes, along with frequency distributions. The software performed three main tasks: computation of informational indexes, storage of these in a database, index analysis and visualization. The validation was done by investigating genomes of various organisms. A systematic analysis of genomic repeats of several lengths, which is of vivid interest in biology (for example to compute excessively represented functional sequences, such as promoters, was discussed, and suggested a method to define synthetic genetic networks. Conclusions We introduced a methodology based on dictionaries, and an efficient motif-finding software application for comparative genomics. This approach could be extended along many investigation lines, namely exported in other contexts of computational genomics, as a basis for discrimination of genomic pathologies.

  12. Novel Strategy for Discrimination of Transcription Factor Binding Motifs Employing Mathematical Neural Network

    Science.gov (United States)

    Sugimoto, Asuka; Sumi, Takuya; Kang, Jiyoung; Tateno, Masaru

    2017-07-01

    Recognition in biological macromolecular systems, such as DNA-protein recognition, is one of the most crucial problems to solve toward understanding the fundamental mechanisms of various biological processes. Since specific base sequences of genome DNA are discriminated by proteins, such as transcription factors (TFs), finding TF binding motifs (TFBMs) in whole genome DNA sequences is currently a central issue in interdisciplinary biophysical and information sciences. In the present study, a novel strategy to create a discriminant function for discrimination of TFBMs by constituting mathematical neural networks (NNs) is proposed, together with a method to determine the boundary of signals (TFBMs) and noise in the NN-score (output) space. This analysis also leads to the mathematical limitation of discrimination in the recognition of features representing TFBMs, in an information geometrical manifold. Thus, the present strategy enables the identification of the whole space of TFBMs, right up to the noise boundary.

  13. IQCJ-SCHIP1, a novel fusion transcript encoding a calmodulin-binding IQ motif protein

    International Nuclear Information System (INIS)

    Kwasnicka-Crawford, Dorota A.; Carson, Andrew R.; Scherer, Stephen W.

    2006-01-01

    The existence of transcripts that span two adjacent, independent genes is considered rare in the human genome. This study characterizes a novel human fusion gene named IQCJ-SCHIP1. IQCJ-SCHIP1 is the longest isoform of a complex transcriptional unit that bridges two separate genes that encode distinct proteins, IQCJ, a novel IQ motif containing protein and SCHIP1, a schwannomin interacting protein that has been previously shown to interact with the Neurofibromatosis type 2 (NF2) protein. IQCJ-SCHIP1 is located on the chromosome 3q25 and comprises a 1692-bp transcript encompassing 11 exons spanning 828 kb of the genomic DNA. We show that IQCJ-SCHIP1 mRNA is highly expressed in the brain. Protein encoded by the IQCJ-SCHIP1 gene was localized to cytoplasm and actin-rich regions and in differentiated PC12 cells was also seen in neurite extensions

  14. Detection and genome analysis of a lineage III peste des petits ruminants virus in Kenya in 2011

    International Nuclear Information System (INIS)

    Dundon, W.G.; Kihu, S.M.; Gitao, G.C.; Bebora, L.C.; John, N.M.; Ogugi, J.O.; Loitsch, A.; Diallo, A.

    2016-01-01

    Full text: In May 2011 in Turkana County, north-western Kenya, tissue samples were collected from goats suspected of having died of peste des petits ruminant (PPR) disease, an acute viral disease of small ruminants. The samples were processed and tested by reverse transcriptase PCR for the presence of PPR viral RNA. The positive samples were sequenced and identified as belonging to peste des petits ruminants virus (PPRV) lineage III. Full-genome analysis of one of the positive samples revealed that the virus causing disease in Kenya in 2011 was 95.7% identical to the full genome of a virus isolated in Uganda in 2012 and that a segment of the viral fusion gene was 100% identical to that of a virus circulating in Tanzania in 2013. These data strongly indicate transboundary movement of lineage III viruses between Eastern Africa countries and have significant implications for surveillance and control of this important disease as it moves southwards in Africa. (author)

  15. Detection of Multiple Parallel Transmission Outbreak of Streptococcus suis Human Infection by Use of Genome Epidemiology, China, 2005.

    Science.gov (United States)

    Du, Pengcheng; Zheng, Han; Zhou, Jieping; Lan, Ruiting; Ye, Changyun; Jing, Huaiqi; Jin, Dong; Cui, Zhigang; Bai, Xuemei; Liang, Jianming; Liu, Jiantao; Xu, Lei; Zhang, Wen; Chen, Chen; Xu, Jianguo

    2017-02-01

    Streptococcus suis sequence type 7 emerged and caused 2 of the largest human infection outbreaks in China in 1998 and 2005. To determine the major risk factors and source of the infections, we analyzed whole genomes of 95 outbreak-associated isolates, identified 160 single nucleotide polymorphisms, and classified them into 6 clades. Molecular clock analysis revealed that clade 1 (responsible for the 1998 outbreak) emerged in October 1997. Clades 2-6 (responsible for the 2005 outbreak) emerged separately during February 2002-August 2004. A total of 41 lineages of S. suis emerged by the end of 2004 and rapidly expanded to 68 genome types through single base mutations when the outbreak occurred in June 2005. We identified 32 identical isolates and classified them into 8 groups, which were distributed in a large geographic area with no transmission link. These findings suggest that persons were infected in parallel in respective geographic sites.

  16. Third International E. coli genome meeting

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1994-12-31

    Proceedings of the Third E. Coli Genome Meeting are provided. Presentations were divided into sessions entitled (1) Large Scale Sequencing, Sequence Analysis; (2) Databases; (3) Sequence Analysis; (4) Sequence Divergence in E. coli Strains; (5) Repeated Sequences and Regulatory Motifs; (6) Mutations, Rearrangements and Stress Responses; and (7) Origins of New Genes. The document provides a collection of abstracts of oral and poster presentations.

  17. Whole-genome sequencing of eight goat populations for the detection of selection signatures underlying production and adaptive traits

    Science.gov (United States)

    Wang, Xiaolong; Liu, Jing; Zhou, Guangxian; Guo, Jiazhong; Yan, Hailong; Niu, Yiyuan; Li, Yan; Yuan, Chao; Geng, Rongqing; Lan, Xianyong; An, Xiaopeng; Tian, Xingui; Zhou, Huangkai; Song, Jiuzhou; Jiang, Yu; Chen, Yulin

    2016-01-01

    The goat (Capra hircus) is one of the first farm animals that have undergone domestication and extensive natural and artificial selection by adapting to various environments, which in turn has resulted in its high level of phenotypic diversity. Here, we generated medium-coverage (9–13×) sequences from eight domesticated goat breeds, representing morphologically or geographically specific populations, to identify genomic regions representing selection signatures. We discovered ~10 million single nucleotide polymorphisms (SNPs) for each breed. By combining two approaches, ZHp and di values, we identified 22 genomic regions that may have contributed to the phenotypes in coat color patterns, body size, cashmere traits, as well as high altitude adaptation in goat populations. Candidate genes underlying strong selection signatures including coloration (ASIP, KITLG, HTT, GNA11, and OSTM1), body size (TBX15, DGCR8, CDC25A, and RDH16), cashmere traits (LHX2, FGF9, and WNT2), and hypoxia adaptation (CDK2, SOCS2, NOXA1, and ENPEP) were identified. We also identified candidate functional SNPs within selected genes that may be important for each trait. Our results demonstrated the potential of using sequence data in identifying genomic regions that are responsible for agriculturally significant phenotypes in goats, which in turn can be used in the selection of goat breeds for environmental adaptation and domestication. PMID:27941843

  18. Crystal structure of bacterial cell-surface alginate-binding protein with an M75 peptidase motif

    International Nuclear Information System (INIS)

    Maruyama, Yukie; Ochiai, Akihito; Mikami, Bunzo; Hashimoto, Wataru; Murata, Kousaku

    2011-01-01

    Research highlights: → Bacterial alginate-binding Algp7 is similar to component EfeO of Fe 2+ transporter. → We determined the crystal structure of Algp7 with a metal-binding motif. → Algp7 consists of two helical bundles formed through duplication of a single bundle. → A deep cleft involved in alginate binding locates around the metal-binding site. → Algp7 may function as a Fe 2+ -chelated alginate-binding protein. -- Abstract: A gram-negative Sphingomonas sp. A1 directly incorporates alginate polysaccharide into the cytoplasm via the cell-surface pit and ABC transporter. A cell-surface alginate-binding protein, Algp7, functions as a concentrator of the polysaccharide in the pit. Based on the primary structure and genetic organization in the bacterial genome, Algp7 was found to be homologous to an M75 peptidase motif-containing EfeO, a component of a ferrous ion transporter. Despite the presence of an M75 peptidase motif with high similarity, the Algp7 protein purified from recombinant Escherichia coli cells was inert on insulin B chain and N-benzoyl-Phe-Val-Arg-p-nitroanilide, both of which are substrates for a typical M75 peptidase, imelysin, from Pseudomonas aeruginosa. The X-ray crystallographic structure of Algp7 was determined at 2.10 A resolution by single-wavelength anomalous diffraction. Although a metal-binding motif, HxxE, conserved in zinc ion-dependent M75 peptidases is also found in Algp7, the crystal structure of Algp7 contains no metal even at the motif. The protein consists of two structurally similar up-and-down helical bundles as the basic scaffold. A deep cleft between the bundles is sufficiently large to accommodate macromolecules such as alginate polysaccharide. This is the first structural report on a bacterial cell-surface alginate-binding protein with an M75 peptidase motif.

  19. Crystal structure of bacterial cell-surface alginate-binding protein with an M75 peptidase motif

    Energy Technology Data Exchange (ETDEWEB)

    Maruyama, Yukie; Ochiai, Akihito [Laboratory of Basic and Applied Molecular Biotechnology, Graduate School of Agriculture, Kyoto University, Uji, Kyoto 611-0011 (Japan); Mikami, Bunzo [Laboratory of Applied Structural Biology, Graduate School of Agriculture, Kyoto University, Uji, Kyoto 611-0011 (Japan); Hashimoto, Wataru [Laboratory of Basic and Applied Molecular Biotechnology, Graduate School of Agriculture, Kyoto University, Uji, Kyoto 611-0011 (Japan); Murata, Kousaku, E-mail: kmurata@kais.kyoto-u.ac.jp [Laboratory of Basic and Applied Molecular Biotechnology, Graduate School of Agriculture, Kyoto University, Uji, Kyoto 611-0011 (Japan)

    2011-02-18

    Research highlights: {yields} Bacterial alginate-binding Algp7 is similar to component EfeO of Fe{sup 2+} transporter. {yields} We determined the crystal structure of Algp7 with a metal-binding motif. {yields} Algp7 consists of two helical bundles formed through duplication of a single bundle. {yields} A deep cleft involved in alginate binding locates around the metal-binding site. {yields} Algp7 may function as a Fe{sup 2+}-chelated alginate-binding protein. -- Abstract: A gram-negative Sphingomonas sp. A1 directly incorporates alginate polysaccharide into the cytoplasm via the cell-surface pit and ABC transporter. A cell-surface alginate-binding protein, Algp7, functions as a concentrator of the polysaccharide in the pit. Based on the primary structure and genetic organization in the bacterial genome, Algp7 was found to be homologous to an M75 peptidase motif-containing EfeO, a component of a ferrous ion transporter. Despite the presence of an M75 peptidase motif with high similarity, the Algp7 protein purified from recombinant Escherichia coli cells was inert on insulin B chain and N-benzoyl-Phe-Val-Arg-p-nitroanilide, both of which are substrates for a typical M75 peptidase, imelysin, from Pseudomonas aeruginosa. The X-ray crystallographic structure of Algp7 was determined at 2.10 A resolution by single-wavelength anomalous diffraction. Although a metal-binding motif, HxxE, conserved in zinc ion-dependent M75 peptidases is also found in Algp7, the crystal structure of Algp7 contains no metal even at the motif. The protein consists of two structurally similar up-and-down helical bundles as the basic scaffold. A deep cleft between the bundles is sufficiently large to accommodate macromolecules such as alginate polysaccharide. This is the first structural report on a bacterial cell-surface alginate-binding protein with an M75 peptidase motif.

  20. Survey and analysis of simple sequence repeats in the Laccaria bicolor genome, with development of microsatellite markers

    Energy Technology Data Exchange (ETDEWEB)

    Labbe, Jessy L [ORNL; Murat, Claude [INRA, Nancy, France; Morin, Emmanuelle [INRA, Nancy, France; Le Tacon, F [UMR, France; Martin, Francis [INRA, Nancy, France

    2011-01-01

    It is becoming clear that simple sequence repeats (SSRs) play a significant role in fungal genome organization, and they are a large source of genetic markers for population genetics and meiotic maps. We identified SSRs in the Laccaria bicolor genome by in silico survey and analyzed their distribution in the different genomic regions. We also compared the abundance and distribution of SSRs in L. bicolor with those of the following fungal genomes: Phanerochaete chrysosporium, Coprinopsis cinerea, Ustilago maydis, Cryptococcus neoformans, Aspergillus nidulans, Magnaporthe grisea, Neurospora crassa and Saccharomyces cerevisiae. Using the MISA computer program, we detected 277,062 SSRs in the L. bicolor genome representing 8% of the assembled genomic sequence. Among the analyzed basidiomycetes, L. bicolor exhibited the highest SSR density although no correlation between relative abundance and the genome sizes was observed. In most genomes the short motifs (mono- to trinucleotides) were more abundant than the longer repeated SSRs. Generally, in each organism, the occurrence, relative abundance, and relative density of SSRs decreased as the repeat unit increased. Furthermore, each organism had its own common and longest SSRs. In the L. bicolor genome, most of the SSRs were located in intergenic regions (73.3%) and the highest SSR density was observed in transposable elements (TEs; 6,706 SSRs/Mb). However, 81% of the protein-coding genes contained SSRs in their exons, suggesting that SSR polymorphism may alter gene phenotypes. Within a L. bicolor offspring, sequence polymorphism of 78 SSRs was mainly detected in non-TE intergenic regions. Unlike previously developed microsatellite markers, these new ones are spread throughout the genome; these markers could have immediate applications in population genetics.

  1. Postmortem detection of hepatitis B, C, and human immunodeficiency virus genomes in blood samples from drug-related deaths in Denmark*

    DEFF Research Database (Denmark)

    Eriksen, Mette Brandt; Jakobsen, Marianne Antonius; Kringsholm, Birgitte

    2009-01-01

    Blood-borne viral infections are widespread among injecting drug users; however, it is difficult to include these patients in serological surveys. Therefore, we developed a national surveillance program based on postmortem testing of persons whose deaths were drug related. Blood collected...... at autopsy was tested for anti-HBc, anti-HBs, anti-hepatitis C virus (HCV), or anti-human immunodeficiency virus (HIV) antibodies using commercial kits. Subsets of seropositive samples were screened for viral genomes using sensitive in-house and commercial polymerase chain reaction (PCR) assays. Hepatitis B...... virus (HBV) DNA was detected in 20% (3/15) of anti-HBc-positive/anti-HBs-negative samples, HCV RNA was found in 64% (16/25) of anti-HCV-positive samples, and HIV RNA was detected in 40% (6/15) of anti-HIV-positive samples. The postmortem and antemortem prevalences of HBV DNA and HCV RNA were similar...

  2. Inspecting Targeted Deep Sequencing of Whole Genome Amplified DNA Versus Fresh DNA for Somatic Mutation Detection: A Genetic Study in Myelodysplastic Syndrome Patients.

    Science.gov (United States)

    Palomo, Laura; Fuster-Tormo, Francisco; Alvira, Daniel; Ademà, Vera; Armengol, María Pilar; Gómez-Marzo, Paula; de Haro, Nuri; Mallo, Mar; Xicoy, Blanca; Zamora, Lurdes; Solé, Francesc

    2017-08-01

    Whole genome amplification (WGA) has become an invaluable method for preserving limited samples of precious stock material and has been used during the past years as an alternative tool to increase the amount of DNA before library preparation for next-generation sequencing. Myelodysplastic syndromes (MDS) are a group of clonal hematopoietic stem cell disorders characterized by presenting somatic mutations in several myeloid-related genes. In this work, targeted deep sequencing has been performed on four paired fresh DNA and WGA DNA samples from bone marrow of MDS patients, to assess the feasibility of using WGA DNA for detecting somatic mutations. The results of this study highlighted that, in general, the sequencing and alignment statistics of fresh DNA and WGA DNA samples were similar. However, after variant calling and when considering variants detected at all frequencies, there was a high level of discordance between fresh DNA and WGA DNA (overall, a higher number of variants was detected in WGA DNA). After proper filtering, a total of three somatic mutations were detected in the cohort. All somatic mutations detected in fresh DNA were also identified in WGA DNA and validated by whole exome sequencing.

  3. Powdery mildew fungal effector candidates share N-terminal Y/F/WxC-motif

    Directory of Open Access Journals (Sweden)

    Emmersen Jeppe

    2010-05-01

    Full Text Available Abstract Background Powdery mildew and rust fungi are widespread, serious pathogens that depend on developing haustoria in the living plant cells. Haustoria are separated from the host cytoplasm by a plant cell-derived extrahaustorial membrane. They secrete effector proteins, some of which are subsequently transferred across this membrane to the plant cell to suppress defense. Results In a cDNA library from barley epidermis containing powdery mildew haustoria, two-thirds of the sequenced ESTs were fungal and represented ~3,000 genes. Many of the most highly expressed genes encoded small proteins with N-terminal signal peptides. While these proteins are novel and poorly related, they do share a three-amino acid motif, which we named "Y/F/WxC", in the N-terminal of the mature proteins. The first amino acid of this motif is aromatic: tyrosine, phenylalanine or tryptophan, and the last is always cysteine. In total, we identified 107 such proteins, for which the ESTs represent 19% of the fungal clones in our library, suggesting fundamental roles in haustoria function. While overall sequence similarity between the powdery mildew Y/F/WxC-proteins is low, they do have a highly similar exon-intron structure, suggesting they have a common origin. Interestingly, searches of public fungal genome and EST databases revealed that haustoria-producing rust fungi also encode large numbers of novel, short proteins with signal peptides and the Y/F/WxC-motif. No significant numbers of such proteins were identified from genome and EST sequences from either fungi which do not produce haustoria or from haustoria-producing Oomycetes. Conclusion In total, we identified 107, 178 and 57 such Y/F/WxC-proteins from the barley powdery mildew, the wheat stem rust and the wheat leaf rust fungi, respectively. All together, our findings suggest the Y/F/WxC-proteins to be a new class of effectors from haustoria-producing pathogenic fungi.

  4. Development of simple sequence repeat (SSR) markers of sesame (Sesamum indicum) from a genome survey.

    Science.gov (United States)

    Wei, Xin; Wang, Linhai; Zhang, Yanxin; Qi, Xiaoqiong; Wang, Xiaoling; Ding, Xia; Zhang, Jing; Zhang, Xiurong

    2014-04-22

    Sesame (Sesamum indicum), an important oil crop, is widely grown in tropical and subtropical regions. It provides part of the daily edible oil allowance for almost half of the world's population. A limited number of co-dominant markers has been developed and applied in sesame genetic diversity and germplasm identity studies. Here we report for the first time a whole genome survey used to develop simple sequence repeat (SSR) markers and to detect the genetic diversity of sesame germplasm. From the initial assembled sesame genome, 23,438 SSRs (≥5 repeats) were identified. The most common repeat motif was dinucleotide with a frequency of 84.24%, followed by 13.53% trinucleotide, 1.65% tetranucleotide, 0.3% pentanucleotide and 0.28% hexanucleotide motifs. From 1500 designed and synthesised primer pairs, 218 polymorphic SSRs were developed and used to screen 31 sesame accessions that from 12 countries. STRUCTURE and phylogenetic analyses indicated that all sesame accessions could be divided into two groups: one mainly from China and another from other countries. Cluster analysis classified Chinese major sesame varieties into three groups. These novel SSR markers are a useful tool for genetic linkage map construction, genetic diversity detection, and marker-assisted selective sesame breeding.

  5. Flow Cytometric DNA index, G-band Karyotyping, and Comparative Genomic Hybridization in Detection of High Hyperdiploidy in Childhood Acute Lymphoblastic Leukemia

    DEFF Research Database (Denmark)

    Nygaard, Ulrikka; Larsen, Jacob; Kristensen, Tim D

    2006-01-01

    High hyperdiploid acute lymphoblastic leukemia in children is related to a good outcome. Because these patients may be stratified to a low-intensity treatment, we have investigated the sensitivity of flow cytometry (FCM), G-band karyotyping (GBK), and high-resolution comparative genomic hybridiza......High hyperdiploid acute lymphoblastic leukemia in children is related to a good outcome. Because these patients may be stratified to a low-intensity treatment, we have investigated the sensitivity of flow cytometry (FCM), G-band karyotyping (GBK), and high-resolution comparative genomic.......001 for all comparisons). However, in 4 of 18 patients, high hyperdiploidy was overlooked by GBK or HR-CGH, and even when FCM was applied, 2 of 18 patients with high hyperdiploidy by GBK and/or HR-CGH were classified as nonhigh hyperdiploid. If high hyperdiploid subclones were included, FCM could detect all...... high hyperdiploid patients found by either GBK or HR-CGH, but would then in addition classify 15% to 20% of the remaining patients as high hyperdiploid. Thus, both GBK and HR-CGH overlook patients with high hyperdiploidy, and FCM only detects all high hyperdiploid patients if small high hyperdiploid...

  6. ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions.

    Science.gov (United States)

    Muiño, Jose M; Kaufmann, Kerstin; van Ham, Roeland Chj; Angenent, Gerco C; Krajewski, Pawel

    2011-05-09

    In vivo detection of protein-bound genomic regions can be achieved by combining chromatin-immunoprecipitation with next-generation sequencing technology (ChIP-seq). The large amount of sequence data produced by this method needs to be analyzed in a statistically proper and computationally efficient manner. The generation of high copy numbers of DNA fragments as an artifact of the PCR step in ChIP-seq is an important source of bias of this methodology. We present here an R package for the statistical analysis of ChIP-seq experiments. Taking the average size of DNA fragments subjected to sequencing into account, the software calculates single-nucleotide read-enrichment values. After normalization, sample and control are compared using a test based on the ratio test or the Poisson distribution. Test statistic thresholds to control the false discovery rate are obtained through random permutations. Computational efficiency is achieved by implementing the most time-consuming functions in C++ and integrating these in the R package. An analysis of simulated and experimental ChIP-seq data is presented to demonstrate the robustness of our method against PCR-artefacts and its adequate control of the error rate. The software ChIP-seq Analysis in R (CSAR) enables fast and accurate detection of protein-bound genomic regions through the analysis of ChIP-seq experiments. Compared to existing methods, we found that our package shows greater robustness against PCR-artefacts and better control of the error rate.

  7. Genome-wide analysis of tandem repeats in plants and green algae

    Science.gov (United States)

    Zhixin Zhao; Cheng Guo; Sreeskandarajan Sutharzan; Pei Li; Craig Echt; Jie Zhang; Chun Liang

    2014-01-01

    Tandem repeats (TRs) extensively exist in the genomes of prokaryotes and eukaryotes. Based on the sequenced genomes and gene annotations of 31 plant and algal species in Phytozome version 8.0 (http://www.phytozome.net/), we examined TRs in a genome-wide scale, characterized their distributions and motif features, and explored their putative biological functions. Among...

  8. The Contribution of Short Repeats of Low Sequence Complexity to Large Conifer Genomes

    Science.gov (United States)

    A. Schmidt; R.L. Doudrick; J.S. Heslop-Harrison; T. Schmidt

    2000-01-01

    Abstract: The abundance and genomic organization of six simple sequence repeats, consisting of di-, tri-, and tetranucleotide sequence motifs, and a minisatellite repeat have been analyzed in different gymnosperms by Southern hybridization. Within the gymnosperm genomes investigated, the abundance and genomic organization of micro- and...

  9. Single promoters as regulatory network motifs

    Science.gov (United States)

    Zopf, Christopher; Maheshri, Narendra

    2012-02-01

    At eukaryotic promoters, chromatin can influence the relationship between a gene's expression and transcription factor (TF) activity. This additional complexity might allow single promoters to exhibit dynamical behavior commonly attributed to regulatory motifs involving multiple genes. We investigate the role of promoter chromatin architecture in the kinetics of gene activation using a previously described set of promoter variants based on the phosphate-regulated PHO5 promoter in S. cerevisiae. Accurate quantitative measurement of transcription activation kinetics is facilitated by a controllable and observable TF input to a promoter of interest leading to an observable expression output in single cells. We find the particular architecture of these promoters can result in a significant delay in activation, filtering of noisy TF signals, and a memory of previous activation -- dynamical behaviors reminiscent of a feed-forward loop but only requiring a single promoter. We suggest this is a consequence of chromatin transactions at the promoter, likely passing through a long-lived ``primed'' state between its inactive and competent states. Finally, we show our experimental setup can be generalized as a ``gene oscilloscope'' to probe the kinetics of heterologous promoter architectures.

  10. Profile-based short linear protein motif discovery

    Directory of Open Access Journals (Sweden)

    Haslam Niall J

    2012-05-01

    Full Text Available Abstract Background Short linear protein motifs are attracting increasing attention as functionally independent sites, typically 3–10 amino acids in length that are enriched in disordered regions of proteins. Multiple methods have recently been proposed to discover over-represented motifs within a set of proteins based on simple regular expressions. Here, we extend these approaches to profile-based methods, which provide a richer motif representation. Results The profile motif discovery method MEME performed relatively poorly for motifs in disordered regions of proteins. However, when we applied evolutionary weighting to account for redundancy amongst homologous proteins, and masked out poorly conserved regions of disordered proteins, the performance of MEME is equivalent to that of regular expression methods. However, the two approaches returned different subsets within both a benchmark dataset, and a more realistic discovery dataset. Conclusions Profile-based motif discovery methods complement regular expression based methods. Whilst profile-based methods are computationally more intensive, they are likely to discover motifs currently overlooked by regular expression methods.

  11. Recurrent Structural Motifs in Non-Homologous Protein Structures

    Directory of Open Access Journals (Sweden)

    Nicolas Guex

    2013-04-01

    Full Text Available We have extracted an extensive collection of recurrent structural motifs (RSMs, which consist of sequentially non-contiguous structural motifs (4–6 residues, each of which appears with very similar conformation in three or more mutually unrelated protein structures. We find that the proteins in our set are covered to a substantial extent by the recurrent non-contiguous structural motifs, especially the helix and strand regions. Computational alanine scanning calculations indicate that the average folding free energy changes upon alanine mutation for most types of non-alanine residues are higher for amino acids that are present in recurrent structural motifs than for amino acids that are not. The non-alanine amino acids that are most common in the recurrent structural motifs, i.e., phenylalanine, isoleucine, leucine, valine and tyrosine and the less abundant methionine and tryptophan, have the largest folding free energy changes. This indicates that the recurrent structural motifs, as we define them, describe recurrent structural patterns that are important for protein stability. In view of their properties, such structural motifs are potentially useful for inter-residue contact prediction and protein structure refinement.

  12. Parameterized algorithmics for finding connected motifs in biological networks.

    Science.gov (United States)

    Betzler, Nadja; van Bevern, René; Fellows, Michael R; Komusiewicz, Christian; Niedermeier, Rolf

    2011-01-01

    We study the NP-hard LIST-COLORED GRAPH MOTIF problem which, given an undirected list-colored graph G = (V, E) and a multiset M of colors, asks for maximum-cardinality sets S ⊆ V and M' ⊆ M such that G[S] is connected and contains exactly (with respect to multiplicity) the colors in M'. LIST-COLORED GRAPH MOTIF has applications in the analysis of biological networks. We study LIST-COLORED GRAPH MOTIF with respect to three different parameterizations. For the parameters motif size |M| and solution size |S|, we present fixed-parameter algorithms, whereas for the parameter |V| - |M|, we show W[1]-hardness for general instances and achieve fixed-parameter tractability for a special case of LIST-COLORED GRAPH MOTIF. We implemented the fixed-parameter algorithms for parameters |M| and |S|, developed further speed-up heuristics for these algorithms, and applied them in the context of querying protein-interaction networks, demonstrating their usefulness for realistic instances. Furthermore, we show that extending the request for motif connectedness to stronger demands, such as biconnectedness or bridge-connectedness leads to W[1]-hard problems when the parameter is the motif size |M|.

  13. Computational analyses of synergism in small molecular network motifs.

    Directory of Open Access Journals (Sweden)

    Yili Zhang

    2014-03-01

    Full Text Available Cellular functions and responses to stimuli are controlled by complex regulatory networks that comprise a large diversity of molecular components and their interactions. However, achieving an intuitive understanding of the dynamical properties and responses to stimuli of these networks is hampered by their large scale and complexity. To address this issue, analyses of regulatory networks often focus on reduced models that depict distinct, reoccurring connectivity patterns referred to as motifs. Previous modeling studies have begun to characterize the dynamics of small motifs, and to describe ways in which variations in parameters affect their responses to stimuli. The present study investigates how variations in pairs of parameters affect responses in a series of ten common network motifs, identifying concurrent variations that act synergistically (or antagonistically to alter the responses of the motifs to stimuli. Synergism (or antagonism was quantified using degrees of nonlinear blending and additive synergism. Simulations identified concurrent variations that maximized synergism, and examined the ways in which it was affected by stimulus protocols and the architecture of a motif. Only a subset of architectures exhibited synergism following paired changes in parameters. The approach was then applied to a model describing interlocked feedback loops governing the synthesis of the CREB1 and CREB2 transcription factors. The effects of motifs on synergism for this biologically realistic model were consistent with those for the abstract models of single motifs. These results have implications for the rational design of combination drug therapies with the potential for synergistic interactions.

  14. Array-based assay detects genome-wide 5-mC and 5-hmC in the brains of humans, non-human primates, and mice

    Science.gov (United States)

    2014-01-01

    Background Methylation on the fifth position of cytosine (5-mC) is an essential epigenetic mark that is linked to both normal neurodevelopment and neurological diseases. The recent identification of another modified form of cytosine, 5-hydroxymethylcytosine (5-hmC), in both stem cells and post-mitotic neurons, raises new questions as to the role of this base in mediating epigenetic effects. Genomic studies of these marks using model systems are limited, particularly with array-based tools, because the standard method of detecting DNA methylation cannot distinguish between 5-mC and 5-hmC and most methods have been developed to only survey the human genome. Results We show that non-human data generated using the optimization of a widely used human DNA methylation array, designed only to detect 5-mC, reproducibly distinguishes tissue types within and between chimpanzee, rhesus, and mouse, with correlations near the human DNA level (R2 > 0.99). Genome-wide methylation analysis, using this approach, reveals 6,102 differentially methylated loci between rhesus placental and fetal tissues with pathways analysis significantly overrepresented for developmental processes. Restricting the analysis to oncogenes and tumor suppressor genes finds 76 differentially methylated loci, suggesting that rhesus placental tissue carries a cancer epigenetic signature. Similarly, adapting the assay to detect 5-hmC finds highly reproducible 5-hmC levels within human, rhesus, and mouse brain tissue that is species-specific with a hierarchical abundance among the three species (human > rhesus >> mouse). Annotation of 5-hmC with respect to gene structure reveals a significant prevalence in the 3'UTR and an association with chromatin-related ontological terms, suggesting an epigenetic feedback loop mechanism for 5-hmC. Conclusions Together, these data show that this array-based methylation assay is generalizable to all mammals for the detection of both 5-mC and 5-hmC, greatly improving the

  15. Array-based assay detects genome-wide 5-mC and 5-hmC in the brains of humans, non-human primates, and mice.

    Science.gov (United States)

    Chopra, Pankaj; Papale, Ligia A; White, Andrew T J; Hatch, Andrea; Brown, Ryan M; Garthwaite, Mark A; Roseboom, Patrick H; Golos, Thaddeus G; Warren, Stephen T; Alisch, Reid S

    2014-02-13

    Methylation on the fifth position of cytosine (5-mC) is an essential epigenetic mark that is linked to both normal neurodevelopment and neurological diseases. The recent identification of another modified form of cytosine, 5-hydroxymethylcytosine (5-hmC), in both stem cells and post-mitotic neurons, raises new questions as to the role of this base in mediating epigenetic effects. Genomic studies of these marks using model systems are limited, particularly with array-based tools, because the standard method of detecting DNA methylation cannot distinguish between 5-mC and 5-hmC and most methods have been developed to only survey the human genome. We show that non-human data generated using the optimization of a widely used human DNA methylation array, designed only to detect 5-mC, reproducibly distinguishes tissue types within and between chimpanzee, rhesus, and mouse, with correlations near the human DNA level (R(2) > 0.99). Genome-wide methylation analysis, using this approach, reveals 6,102 differentially methylated loci between rhesus placental and fetal tissues with pathways analysis significantly overrepresented for developmental processes. Restricting the analysis to oncogenes and tumor suppressor genes finds 76 differentially methylated loci, suggesting that rhesus placental tissue carries a cancer epigenetic signature. Similarly, adapting the assay to detect 5-hmC finds highly reproducible 5-hmC levels within human, rhesus, and mouse brain tissue that is species-specific with a hierarchical abundance among the three species (human > rhesus > mouse). Annotation of 5-hmC with respect to gene structure reveals a significant prevalence in the 3'UTR and an association with chromatin-related ontological terms, suggesting an epigenetic feedback loop mechanism for 5-hmC. Together, these data show that this array-based methylation assay is generalizable to all mammals for the detection of both 5-mC and 5-hmC, greatly improving the utility of mammalian model systems

  16. Genome-wide association study of Hirschsprung disease detects a novel low-frequency variant at the RET locus.

    Science.gov (United States)

    Fadista, João; Lund, Marie; Skotte, Line; Geller, Frank; Nandakumar, Priyanka; Chatterjee, Sumantra; Matsson, Hans; Granström, Anna Löf; Wester, Tomas; Salo, Perttu; Virtanen, Valtter; Carstensen, Lisbeth; Bybjerg-Grauholm, Jonas; Hougaard, David Michael; Pakarinen, Mikko; Perola, Markus; Nordenskjöld, Agneta; Chakravarti, Aravinda; Melbye, Mads; Feenstra, Bjarke

    2018-01-29

    Hirschsprung disease (HSCR) is a congenital disorder with a population incidence of ~1/5000 live births, defined by an absence of enteric ganglia along variable lengths of the colon. HSCR genome-wide association studies (GWAS) have found common associated variants at RET, SEMA3, and NRG1, but they still fail to explain all of its heritability. To enhance gene discovery, we performed a GWAS of 170 cases identified from the Danish nationwide pathology registry with 4717 controls, based on 6.2 million variants imputed from the haplotype reference consortium panel. We found a novel low-frequency variant (rs144432435), which, when conditioning on the lead RET single-nucleotide polymorphism (SNP), was of genome-wide significance in the discovery analysis. This conditional association signal was replicated in a Swedish HSCR cohort with discovery plus replication meta-analysis conditional odds ratio of 6.6 (P = 7.7 × 10 -10 ; 322 cases and 4893 controls). The conditional signal was, however, not replicated in two HSCR cohorts from USA and Finland, leading to the hypothesis that rs144432435 tags a rare haplotype present in Denmark and Sweden. Using the genome-wide complex trait analysis method, we estimated the SNP heritability of HSCR to be 88%, close to estimates based on classical family studies. Moreover, by using Lasso (least absolute shrinkage and selection operator) regression we were able to construct a genetic HSCR predictor with a area under the receiver operator characteristics curve of 76% in an independent validation set. In conclusion, we combined the largest collection of sporadic Hirschsprung cases to date (586 cases) to further elucidate HSCR's genetic architecture.

  17. Modeling Small Noncanonical RNA Motifs with the Rosetta FARFAR Server.

    Science.gov (United States)

    Yesselman, Joseph D; Das, Rhiju

    2016-01-01

    Noncanonical RNA motifs help define the vast complexity of RNA structure and function, and in many cases, these loops and junctions are on the order of only ten nucleotides in size. Unfortunately, despite their small size, there is no reliable method to determine the ensemble of lowest energy structures of junctions and loops at atomic accuracy. This chapter outlines straightforward protocols using a webserver for Rosetta Fragment Assembly of RNA with Full Atom Refinement (FARFAR) ( http://rosie.rosettacommons.org/rna_denovo/submit ) to model the 3D structure of small noncanonical RNA motifs for use in visualizing motifs and for further refinement or filtering with experimental data such as NMR chemical shifts.

  18. Genome wide DNA methylation analysis of Haloferax volcanii H26 and identification of DNA methyltransferase related PD-(D/EXK nuclease family protein HVO_A0006

    Directory of Open Access Journals (Sweden)

    Matthew eOuellette

    2015-04-01

    Full Text Available Restriction-modification (RM systems have evolved to protect the cell from invading DNAs and are composed of two enzymes: a DNA methyltransferase and a restriction endonuclease. Although RM systems are present in both archaeal and bacterial genomes, DNA methylation in archaea has not been well defined. In order to characterize the function of RM systems in archaeal species, we have made use of the model haloarchaeon Haloferax volcanii. A genomic DNA methylation analysis of H. volcanii strain H26 was performed using PacBio single molecule real-time (SMRT sequencing. This analysis was also performed on a strain of H. volcanii in which an annotated DNA methyltransferase gene HVO_A0006 was deleted from the genome. Sequence analysis of H26 revealed two motifs which are modified in the genome: Cm4TAG and GCAm6BN6VTGC. Analysis of the ∆HVO_A0006 strain indicated that it exhibited reduced adenine methylation compared to the parental strain and altered the detected adenine motif. However, protein domain architecture analysis and amino acid alignments revealed that HVO_A0006 is homologous only to the N-terminal endonuclease region of Type IIG RM proteins and contains a PD-(D/EXK nuclease motif, suggesting that HVO_A0006 is a PD-(D/EXK nuclease family protein. Further bioinformatic analysis of the HVO_A0006 gene demonstrating that the gene is rare among the Halobacteria. It is surrounded by two transposition genes suggesting that HVO_A0006 is a fragment of a Type IIG RM gene, which has likely been acquired through gene transfer, and affects restriction-modification activity by interacting with another RM system component(s. Here, we present the first genome-wide characterization of DNA methylation in an archaeal species and examine the function of a DNA methyltransferase related gene HVO_A0006.

  19. PROSITE: a documented database using patterns and profiles as motif descriptors.

    Science.gov (United States)

    Sigrist, Christian J A; Cerutti, Lorenzo; Hulo, Nicolas; Gattiker, Alexandre; Falquet, Laurent; Pagni, Marco; Bairoch, Amos; Bucher, Philipp

    2002-09-01

    Among the various databases dedicated to the identification of protein families and domains, PROSITE is the first one created and has continuously evolved since. PROSITE currently consists of a large collection of biologically meaningful motifs that are described as patterns or profiles, and linked to documentation briefly describing the protein family or domain they are designed to detect. The close relationship of PROSITE with the SWISS-PROT protein database allows the evaluation