WorldWideScience

Sample records for infer cis-regulatory motifs

  1. Comparative genomics of metabolic capacities of regulons controlled by cis-regulatory RNA motifs in bacteria.

    Science.gov (United States)

    Sun, Eric I; Leyn, Semen A; Kazanov, Marat D; Saier, Milton H; Novichkov, Pavel S; Rodionov, Dmitry A

    2013-09-02

    In silico comparative genomics approaches have been efficiently used for functional prediction and reconstruction of metabolic and regulatory networks. Riboswitches are metabolite-sensing structures often found in bacterial mRNA leaders controlling gene expression on transcriptional or translational levels.An increasing number of riboswitches and other cis-regulatory RNAs have been recently classified into numerous RNA families in the Rfam database. High conservation of these RNA motifs provides a unique advantage for their genomic identification and comparative analysis. A comparative genomics approach implemented in the RegPredict tool was used for reconstruction and functional annotation of regulons controlled by RNAs from 43 Rfam families in diverse taxonomic groups of Bacteria. The inferred regulons include ~5200 cis-regulatory RNAs and more than 12000 target genes in 255 microbial genomes. All predicted RNA-regulated genes were classified into specific and overall functional categories. Analysis of taxonomic distribution of these categories allowed us to establish major functional preferences for each analyzed cis-regulatory RNA motif family. Overall, most RNA motif regulons showed predictable functional content in accordance with their experimentally established effector ligands. Our results suggest that some RNA motifs (including thiamin pyrophosphate and cobalamin riboswitches that control the cofactor metabolism) are widespread and likely originated from the last common ancestor of all bacteria. However, many more analyzed RNA motifs are restricted to a narrow taxonomic group of bacteria and likely represent more recent evolutionary innovations. The reconstructed regulatory networks for major known RNA motifs substantially expand the existing knowledge of transcriptional regulation in bacteria. The inferred regulons can be used for genetic experiments, functional annotations of genes, metabolic reconstruction and evolutionary analysis. The obtained genome

  2. Using hexamers to predict cis-regulatory motifs in Drosophila

    Directory of Open Access Journals (Sweden)

    Kibler Dennis

    2005-10-01

    Full Text Available Abstract Background Cis-regulatory modules (CRMs are short stretches of DNA that help regulate gene expression in higher eukaryotes. They have been found up to 1 megabase away from the genes they regulate and can be located upstream, downstream, and even within their target genes. Due to the difficulty of finding CRMs using biological and computational techniques, even well-studied regulatory systems may contain CRMs that have not yet been discovered. Results We present a simple, efficient method (HexDiff based only on hexamer frequencies of known CRMs and non-CRM sequence to predict novel CRMs in regulatory systems. On a data set of 16 gap and pair-rule genes containing 52 known CRMs, predictions made by HexDiff had a higher correlation with the known CRMs than several existing CRM prediction algorithms: Ahab, Cluster Buster, MSCAN, MCAST, and LWF. After combining the results of the different algorithms, 10 putative CRMs were identified and are strong candidates for future study. The hexamers used by HexDiff to distinguish between CRMs and non-CRM sequence were also analyzed and were shown to be enriched in regulatory elements. Conclusion HexDiff provides an efficient and effective means for finding new CRMs based on known CRMs, rather than known binding sites.

  3. An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.

    Science.gov (United States)

    Liu, Bingqiang; Zhang, Hanyuan; Zhou, Chuan; Li, Guojun; Fennell, Anne; Wang, Guanghui; Kang, Yu; Liu, Qi; Ma, Qin

    2016-08-09

    Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance

  4. On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

    Science.gov (United States)

    Tarpine, Ryan; Istrail, Sorin

    The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.

  5. Evolution of New cis-Regulatory Motifs Required for Cell-Specific Gene Expression in Caenorhabditis.

    Directory of Open Access Journals (Sweden)

    Michalis Barkoulas

    2016-09-01

    Full Text Available Patterning of C. elegans vulval cell fates relies on inductive signaling. In this induction event, a single cell, the gonadal anchor cell, secretes LIN-3/EGF and induces three out of six competent precursor cells to acquire a vulval fate. We previously showed that this developmental system is robust to a four-fold variation in lin-3/EGF genetic dose. Here using single-molecule FISH, we find that the mean level of expression of lin-3 in the anchor cell is remarkably conserved. No change in lin-3 expression level could be detected among C. elegans wild isolates and only a low level of change-less than 30%-in the Caenorhabditis genus and in Oscheius tipulae. In C. elegans, lin-3 expression in the anchor cell is known to require three transcription factor binding sites, specifically two E-boxes and a nuclear-hormone-receptor (NHR binding site. Mutation of any of these three elements in C. elegans results in a dramatic decrease in lin-3 expression. Yet only a single E-box is found in the Drosophilae supergroup of Caenorhabditis species, including C. angaria, while the NHR-binding site likely only evolved at the base of the Elegans group. We find that a transgene from C. angaria bearing a single E-box is sufficient for normal expression in C. elegans. Even a short 58 bp cis-regulatory fragment from C. angaria with this single E-box is able to replace the three transcription factor binding sites at the endogenous C. elegans lin-3 locus, resulting in the wild-type expression level. Thus, regulatory evolution occurring in cis within a 58 bp lin-3 fragment, results in a strict requirement for the NHR binding site and a second E-box in C. elegans. This single-cell, single-molecule, quantitative and functional evo-devo study demonstrates that conserved expression levels can hide extensive change in cis-regulatory site requirements and highlights the evolution of new cis-regulatory elements required for cell-specific gene expression.

  6. LDsplit: screening for cis-regulatory motifs stimulating meiotic recombination hotspots by analysis of DNA sequence polymorphisms.

    Science.gov (United States)

    Yang, Peng; Wu, Min; Guo, Jing; Kwoh, Chee Keong; Przytycka, Teresa M; Zheng, Jie

    2014-02-17

    As a fundamental genomic element, meiotic recombination hotspot plays important roles in life sciences. Thus uncovering its regulatory mechanisms has broad impact on biomedical research. Despite the recent identification of the zinc finger protein PRDM9 and its 13-mer binding motif as major regulators for meiotic recombination hotspots, other regulators remain to be discovered. Existing methods for finding DNA sequence motifs of recombination hotspots often rely on the enrichment of co-localizations between hotspots and short DNA patterns, which ignore the cross-individual variation of recombination rates and sequence polymorphisms in the population. Our objective in this paper is to capture signals encoded in genetic variations for the discovery of recombination-associated DNA motifs. Recently, an algorithm called "LDsplit" has been designed to detect the association between single nucleotide polymorphisms (SNPs) and proximal meiotic recombination hotspots. The association is measured by the difference of population recombination rates at a hotspot between two alleles of a candidate SNP. Here we present an open source software tool of LDsplit, with integrative data visualization for recombination hotspots and their proximal SNPs. Applying LDsplit on SNPs inside an established 7-mer motif bound by PRDM9 we observed that SNP alleles preserving the original motif tend to have higher recombination rates than the opposite alleles that disrupt the motif. Running on SNP windows around hotspots each containing an occurrence of the 7-mer motif, LDsplit is able to guide the established motif finding algorithm of MEME to recover the 7-mer motif. In contrast, without LDsplit the 7-mer motif could not be identified. LDsplit is a software tool for the discovery of cis-regulatory DNA sequence motifs stimulating meiotic recombination hotspots by screening and narrowing down to hotspot associated SNPs. It is the first computational method that utilizes the genetic variation of

  7. Predicting tissue specific cis-regulatory modules in the human genome using pairs of co-occurring motifs

    Directory of Open Access Journals (Sweden)

    Girgis Hani Z

    2012-02-01

    Full Text Available Abstract Background Researchers seeking to unlock the genetic basis of human physiology and diseases have been studying gene transcription regulation. The temporal and spatial patterns of gene expression are controlled by mainly non-coding elements known as cis-regulatory modules (CRMs and epigenetic factors. CRMs modulating related genes share the regulatory signature which consists of transcription factor (TF binding sites (TFBSs. Identifying such CRMs is a challenging problem due to the prohibitive number of sequence sets that need to be analyzed. Results We formulated the challenge as a supervised classification problem even though experimentally validated CRMs were not required. Our efforts resulted in a software system named CrmMiner. The system mines for CRMs in the vicinity of related genes. CrmMiner requires two sets of sequences: a mixed set and a control set. Sequences in the vicinity of the related genes comprise the mixed set, whereas the control set includes random genomic sequences. CrmMiner assumes that a large percentage of the mixed set is made of background sequences that do not include CRMs. The system identifies pairs of closely located motifs representing vertebrate TFBSs that are enriched in the training mixed set consisting of 50% of the gene loci. In addition, CrmMiner selects a group of the enriched pairs to represent the tissue-specific regulatory signature. The mixed and the control sets are searched for candidate sequences that include any of the selected pairs. Next, an optimal Bayesian classifier is used to distinguish candidates found in the mixed set from their control counterparts. Our study proposes 62 tissue-specific regulatory signatures and putative CRMs for different human tissues and cell types. These signatures consist of assortments of ubiquitously expressed TFs and tissue-specific TFs. Under controlled settings, CrmMiner identified known CRMs in noisy sets up to 1:25 signal-to-noise ratio. CrmMiner was

  8. Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest.

    Science.gov (United States)

    Wang, Xin; Lin, Peijie; Ho, Joshua W K

    2018-01-19

    It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs - a motif grammar - located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific.

  9. Mouse transgenesis identifies conserved functional enhancers and cis-regulatory motif in the vertebrate LIM homeobox gene Lhx2 locus.

    Directory of Open Access Journals (Sweden)

    Alison P Lee

    Full Text Available The vertebrate Lhx2 is a member of the LIM homeobox family of transcription factors. It is essential for the normal development of the forebrain, eye, olfactory system and liver as well for the differentiation of lymphoid cells. However, despite the highly restricted spatio-temporal expression pattern of Lhx2, nothing is known about its transcriptional regulation. In mammals and chicken, Crb2, Dennd1a and Lhx2 constitute a conserved linkage block, while the intervening Dennd1a is lost in the fugu Lhx2 locus. To identify functional enhancers of Lhx2, we predicted conserved noncoding elements (CNEs in the human, mouse and fugu Crb2-Lhx2 loci and assayed their function in transgenic mouse at E11.5. Four of the eight CNE constructs tested functioned as tissue-specific enhancers in specific regions of the central nervous system and the dorsal root ganglia (DRG, recapitulating partial and overlapping expression patterns of Lhx2 and Crb2 genes. There was considerable overlap in the expression domains of the CNEs, which suggests that the CNEs are either redundant enhancers or regulating different genes in the locus. Using a large set of CNEs (810 CNEs associated with transcription factor-encoding genes that express predominantly in the central nervous system, we predicted four over-represented 8-mer motifs that are likely to be associated with expression in the central nervous system. Mutation of one of them in a CNE that drove reporter expression in the neural tube and DRG abolished expression in both domains indicating that this motif is essential for expression in these domains. The failure of the four functional enhancers to recapitulate the complete expression pattern of Lhx2 at E11.5 indicates that there must be other Lhx2 enhancers that are either located outside the region investigated or divergent in mammals and fishes. Other approaches such as sequence comparison between multiple mammals are required to identify and characterize such enhancers.

  10. A minimal murine Msx-1 gene promoter. Organization of its cis-regulatory motifs and their role in transcriptional activation in cells in culture and in transgenic mice.

    Science.gov (United States)

    Takahashi, T; Guron, C; Shetty, S; Matsui, H; Raghow, R

    1997-09-05

    To dissect the cis-regulatory elements of the murine Msx-1 promoter, which lacks a conventional TATA element, a putative Msx-1 promoter DNA fragment (from -1282 to +106 base pairs (bp)) or its congeners containing site-specific alterations were fused to luciferase reporter and introduced into NIH3T3 and C2C12 cells, and the expression of luciferase was assessed in transient expression assays. The functional consequences of the sequential 5' deletions of the promotor revealed that multiple positive and negative regulatory elements participate in regulating transcription of the Msx-1 gene. Surprisingly, however, the optimal expression of Msx-1 promoter in either NIH3T3 or C2C12 cells required only 165 bp of the upstream sequence to warrant detailed examination of its structure. Therefore, the functional consequences of site-specific deletions and point mutations of the cis-acting elements of the minimal Msx-1 promoter were systematically examined. Concomitantly, potential transcriptional factor(s) interacting with the cis-acting elements of the minimal promoter were also studied by gel electrophoretic mobility shift assays and DNase I footprinting. Combined analyses of the minimal promoter by DNase I footprinting, electrophoretic mobility shift assays, and super shift assays with specific antibodies revealed that 5'-flanking regions from -161 to -154 and from -26 to -13 of the Msx-1 promoter contains an authentic E box (proximal E box), capable of binding a protein immunologically related to the upstream stimulating factor 1 (USF-1) and a GC-rich sequence motif which can bind to Sp1 (proximal Sp1), respectively. Additionally, we observed that the promoter activation was seriously hampered if the proximal E box was removed or mutated, and the promoter activity was eliminated completely if the proximal Sp1 site was similarly altered. Absolute dependence of the Msx-1 minimal promoter on Sp1 could be demonstrated by transient expression assays in the Sp1-deficient

  11. Statistical significance of cis-regulatory modules

    Directory of Open Access Journals (Sweden)

    Smith Andrew D

    2007-01-01

    Full Text Available Abstract Background It is becoming increasingly important for researchers to be able to scan through large genomic regions for transcription factor binding sites or clusters of binding sites forming cis-regulatory modules. Correspondingly, there has been a push to develop algorithms for the rapid detection and assessment of cis-regulatory modules. While various algorithms for this purpose have been introduced, most are not well suited for rapid, genome scale scanning. Results We introduce methods designed for the detection and statistical evaluation of cis-regulatory modules, modeled as either clusters of individual binding sites or as combinations of sites with constrained organization. In order to determine the statistical significance of module sites, we first need a method to determine the statistical significance of single transcription factor binding site matches. We introduce a straightforward method of estimating the statistical significance of single site matches using a database of known promoters to produce data structures that can be used to estimate p-values for binding site matches. We next introduce a technique to calculate the statistical significance of the arrangement of binding sites within a module using a max-gap model. If the module scanned for has defined organizational parameters, the probability of the module is corrected to account for organizational constraints. The statistical significance of single site matches and the architecture of sites within the module can be combined to provide an overall estimation of statistical significance of cis-regulatory module sites. Conclusion The methods introduced in this paper allow for the detection and statistical evaluation of single transcription factor binding sites and cis-regulatory modules. The features described are implemented in the Search Tool for Occurrences of Regulatory Motifs (STORM and MODSTORM software.

  12. Identification, occurrence, and validation of DRE and ABRE Cis-regulatory motifs in the promoter regions of genes of Arabidopsis thaliana.

    Science.gov (United States)

    Mishra, Sonal; Shukla, Aparna; Upadhyay, Swati; Sanchita; Sharma, Pooja; Singh, Seema; Phukan, Ujjal J; Meena, Abha; Khan, Feroz; Tripathi, Vineeta; Shukla, Rakesh Kumar; Shrama, Ashok

    2014-04-01

    Plants posses a complex co-regulatory network which helps them to elicit a response under diverse adverse conditions. We used an in silico approach to identify the genes with both DRE and ABRE motifs in their promoter regions in Arabidopsis thaliana. Our results showed that Arabidopsis contains a set of 2,052 genes with ABRE and DRE motifs in their promoter regions. Approximately 72% or more of the total predicted 2,052 genes had a gap distance of less than 400 bp between DRE and ABRE motifs. For positional orientation of the DRE and ABRE motifs, we found that the DR form (one in direct and the other one in reverse orientation) was more prevalent than other forms. These predicted 2,052 genes include 155 transcription factors. Using microarray data from The Arabidopsis Information Resource (TAIR) database, we present 44 transcription factors out of 155 which are upregulated by more than twofold in response to osmotic stress and ABA treatment. Fifty-one transcripts from the one predicted above were validated using semiquantitative expression analysis to support the microarray data in TAIR. Taken together, we report a set of genes containing both DRE and ABRE motifs in their promoter regions in A. thaliana, which can be useful to understand the role of ABA under osmotic stress condition. © 2013 Institute of Botany, Chinese Academy of Sciences.

  13. Bounded search for de novo identification of degenerate cis-regulatory elements

    Directory of Open Access Journals (Sweden)

    Khetani Radhika S

    2006-05-01

    Full Text Available Abstract Background The identification of statistically overrepresented sequences in the upstream regions of coregulated genes should theoretically permit the identification of potential cis-regulatory elements. However, in practice many cis-regulatory elements are highly degenerate, precluding the use of an exhaustive word-counting strategy for their identification. While numerous methods exist for inferring base distributions using a position weight matrix, recent studies suggest that the independence assumptions inherent in the model, as well as the inability to reach a global optimum, limit this approach. Results In this paper, we report PRISM, a degenerate motif finder that leverages the relationship between the statistical significance of a set of binding sites and that of the individual binding sites. PRISM first identifies overrepresented, non-degenerate consensus motifs, then iteratively relaxes each one into a high-scoring degenerate motif. This approach requires no tunable parameters, thereby lending itself to unbiased performance comparisons. We therefore compare PRISM's performance against nine popular motif finders on 28 well-characterized S. cerevisiae regulons. PRISM consistently outperforms all other programs. Finally, we use PRISM to predict the binding sites of uncharacterized regulons. Our results support a proposed mechanism of action for the yeast cell-cycle transcription factor Stb1, whose binding site has not been determined experimentally. Conclusion The relationship between statistical measures of the binding sites and the set as a whole leads to a simple means of identifying the diverse range of cis-regulatory elements to which a protein binds. This approach leverages the advantages of word-counting, in that position dependencies are implicitly accounted for and local optima are more easily avoided. While we sacrifice guaranteed optimality to prevent the exponential blowup of exhaustive search, we prove that the error

  14. Computational and molecular dissection of an X-box cis-Regulatory module

    OpenAIRE

    Warrington, Timothy Burton

    2015-01-01

    Ciliopathies are a class of human diseases marked by dysfunction of the cellular organelle, cilia. While many of the molecular components that make up cilia have been identified and studied, comparatively little is understood about the transcriptional regulation of genes encoding these components. The conserved transcription factor Regulatory Factor X (RFX)/DAF-19, which acts through binding to the cis-regulatory motif known as X-box, has been shown to regulate ciliary genes in many animals f...

  15. Identification of a cis-regulatory element by transient analysis of co-ordinately regulated genes

    Directory of Open Access Journals (Sweden)

    Allan Andrew C

    2008-07-01

    Full Text Available Abstract Background Transcription factors (TFs co-ordinately regulate target genes that are dispersed throughout the genome. This co-ordinate regulation is achieved, in part, through the interaction of transcription factors with conserved cis-regulatory motifs that are in close proximity to the target genes. While much is known about the families of transcription factors that regulate gene expression in plants, there are few well characterised cis-regulatory motifs. In Arabidopsis, over-expression of the MYB transcription factor PAP1 (PRODUCTION OF ANTHOCYANIN PIGMENT 1 leads to transgenic plants with elevated anthocyanin levels due to the co-ordinated up-regulation of genes in the anthocyanin biosynthetic pathway. In addition to the anthocyanin biosynthetic genes, there are a number of un-associated genes that also change in expression level. This may be a direct or indirect consequence of the over-expression of PAP1. Results Oligo array analysis of PAP1 over-expression Arabidopsis plants identified genes co-ordinately up-regulated in response to the elevated expression of this transcription factor. Transient assays on the promoter regions of 33 of these up-regulated genes identified eight promoter fragments that were transactivated by PAP1. Bioinformatic analysis on these promoters revealed a common cis-regulatory motif that we showed is required for PAP1 dependent transactivation. Conclusion Co-ordinated gene regulation by individual transcription factors is a complex collection of both direct and indirect effects. Transient transactivation assays provide a rapid method to identify direct target genes from indirect target genes. Bioinformatic analysis of the promoters of these direct target genes is able to locate motifs that are common to this sub-set of promoters, which is impossible to identify with the larger set of direct and indirect target genes. While this type of analysis does not prove a direct interaction between protein and DNA

  16. Creating and validating cis-regulatory maps of tissue-specific gene expression regulation

    Science.gov (United States)

    O'Connor, Timothy R.; Bailey, Timothy L.

    2014-01-01

    Predicting which genomic regions control the transcription of a given gene is a challenge. We present a novel computational approach for creating and validating maps that associate genomic regions (cis-regulatory modules–CRMs) with genes. The method infers regulatory relationships that explain gene expression observed in a test tissue using widely available genomic data for ‘other’ tissues. To predict the regulatory targets of a CRM, we use cross-tissue correlation between histone modifications present at the CRM and expression at genes within 1 Mbp of it. To validate cis-regulatory maps, we show that they yield more accurate models of gene expression than carefully constructed control maps. These gene expression models predict observed gene expression from transcription factor binding in the CRMs linked to that gene. We show that our maps are able to identify long-range regulatory interactions and improve substantially over maps linking genes and CRMs based on either the control maps or a ‘nearest neighbor’ heuristic. Our results also show that it is essential to include CRMs predicted in multiple tissues during map-building, that H3K27ac is the most informative histone modification, and that CAGE is the most informative measure of gene expression for creating cis-regulatory maps. PMID:25200088

  17. Identification of putative cis-regulatory elements in Cryptosporidium parvum by de novo pattern finding

    Directory of Open Access Journals (Sweden)

    Kissinger Jessica C

    2007-01-01

    Full Text Available Abstract Background Cryptosporidium parvum is a unicellular eukaryote in the phylum Apicomplexa. It is an obligate intracellular parasite that causes diarrhea and is a significant AIDS-related pathogen. Cryptosporidium parvum is not amenable to long-term laboratory cultivation or classical molecular genetic analysis. The parasite exhibits a complex life cycle, a broad host range, and fundamental mechanisms of gene regulation remain unknown. We have used data from the recently sequenced genome of this organism to uncover clues about gene regulation in C. parvum. We have applied two pattern finding algorithms MEME and AlignACE to identify conserved, over-represented motifs in the 5' upstream regions of genes in C. parvum. To support our findings, we have established comparative real-time -PCR expression profiles for the groups of genes examined computationally. Results We find that groups of genes that share a function or belong to a common pathway share upstream motifs. Different motifs are conserved upstream of different groups of genes. Comparative real-time PCR studies show co-expression of genes within each group (in sub-sets during the life cycle of the parasite, suggesting co-regulation of these genes may be driven by the use of conserved upstream motifs. Conclusion This is one of the first attempts to characterize cis-regulatory elements in the absence of any previously characterized elements and with very limited expression data (seven genes only. Using de novo pattern finding algorithms, we have identified specific DNA motifs that are conserved upstream of genes belonging to the same metabolic pathway or gene family. We have demonstrated the co-expression of these genes (often in subsets using comparative real-time-PCR experiments thus establishing evidence for these conserved motifs as putative cis-regulatory elements. Given the lack of prior information concerning expression patterns and organization of promoters in C. parvum we

  18. BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements.

    Science.gov (United States)

    De Witte, Dieter; Van de Velde, Jan; Decap, Dries; Van Bel, Michiel; Audenaert, Pieter; Demeester, Piet; Dhoedt, Bart; Vandepoele, Klaas; Fostier, Jan

    2015-12-01

    The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be. Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  19. A New Algorithm for Identifying Cis-Regulatory Modules Based on Hidden Markov Model

    Directory of Open Access Journals (Sweden)

    Haitao Guo

    2017-01-01

    Full Text Available The discovery of cis-regulatory modules (CRMs is the key to understanding mechanisms of transcription regulation. Since CRMs have specific regulatory structures that are the basis for the regulation of gene expression, how to model the regulatory structure of CRMs has a considerable impact on the performance of CRM identification. The paper proposes a CRM discovery algorithm called ComSPS. ComSPS builds a regulatory structure model of CRMs based on HMM by exploring the rules of CRM transcriptional grammar that governs the internal motif site arrangement of CRMs. We test ComSPS on three benchmark datasets and compare it with five existing methods. Experimental results show that ComSPS performs better than them.

  20. A New Algorithm for Identifying Cis-Regulatory Modules Based on Hidden Markov Model

    Science.gov (United States)

    2017-01-01

    The discovery of cis-regulatory modules (CRMs) is the key to understanding mechanisms of transcription regulation. Since CRMs have specific regulatory structures that are the basis for the regulation of gene expression, how to model the regulatory structure of CRMs has a considerable impact on the performance of CRM identification. The paper proposes a CRM discovery algorithm called ComSPS. ComSPS builds a regulatory structure model of CRMs based on HMM by exploring the rules of CRM transcriptional grammar that governs the internal motif site arrangement of CRMs. We test ComSPS on three benchmark datasets and compare it with five existing methods. Experimental results show that ComSPS performs better than them. PMID:28497059

  1. Characterization of Putative cis-Regulatory Elements in Genes Preferentially Expressed in Arabidopsis Male Meiocytes

    Directory of Open Access Journals (Sweden)

    Junhua Li

    2014-01-01

    Full Text Available Meiosis is essential for plant reproduction because it is the process during which homologous chromosome pairing, synapsis, and meiotic recombination occur. The meiotic transcriptome is difficult to investigate because of the size of meiocytes and the confines of anther lobes. The recent development of isolation techniques has enabled the characterization of transcriptional profiles in male meiocytes of Arabidopsis. Gene expression in male meiocytes shows unique features. The direct interaction of transcription factors (TFs with DNA regulatory sequences forms the basis for the specificity of transcriptional regulation. Here, we identified putative cis-regulatory elements (CREs associated with male meiocyte-expressed genes using in silico tools. The upstream regions (1 kb of the top 50 genes preferentially expressed in Arabidopsis meiocytes possessed conserved motifs. These motifs are putative binding sites of TFs, some of which share common functions, such as roles in cell division. In combination with cell-type-specific analysis, our findings could be a substantial aid for the identification and experimental verification of the protein-DNA interactions for the specific TFs that drive gene expression in meiocytes.

  2. Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA.

    Science.gov (United States)

    Pierstorff, Nora; Bergman, Casey M; Wiehe, Thomas

    2006-12-01

    Predicting cis-regulatory modules (CRMs) in higher eukaryotes is a challenging computational task. Commonly used methods to predict CRMs based on the signal of transcription factor binding sites (TFBS) are limited by prior information about transcription factor specificity. More general methods that bypass the reliance on TFBS models are needed for comprehensive CRM prediction. We have developed a method to predict CRMs called CisPlusFinder that identifies high density regions of perfect local ungapped sequences (PLUSs) based on multiple species conservation. By assuming that PLUSs contain core TFBS motifs that are locally overrepresented, the method attempts to capture the expected features of CRM structure and evolution. Applied to a benchmark dataset of CRMs involved in early Drosophila development, CisPlusFinder predicts more annotated CRMs than all other methods tested. Using the REDfly database, we find that some 'false positive' predictions in the benchmark dataset correspond to recently annotated CRMs. Our work demonstrates that CRM prediction methods that combine comparative genomic data with statistical properties of DNA may achieve reasonable performance when applied genome-wide in the absence of an a priori set of known TFBS motifs. The program CisPlusFinder can be downloaded at http://jakob.genetik.uni-koeln.de/bioinformatik/people/nora/nora.html. All software is licensed under the Lesser GNU Public License (LGPL).

  3. Identifying Cis-Regulatory Changes Involved in the Evolution of Aerobic Fermentation in Yeasts

    Science.gov (United States)

    Lin, Zhenguo; Wang, Tzi-Yuan; Tsai, Bing-Shi; Wu, Fang-Ting; Yu, Fu-Jung; Tseng, Yu-Jung; Sung, Huang-Mo; Li, Wen-Hsiung

    2013-01-01

    Gene regulation change has long been recognized as an important mechanism for phenotypic evolution. We used the evolution of yeast aerobic fermentation as a model to explore how gene regulation has evolved and how this process has contributed to phenotypic evolution and adaptation. Most eukaryotes fully oxidize glucose to CO2 and H2O in mitochondria to maximize energy yield, whereas some yeasts, such as Saccharomyces cerevisiae and its relatives, predominantly ferment glucose into ethanol even in the presence of oxygen, a phenomenon known as aerobic fermentation. We examined the genome-wide gene expression levels among 12 different yeasts and found that a group of genes involved in the mitochondrial respiration process showed the largest reduction in gene expression level during the evolution of aerobic fermentation. Our analysis revealed that the downregulation of these genes was significantly associated with massive loss of binding motifs of Cbf1p in the fermentative yeasts. Our experimental assays confirmed the binding of Cbf1p to the predicted motif and the activator role of Cbf1p. In summary, our study laid a foundation to unravel the long-time mystery about the genetic basis of evolution of aerobic fermentation, providing new insights into understanding the role of cis-regulatory changes in phenotypic evolution. PMID:23650209

  4. Genetic mapping uncovers cis-regulatory landscape of RNA editing.

    Science.gov (United States)

    Ramaswami, Gokul; Deng, Patricia; Zhang, Rui; Anna Carbone, Mary; Mackay, Trudy F C; Li, Jin Billy

    2015-09-16

    Adenosine-to-inosine (A-to-I) RNA editing, catalysed by ADAR enzymes conserved in metazoans, plays an important role in neurological functions. Although the fine-tuning mechanism provided by A-to-I RNA editing is important, the underlying rules governing ADAR substrate recognition are not well understood. We apply a quantitative trait loci (QTL) mapping approach to identify genetic variants associated with variability in RNA editing. With very accurate measurement of RNA editing levels at 789 sites in 131 Drosophila melanogaster strains, here we identify 545 editing QTLs (edQTLs) associated with differences in RNA editing. We demonstrate that many edQTLs can act through changes in the local secondary structure for edited dsRNAs. Furthermore, we find that edQTLs located outside of the edited dsRNA duplex are enriched in secondary structure, suggesting that distal dsRNA structure beyond the editing site duplex affects RNA editing efficiency. Our work will facilitate the understanding of the cis-regulatory code of RNA editing.

  5. Changes in Cis-regulatory Elements during Morphological Evolution

    Directory of Open Access Journals (Sweden)

    Yu-Lee Paul

    2012-10-01

    Full Text Available How have animals evolved new body designs (morphological evolution? This requires explanations both for simple morphological changes, such as differences in pigmentation and hair patterns between different Drosophila populations and species, and also for more complex changes, such as differences in the forelimbs of mice and bats, and the necks of amphibians and reptiles. The genetic changes and pathways involved in these evolutionary steps require identification. Many, though not all, of these events occur by changes in cis-regulatory (enhancer elements within developmental genes. Enhancers are modular, each affecting expression in only one or a few tissues. Therefore it is possible to add, remove or alter an enhancer without producing changes in multiple tissues, and thereby avoid widespread (pleiotropic deleterious effects. Ideally, for a given step in morphological evolution it is necessary to identify (i the change in phenotype, (ii the changes in gene expression, (iii the DNA region, enhancer or otherwise, affected, (iv the mutation involved, (v the nature of the transcription or other factors that bind to this site. In practice these data are incomplete for most of the published studies upon morphological evolution. Here, the investigations are categorized according to how far these analyses have proceeded.

  6. Validation of Skeletal Muscle cis-Regulatory Module Predictions Reveals Nucleotide Composition Bias in Functional Enhancers

    Science.gov (United States)

    Kwon, Andrew T.; Chou, Alice Yi; Arenillas, David J.; Wasserman, Wyeth W.

    2011-01-01

    We performed a genome-wide scan for muscle-specific cis-regulatory modules (CRMs) using three computational prediction programs. Based on the predictions, 339 candidate CRMs were tested in cell culture with NIH3T3 fibroblasts and C2C12 myoblasts for capacity to direct selective reporter gene expression to differentiated C2C12 myotubes. A subset of 19 CRMs validated as functional in the assay. The rate of predictive success reveals striking limitations of computational regulatory sequence analysis methods for CRM discovery. Motif-based methods performed no better than predictions based only on sequence conservation. Analysis of the properties of the functional sequences relative to inactive sequences identifies nucleotide sequence composition can be an important characteristic to incorporate in future methods for improved predictive specificity. Muscle-related TFBSs predicted within the functional sequences display greater sequence conservation than non-TFBS flanking regions. Comparison with recent MyoD and histone modification ChIP-Seq data supports the validity of the functional regions. PMID:22144875

  7. Validation of skeletal muscle cis-regulatory module predictions reveals nucleotide composition bias in functional enhancers.

    Directory of Open Access Journals (Sweden)

    Andrew T Kwon

    2011-12-01

    Full Text Available We performed a genome-wide scan for muscle-specific cis-regulatory modules (CRMs using three computational prediction programs. Based on the predictions, 339 candidate CRMs were tested in cell culture with NIH3T3 fibroblasts and C2C12 myoblasts for capacity to direct selective reporter gene expression to differentiated C2C12 myotubes. A subset of 19 CRMs validated as functional in the assay. The rate of predictive success reveals striking limitations of computational regulatory sequence analysis methods for CRM discovery. Motif-based methods performed no better than predictions based only on sequence conservation. Analysis of the properties of the functional sequences relative to inactive sequences identifies nucleotide sequence composition can be an important characteristic to incorporate in future methods for improved predictive specificity. Muscle-related TFBSs predicted within the functional sequences display greater sequence conservation than non-TFBS flanking regions. Comparison with recent MyoD and histone modification ChIP-Seq data supports the validity of the functional regions.

  8. Bioinformatic analysis of cis-regulatory interactions between progesterone and estrogen receptors in breast cancer

    Directory of Open Access Journals (Sweden)

    Matloob Khushi

    2014-11-01

    Full Text Available Chromatin factors interact with each other in a cell and sequence-specific manner in order to regulate transcription and a wealth of publically available datasets exists describing the genomic locations of these interactions. Our recently published BiSA (Binding Sites Analyser database contains transcription factor binding locations and epigenetic modifications collected from published studies and provides tools to analyse stored and imported data. Using BiSA we investigated the overlapping cis-regulatory role of estrogen receptor alpha (ERα and progesterone receptor (PR in the T-47D breast cancer cell line. We found that ERα binding sites overlap with a subset of PR binding sites. To investigate further, we re-analysed raw data to remove any biases introduced by the use of distinct tools in the original publications. We identified 22,152 PR and 18,560 ERα binding sites (<5% false discovery rate with 4,358 overlapping regions among the two datasets. BiSA statistical analysis revealed a non-significant overall overlap correlation between the two factors, suggesting that ERα and PR are not partner factors and do not require each other for binding to occur. However, Monte Carlo simulation by Binary Interval Search (BITS, Relevant Distance, Absolute Distance, Jaccard and Projection tests by Genometricorr revealed a statistically significant spatial correlation of binding regions on chromosome between the two factors. Motif analysis revealed that the shared binding regions were enriched with binding motifs for ERα, PR and a number of other transcription and pioneer factors. Some of these factors are known to co-locate with ERα and PR binding. Therefore spatially close proximity of ERα binding sites with PR binding sites suggests that ERα and PR, in general function independently at the molecular level, but that their activities converge on a specific subset of transcriptional targets.

  9. Quantitative statistical analysis of cis-regulatory sequences in ABA/VP1- and CBF/DREB1-regulated genes of Arabidopsis.

    Science.gov (United States)

    Suzuki, Masaharu; Ketterling, Matthew G; McCarty, Donald R

    2005-09-01

    We have developed a simple quantitative computational approach for objective analysis of cis-regulatory sequences in promoters of coregulated genes. The program, designated MotifFinder, identifies oligo sequences that are overrepresented in promoters of coregulated genes. We used this approach to analyze promoter sequences of Viviparous1 (VP1)/abscisic acid (ABA)-regulated genes and cold-regulated genes, respectively, of Arabidopsis (Arabidopsis thaliana). We detected significantly enriched sequences in up-regulated genes but not in down-regulated genes. This result suggests that gene activation but not repression is mediated by specific and common sequence elements in promoters. The enriched motifs include several known cis-regulatory sequences as well as previously unidentified motifs. With respect to known cis-elements, we dissected the flanking nucleotides of the core sequences of Sph element, ABA response elements (ABREs), and the C repeat/dehydration-responsive element. This analysis identified the motif variants that may correlate with qualitative and quantitative differences in gene expression. While both VP1 and cold responses are mediated in part by ABA signaling via ABREs, these responses correlate with unique ABRE variants distinguished by nucleotides flanking the ACGT core. ABRE and Sph motifs are tightly associated uniquely in the coregulated set of genes showing a strict dependence on VP1 and ABA signaling. Finally, analysis of distribution of the enriched sequences revealed a striking concentration of enriched motifs in a proximal 200-base region of VP1/ABA and cold-regulated promoters. Overall, each class of coregulated genes possesses a discrete set of the enriched motifs with unique distributions in their promoters that may account for the specificity of gene regulation.

  10. Barcoded DNA-tag reporters for multiplex cis-regulatory analysis.

    Directory of Open Access Journals (Sweden)

    Jongmin Nam

    Full Text Available Cis-regulatory DNA sequences causally mediate patterns of gene expression, but efficient experimental analysis of these control systems has remained challenging. Here we develop a new version of "barcoded" DNA-tag reporters, "Nanotags" that permit simultaneous quantitative analysis of up to 130 distinct cis-regulatory modules (CRMs. The activities of these reporters are measured in single experiments by the NanoString RNA counting method and other quantitative procedures. We demonstrate the efficiency of the Nanotag method by simultaneously measuring hourly temporal activities of 126 CRMs from 46 genes in the developing sea urchin embryo, otherwise a virtually impossible task. Nanotags are also used in gene perturbation experiments to reveal cis-regulatory responses of many CRMs at once. Nanotag methodology can be applied to many research areas, ranging from gene regulatory networks to functional and evolutionary genomics.

  11. Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas.

    Science.gov (United States)

    Mathelier, Anthony; Lefebvre, Calvin; Zhang, Allen W; Arenillas, David J; Ding, Jiarui; Wasserman, Wyeth W; Shah, Sohrab P

    2015-04-23

    With the rapid increase of whole-genome sequencing of human cancers, an important opportunity to analyze and characterize somatic mutations lying within cis-regulatory regions has emerged. A focus on protein-coding regions to identify nonsense or missense mutations disruptive to protein structure and/or function has led to important insights; however, the impact on gene expression of mutations lying within cis-regulatory regions remains under-explored. We analyzed somatic mutations from 84 matched tumor-normal whole genomes from B-cell lymphomas with accompanying gene expression measurements to elucidate the extent to which these cancers are disrupted by cis-regulatory mutations. We characterize mutations overlapping a high quality set of well-annotated transcription factor binding sites (TFBSs), covering a similar portion of the genome as protein-coding exons. Our results indicate that cis-regulatory mutations overlapping predicted TFBSs are enriched in promoter regions of genes involved in apoptosis or growth/proliferation. By integrating gene expression data with mutation data, our computational approach culminates with identification of cis-regulatory mutations most likely to participate in dysregulation of the gene expression program. The impact can be measured along with protein-coding mutations to highlight key mutations disrupting gene expression and pathways in cancer. Our study yields specific genes with disrupted expression triggered by genomic mutations in either the coding or the regulatory space. It implies that mutated regulatory components of the genome contribute substantially to cancer pathways. Our analyses demonstrate that identifying genomically altered cis-regulatory elements coupled with analysis of gene expression data will augment biological interpretation of mutational landscapes of cancers.

  12. Nutritional control of gene expression in Drosophila larvae via TOR, Myc and a novel cis-regulatory element

    Directory of Open Access Journals (Sweden)

    Grewal Savraj S

    2010-01-01

    Full Text Available Abstract Background Nutrient availability is a key determinant of eukaryotic cell growth. In unicellular organisms many signaling and transcriptional networks link nutrient availability to the expression of metabolic genes required for growth. However, less is known about the corresponding mechanisms that operate in metazoans. We used gene expression profiling to explore this issue in developing Drosophila larvae. Results We found that starvation for dietary amino acids (AA's leads to dynamic changes in transcript levels of many metabolic genes. The conserved insulin/PI3K and TOR signaling pathways mediate nutrition-dependent growth in Drosophila and other animals. We found that many AA starvation-responsive transcripts were also altered in TOR mutants. In contrast, although PI3K overexpression induced robust changes in the expression of many metabolic genes, these changes showed limited overlap with the AA starvation expression profile. We did however identify a strong overlap between genes regulated by the transcription factor, Myc, and AA starvation-responsive genes, particularly those involved in ribosome biogenesis, protein synthesis and mitochondrial function. The consensus Myc DNA binding site is enriched in promoters of these AA starvation genes, and we found that Myc overexpression could bypass dietary AA to induce expression of these genes. We also identified another sequence motif (Motif 1 enriched in the promoters of AA starvation-responsive genes. We showed that Motif 1 was both necessary and sufficient to mediate transcriptional responses to dietary AA in larvae. Conclusions Our data suggest that many of the transcriptional effects of amino acids are mediated via signaling through the TOR pathway in Drosophila larvae. We also find that these transcriptional effects are mediated through at least two mechanisms: via the transcription factor Myc, and via the Motif 1 cis-regulatory element. These studies begin to elucidate a nutrient

  13. CRX ChIP-seq reveals the cis-regulatory architecture of mouse photoreceptors

    NARCIS (Netherlands)

    J.C. Corbo (Joseph); K.A. Lawrence (Karen); M. Karlstetter (Marcus); C.A. Myers (Connie); M. Abdelaziz (Musa); W. Dirkes (William); K. Weigelt (Karin); M. Seifert (Martin); V. Benes (Vladimir); L.G. Fritsche (Lars); B.H.F. Weber (Bernhard); T. Langmann (Thomas)

    2010-01-01

    textabstractApproximately 98% of mammalian DNA is noncoding, yet we understand relatively little about the function of this enigmatic portion of the genome. The cis-regulatory elements that control gene expression reside in noncoding regions and can be identified by mapping the binding sites of

  14. Cis-regulatory elements in the primate brain: from functional specialization to neurodegeneration

    NARCIS (Netherlands)

    Vermunt, Marit W.

    2017-01-01

    Over the last decade, the noncoding part of the genome has been shown to harbour thousands of cis-regulatory elements, such as enhancers, that activate well-defined gene expression programs. Here, we charted active enhancers in a multiplicity of human brain regions to understand the role of

  15. Evolution of Cis-Regulatory Elements and Regulatory Networks in Duplicated Genes of Arabidopsis.

    Science.gov (United States)

    Arsovski, Andrej A; Pradinuk, Julian; Guo, Xu Qiu; Wang, Sishuo; Adams, Keith L

    2015-12-01

    Plant genomes contain large numbers of duplicated genes that contribute to the evolution of new functions. Following duplication, genes can exhibit divergence in their coding sequence and their expression patterns. Changes in the cis-regulatory element landscape can result in changes in gene expression patterns. High-throughput methods developed recently can identify potential cis-regulatory elements on a genome-wide scale. Here, we use a recent comprehensive data set of DNase I sequencing-identified cis-regulatory binding sites (footprints) at single-base-pair resolution to compare binding sites and network connectivity in duplicated gene pairs in Arabidopsis (Arabidopsis thaliana). We found that duplicated gene pairs vary greatly in their cis-regulatory element architecture, resulting in changes in regulatory network connectivity. Whole-genome duplicates (WGDs) have approximately twice as many footprints in their promoters left by potential regulatory proteins than do tandem duplicates (TDs). The WGDs have a greater average number of footprint differences between paralogs than TDs. The footprints, in turn, result in more regulatory network connections between WGDs and other genes, forming denser, more complex regulatory networks than shown by TDs. When comparing regulatory connections between duplicates, WGDs had more pairs in which the two genes are either partially or fully diverged in their network connections, but fewer genes with no network connections than the TDs. There is evidence of younger TDs and WGDs having fewer unique connections compared with older duplicates. This study provides insights into cis-regulatory element evolution and network divergence in duplicated genes. © 2015 American Society of Plant Biologists. All Rights Reserved.

  16. Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.

    Science.gov (United States)

    Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W

    2018-05-31

    In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.

  17. Assessment of algorithms for inferring positional weight matrix motifs of transcription factor binding sites using protein binding microarray data.

    Directory of Open Access Journals (Sweden)

    Yaron Orenstein

    Full Text Available The new technology of protein binding microarrays (PBMs allows simultaneous measurement of the binding intensities of a transcription factor to tens of thousands of synthetic double-stranded DNA probes, covering all possible 10-mers. A key computational challenge is inferring the binding motif from these data. We present a systematic comparison of four methods developed specifically for reconstructing a binding site motif represented as a positional weight matrix from PBM data. The reconstructed motifs were evaluated in terms of three criteria: concordance with reference motifs from the literature and ability to predict in vivo and in vitro bindings. The evaluation encompassed over 200 transcription factors and some 300 assays. The results show a tradeoff between how the methods perform according to the different criteria, and a dichotomy of method types. Algorithms that construct motifs with low information content predict PBM probe ranking more faithfully, while methods that produce highly informative motifs match reference motifs better. Interestingly, in predicting high-affinity binding, all methods give far poorer results for in vivo assays compared to in vitro assays.

  18. Prediction of tissue-specific cis-regulatory modules using Bayesian networks and regression trees

    Directory of Open Access Journals (Sweden)

    Chen Xiaoyu

    2007-12-01

    Full Text Available Abstract Background In vertebrates, a large part of gene transcriptional regulation is operated by cis-regulatory modules. These modules are believed to be regulating much of the tissue-specificity of gene expression. Results We develop a Bayesian network approach for identifying cis-regulatory modules likely to regulate tissue-specific expression. The network integrates predicted transcription factor binding site information, transcription factor expression data, and target gene expression data. At its core is a regression tree modeling the effect of combinations of transcription factors bound to a module. A new unsupervised EM-like algorithm is developed to learn the parameters of the network, including the regression tree structure. Conclusion Our approach is shown to accurately identify known human liver and erythroid-specific modules. When applied to the prediction of tissue-specific modules in 10 different tissues, the network predicts a number of important transcription factor combinations whose concerted binding is associated to specific expression.

  19. Genetic interaction motif finding by expectation maximization – a novel statistical model for inferring gene modules from synthetic lethality

    Directory of Open Access Journals (Sweden)

    Ye Ping

    2005-12-01

    Full Text Available Abstract Background Synthetic lethality experiments identify pairs of genes with complementary function. More direct functional associations (for example greater probability of membership in a single protein complex may be inferred between genes that share synthetic lethal interaction partners than genes that are directly synthetic lethal. Probabilistic algorithms that identify gene modules based on motif discovery are highly appropriate for the analysis of synthetic lethal genetic interaction data and have great potential in integrative analysis of heterogeneous datasets. Results We have developed Genetic Interaction Motif Finding (GIMF, an algorithm for unsupervised motif discovery from synthetic lethal interaction data. Interaction motifs are characterized by position weight matrices and optimized through expectation maximization. Given a seed gene, GIMF performs a nonlinear transform on the input genetic interaction data and automatically assigns genes to the motif or non-motif category. We demonstrate the capacity to extract known and novel pathways for Saccharomyces cerevisiae (budding yeast. Annotations suggested for several uncharacterized genes are supported by recent experimental evidence. GIMF is efficient in computation, requires no training and automatically down-weights promiscuous genes with high degrees. Conclusion GIMF effectively identifies pathways from synthetic lethality data with several unique features. It is mostly suitable for building gene modules around seed genes. Optimal choice of one single model parameter allows construction of gene networks with different levels of confidence. The impact of hub genes the generic probabilistic framework of GIMF may be used to group other types of biological entities such as proteins based on stochastic motifs. Analysis of the strongest motifs discovered by the algorithm indicates that synthetic lethal interactions are depleted between genes within a motif, suggesting that synthetic

  20. Pathogenic adaptation of intracellular bacteria by rewiring a cis-regulatory input function.

    Science.gov (United States)

    Osborne, Suzanne E; Walthers, Don; Tomljenovic, Ana M; Mulder, David T; Silphaduang, Uma; Duong, Nancy; Lowden, Michael J; Wickham, Mark E; Waller, Ross F; Kenney, Linda J; Coombes, Brian K

    2009-03-10

    The acquisition of DNA by horizontal gene transfer enables bacteria to adapt to previously unexploited ecological niches. Although horizontal gene transfer and mutation of protein-coding sequences are well-recognized forms of pathogen evolution, the evolutionary significance of cis-regulatory mutations in creating phenotypic diversity through altered transcriptional outputs is not known. We show the significance of regulatory mutation for pathogen evolution by mapping and then rewiring a cis-regulatory module controlling a gene required for murine typhoid. Acquisition of a binding site for the Salmonella pathogenicity island-2 regulator, SsrB, enabled the srfN gene, ancestral to the Salmonella genus, to play a role in pathoadaptation of S. typhimurium to a host animal. We identified the evolved cis-regulatory module and quantified the fitness gain that this regulatory output accrues for the bacterium using competitive infections of host animals. Our findings highlight a mechanism of pathogen evolution involving regulatory mutation that is selected because of the fitness advantage the new regulatory output provides the incipient clones.

  1. Dynamic SPR monitoring of yeast nuclear protein binding to a cis-regulatory element

    International Nuclear Information System (INIS)

    Mao, Grace; Brody, James P.

    2007-01-01

    Gene expression is controlled by protein complexes binding to short specific sequences of DNA, called cis-regulatory elements. Expression of most eukaryotic genes is controlled by dozens of these elements. Comprehensive identification and monitoring of these elements is a major goal of genomics. In pursuit of this goal, we are developing a surface plasmon resonance (SPR) based assay to identify and monitor cis-regulatory elements. To test whether we could reliably monitor protein binding to a regulatory element, we immobilized a 16 bp region of Saccharomyces cerevisiae chromosome 5 onto a gold surface. This 16 bp region of DNA is known to bind several proteins and thought to control expression of the gene RNR1, which varies through the cell cycle. We synchronized yeast cell cultures, and then sampled these cultures at a regular interval. These samples were processed to purify nuclear lysate, which was then exposed to the sensor. We found that nuclear protein binds this particular element of DNA at a significantly higher rate (as compared to unsynchronized cells) during G1 phase. Other time points show levels of DNA-nuclear protein binding similar to the unsynchronized control. We also measured the apparent association complex of the binding to be 0.014 s -1 . We conclude that (1) SPR-based assays can monitor DNA-nuclear protein binding and that (2) for this particular cis-regulatory element, maximum DNA-nuclear protein binding occurs during G1 phase

  2. A novel method for predicting activity of cis-regulatory modules, based on a diverse training set.

    Science.gov (United States)

    Yang, Wei; Sinha, Saurabh

    2017-01-01

    With the rapid emergence of technologies for locating cis-regulatory modules (CRMs) genome-wide, the next pressing challenge is to assign precise functions to each CRM, i.e. to determine the spatiotemporal domains or cell-types where it drives expression. A popular approach to this task is to model the typical k-mer composition of a set of CRMs known to drive a common expression pattern, and assign that pattern to other CRMs exhibiting a similar k-mer composition. This approach does not rely on prior knowledge of transcription factors relevant to the CRM or their binding motifs, and is thus more widely applicable than motif-based methods for predicting CRM activity, but is also prone to false positive predictions. We present a novel strategy to improve the above-mentioned approach: to predict if a CRM drives a specific gene expression pattern, assess not only how similar the CRM is to other CRMs with similar activity but also to CRMs with distinct activities. We use a state-of-the-art statistical method to quantify a CRM's sequence similarity to many different training sets of CRMs, and employ a classification algorithm to integrate these similarity scores into a single prediction of the CRM's activity. This strategy is shown to significantly improve CRM activity prediction over current approaches. Our implementation of the new method, called IMMBoost, is freely available as source code, at https://github.com/weiyangedward/IMMBoost CONTACT: sinhas@illinois.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. A cis-regulatory sequence driving metabolic insecticide resistance in mosquitoes: functional characterisation and signatures of selection.

    Science.gov (United States)

    Wilding, Craig S; Smith, Ian; Lynd, Amy; Yawson, Alexander Egyir; Weetman, David; Paine, Mark J I; Donnelly, Martin J

    2012-09-01

    Although cytochrome P450 (CYP450) enzymes are frequently up-regulated in mosquitoes resistant to insecticides, no regulatory motifs driving these expression differences with relevance to wild populations have been identified. Transposable elements (TEs) are often enriched upstream of those CYP450s involved in insecticide resistance, leading to the assumption that they contribute regulatory motifs that directly underlie the resistance phenotype. A partial CuRE1 (Culex Repetitive Element 1) transposable element is found directly upstream of CYP9M10, a cytochrome P450 implicated previously in larval resistance to permethrin in the ISOP450 strain of Culex quinquefasciatus, but is absent from the equivalent genomic region of a susceptible strain. Via expression of CYP9M10 in Escherichia coli we have now demonstrated time- and NADPH-dependant permethrin metabolism, prerequisites for confirmation of a role in metabolic resistance, and through qPCR shown that CYP9M10 is >20-fold over-expressed in ISOP450 compared to a susceptible strain. In a fluorescent reporter assay the region upstream of CYP9M10 from ISOP450 drove 10× expression compared to the equivalent region (lacking CuRE1) from the susceptible strain. Close correspondence with the gene expression fold-change implicates the upstream region including CuRE1 as a cis-regulatory element involved in resistance. Only a single CuRE1 bearing allele, identical to the CuRE1 bearing allele in the resistant strain, is found throughout Sub-Saharan Africa, in contrast to the diversity encountered in non-CuRE1 alleles. This suggests a single origin and subsequent spread due to selective advantage. CuRE1 is detectable using a simple diagnostic. When applied to C. quinquefasciatus larvae from Ghana we have demonstrated a significant association with permethrin resistance in multiple field sites (mean Odds Ratio = 3.86) suggesting this marker has relevance to natural populations of vector mosquitoes. However, when CuRE1 was excised

  4. Plasticity of the cis-regulatory input function of a gene.

    Directory of Open Access Journals (Sweden)

    Avraham E Mayo

    2006-04-01

    Full Text Available The transcription rate of a gene is often controlled by several regulators that bind specific sites in the gene's cis-regulatory region. The combined effect of these regulators is described by a cis-regulatory input function. What determines the form of an input function, and how variable is it with respect to mutations? To address this, we employ the well-characterized lac operon of Escherichia coli, which has an elaborate input function, intermediate between Boolean AND-gate and OR-gate logic. We mapped in detail the input function of 12 variants of the lac promoter, each with different point mutations in the regulator binding sites, by means of accurate expression measurements from living cells. We find that even a few mutations can significantly change the input function, resulting in functions that resemble Pure AND gates, OR gates, or single-input switches. Other types of gates were not found. The variant input functions can be described in a unified manner by a mathematical model. The model also lets us predict which functions cannot be reached by point mutations. The input function that we studied thus appears to be plastic, in the sense that many of the mutations do not ruin the regulation completely but rather result in new ways to integrate the inputs.

  5. A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analysis of 606 Sequences

    Science.gov (United States)

    Grimm, Guido W.; Renner, Susanne S.; Stamatakis, Alexandros; Hemleben, Vera

    2007-01-01

    The multi-copy internal transcribed spacer (ITS) region of nuclear ribosomal DNA is widely used to infer phylogenetic relationships among closely related taxa. Here we use maximum likelihood (ML) and splits graph analyses to extract phylogenetic information from ~ 600 mostly cloned ITS sequences, representing 81 species and subspecies of Acer, and both species of its sister Dipteronia. Additional analyses compared sequence motifs in Acer and several hundred Anacardiaceae, Burseraceae, Meliaceae, Rutaceae, and Sapindaceae ITS sequences in GenBank. We also assessed the effects of using smaller data sets of consensus sequences with ambiguity coding (accounting for within-species variation) instead of the full (partly redundant) original sequences. Neighbor-nets and bipartition networks were used to visualize conflict among character state patterns. Species clusters observed in the trees and networks largely agree with morphology-based classifications; of de Jong’s (1994) 16 sections, nine are supported in neighbor-net and bipartition networks, and ten by sequence motifs and the ML tree; of his 19 series, 14 are supported in networks, motifs, and the ML tree. Most nodes had higher bootstrap support with matrices of 105 or 40 consensus sequences than with the original matrix. Within-taxon ITS divergence did not differ between diploid and polyploid Acer, and there was little evidence of differentiated parental ITS haplotypes, suggesting that concerted evolution in Acer acts rapidly. PMID:19455198

  6. A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analysis of 606 Sequences

    Directory of Open Access Journals (Sweden)

    Guido W. Grimm

    2006-01-01

    Full Text Available The multi-copy internal transcribed spacer (ITS region of nuclear ribosomal DNA is widely used to infer phylogenetic relationships among closely related taxa. Here we use maximum likelihood (ML and splits graph analyses to extract phylogenetic information from ~ 600 mostly cloned ITS sequences, representing 81 species and subspecies of Acer, and both species of its sister Dipteronia. Additional analyses compared sequence motifs in Acer and several hundred Anacardiaceae, Burseraceae, Meliaceae, Rutaceae, and Sapindaceae ITS sequences in GenBank. We also assessed the effects of using smaller data sets of consensus sequences with ambiguity coding (accounting for within-species variation instead of the full (partly redundant original sequences. Neighbor-nets and bipartition networks were used to visualize conflict among character state patterns. Species clusters observed in the trees and networks largely agree with morphology-based classifications; of de Jong’s (1994 16 sections, nine are supported in neighbor-net and bipartition networks, and ten by sequence motifs and the ML tree; of his 19 series, 14 are supported in networks, motifs, and the ML tree. Most nodes had higher bootstrap support with matrices of 105 or 40 consensus sequences than with the original matrix. Within-taxon ITS divergence did not differ between diploid and polyploid Acer, and there was little evidence of differentiated parental ITS haplotypes, suggesting that concerted evolution in Acer acts rapidly.

  7. Alignment and prediction of cis-regulatory modules based on a probabilistic model of evolution.

    Directory of Open Access Journals (Sweden)

    Xin He

    2009-03-01

    Full Text Available Cross-species comparison has emerged as a powerful paradigm for predicting cis-regulatory modules (CRMs and understanding their evolution. The comparison requires reliable sequence alignment, which remains a challenging task for less conserved noncoding sequences. Furthermore, the existing models of DNA sequence evolution generally do not explicitly treat the special properties of CRM sequences. To address these limitations, we propose a model of CRM evolution that captures different modes of evolution of functional transcription factor binding sites (TFBSs and the background sequences. A particularly novel aspect of our work is a probabilistic model of gains and losses of TFBSs, a process being recognized as an important part of regulatory sequence evolution. We present a computational framework that uses this model to solve the problems of CRM alignment and prediction. Our alignment method is similar to existing methods of statistical alignment but uses the conserved binding sites to improve alignment. Our CRM prediction method deals with the inherent uncertainties of binding site annotations and sequence alignment in a probabilistic framework. In simulated as well as real data, we demonstrate that our program is able to improve both alignment and prediction of CRM sequences over several state-of-the-art methods. Finally, we used alignments produced by our program to study binding site conservation in genome-wide binding data of key transcription factors in the Drosophila blastoderm, with two intriguing results: (i the factor-bound sequences are under strong evolutionary constraints even if their neighboring genes are not expressed in the blastoderm and (ii binding sites in distal bound sequences (relative to transcription start sites tend to be more conserved than those in proximal regions. Our approach is implemented as software, EMMA (Evolutionary Model-based cis-regulatory Module Analysis, ready to be applied in a broad biological context.

  8. In silico modeling of epigenetic-induced changes in photoreceptor cis-regulatory elements.

    Science.gov (United States)

    Hossain, Reafa A; Dunham, Nicholas R; Enke, Raymond A; Berndsen, Christopher E

    2018-01-01

    DNA methylation is a well-characterized epigenetic repressor of mRNA transcription in many plant and vertebrate systems. However, the mechanism of this repression is not fully understood. The process of transcription is controlled by proteins that regulate recruitment and activity of RNA polymerase by binding to specific cis-regulatory sequences. Cone-rod homeobox (CRX) is a well-characterized mammalian transcription factor that controls photoreceptor cell-specific gene expression. Although much is known about the functions and DNA binding specificity of CRX, little is known about how DNA methylation modulates CRX binding affinity to genomic cis-regulatory elements. We used bisulfite pyrosequencing of human ocular tissues to measure DNA methylation levels of the regulatory regions of RHO , PDE6B, PAX6 , and LINE1 retrotransposon repeats. To describe the molecular mechanism of repression, we used molecular modeling to illustrate the effect of DNA methylation on human RHO regulatory sequences. In this study, we demonstrate an inverse correlation between DNA methylation in regulatory regions adjacent to the human RHO and PDE6B genes and their subsequent transcription in human ocular tissues. Docking of CRX to the DNA models shows that CRX interacts with the grooves of these sequences, suggesting changes in groove structure could regulate binding. Molecular dynamics simulations of the RHO promoter and enhancer regions show changes in the flexibility and groove width upon epigenetic modification. Models also demonstrate changes in the local dynamics of CRX binding sites within RHO regulatory sequences which may account for the repression of CRX-dependent transcription. Collectively, these data demonstrate epigenetic regulation of CRX binding sites in human retinal tissue and provide insight into the mechanism of this mode of epigenetic regulation to be tested in future experiments.

  9. Brachyury, Foxa2 and the cis-Regulatory Origins of the Notochord.

    Directory of Open Access Journals (Sweden)

    Diana S José-Edwards

    2015-12-01

    Full Text Available A main challenge of modern biology is to understand how specific constellations of genes are activated to differentiate cells and give rise to distinct tissues. This study focuses on elucidating how gene expression is initiated in the notochord, an axial structure that provides support and patterning signals to embryos of humans and all other chordates. Although numerous notochord genes have been identified, the regulatory DNAs that orchestrate development and propel evolution of this structure by eliciting notochord gene expression remain mostly uncharted, and the information on their configuration and recurrence is still quite fragmentary. Here we used the simple chordate Ciona for a systematic analysis of notochord cis-regulatory modules (CRMs, and investigated their composition, architectural constraints, predictive ability and evolutionary conservation. We found that most Ciona notochord CRMs relied upon variable combinations of binding sites for the transcription factors Brachyury and/or Foxa2, which can act either synergistically or independently from one another. Notably, one of these CRMs contains a Brachyury binding site juxtaposed to an (AC microsatellite, an unusual arrangement also found in Brachyury-bound regulatory regions in mouse. In contrast, different subsets of CRMs relied upon binding sites for transcription factors of widely diverse families. Surprisingly, we found that neither intra-genomic nor interspecific conservation of binding sites were reliably predictive hallmarks of notochord CRMs. We propose that rather than obeying a rigid sequence-based cis-regulatory code, most notochord CRMs are rather unique. Yet, this study uncovered essential elements recurrently used by divergent chordates as basic building blocks for notochord CRMs.

  10. Brachyury, Foxa2 and the cis-Regulatory Origins of the Notochord.

    Science.gov (United States)

    José-Edwards, Diana S; Oda-Ishii, Izumi; Kugler, Jamie E; Passamaneck, Yale J; Katikala, Lavanya; Nibu, Yutaka; Di Gregorio, Anna

    2015-12-01

    A main challenge of modern biology is to understand how specific constellations of genes are activated to differentiate cells and give rise to distinct tissues. This study focuses on elucidating how gene expression is initiated in the notochord, an axial structure that provides support and patterning signals to embryos of humans and all other chordates. Although numerous notochord genes have been identified, the regulatory DNAs that orchestrate development and propel evolution of this structure by eliciting notochord gene expression remain mostly uncharted, and the information on their configuration and recurrence is still quite fragmentary. Here we used the simple chordate Ciona for a systematic analysis of notochord cis-regulatory modules (CRMs), and investigated their composition, architectural constraints, predictive ability and evolutionary conservation. We found that most Ciona notochord CRMs relied upon variable combinations of binding sites for the transcription factors Brachyury and/or Foxa2, which can act either synergistically or independently from one another. Notably, one of these CRMs contains a Brachyury binding site juxtaposed to an (AC) microsatellite, an unusual arrangement also found in Brachyury-bound regulatory regions in mouse. In contrast, different subsets of CRMs relied upon binding sites for transcription factors of widely diverse families. Surprisingly, we found that neither intra-genomic nor interspecific conservation of binding sites were reliably predictive hallmarks of notochord CRMs. We propose that rather than obeying a rigid sequence-based cis-regulatory code, most notochord CRMs are rather unique. Yet, this study uncovered essential elements recurrently used by divergent chordates as basic building blocks for notochord CRMs.

  11. Nomadic enhancers: tissue-specific cis-regulatory elements of yellow have divergent genomic positions among Drosophila species.

    Directory of Open Access Journals (Sweden)

    Gizem Kalay

    2010-11-01

    Full Text Available cis-regulatory DNA sequences known as enhancers control gene expression in space and time. They are central to metazoan development and are often responsible for changes in gene regulation that contribute to phenotypic evolution. Here, we examine the sequence, function, and genomic location of enhancers controlling tissue- and cell-type specific expression of the yellow gene in six Drosophila species. yellow is required for the production of dark pigment, and its expression has evolved largely in concert with divergent pigment patterns. Using Drosophila melanogaster as a transgenic host, we examined the expression of reporter genes in which either 5' intergenic or intronic sequences of yellow from each species controlled the expression of Green Fluorescent Protein. Surprisingly, we found that sequences controlling expression in the wing veins, as well as sequences controlling expression in epidermal cells of the abdomen, thorax, and wing, were located in different genomic regions in different species. By contrast, sequences controlling expression in bristle-associated cells were located in the intron of all species. Differences in the precise pattern of spatial expression within the developing epidermis of D. melanogaster transformants usually correlated with adult pigmentation in the species from which the cis-regulatory sequences were derived, which is consistent with cis-regulatory evolution affecting yellow expression playing a central role in Drosophila pigmentation divergence. Sequence comparisons among species favored a model in which sequential nucleotide substitutions were responsible for the observed changes in cis-regulatory architecture. Taken together, these data demonstrate frequent changes in yellow cis-regulatory architecture among Drosophila species. Similar analyses of other genes, combining in vivo functional tests of enhancer activity with in silico comparative genomics, are needed to determine whether the pattern of

  12. [Analysis of cis-regulatory element distribution in gene promoters of Gossypium raimondii and Arabidopsis thaliana].

    Science.gov (United States)

    Sun, Gao-Fei; He, Shou-Pu; Du, Xiong-Ming

    2013-10-01

    Cotton genomic studies have boomed since the release of Gossypium raimondii draft genome. In this study, cis-regulatory element (CRE) in 1 kb length sequence upstream 5' UTR of annotated genes were selected and scanned in the Arabidopsis thaliana (At) and Gossypium raimondii (Gr) genomes, based on the database of PLACE (Plant cis-acting Regulatory DNA Elements). According to the definition of this study, 44 (12.3%) and 57 (15.5%) CREs presented "peak-like" distribution in the 1 kb selected sequences of both genomes, respectively. Thirty-four of them were peak-like distributed in both genomes, which could be further categorized into 4 types based on their core sequences. The coincidence of TATABOX peak position and their actual position ((-) -30 bp) indicated that the position of a common CRE was conservative in different genes, which suggested that the peak position of these CREs was their possible actual position of transcription factors. The position of a common CRE was also different between the two genomes due to stronger length variation of 5' UTR in Gr than At. Furthermore, most of the peak-like CREs were located in the region of -110 bp-0 bp, which suggested that concentrated distribution might be conductive to the interaction of transcription factors, and then regulate the gene expression in downstream.

  13. Using reporter gene assays to identify cis regulatory differences between humans and chimpanzees.

    Science.gov (United States)

    Chabot, Adrien; Shrit, Ralla A; Blekhman, Ran; Gilad, Yoav

    2007-08-01

    Most phenotypic differences between human and chimpanzee are likely to result from differences in gene regulation, rather than changes to protein-coding regions. To date, however, only a handful of human-chimpanzee nucleotide differences leading to changes in gene regulation have been identified. To hone in on differences in regulatory elements between human and chimpanzee, we focused on 10 genes that were previously found to be differentially expressed between the two species. We then designed reporter gene assays for the putative human and chimpanzee promoters of the 10 genes. Of seven promoters that we found to be active in human liver cell lines, human and chimpanzee promoters had significantly different activity in four cases, three of which recapitulated the gene expression difference seen in the microarray experiment. For these three genes, we were therefore able to demonstrate that a change in cis influences expression differences between humans and chimpanzees. Moreover, using site-directed mutagenesis on one construct, the promoter for the DDA3 gene, we were able to identify three nucleotides that together lead to a cis regulatory difference between the species. High-throughput application of this approach can provide a map of regulatory element differences between humans and our close evolutionary relatives.

  14. Characterization of Cer-1 cis-regulatory region during early Xenopus development.

    Science.gov (United States)

    Silva, Ana Cristina; Filipe, Mário; Steinbeisser, Herbert; Belo, José António

    2011-05-01

    Cerberus-related molecules are well-known Wnt, Nodal, and BMP inhibitors that have been implicated in different processes including anterior–posterior patterning and left–right asymmetry. In both mouse and frog, two Cerberus-related genes have been isolated, mCer-1 and mCer-2, and Xcer and Xcoco, respectively. Until now, little is known about the mechanisms involved in their transcriptional regulation. Here, we report a heterologous analysis of the mouse Cerberus-1 gene upstream regulatory regions, responsible for its expression in the visceral endodermal cells. Our analysis showed that the consensus sequences for a TATA, CAAT, or GC boxes were absent but a TGTGG sequence was present at position -172 to -168 bp, relative to the ATG. Using a series of deletion constructs and transient expression in Xenopus embryos, we found that a fragment of 1.4 kb of Cer-1 promoter sequence could reproduce the endogenous expression pattern of Xenopus cerberus. A 0.7-kb mcer-1 upstream region was able to drive reporter expression to the involuting mesendodermal cells, while further deletions abolished reporter gene expression. Our results suggest that although no sequence similarity was found between mouse and Xenopus cerberus cis-regulatory regions, the signaling cascades regulating cerberus expression, during gastrulation, is conserved.

  15. PReMod: a database of genome-wide mammalian cis-regulatory module predictions.

    Science.gov (United States)

    Ferretti, Vincent; Poitras, Christian; Bergeron, Dominique; Coulombe, Benoit; Robert, François; Blanchette, Mathieu

    2007-01-01

    We describe PReMod, a new database of genome-wide cis-regulatory module (CRM) predictions for both the human and the mouse genomes. The prediction algorithm, described previously in Blanchette et al. (2006) Genome Res., 16, 656-668, exploits the fact that many known CRMs are made of clusters of phylogenetically conserved and repeated transcription factors (TF) binding sites. Contrary to other existing databases, PReMod is not restricted to modules located proximal to genes, but in fact mostly contains distal predicted CRMs (pCRMs). Through its web interface, PReMod allows users to (i) identify pCRMs around a gene of interest; (ii) identify pCRMs that have binding sites for a given TF (or a set of TFs) or (iii) download the entire dataset for local analyses. Queries can also be refined by filtering for specific chromosomal regions, for specific regions relative to genes or for the presence of CpG islands. The output includes information about the binding sites predicted within the selected pCRMs, and a graphical display of their distribution within the pCRMs. It also provides a visual depiction of the chromosomal context of the selected pCRMs in terms of neighboring pCRMs and genes, all of which are linked to the UCSC Genome Browser and the NCBI. PReMod: http://genomequebec.mcgill.ca/PReMod.

  16. BET Bromodomain Inhibition Releases the Mediator Complex from Select cis-Regulatory Elements.

    Science.gov (United States)

    Bhagwat, Anand S; Roe, Jae-Seok; Mok, Beverly Y L; Hohmann, Anja F; Shi, Junwei; Vakoc, Christopher R

    2016-04-19

    The bromodomain and extraterminal (BET) protein BRD4 can physically interact with the Mediator complex, but the relevance of this association to the therapeutic effects of BET inhibitors in cancer is unclear. Here, we show that BET inhibition causes a rapid release of Mediator from a subset of cis-regulatory elements in the genome of acute myeloid leukemia (AML) cells. These sites of Mediator eviction were highly correlated with transcriptional suppression of neighboring genes, which are enriched for targets of the transcription factor MYB and for functions related to leukemogenesis. A shRNA screen of Mediator in AML cells identified the MED12, MED13, MED23, and MED24 subunits as performing a similar regulatory function to BRD4 in this context, including a shared role in sustaining a block in myeloid maturation. These findings suggest that the interaction between BRD4 and Mediator has functional importance for gene-specific transcriptional activation and for AML maintenance. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  17. Lmx1b-targeted cis-regulatory modules involved in limb dorsalization.

    Science.gov (United States)

    Haro, Endika; Watson, Billy A; Feenstra, Jennifer M; Tegeler, Luke; Pira, Charmaine U; Mohan, Subburaman; Oberg, Kerby C

    2017-06-01

    Lmx1b is a homeodomain transcription factor responsible for limb dorsalization. Despite striking double-ventral (loss-of-function) and double-dorsal (gain-of-function) limb phenotypes, no direct gene targets in the limb have been confirmed. To determine direct targets, we performed a chromatin immunoprecipitation against Lmx1b in mouse limbs at embryonic day 12.5 followed by next-generation sequencing (ChIP-seq). Nearly 84% ( n =617) of the Lmx1b-bound genomic intervals (LBIs) identified overlap with chromatin regulatory marks indicative of potential cis -regulatory modules (PCRMs). In addition, 73 LBIs mapped to CRMs that are known to be active during limb development. We compared Lmx1b-bound PCRMs with genes regulated by Lmx1b and found 292 PCRMs within 1 Mb of 254 Lmx1b-regulated genes. Gene ontological analysis suggests that Lmx1b targets extracellular matrix production, bone/joint formation, axonal guidance, vascular development, cell proliferation and cell movement. We validated the functional activity of a PCRM associated with joint-related Gdf5 that provides a mechanism for Lmx1b-mediated joint modification and a PCRM associated with Lmx1b that suggests a role in autoregulation. This is the first report to describe genome-wide Lmx1b binding during limb development, directly linking Lmx1b to targets that accomplish limb dorsalization. © 2017. Published by The Company of Biologists Ltd.

  18. Functional evolution of cis-regulatory modules at a homeotic gene in Drosophila.

    Directory of Open Access Journals (Sweden)

    Margaret C W Ho

    2009-11-01

    Full Text Available It is a long-held belief in evolutionary biology that the rate of molecular evolution for a given DNA sequence is inversely related to the level of functional constraint. This belief holds true for the protein-coding homeotic (Hox genes originally discovered in Drosophila melanogaster. Expression of the Hox genes in Drosophila embryos is essential for body patterning and is controlled by an extensive array of cis-regulatory modules (CRMs. How the regulatory modules functionally evolve in different species is not clear. A comparison of the CRMs for the Abdominal-B gene from different Drosophila species reveals relatively low levels of overall sequence conservation. However, embryonic enhancer CRMs from other Drosophila species direct transgenic reporter gene expression in the same spatial and temporal patterns during development as their D. melanogaster orthologs. Bioinformatic analysis reveals the presence of short conserved sequences within defined CRMs, representing gap and pair-rule transcription factor binding sites. One predicted binding site for the gap transcription factor KRUPPEL in the IAB5 CRM was found to be altered in Superabdominal (Sab mutations. In Sab mutant flies, the third abdominal segment is transformed into a copy of the fifth abdominal segment. A model for KRUPPEL-mediated repression at this binding site is presented. These findings challenge our current understanding of the relationship between sequence evolution at the molecular level and functional activity of a CRM. While the overall sequence conservation at Drosophila CRMs is not distinctive from neighboring genomic regions, functionally critical transcription factor binding sites within embryonic enhancer CRMs are highly conserved. These results have implications for understanding mechanisms of gene expression during embryonic development, enhancer function, and the molecular evolution of eukaryotic regulatory modules.

  19. A HLA class I cis-regulatory element whose activity can be modulated by hormones.

    Science.gov (United States)

    Sim, B C; Hui, K M

    1994-12-01

    To elucidate the basis of the down-regulation in major histocompatibility complex (MHC) class I gene expression and to identify possible DNA-binding regulatory elements that have the potential to interact with class I MHC genes, we have studied the transcriptional regulation of class I HLA genes in human breast carcinoma cells. A 9 base pair (bp) negative cis-regulatory element (NRE) has been identified using band-shift assays employing DNA sequences derived from the 5'-flanking region of HLA class I genes. This 9-bp element, GTCATGGCG, located within exon I of the HLA class I gene, can potently inhibit the expression of a heterologous thymidine kinase (TK) gene promoter and the HLA enhancer element. Furthermore, this regulatory element can exert its suppressive function in either the sense or anti-sense orientation. More interestingly, NRE can suppress dexamethasone-mediated gene activation in the context of the reported glucocorticoid-responsive element (GRE) in MCF-7 cells but has no influence on the estrogen-mediated transcriptional activation of MCF-7 cells in the context of the reported estrogen-responsive element (ERE). Furthermore, the presence of such a regulatory element within the HLA class I gene whose activity can be modulated by hormones correlates well with our observation that the level of HLA class I gene expression can be down-regulated by hormones in human breast carcinoma cells. Such interactions between negative regulatory elements and specific hormone trans-activators are novel and suggest a versatile form of transcriptional control.

  20. Deciphering Cis-Regulatory Element Mediated Combinatorial Regulation in Rice under Blast Infected Condition.

    Directory of Open Access Journals (Sweden)

    Arindam Deb

    Full Text Available Combinations of cis-regulatory elements (CREs present at the promoters facilitate the binding of several transcription factors (TFs, thereby altering the consequent gene expressions. Due to the eminent complexity of the regulatory mechanism, the combinatorics of CRE-mediated transcriptional regulation has been elusive. In this work, we have developed a new methodology that quantifies the co-occurrence tendencies of CREs present in a set of promoter sequences; these co-occurrence scores are filtered in three consecutive steps to test their statistical significance; and the significantly co-occurring CRE pairs are presented as networks. These networks of co-occurring CREs are further transformed to derive higher order of regulatory combinatorics. We have further applied this methodology on the differentially up-regulated gene-sets of rice tissues under fungal (Magnaporthe infected conditions to demonstrate how it helps to understand the CRE-mediated combinatorial gene regulation. Our analysis includes a wide spectrum of biologically important results. The CRE pairs having a strong tendency to co-occur often exhibit very similar joint distribution patterns at the promoters of rice. We couple the network approach with experimental results of plant gene regulation and defense mechanisms and find evidences of auto and cross regulation among TF families, cross-talk among multiple hormone signaling pathways, similarities and dissimilarities in regulatory combinatorics between different tissues, etc. Our analyses have pointed a highly distributed nature of the combinatorial gene regulation facilitating an efficient alteration in response to fungal attack. All together, our proposed methodology could be an important approach in understanding the combinatorial gene regulation. It can be further applied to unravel the tissue and/or condition specific combinatorial gene regulation in other eukaryotic systems with the availability of annotated genomic

  1. Changes in cis-regulatory elements of a key floral regulator are associated with divergence of inflorescence architectures.

    Science.gov (United States)

    Kusters, Elske; Della Pina, Serena; Castel, Rob; Souer, Erik; Koes, Ronald

    2015-08-15

    Higher plant species diverged extensively with regard to the moment (flowering time) and position (inflorescence architecture) at which flowers are formed. This seems largely caused by variation in the expression patterns of conserved genes that specify floral meristem identity (FMI), rather than changes in the encoded proteins. Here, we report a functional comparison of the promoters of homologous FMI genes from Arabidopsis, petunia, tomato and Antirrhinum. Analysis of promoter-reporter constructs in petunia and Arabidopsis, as well as complementation experiments, showed that the divergent expression of leafy (LFY) and the petunia homolog aberrant leaf and flower (ALF) results from alterations in the upstream regulatory network rather than cis-regulatory changes. The divergent expression of unusual floral organs (UFO) from Arabidopsis, and the petunia homolog double top (DOT), however, is caused by the loss or gain of cis-regulatory promoter elements, which respond to trans-acting factors that are expressed in similar patterns in both species. Introduction of pUFO:UFO causes no obvious defects in Arabidopsis, but in petunia it causes the precocious and ectopic formation of flowers. This provides an example of how a change in a cis-regulatory region can account for a change in the plant body plan. © 2015. Published by The Company of Biologists Ltd.

  2. Implications of duplicated cis-regulatory elements in the evolution of metazoans: the DDI model or how simplicity begets novelty.

    Science.gov (United States)

    Jiménez-Delgado, Senda; Pascual-Anaya, Juan; Garcia-Fernàndez, Jordi

    2009-07-01

    The discovery that most regulatory genes were conserved among animals from distant phyla challenged the ideas that gene duplication and divergence of homologous coding sequences were the basis for major morphological changes in metazoan evolution. In recent years, however, the interest for the roles, conservation and changes of non-coding sequences grew-up in parallel with genome sequencing projects. Presently, many independent studies are highlighting the importance that subtle changes in cis-regulatory regions had in the evolution of morphology trough the Animal Kingdom. Here we will show and discuss some of these studies, and underscore the future of cis-Evo-Devo research. Nevertheless, we would also explore how gene duplication, which includes duplication of regulatory regions, may have been critical for spatial or temporal co-option of new regulatory networks, causing the deployment of new transcriptome scenarios, and how these induced morphological changes were critical for the evolution of new forms. Forty years after Susumu Ohno famous sentence 'natural selection merely modifies, while redundancy creates', we suggest the alternative: 'natural selection modifies, while redundancy of cis-regulatory elements innovates', and propose the Duplication-Degeneration-Innovation model to explain the increased evolvability of duplicated cis-regulatory regions. Paradoxically, making regulation simpler by subfunctionalization paved the path for future complexity or, in other words, 'to make it simple to make it complex'.

  3. Identification of cis-regulatory sequences that activate transcription in the suspensor of plant embryos.

    Science.gov (United States)

    Kawashima, Tomokazu; Wang, Xingjun; Henry, Kelli F; Bi, Yuping; Weterings, Koen; Goldberg, Robert B

    2009-03-03

    Little is known about the molecular mechanisms by which the embryo proper and suspensor of plant embryos activate specific gene sets shortly after fertilization. We analyzed the upstream region of the scarlet runner bean (Phaseolus coccineus) G564 gene to understand how genes are activated specifically within the suspensor during early embryo development. Previously, we showed that the G564 upstream region has a block of tandem repeats, which contain a conserved 10-bp motif (GAAAAG(C)/(T)GAA), and that deletion of these repeats results in a loss of suspensor transcription. Here, we use gain-of-function (GOF) experiments with transgenic globular-stage tobacco embryos to show that only 1 of the 5 tandem repeats is required to drive suspensor-specific transcription. Fine-scale deletion and scanning mutagenesis experiments with 1 tandem repeat uncovered a 54-bp region that contains all of the sequences required to activate transcription in the suspensor, including the 10-bp motif (GAAAAGCGAA) and a similar 10-bp-like motif (GAAAAACGAA). Site-directed mutagenesis and GOF experiments indicated that both the 10-bp and 10-bp-like motifs are necessary, but not sufficient to activate transcription in the suspensor, and that a sequence (TTGGT) between the 10-bp and the 10-bp-like motifs is also necessary for suspensor transcription. Together, these data identify sequences that are required to activate transcription in the suspensor of a plant embryo after fertilization.

  4. Systematic identification of cis-regulatory sequences active in mouse and human embryonic stem cells.

    Directory of Open Access Journals (Sweden)

    Marica Grskovic

    2007-08-01

    Full Text Available Understanding the transcriptional regulation of pluripotent cells is of fundamental interest and will greatly inform efforts aimed at directing differentiation of embryonic stem (ES cells or reprogramming somatic cells. We first analyzed the transcriptional profiles of mouse ES cells and primordial germ cells and identified genes upregulated in pluripotent cells both in vitro and in vivo. These genes are enriched for roles in transcription, chromatin remodeling, cell cycle, and DNA repair. We developed a novel computational algorithm, CompMoby, which combines analyses of sequences both aligned and non-aligned between different genomes with a probabilistic segmentation model to systematically predict short DNA motifs that regulate gene expression. CompMoby was used to identify conserved overrepresented motifs in genes upregulated in pluripotent cells. We show that the motifs are preferentially active in undifferentiated mouse ES and embryonic germ cells in a sequence-specific manner, and that they can act as enhancers in the context of an endogenous promoter. Importantly, the activity of the motifs is conserved in human ES cells. We further show that the transcription factor NF-Y specifically binds to one of the motifs, is differentially expressed during ES cell differentiation, and is required for ES cell proliferation. This study provides novel insights into the transcriptional regulatory networks of pluripotent cells. Our results suggest that this systematic approach can be broadly applied to understanding transcriptional networks in mammalian species.

  5. Direct activation of a notochord cis-regulatory module by Brachyury and FoxA in the ascidian Ciona intestinalis.

    Science.gov (United States)

    Passamaneck, Yale J; Katikala, Lavanya; Perrone, Lorena; Dunn, Matthew P; Oda-Ishii, Izumi; Di Gregorio, Anna

    2009-11-01

    The notochord is a defining feature of the chordate body plan. Experiments in ascidian, frog and mouse embryos have shown that co-expression of Brachyury and FoxA class transcription factors is required for notochord development. However, studies on the cis-regulatory sequences mediating the synergistic effects of these transcription factors are complicated by the limited knowledge of notochord genes and cis-regulatory modules (CRMs) that are directly targeted by both. We have identified an easily testable model for such investigations in a 155-bp notochord-specific CRM from the ascidian Ciona intestinalis. This CRM contains functional binding sites for both Ciona Brachyury (Ci-Bra) and FoxA (Ci-FoxA-a). By combining point mutation analysis and misexpression experiments, we demonstrate that binding of both transcription factors to this CRM is necessary and sufficient to activate transcription. To gain insights into the cis-regulatory criteria controlling its activity, we investigated the organization of the transcription factor binding sites within the 155-bp CRM. The 155-bp sequence contains two Ci-Bra binding sites with identical core sequences but opposite orientations, only one of which is required for enhancer activity. Changes in both orientation and spacing of these sites substantially affect the activity of the CRM, as clusters of identical sites found in the Ciona genome with different arrangements are unable to activate transcription in notochord cells. This work presents the first evidence of a synergistic interaction between Brachyury and FoxA in the activation of an individual notochord CRM, and highlights the importance of transcription factor binding site arrangement for its function.

  6. Characterization of a putative cis-regulatory element that controls transcriptional activity of the pig uroplakin II gene promoter

    International Nuclear Information System (INIS)

    Kwon, Deug-Nam; Park, Mi-Ryung; Park, Jong-Yi; Cho, Ssang-Goo; Park, Chankyu; Oh, Jae-Wook; Song, Hyuk; Kim, Jae-Hwan; Kim, Jin-Hoi

    2011-01-01

    Highlights: → The sequences of -604 to -84 bp of the pUPII promoter contained the region of a putative negative cis-regulatory element. → The core promoter was located in the 5F-1. → Transcription factor HNF4 can directly bind in the pUPII core promoter region, which plays a critical role in controlling promoter activity. → These features of the pUPII promoter are fundamental to development of a target-specific vector. -- Abstract: Uroplakin II (UPII) is a one of the integral membrane proteins synthesized as a major differentiation product of mammalian urothelium. UPII gene expression is bladder specific and differentiation dependent, but little is known about its transcription response elements and molecular mechanism. To identify the cis-regulatory elements in the pig UPII (pUPII) gene promoter region, we constructed pUPII 5' upstream region deletion mutants and demonstrated that each of the deletion mutants participates in controlling the expression of the pUPII gene in human bladder carcinoma RT4 cells. We also identified a new core promoter region and putative negative cis-regulatory element within a minimal promoter region. In addition, we showed that hepatocyte nuclear factor 4 (HNF4) can directly bind in the pUPII core promoter (5F-1) region, which plays a critical role in controlling promoter activity. Transient cotransfection experiments showed that HNF4 positively regulates pUPII gene promoter activity. Thus, the binding element and its binding protein, HNF4 transcription factor, may be involved in the mechanism that specifically regulates pUPII gene transcription.

  7. ChIP-Seq-Annotated Heliconius erato Genome Highlights Patterns of cis-Regulatory Evolution in Lepidoptera

    Directory of Open Access Journals (Sweden)

    James J. Lewis

    2016-09-01

    Full Text Available Uncovering phylogenetic patterns of cis-regulatory evolution remains a fundamental goal for evolutionary and developmental biology. Here, we characterize the evolution of regulatory loci in butterflies and moths using chromatin immunoprecipitation sequencing (ChIP-seq annotation of regulatory elements across three stages of head development. In the process we provide a high-quality, functionally annotated genome assembly for the butterfly, Heliconius erato. Comparing cis-regulatory element conservation across six lepidopteran genomes, we find that regulatory sequences evolve at a pace similar to that of protein-coding regions. We also observe that elements active at multiple developmental stages are markedly more conserved than elements with stage-specific activity. Surprisingly, we also find that stage-specific proximal and distal regulatory elements evolve at nearly identical rates. Our study provides a benchmark for genome-wide patterns of regulatory element evolution in insects, and it shows that developmental timing of activity strongly predicts patterns of regulatory sequence evolution.

  8. cisMEP: an integrated repository of genomic epigenetic profiles and cis-regulatory modules in Drosophila.

    Science.gov (United States)

    Yang, Tzu-Hsien; Wang, Chung-Ching; Hung, Po-Cheng; Wu, Wei-Sheng

    2014-01-01

    Cis-regulatory modules (CRMs), or the DNA sequences required for regulating gene expression, play the central role in biological researches on transcriptional regulation in metazoan species. Nowadays, the systematic understanding of CRMs still mainly resorts to computational methods due to the time-consuming and small-scale nature of experimental methods. But the accuracy and reliability of different CRM prediction tools are still unclear. Without comparative cross-analysis of the results and combinatorial consideration with extra experimental information, there is no easy way to assess the confidence of the predicted CRMs. This limits the genome-wide understanding of CRMs. It is known that transcription factor binding and epigenetic profiles tend to determine functions of CRMs in gene transcriptional regulation. Thus integration of the genome-wide epigenetic profiles with systematically predicted CRMs can greatly help researchers evaluate and decipher the prediction confidence and possible transcriptional regulatory functions of these potential CRMs. However, these data are still fragmentary in the literatures. Here we performed the computational genome-wide screening for potential CRMs using different prediction tools and constructed the pioneer database, cisMEP (cis-regulatory module epigenetic profile database), to integrate these computationally identified CRMs with genomic epigenetic profile data. cisMEP collects the literature-curated TFBS location data and nine genres of epigenetic data for assessing the confidence of these potential CRMs and deciphering the possible CRM functionality. cisMEP aims to provide a user-friendly interface for researchers to assess the confidence of different potential CRMs and to understand the functions of CRMs through experimentally-identified epigenetic profiles. The deposited potential CRMs and experimental epigenetic profiles for confidence assessment provide experimentally testable hypotheses for the molecular mechanisms

  9. DMINDA: an integrated web server for DNA motif identification and analyses.

    Science.gov (United States)

    Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

    2014-07-01

    DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space.

    Directory of Open Access Journals (Sweden)

    Rahul Karnik

    Full Text Available The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.

  11. Unveiling combinatorial regulation through the combination of ChIP information and in silico cis-regulatory module detection

    Science.gov (United States)

    Sun, Hong; Guns, Tias; Fierro, Ana Carolina; Thorrez, Lieven; Nijssen, Siegfried; Marchal, Kathleen

    2012-01-01

    Computationally retrieving biologically relevant cis-regulatory modules (CRMs) is not straightforward. Because of the large number of candidates and the imperfection of the screening methods, many spurious CRMs are detected that are as high scoring as the biologically true ones. Using ChIP-information allows not only to reduce the regions in which the binding sites of the assayed transcription factor (TF) should be located, but also allows restricting the valid CRMs to those that contain the assayed TF (here referred to as applying CRM detection in a query-based mode). In this study, we show that exploiting ChIP-information in a query-based way makes in silico CRM detection a much more feasible endeavor. To be able to handle the large datasets, the query-based setting and other specificities proper to CRM detection on ChIP-Seq based data, we developed a novel powerful CRM detection method ‘CPModule’. By applying it on a well-studied ChIP-Seq data set involved in self-renewal of mouse embryonic stem cells, we demonstrate how our tool can recover combinatorial regulation of five known TFs that are key in the self-renewal of mouse embryonic stem cells. Additionally, we make a number of new predictions on combinatorial regulation of these five key TFs with other TFs documented in TRANSFAC. PMID:22422841

  12. Does positive selection drive transcription factor binding site turnover? A test with Drosophila cis-regulatory modules.

    Directory of Open Access Journals (Sweden)

    Bin Z He

    2011-04-01

    Full Text Available Transcription factor binding site(s (TFBS gain and loss (i.e., turnover is a well-documented feature of cis-regulatory module (CRM evolution, yet little attention has been paid to the evolutionary force(s driving this turnover process. The predominant view, motivated by its widespread occurrence, emphasizes the importance of compensatory mutation and genetic drift. Positive selection, in contrast, although it has been invoked in specific instances of adaptive gene expression evolution, has not been considered as a general alternative to neutral compensatory evolution. In this study we evaluate the two hypotheses by analyzing patterns of single nucleotide polymorphism in the TFBS of well-characterized CRM in two closely related Drosophila species, Drosophila melanogaster and Drosophila simulans. An important feature of the analysis is classification of TFBS mutations according to the direction of their predicted effect on binding affinity, which allows gains and losses to be evaluated independently along the two phylogenetic lineages. The observed patterns of polymorphism and divergence are not compatible with neutral evolution for either class of mutations. Instead, multiple lines of evidence are consistent with contributions of positive selection to TFBS gain and loss as well as purifying selection in its maintenance. In discussion, we propose a model to reconcile the finding of selection driving TFBS turnover with constrained CRM function over long evolutionary time.

  13. Mapping of cis-regulatory sites in the promoter of testis-specific stellate genes of Drosophila melanogaster.

    Science.gov (United States)

    Olenkina, O M; Egorova, K S; Aravin, A A; Naumova, N M; Gvozdev, V A; Olenina, L V

    2012-11-01

    Tandem Stellate genes organized into two clusters in heterochromatin and euchromatin of the X-chromosome are part of the Ste-Su(Ste) genetic system required for maintenance of male fertility and reproduction of Drosophila melanogaster. Stellate genes encode a regulatory subunit of protein kinase CK2 and are the main targets of germline-specific piRNA-silencing; their derepression leads to appearance of protein crystals in spermatocytes, meiotic disturbances, and male sterility. A short promoter region of 134 bp appears to be sufficient for testis-specific transcription of Stellate, and it contains three closely located cis-regulatory elements called E-boxes. By using reporter analysis, we confirmed a strong functionality of the E-boxes in the Stellate promoter for in vivo transcription. Using selective mutagenesis, we have shown that the presence of the central E-box 2 is preferable to maintain a high-level testis-specific transcription of the reporter gene under the Stellate promoter. The Stellate promoter provides transcription even in heterochromatin, and corresponding mRNAs are translated with the generation of full-size protein products in case of disturbances in the piRNA-silencing process. We have also shown for the first time that the activity of the Stellate promoter is determined by chromatin context of the X-chromosome in male germinal cells, and it increases at about twofold when relocating in autosomes.

  14. Retinal Expression of the Drosophila eyes absent Gene Is Controlled by Several Cooperatively Acting Cis-regulatory Elements

    Science.gov (United States)

    Neuman, Sarah D.; Bashirullah, Arash; Kumar, Justin P.

    2016-01-01

    The eyes absent (eya) gene of the fruit fly, Drosophila melanogaster, is a member of an evolutionarily conserved gene regulatory network that controls eye formation in all seeing animals. The loss of eya leads to the complete elimination of the compound eye while forced expression of eya in non-retinal tissues is sufficient to induce ectopic eye formation. Within the developing retina eya is expressed in a dynamic pattern and is involved in tissue specification/determination, cell proliferation, apoptosis, and cell fate choice. In this report we explore the mechanisms by which eya expression is spatially and temporally governed in the developing eye. We demonstrate that multiple cis-regulatory elements function cooperatively to control eya transcription and that spacing between a pair of enhancer elements is important for maintaining correct gene expression. Lastly, we show that the loss of eya expression in sine oculis (so) mutants is the result of massive cell death and a progressive homeotic transformation of retinal progenitor cells into head epidermis. PMID:27930646

  15. Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

    Science.gov (United States)

    Fauteux, François; Strömvik, Martina V

    2009-01-01

    Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs

  16. Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

    Directory of Open Access Journals (Sweden)

    Fauteux François

    2009-10-01

    Full Text Available Abstract Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP gene promoters from three plant families, namely Brassicaceae (mustards, Fabaceae (legumes and Poaceae (grasses using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L. Heynh., soybean (Glycine max (L. Merr. and rice (Oryza sativa L. respectively. We have identified three conserved motifs (two RY-like and one ACGT-like in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination

  17. Identification of choriogenin cis-regulatory elements and production of estrogen-inducible, liver-specific transgenic Medaka.

    Science.gov (United States)

    Ueno, Tetsuro; Yasumasu, Shigeki; Hayashi, Shinji; Iuchi, Ichiro

    2004-07-01

    Choriogenins (chg-H, chg-L) are precursor proteins of egg envelope of medaka and synthesized in the spawning female liver in response to estrogen. We linked a gene construct chg-L1.5 kb/GFP (a 1.5 kb 5'-upstream region of the chg-L gene fused with a green fluorescence protein (GFP) gene) to another construct emgb/RFP (a cis-regulatory region of embryonic globin gene fused with an RFP gene), injected the double fusion gene construct into 1- or 2-cell-stage embryos, and selected embryos expressing the RFP in erythroid cells. From the embryos, we established two lines of chg-L1.5 kb/GFP-emgb/RFP-transgenic medaka. The 3-month-old spawning females and estradiol-17beta (E2)-exposed males displayed the liver-specific GFP expression. The E2-dependent GFP expression was detected in the differentiating liver of the stage 37-38 embryos. In addition, RT-PCR and whole-mount in situ hybridization showed that the E2-dependent chg expression was found in the liver of the stage 34 embryos of wild medaka, suggesting that such E2-dependency is achieved shortly after differentiation of the liver. Analysis using serial deletion mutants fused with GFP showed that the region -426 to -284 of the chg-L gene or the region -364 to -265 of the chg-H gene had the ability to promote the E2-dependent liver-specific GFP expression of its downstream gene. Further analyses suggested that an estrogen response element (ERE) at -309, an ERE half-site at -330 and a binding site for C/EBP at -363 of the chg-L gene played important roles in its downstream chg-L gene expression. In addition, this transgenic medaka may be useful as one of the test animals for detecting environmental estrogenic steroids.

  18. Multiple cis-regulatory elements are involved in the complex regulation of the sieve element-specific MtSEO-F1 promoter from Medicago truncatula.

    Science.gov (United States)

    Bucsenez, M; Rüping, B; Behrens, S; Twyman, R M; Noll, G A; Prüfer, D

    2012-09-01

    The sieve element occlusion (SEO) gene family includes several members that are expressed specifically in immature sieve elements (SEs) in the developing phloem of dicotyledonous plants. To determine how this restricted expression profile is achieved, we analysed the SE-specific Medicago truncatula SEO-F1 promoter (PMtSEO-F1) by constructing deletion, substitution and hybrid constructs and testing them in transgenic tobacco plants using green fluorescent protein as a reporter. This revealed four promoter regions, each containing cis-regulatory elements that activate transcription in SEs. One of these segments also contained sufficient information to suppress PMtSEO-F1 transcription in the phloem companion cells (CCs). Subsequent in silico analysis revealed several candidate cis-regulatory elements that PMtSEO-F1 shares with other SEO promoters. These putative sieve element boxes (PSE boxes) are promising candidates for cis-regulatory elements controlling the SE-specific expression of PMtSEO-F1. © 2012 German Botanical Society and The Royal Botanical Society of the Netherlands.

  19. Inference

    DEFF Research Database (Denmark)

    Møller, Jesper

    2010-01-01

    Chapter 9: This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods based on a maximum likelihood or Bayesian approach combined with markov chain Monte Carlo...... (MCMC) techniques. Due to space limitations the focus is on spatial point processes....

  20. Inference

    DEFF Research Database (Denmark)

    Møller, Jesper

    (This text written by Jesper Møller, Aalborg University, is submitted for the collection ‘Stochastic Geometry: Highlights, Interactions and New Perspectives', edited by Wilfrid S. Kendall and Ilya Molchanov, to be published by ClarendonPress, Oxford, and planned to appear as Section 4.1 with the ......(This text written by Jesper Møller, Aalborg University, is submitted for the collection ‘Stochastic Geometry: Highlights, Interactions and New Perspectives', edited by Wilfrid S. Kendall and Ilya Molchanov, to be published by ClarendonPress, Oxford, and planned to appear as Section 4.......1 with the title ‘Inference'.) This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods using Markov chain Monte Carlo (MCMC) simulations. Due to space limitations the focus...

  1. Conserved cis-regulatory regions in a large genomic landscape control SHH and BMP-regulated Gremlin1 expression in mouse limb buds

    Directory of Open Access Journals (Sweden)

    Zuniga Aimée

    2012-08-01

    Full Text Available Abstract Background Mouse limb bud is a prime model to study the regulatory interactions that control vertebrate organogenesis. Major aspects of limb bud development are controlled by feedback loops that define a self-regulatory signalling system. The SHH/GREM1/AER-FGF feedback loop forms the core of this signalling system that operates between the posterior mesenchymal organiser and the ectodermal signalling centre. The BMP antagonist Gremlin1 (GREM1 is a critical node in this system, whose dynamic expression is controlled by BMP, SHH, and FGF signalling and key to normal progression of limb bud development. Previous analysis identified a distant cis-regulatory landscape within the neighbouring Formin1 (Fmn1 locus that is required for Grem1 expression, reminiscent of the genomic landscapes controlling HoxD and Shh expression in limb buds. Results Three highly conserved regions (HMCO1-3 were identified within the previously defined critical genomic region and tested for their ability to regulate Grem1 expression in mouse limb buds. Using a combination of BAC and conventional transgenic approaches, a 9 kb region located ~70 kb downstream of the Grem1 transcription unit was identified. This region, termed Grem1 Regulatory Sequence 1 (GRS1, is able to recapitulate major aspects of Grem1 expression, as it drives expression of a LacZ reporter into the posterior and, to a lesser extent, in the distal-anterior mesenchyme. Crossing the GRS1 transgene into embryos with alterations in the SHH and BMP pathways established that GRS1 depends on SHH and is modulated by BMP signalling, i.e. integrates inputs from these pathways. Chromatin immunoprecipitation revealed interaction of endogenous GLI3 proteins with the core cis-regulatory elements in the GRS1 region. As GLI3 is a mediator of SHH signal transduction, these results indicated that SHH directly controls Grem1 expression through the GRS1 region. Finally, all cis-regulatory regions within the Grem1

  2. Properties of non-coding DNA and identification of putative cis-regulatory elements in Theileria parva

    Directory of Open Access Journals (Sweden)

    Guo Xiang

    2008-12-01

    Full Text Available Abstract Background Parasites in the genus Theileria cause lymphoproliferative diseases in cattle, resulting in enormous socio-economic losses. The availability of the genome sequences and annotation for T. parva and T. annulata has facilitated the study of parasite biology and their relationship with host cell transformation and tropism. However, the mechanism of transcriptional regulation in this genus, which may be key to understanding fundamental aspects of its parasitology, remains poorly understood. In this study, we analyze the evolution of non-coding sequences in the Theileria genome and identify conserved sequence elements that may be involved in gene regulation of these parasitic species. Results Intergenic regions and introns in Theileria are short, and their length distributions are considerably right-skewed. Intergenic regions flanked by genes in 5'-5' orientation tend to be longer and slightly more AT-rich than those flanked by two stop codons; intergenic regions flanked by genes in 3'-5' orientation have intermediate values of length and AT composition. Intron position is negatively correlated with intron length, and positively correlated with GC content. Using stringent criteria, we identified a set of high-quality orthologous non-coding sequences between T. parva and T. annulata, and determined the distribution of selective constraints across regions, which are shown to be higher close to translation start sites. A positive correlation between constraint and length in both intergenic regions and introns suggests a tight control over length expansion of non-coding regions. Genome-wide searches for functional elements revealed several conserved motifs in intergenic regions of Theileria genomes. Two such motifs are preferentially located within the first 60 base pairs upstream of transcription start sites in T. parva, are preferentially associated with specific protein functional categories, and have significant similarity to know

  3. Computational exploration of cis-regulatory modules in rhythmic expression data using the "Exploration of Distinctive CREs and CRMs" (EDCC) and "CRM Network Generator" (CNG) programs.

    Science.gov (United States)

    Bekiaris, Pavlos Stephanos; Tekath, Tobias; Staiger, Dorothee; Danisman, Selahattin

    2018-01-01

    Understanding the effect of cis-regulatory elements (CRE) and clusters of CREs, which are called cis-regulatory modules (CRM), in eukaryotic gene expression is a challenge of computational biology. We developed two programs that allow simple, fast and reliable analysis of candidate CREs and CRMs that may affect specific gene expression and that determine positional features between individual CREs within a CRM. The first program, "Exploration of Distinctive CREs and CRMs" (EDCC), correlates candidate CREs and CRMs with specific gene expression patterns. For pairs of CREs, EDCC also determines positional preferences of the single CREs in relation to each other and to the transcriptional start site. The second program, "CRM Network Generator" (CNG), prioritizes these positional preferences using a neural network and thus allows unbiased rating of the positional preferences that were determined by EDCC. We tested these programs with data from a microarray study of circadian gene expression in Arabidopsis thaliana. Analyzing more than 1.5 million pairwise CRE combinations, we found 22 candidate combinations, of which several contained known clock promoter elements together with elements that had not been identified as relevant to circadian gene expression before. CNG analysis further identified positional preferences of these CRE pairs, hinting at positional information that may be relevant for circadian gene expression. Future wet lab experiments will have to determine which of these combinations confer daytime specific circadian gene expression.

  4. RNA-ID, a highly sensitive and robust method to identify cis-regulatory sequences using superfolder GFP and a fluorescence-based assay.

    Science.gov (United States)

    Dean, Kimberly M; Grayhack, Elizabeth J

    2012-12-01

    We have developed a robust and sensitive method, called RNA-ID, to screen for cis-regulatory sequences in RNA using fluorescence-activated cell sorting (FACS) of yeast cells bearing a reporter in which expression of both superfolder green fluorescent protein (GFP) and yeast codon-optimized mCherry red fluorescent protein (RFP) is driven by the bidirectional GAL1,10 promoter. This method recapitulates previously reported progressive inhibition of translation mediated by increasing numbers of CGA codon pairs, and restoration of expression by introduction of a tRNA with an anticodon that base pairs exactly with the CGA codon. This method also reproduces effects of paromomycin and context on stop codon read-through. Five key features of this method contribute to its effectiveness as a selection for regulatory sequences: The system exhibits greater than a 250-fold dynamic range, a quantitative and dose-dependent response to known inhibitory sequences, exquisite resolution that allows nearly complete physical separation of distinct populations, and a reproducible signal between different cells transformed with the identical reporter, all of which are coupled with simple methods involving ligation-independent cloning, to create large libraries. Moreover, we provide evidence that there are sequences within a 9-nt library that cause reduced GFP fluorescence, suggesting that there are novel cis-regulatory sequences to be found even in this short sequence space. This method is widely applicable to the study of both RNA-mediated and codon-mediated effects on expression.

  5. An in vivo cis-regulatory screen at the type 2 diabetes associated TCF7L2 locus identifies multiple tissue-specific enhancers.

    Directory of Open Access Journals (Sweden)

    Daniel Savic

    Full Text Available Genome-wide association studies (GWAS have repeatedly shown an association between non-coding variants in the TCF7L2 locus and risk for type 2 diabetes (T2D, implicating a role for cis-regulatory variation within this locus in disease etiology. Supporting this hypothesis, we previously localized complex regulatory activity to the TCF7L2 T2D-associated interval using an in vivo bacterial artificial chromosome (BAC enhancer-trapping reporter strategy. To follow-up on this broad initial survey of the TCF7L2 regulatory landscape, we performed a fine-mapping enhancer scan using in vivo mouse transgenic reporter assays. We functionally interrogated approximately 50% of the sequences within the T2D-associated interval, utilizing sequence conservation within this 92-kb interval to determine the regulatory potential of all evolutionary conserved sequences that exhibited conservation to the non-eutherian mammal opossum. Included in this study was a detailed functional interrogation of sequences spanning both protective and risk alleles of single nucleotide polymorphism (SNP rs7903146, which has exhibited allele-specific enhancer function in pancreatic beta cells. Using these assays, we identified nine segments regulating various aspects of the TCF7L2 expression profile and that constitute nearly 70% of the sequences tested. These results highlight the regulatory complexity of this interval and support the notion that a TCF7L2 cis-regulatory disruption leads to T2D predisposition.

  6. Cis-regulatory element based targeted gene finding: genome-wide identification of abscisic acid- and abiotic stress-responsive genes in Arabidopsis thaliana.

    Science.gov (United States)

    Zhang, Weixiong; Ruan, Jianhua; Ho, Tuan-Hua David; You, Youngsook; Yu, Taotao; Quatrano, Ralph S

    2005-07-15

    A fundamental problem of computational genomics is identifying the genes that respond to certain endogenous cues and environmental stimuli. This problem can be referred to as targeted gene finding. Since gene regulation is mainly determined by the binding of transcription factors and cis-regulatory DNA sequences, most existing gene annotation methods, which exploit the conservation of open reading frames, are not effective in finding target genes. A viable approach to targeted gene finding is to exploit the cis-regulatory elements that are known to be responsible for the transcription of target genes. Given such cis-elements, putative target genes whose promoters contain the elements can be identified. As a case study, we apply the above approach to predict the genes in model plant Arabidopsis thaliana which are inducible by a phytohormone, abscisic acid (ABA), and abiotic stress, such as drought, cold and salinity. We first construct and analyze two ABA specific cis-elements, ABA-responsive element (ABRE) and its coupling element (CE), in A.thaliana, based on their conservation in rice and other cereal plants. We then use the ABRE-CE module to identify putative ABA-responsive genes in A.thaliana. Based on RT-PCR verification and the results from literature, this method has an accuracy rate of 67.5% for the top 40 predictions. The cis-element based targeted gene finding approach is expected to be widely applicable since a large number of cis-elements in many species are available.

  7. The identification of functional motifs in temporal gene expression analysis

    Directory of Open Access Journals (Sweden)

    Michael G. Surette

    2005-01-01

    Full Text Available The identification of transcription factor binding sites is essential to the understanding of the regulation of gene expression and the reconstruction of genetic regulatory networks. The in silico identification of cis-regulatory motifs is challenging due to sequence variability and lack of sufficient data to generate consensus motifs that are of quantitative or even qualitative predictive value. To determine functional motifs in gene expression, we propose a strategy to adopt false discovery rate (FDR and estimate motif effects to evaluate combinatorial analysis of motif candidates and temporal gene expression data. The method decreases the number of predicted motifs, which can then be confirmed by genetic analysis. To assess the method we used simulated motif/expression data to evaluate parameters. We applied this approach to experimental data for a group of iron responsive genes in Salmonella typhimurium 14028S. The method identified known and potentially new ferric-uptake regulator (Fur binding sites. In addition, we identified uncharacterized functional motif candidates that correlated with specific patterns of expression. A SAS code for the simulation and analysis gene expression data is available from the first author upon request.

  8. Two negative cis-regulatory regions involved in fruit-specific promoter activity from watermelon (Citrullus vulgaris S.).

    Science.gov (United States)

    Yin, Tao; Wu, Hanying; Zhang, Shanglong; Lu, Hongyu; Zhang, Lingxiao; Xu, Yong; Chen, Daming; Liu, Jingmei

    2009-01-01

    A 1.8 kb 5'-flanking region of the large subunit of ADP-glucose pyrophosphorylase, isolated from watermelon (Citrullus vulgaris S.), has fruit-specific promoter activity in transgenic tomato plants. Two negative regulatory regions, from -986 to -959 and from -472 to -424, were identified in this promoter region by fine deletion analyses. Removal of both regions led to constitutive expression in epidermal cells. Gain-of-function experiments showed that these two regions were sufficient to inhibit RFP (red fluorescent protein) expression in transformed epidermal cells when fused to the cauliflower mosaic virus (CaMV) 35S minimal promoter. Gel mobility shift experiments demonstrated the presence of leaf nuclear factors that interact with these two elements. A TCCAAAA motif was identified in these two regions, as well as one in the reverse orientation, which was confirmed to be a novel specific cis-element. A quantitative beta-glucuronidase (GUS) activity assay of stable transgenic tomato plants showed that the activities of chimeric promoters harbouring only one of the two cis-elements, or both, were approximately 10-fold higher in fruits than in leaves. These data confirm that the TCCAAAA motif functions as a fruit-specific element by inhibiting gene expression in leaves.

  9. Cis-regulatory signatures of orthologous stress-associated bZIP transcription factors from rice, sorghum and Arabidopsis based on phylogenetic footprints

    Directory of Open Access Journals (Sweden)

    Xu Fuyu

    2012-09-01

    Full Text Available Abstract Background The potential contribution of upstream sequence variation to the unique features of orthologous genes is just beginning to be unraveled. A core subset of stress-associated bZIP transcription factors from rice (Oryza sativa formed ten clusters of orthologous groups (COG with genes from the monocot sorghum (Sorghum bicolor and dicot Arabidopsis (Arabidopsis thaliana. The total cis-regulatory information content of each stress-associated COG was examined by phylogenetic footprinting to reveal ortholog-specific, lineage-specific and species-specific conservation patterns. Results The most apparent pattern observed was the occurrence of spatially conserved ‘core modules’ among the COGs but not among paralogs. These core modules are comprised of various combinations of two to four putative transcription factor binding site (TFBS classes associated with either developmental or stress-related functions. Outside the core modules are specific stress (ABA, oxidative, abiotic, biotic or organ-associated signals, which may be functioning as ‘regulatory fine-tuners’ and further define lineage-specific and species-specific cis-regulatory signatures. Orthologous monocot and dicot promoters have distinct TFBS classes involved in disease and oxidative-regulated expression, while the orthologous rice and sorghum promoters have distinct combinations of root-specific signals, a pattern that is not particularly conserved in Arabidopsis. Conclusions Patterns of cis-regulatory conservation imply that each ortholog has distinct signatures, further suggesting that they are potentially unique in a regulatory context despite the presumed conservation of broad biological function during speciation. Based on the observed patterns of conservation, we postulate that core modules are likely primary determinants of basal developmental programming, which may be integrated with and further elaborated by additional intrinsic or extrinsic signals in

  10. Microevolution of cis-regulatory elements: an example from the pair-rule segmentation gene fushi tarazu in the Drosophila melanogaster subgroup.

    Directory of Open Access Journals (Sweden)

    Mohammed Bakkali

    Full Text Available The importance of non-coding DNAs that control transcription is ever noticeable, but the characterization and analysis of the evolution of such DNAs presents challenges not found in the analysis of coding sequences. In this study of the cis-regulatory elements of the pair rule segmentation gene fushi tarazu (ftz I report the DNA sequences of ftz's zebra element (promoter and a region containing the proximal enhancer from a total of 45 fly lines belonging to several populations of the species Drosophila melanogaster, D. simulans, D. sechellia, D. mauritiana, D. yakuba, D. teissieri, D. orena and D. erecta. Both elements evolve at slower rate than ftz synonymous sites, thus reflecting their functional importance. The promoter evolves more slowly than the average for ftz's coding sequence while, on average, the enhancer evolves more rapidly, suggesting more functional constraint and effective purifying selection on the former. Comparative analysis of the number and nature of base substitutions failed to detect significant evidence for positive/adaptive selection in transcription-factor-binding sites. These seem to evolve at similar rates to regions not known to bind transcription factors. Although this result reflects the evolutionary flexibility of the transcription factor binding sites, it also suggests a complex and still not completely understood nature of even the characterized cis-regulatory sequences. The latter seem to contain more functional parts than those currently identified, some of which probably transcription factor binding. This study illustrates ways in which functional assignments of sequences within cis-acting sequences can be used in the search for adaptive evolution, but also highlights difficulties in how such functional assignment and analysis can be carried out.

  11. A Survey of 6,300 Genomic Fragments for cis-Regulatory Activity in the Imaginal Discs of Drosophila melanogaster

    Directory of Open Access Journals (Sweden)

    Aurélie Jory

    2012-10-01

    Full Text Available Over 6,000 fragments from the genome of Drosophila melanogaster were analyzed for their ability to drive expression of GAL4 reporter genes in the third-instar larval imaginal discs. About 1,200 reporter genes drove expression in the eye, antenna, leg, wing, haltere, or genital imaginal discs. The patterns ranged from large regions to individual cells. About 75% of the active fragments drove expression in multiple discs; 20% were expressed in ventral, but not dorsal, discs (legs, genital, and antenna, whereas ∼23% were expressed in dorsal but not ventral discs (wing, haltere, and eye. Several patterns, for example, within the leg chordotonal organ, appeared a surprisingly large number of times. Unbiased searches for DNA sequence motifs suggest candidate transcription factors that may regulate enhancers with shared activities. Together, these expression patterns provide a valuable resource to the community and offer a broad overview of how transcriptional regulatory information is distributed in the Drosophila genome.

  12. Annotating RNA motifs in sequences and alignments.

    Science.gov (United States)

    Gardner, Paul P; Eldai, Hisham

    2015-01-01

    RNA performs a diverse array of important functions across all cellular life. These functions include important roles in translation, building translational machinery and maturing messenger RNA. More recent discoveries include the miRNAs and bacterial sRNAs that regulate gene expression, the thermosensors, riboswitches and other cis-regulatory elements that help prokaryotes sense their environment and eukaryotic piRNAs that suppress transposition. However, there can be a long period between the initial discovery of a RNA and determining its function. We present a bioinformatic approach to characterize RNA motifs, which are critical components of many RNA structure-function relationships. These motifs can, in some instances, provide researchers with functional hypotheses for uncharacterized RNAs. Moreover, we introduce a new profile-based database of RNA motifs--RMfam--and illustrate some applications for investigating the evolution and functional characterization of RNA. All the data and scripts associated with this work are available from: https://github.com/ppgardne/RMfam. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. The limits of de novo DNA motif discovery.

    Directory of Open Access Journals (Sweden)

    David Simcha

    Full Text Available A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify "motifs" that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery-searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA "background" sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are "too null," resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where "ground truth" is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced "over-fitting" in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of

  14. DNA motif alignment by evolving a population of Markov chains.

    Science.gov (United States)

    Bi, Chengpeng

    2009-01-30

    Deciphering cis-regulatory elements or de novo motif-finding in genomes still remains elusive although much algorithmic effort has been expended. The Markov chain Monte Carlo (MCMC) method such as Gibbs motif samplers has been widely employed to solve the de novo motif-finding problem through sequence local alignment. Nonetheless, the MCMC-based motif samplers still suffer from local maxima like EM. Therefore, as a prerequisite for finding good local alignments, these motif algorithms are often independently run a multitude of times, but without information exchange between different chains. Hence it would be worth a new algorithm design enabling such information exchange. This paper presents a novel motif-finding algorithm by evolving a population of Markov chains with information exchange (PMC), each of which is initialized as a random alignment and run by the Metropolis-Hastings sampler (MHS). It is progressively updated through a series of local alignments stochastically sampled. Explicitly, the PMC motif algorithm performs stochastic sampling as specified by a population-based proposal distribution rather than individual ones, and adaptively evolves the population as a whole towards a global maximum. The alignment information exchange is accomplished by taking advantage of the pooled motif site distributions. A distinct method for running multiple independent Markov chains (IMC) without information exchange, or dubbed as the IMC motif algorithm, is also devised to compare with its PMC counterpart. Experimental studies demonstrate that the performance could be improved if pooled information were used to run a population of motif samplers. The new PMC algorithm was able to improve the convergence and outperformed other popular algorithms tested using simulated and biological motif sequences.

  15. Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs.

    Directory of Open Access Journals (Sweden)

    Christopher D Brown

    Full Text Available Genetic variants in cis-regulatory elements or trans-acting regulators frequently influence the quantity and spatiotemporal distribution of gene transcription. Recent interest in expression quantitative trait locus (eQTL mapping has paralleled the adoption of genome-wide association studies (GWAS for the analysis of complex traits and disease in humans. Under the hypothesis that many GWAS associations tag non-coding SNPs with small effects, and that these SNPs exert phenotypic control by modifying gene expression, it has become common to interpret GWAS associations using eQTL data. To fully exploit the mechanistic interpretability of eQTL-GWAS comparisons, an improved understanding of the genetic architecture and causal mechanisms of cell type specificity of eQTLs is required. We address this need by performing an eQTL analysis in three parts: first we identified eQTLs from eleven studies on seven cell types; then we integrated eQTL data with cis-regulatory element (CRE data from the ENCODE project; finally we built a set of classifiers to predict the cell type specificity of eQTLs. The cell type specificity of eQTLs is associated with eQTL SNP overlap with hundreds of cell type specific CRE classes, including enhancer, promoter, and repressive chromatin marks, regions of open chromatin, and many classes of DNA binding proteins. These associations provide insight into the molecular mechanisms generating the cell type specificity of eQTLs and the mode of regulation of corresponding eQTLs. Using a random forest classifier with cell specific CRE-SNP overlap as features, we demonstrate the feasibility of predicting the cell type specificity of eQTLs. We then demonstrate that CREs from a trait-associated cell type can be used to annotate GWAS associations in the absence of eQTL data for that cell type. We anticipate that such integrative, predictive modeling of cell specificity will improve our ability to understand the mechanistic basis of human

  16. The lncRNA Malat1 Is Dispensable for Mouse Development but Its Transcription Plays a cis-Regulatory Role in the Adult

    Directory of Open Access Journals (Sweden)

    Bin Zhang

    2012-07-01

    Full Text Available Genome-wide studies have identified thousands of long noncoding RNAs (lncRNAs lacking protein-coding capacity. However, most lncRNAs are expressed at a very low level, and in most cases there is no genetic evidence to support their in vivo function. Malat1 (metastasis associated lung adenocarcinoma transcript 1 is among the most abundant and highly conserved lncRNAs, and it exhibits an uncommon 3′-end processing mechanism. In addition, its specific nuclear localization, developmental regulation, and dysregulation in cancer are suggestive of it having a critical biological function. We have characterized a Malat1 loss-of-function genetic model that indicates that Malat1 is not essential for mouse pre- and postnatal development. Furthermore, depletion of Malat1 does not affect global gene expression, splicing factor level and phosphorylation status, or alternative pre-mRNA splicing. However, among a small number of genes that were dysregulated in adult Malat1 knockout mice, many were Malat1 neighboring genes, thus indicating a potential cis-regulatory role of Malat1 gene transcription.

  17. Functional dissection of the promoter of the pollen-specific gene NTP303 reveals a novel pollen-specific, and conserved cis-regulatory element.

    Science.gov (United States)

    Weterings, K; Schrauwen, J; Wullems, G; Twell, D

    1995-07-01

    Regulatory elements within the promoter of the pollen-specific NTP303 gene from tobacco were analysed by transient and stable expression analyses. Analysis of precisely targeted mutations showed that the NTP303 promoter is not regulated by any of the previously described pollen-specific cis-regulatory elements. However, two adjacent regions from -103 to -86 bp and from -86 to -59 bp were shown to contain sequences which positively regulated the NTP303 promoter. Both of these regions were capable of driving pollen-specific expression from a heterologous promoter, independent of orientation and in an additive manner. The boundaries of the minimal, functional NTP303 promoter were determined to lie within the region -86 to -51 bp. The sequence AAATGA localized from -94 to -89 bp was identified as a novel cis-acting element, of which the TGA triplet was shown to comprise an active part. This element was shown to be completely conserved in the similarly regulated promoter of the Bp 10 gene from Brassica napus encoding a homologue of the NTP303 gene.

  18. Identification of a cis-regulatory region of a gene in Arabidopsis thaliana whose induction by dehydration is mediated by abscisic acid and requires protein synthesis.

    Science.gov (United States)

    Iwasaki, T; Yamaguchi-Shinozaki, K; Shinozaki, K

    1995-05-20

    In Arabidopsis thaliana, the induction of a dehydration-responsive gene, rd22, is mediated by abscisic acid (ABA) but the gene does not include any sequence corresponding to the consensus ABA-responsive element (ABRE), RYACGTGGYR, in its promoter region. The cis-regulatory region of the rd22 promoter was identified by monitoring the expression of beta-glucuronidase (GUS) activity in leaves of transgenic tobacco plants transformed with chimeric gene fusions constructed between 5'-deleted promoters of rd22 and the coding region of the GUS reporter gene. A 67-bp nucleotide fragment corresponding to positions -207 to -141 of the rd22 promoter conferred responsiveness to dehydration and ABA on a non-responsive promoter. The 67-bp fragment contains the sequences of the recognition sites for some transcription factors, such as MYC, MYB, and GT-1. The fact that accumulation of rd22 mRNA requires protein synthesis raises the possibility that the expression of rd22 might be regulated by one of these trans-acting protein factors whose de novo synthesis is induced by dehydration or ABA. Although the structure of the RD22 protein is very similar to that of a non-storage seed protein, USP, of Vicia faba, the expression of the GUS gene driven by the rd22 promoter in non-stressed transgenic Arabidopsis plants was found mainly in flowers and bolted stems rather than in seeds.

  19. Dynamic in vivo binding of transcription factors to cis-regulatory modules of cer and gsc in the stepwise formation of the Spemann–Mangold organizer

    Science.gov (United States)

    Sudou, Norihiro; Yamamoto, Shinji; Ogino, Hajime; Taira, Masanori

    2012-01-01

    How multiple developmental cues are integrated on cis-regulatory modules (CRMs) for cell fate decisions remains uncertain. The Spemann–Mangold organizer in Xenopus embryos expresses the transcription factors Lim1/Lhx1, Otx2, Mix1, Siamois (Sia) and VegT. Reporter analyses using sperm nuclear transplantation and DNA injection showed that cerberus (cer) and goosecoid (gsc) are activated by the aforementioned transcription factors through CRMs conserved between X. laevis and X. tropicalis. ChIP-qPCR analysis for the five transcription factors revealed that cer and gsc CRMs are initially bound by both Sia and VegT at the late blastula stage, and subsequently bound by all five factors at the gastrula stage. At the neurula stage, only binding of Lim1 and Otx2 to the gsc CRM, among others, persists, which corresponds to their co-expression in the prechordal plate. Based on these data, together with detailed expression pattern analysis, we propose a new model of stepwise formation of the organizer, in which (1) maternal VegT and Wnt-induced Sia first bind to CRMs at the blastula stage; then (2) Nodal-inducible Lim1, Otx2, Mix1 and zygotic VegT are bound to CRMs in the dorsal endodermal and mesodermal regions where all these genes are co-expressed; and (3) these two regions are combined at the gastrula stage to form the organizer. Thus, the in vivo dynamics of multiple transcription factors highlight their roles in the initiation and maintenance of gene expression, and also reveal the stepwise integration of maternal, Nodal and Wnt signaling on CRMs of organizer genes to generate the organizer. PMID:22492356

  20. Dynamic in vivo binding of transcription factors to cis-regulatory modules of cer and gsc in the stepwise formation of the Spemann-Mangold organizer.

    Science.gov (United States)

    Sudou, Norihiro; Yamamoto, Shinji; Ogino, Hajime; Taira, Masanori

    2012-05-01

    How multiple developmental cues are integrated on cis-regulatory modules (CRMs) for cell fate decisions remains uncertain. The Spemann-Mangold organizer in Xenopus embryos expresses the transcription factors Lim1/Lhx1, Otx2, Mix1, Siamois (Sia) and VegT. Reporter analyses using sperm nuclear transplantation and DNA injection showed that cerberus (cer) and goosecoid (gsc) are activated by the aforementioned transcription factors through CRMs conserved between X. laevis and X. tropicalis. ChIP-qPCR analysis for the five transcription factors revealed that cer and gsc CRMs are initially bound by both Sia and VegT at the late blastula stage, and subsequently bound by all five factors at the gastrula stage. At the neurula stage, only binding of Lim1 and Otx2 to the gsc CRM, among others, persists, which corresponds to their co-expression in the prechordal plate. Based on these data, together with detailed expression pattern analysis, we propose a new model of stepwise formation of the organizer, in which (1) maternal VegT and Wnt-induced Sia first bind to CRMs at the blastula stage; then (2) Nodal-inducible Lim1, Otx2, Mix1 and zygotic VegT are bound to CRMs in the dorsal endodermal and mesodermal regions where all these genes are co-expressed; and (3) these two regions are combined at the gastrula stage to form the organizer. Thus, the in vivo dynamics of multiple transcription factors highlight their roles in the initiation and maintenance of gene expression, and also reveal the stepwise integration of maternal, Nodal and Wnt signaling on CRMs of organizer genes to generate the organizer.

  1. Comprehensive meta-analysis of Signal Transducers and Activators of Transcription (STAT genomic binding patterns discerns cell-specific cis-regulatory modules

    Directory of Open Access Journals (Sweden)

    Kang Keunsoo

    2013-01-01

    Full Text Available Abstract Background Cytokine-activated transcription factors from the STAT (Signal Transducers and Activators of Transcription family control common and context-specific genetic programs. It is not clear to what extent cell-specific features determine the binding capacity of seven STAT members and to what degree they share genetic targets. Molecular insight into the biology of STATs was gained from a meta-analysis of 29 available ChIP-seq data sets covering genome-wide occupancy of STATs 1, 3, 4, 5A, 5B and 6 in several cell types. Results We determined that the genomic binding capacity of STATs is primarily defined by the cell type and to a lesser extent by individual family members. For example, the overlap of shared binding sites between STATs 3 and 5 in T cells is greater than that between STAT5 in T cells and non-T cells. Even for the top 1,000 highly enriched STAT binding sites, ~15% of STAT5 binding sites in mouse female liver are shared by other STATs in different cell types while in T cells ~90% of STAT5 binding sites are co-occupied by STAT3, STAT4 and STAT6. In addition, we identified 116 cis-regulatory modules (CRM, which are recognized by all STAT members across cell types defining a common JAK-STAT signature. Lastly, in liver STAT5 binding significantly coincides with binding of the cell-specific transcription factors HNF4A, FOXA1 and FOXA2 and is associated with cell-type specific gene transcription. Conclusions Our results suggest that genomic binding of STATs is primarily determined by the cell type and further specificity is achieved in part by juxtaposed binding of cell-specific transcription factors.

  2. PlantPAN: Plant promoter analysis navigator, for identifying combinatorial cis-regulatory elements with distance constraint in plant gene groups

    Directory of Open Access Journals (Sweden)

    Huang Hsien-Da

    2008-11-01

    Full Text Available Abstract Background The elucidation of transcriptional regulation in plant genes is important area of research for plant scientists, following the mapping of various plant genomes, such as A. thaliana, O. sativa and Z. mays. A variety of bioinformatic servers or databases of plant promoters have been established, although most have been focused only on annotating transcription factor binding sites in a single gene and have neglected some important regulatory elements (tandem repeats and CpG/CpNpG islands in promoter regions. Additionally, the combinatorial interaction of transcription factors (TFs is important in regulating the gene group that is associated with the same expression pattern. Therefore, a tool for detecting the co-regulation of transcription factors in a group of gene promoters is required. Results This study develops a database-assisted system, PlantPAN (Plant Promoter Analysis Navigator, for recognizing combinatorial cis-regulatory elements with a distance constraint in sets of plant genes. The system collects the plant transcription factor binding profiles from PLACE, TRANSFAC (public release 7.0, AGRIS, and JASPER databases and allows users to input a group of gene IDs or promoter sequences, enabling the co-occurrence of combinatorial transcription factor binding sites (TFBSs within a defined distance (20 bp to 200 bp to be identified. Furthermore, the new resource enables other regulatory features in a plant promoter, such as CpG/CpNpG islands and tandem repeats, to be displayed. The regulatory elements in the conserved regions of the promoters across homologous genes are detected and presented. Conclusion In addition to providing a user-friendly input/output interface, PlantPAN has numerous advantages in the analysis of a plant promoter. Several case studies have established the effectiveness of PlantPAN. This novel analytical resource is now freely available at http://PlantPAN.mbc.nctu.edu.tw.

  3. Preaxial polydactyly/triphalangeal thumb is associated with changed transcription factor-binding affinity in a family with a novel point mutation in the long-range cis-regulatory element ZRS

    DEFF Research Database (Denmark)

    Farooq, Muhammad; Troelsen, Jesper T; Boyd, Mette

    2010-01-01

    A cis-regulatory sequence also known as zone of polarizing activity (ZPA) regulatory sequence (ZRS) located in intron 5 of LMBR1 is essential for expression of sonic hedgehog (SHH) in the developing posterior limb bud mesenchyme. Even though many point mutations causing preaxial duplication defects...... demonstrated a marked difference between wild-type and the mutant probe, which uniquely bound one or several transcription factors extracted from Caco-2 cells. This finding supports a model in which ectopic anterior SHH expression in the developing limb results from abnormal binding of one or more...

  4. MOCCS: Clarifying DNA-binding motif ambiguity using ChIP-Seq data.

    Science.gov (United States)

    Ozaki, Haruka; Iwasaki, Wataru

    2016-08-01

    As a key mechanism of gene regulation, transcription factors (TFs) bind to DNA by recognizing specific short sequence patterns that are called DNA-binding motifs. A single TF can accept ambiguity within its DNA-binding motifs, which comprise both canonical (typical) and non-canonical motifs. Clarification of such DNA-binding motif ambiguity is crucial for revealing gene regulatory networks and evaluating mutations in cis-regulatory elements. Although chromatin immunoprecipitation sequencing (ChIP-seq) now provides abundant data on the genomic sequences to which a given TF binds, existing motif discovery methods are unable to directly answer whether a given TF can bind to a specific DNA-binding motif. Here, we report a method for clarifying the DNA-binding motif ambiguity, MOCCS. Given ChIP-Seq data of any TF, MOCCS comprehensively analyzes and describes every k-mer to which that TF binds. Analysis of simulated datasets revealed that MOCCS is applicable to various ChIP-Seq datasets, requiring only a few minutes per dataset. Application to the ENCODE ChIP-Seq datasets proved that MOCCS directly evaluates whether a given TF binds to each DNA-binding motif, even if known position weight matrix models do not provide sufficient information on DNA-binding motif ambiguity. Furthermore, users are not required to provide numerous parameters or background genomic sequence models that are typically unavailable. MOCCS is implemented in Perl and R and is freely available via https://github.com/yuifu/moccs. By complementing existing motif-discovery software, MOCCS will contribute to the basic understanding of how the genome controls diverse cellular processes via DNA-protein interactions. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. Genome-wide targeted prediction of ABA responsive genes in rice based on over-represented cis-motif in co-expressed genes.

    Science.gov (United States)

    Lenka, Sangram K; Lohia, Bikash; Kumar, Abhay; Chinnusamy, Viswanathan; Bansal, Kailash C

    2009-02-01

    Abscisic acid (ABA), the popular plant stress hormone, plays a key role in regulation of sub-set of stress responsive genes. These genes respond to ABA through specific transcription factors which bind to cis-regulatory elements present in their promoters. We discovered the ABA Responsive Element (ABRE) core (ACGT) containing CGMCACGTGB motif as over-represented motif among the promoters of ABA responsive co-expressed genes in rice. Targeted gene prediction strategy using this motif led to the identification of 402 protein coding genes potentially regulated by ABA-dependent molecular genetic network. RT-PCR analysis of arbitrarily chosen 45 genes from the predicted 402 genes confirmed 80% accuracy of our prediction. Plant Gene Ontology (GO) analysis of ABA responsive genes showed enrichment of signal transduction and stress related genes among diverse functional categories.

  6. Bayesian centroid estimation for motif discovery.

    Science.gov (United States)

    Carvalho, Luis

    2013-01-01

    Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  7. Bayesian centroid estimation for motif discovery.

    Directory of Open Access Journals (Sweden)

    Luis Carvalho

    Full Text Available Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  8. Two potential hookworm DAF-16 target genes, SNR-3 and LPP-1: gene structure, expression profile, and implications of a cis-regulatory element in the regulation of gene expression.

    Science.gov (United States)

    Gao, Xin; Goggin, Kevin; Dowling, Camille; Qian, Jason; Hawdon, John M

    2015-01-08

    Hookworms infect nearly 700 million people, causing anemia and developmental stunting in heavy infections. Little is known about the genomic structure or gene regulation in hookworms, although recent publication of draft genome assemblies has allowed the first investigations of these topics to be undertaken. The transcription factor DAF-16 mediates multiple developmental pathways in the free living nematode Caenorhabditis elegans, and is involved in the recovery from the developmentally arrested L3 in hookworms. Identification of downstream targets of DAF-16 will provide a better understanding of the molecular mechanism of hookworm infection. Genomic Fragment 2.23 containing a DAF-16 binding element (DBE) was used to identify overlapping complementary expressed sequence tags (ESTs). These sequences were used to search a draft assembly of the Ancylostoma caninum genome, and identified two neighboring genes, snr-3 and lpp-1, in a tail-to-tail orientation. Expression patterns of both genes during parasitic development were determined by qRT-PCR. DAF-16 dependent cis-regulatory activity of fragment 2.23 was investigated using an in vitro reporter system. The snr-3 gene spans approximately 5.6 kb in the genome and contains 3 exons and 2 introns, and contains the DBE in its 3' untranslated region. Downstream from snr-3 in a tail-to-tail arrangement is the gene lpp-1. The lpp-1 gene spans more than 6 kb and contains 10 exons and 9 introns. The A. caninum genome contains 2 apparent splice variants, but there are 7 splice variants in the A. ceylanicum genome. While the gene order is similar, the gene structures of the hookworm genes differ from their C. elegans orthologs. Both genes show peak expression in the late L4 stage. Using a cell culture based expression system, fragment 2.23 was found to have both DAF-16-dependent promoter and enhancer activity that required an intact DBE. Two putative DAF-16 targets were identified by genome wide screening for DAF-16 binding

  9. Large-scale discovery of promoter motifs in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Thomas A Down

    2007-01-01

    Full Text Available A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes.

  10. CompariMotif: quick and easy comparisons of sequence motifs.

    Science.gov (United States)

    Edwards, Richard J; Davey, Norman E; Shields, Denis C

    2008-05-15

    CompariMotif is a novel tool for making motif-motif comparisons, identifying and describing similarities between regular expression motifs. CompariMotif can identify a number of different relationships between motifs, including exact matches, variants of degenerate motifs and complex overlapping motifs. Motif relationships are scored using shared information content, allowing the best matches to be easily identified in large comparisons. Many input and search options are available, enabling a list of motifs to be compared to itself (to identify recurring motifs) or to datasets of known motifs. CompariMotif can be run online at http://bioware.ucd.ie/ and is freely available for academic use as a set of open source Python modules under a GNU General Public License from http://bioinformatics.ucd.ie/shields/software/comparimotif/

  11. MHC motif viewer

    DEFF Research Database (Denmark)

    Rapin, Nicolas Philippe Jean-Pierre; Hoof, Ilka; Lund, Ole

    2008-01-01

    . Algorithms that predict which peptides MHC molecules bind have recently been developed and cover many different alleles, but the utility of these algorithms is hampered by the lack of tools for browsing and comparing the specificity of these molecules. We have, therefore, developed a web server, MHC motif....... A special viewing feature, MHC fight, allows for display of the specificity of two different MHC molecules side by side. We show how the web server can be used to discover and display surprising similarities as well as differences between MHC molecules within and between different species. The MHC motif...

  12. [Personal motif in art].

    Science.gov (United States)

    Gerevich, József

    2015-01-01

    One of the basic questions of the art psychology is whether a personal motif is to be found behind works of art and if so, how openly or indirectly it appears in the work itself. Analysis of examples and documents from the fine arts and literature allow us to conclude that the personal motif that can be identified by the viewer through symbols, at times easily at others with more difficulty, gives an emotional plus to the artistic product. The personal motif may be found in traumatic experiences, in communication to the model or with other emotionally important persons (mourning, disappointment, revenge, hatred, rivalry, revolt etc.), in self-searching, or self-analysis. The emotions are expressed in artistic activity either directly or indirectly. The intention nourished by the artist's identity (Kunstwollen) may stand in the way of spontaneous self-expression, channelling it into hidden paths. Under the influence of certain circumstances, the artist may arouse in the viewer, consciously or unconsciously, an illusionary, misleading image of himself. An examination of the personal motif is one of the important research areas of art therapy.

  13. Prenatal exposure of mice to diethylstilbestrol disrupts T-cell differentiation by regulating Fas/Fas ligand expression through estrogen receptor element and nuclear factor-κB motifs.

    Science.gov (United States)

    Singh, Narendra P; Singh, Udai P; Nagarkatti, Prakash S; Nagarkatti, Mitzi

    2012-11-01

    Prenatal exposure to diethylstilbestrol (DES) is known to cause altered immune functions and increased susceptibility to autoimmune disease in humans. In the current study, we investigated the effect of prenatal exposure to DES on thymocyte differentiation involving apoptotic pathways. Prenatal DES exposure caused thymic atrophy, apoptosis, and up-regulation of Fas and Fas ligand (FasL) expression in thymocytes. To examine the mechanism underlying DES-mediated regulation of Fas and FasL, we performed luciferase assays using T cells transfected with luciferase reporter constructs containing full-length Fas or FasL promoters. There was significant luciferase induction in the presence of Fas or FasL promoters after DES exposure. Further analysis demonstrated the presence of several cis-regulatory motifs on both Fas and FasL promoters. When DES-induced transcription factors were analyzed, estrogen receptor element (ERE), nuclear factor κB (NF-κB), nuclear factor of activated T cells (NF-AT), and activator protein-1 motifs on the Fas promoter, as well as ERE, NF-κB, and NF-AT motifs on the FasL promoter, showed binding affinity with the transcription factors. Electrophoretic mobility-shift assays were performed to verify the binding affinity of cis-regulatory motifs of Fas or FasL promoters with transcription factors. There was shift in mobility of probes (ERE or NF-κB2) of both Fas and FasL in the presence of nuclear proteins from DES-treated cells, and the shift was specific to DES because these probes failed to shift their mobility in the presence of nuclear proteins from vehicle-treated cells. Together, the current study demonstrates that prenatal exposure to DES triggers significant alterations in apoptotic molecules expressed on thymocytes, which may affect T-cell differentiation and cause long-term effects on the immune functions.

  14. ARG-walker: inference of individual specific strengths of meiotic recombination hotspots by population genomics analysis.

    Science.gov (United States)

    Chen, Hao; Yang, Peng; Guo, Jing; Kwoh, Chee Keong; Przytycka, Teresa M; Zheng, Jie

    2015-01-01

    Meiotic recombination hotspots play important roles in various aspects of genomics, but the underlying mechanisms for regulating the locations and strengths of recombination hotspots are not yet fully revealed. Most existing algorithms for estimating recombination rates from sequence polymorphism data can only output average recombination rates of a population, although there is evidence for the heterogeneity in recombination rates among individuals. For genome-wide association studies (GWAS) of recombination hotspots, an efficient algorithm that estimates the individualized strengths of recombination hotspots is highly desirable. In this work, we propose a novel graph mining algorithm named ARG-walker, based on random walks on ancestral recombination graphs (ARG), to estimate individual-specific recombination hotspot strengths. Extensive simulations demonstrate that ARG-walker is able to distinguish the hot allele of a recombination hotspot from the cold allele. Integrated with output of ARG-walker, we performed GWAS on the phased haplotype data of the 22 autosome chromosomes of the HapMap Asian population samples of Chinese and Japanese (JPT+CHB). Significant cis-regulatory signals have been detected, which is corroborated by the enrichment of the well-known 13-mer motif CCNCCNTNNCCNC of PRDM9 protein. Moreover, two new DNA motifs have been identified in the flanking regions of the significantly associated SNPs (single nucleotide polymorphisms), which are likely to be new cis-regulatory elements of meiotic recombination hotspots of the human genome. Our results on both simulated and real data suggest that ARG-walker is a promising new method for estimating the individual recombination variations. In the future, it could be used to uncover the mechanisms of recombination regulation and human diseases related with recombination hotspots.

  15. Motif enrichment tool.

    Science.gov (United States)

    Blatti, Charles; Sinha, Saurabh

    2014-07-01

    The Motif Enrichment Tool (MET) provides an online interface that enables users to find major transcriptional regulators of their gene sets of interest. MET searches the appropriate regulatory region around each gene and identifies which transcription factor DNA-binding specificities (motifs) are statistically overrepresented. Motif enrichment analysis is currently available for many metazoan species including human, mouse, fruit fly, planaria and flowering plants. MET also leverages high-throughput experimental data such as ChIP-seq and DNase-seq from ENCODE and ModENCODE to identify the regulatory targets of a transcription factor with greater precision. The results from MET are produced in real time and are linked to a genome browser for easy follow-up analysis. Use of the web tool is free and open to all, and there is no login requirement. ADDRESS: http://veda.cs.uiuc.edu/MET/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Entropic Inference

    Science.gov (United States)

    Caticha, Ariel

    2011-03-01

    In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEnt and Bayes' rule, and therefore unifies the two themes of these workshops—the Maximum Entropy and the Bayesian methods—into a single general inference scheme.

  17. MotifMark: Finding regulatory motifs in DNA sequences.

    Science.gov (United States)

    Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L; Wang, May D

    2017-07-01

    The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.

  18. Distributional Inference

    NARCIS (Netherlands)

    Kroese, A.H.; van der Meulen, E.A.; Poortema, Klaas; Schaafsma, W.

    1995-01-01

    The making of statistical inferences in distributional form is conceptionally complicated because the epistemic 'probabilities' assigned are mixtures of fact and fiction. In this respect they are essentially different from 'physical' or 'frequency-theoretic' probabilities. The distributional form is

  19. Entropic Inference

    OpenAIRE

    Caticha, Ariel

    2010-01-01

    In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEn...

  20. Perceptual inference.

    Science.gov (United States)

    Aggelopoulos, Nikolaos C

    2015-08-01

    Perceptual inference refers to the ability to infer sensory stimuli from predictions that result from internal neural representations built through prior experience. Methods of Bayesian statistical inference and decision theory model cognition adequately by using error sensing either in guiding action or in "generative" models that predict the sensory information. In this framework, perception can be seen as a process qualitatively distinct from sensation, a process of information evaluation using previously acquired and stored representations (memories) that is guided by sensory feedback. The stored representations can be utilised as internal models of sensory stimuli enabling long term associations, for example in operant conditioning. Evidence for perceptual inference is contributed by such phenomena as the cortical co-localisation of object perception with object memory, the response invariance in the responses of some neurons to variations in the stimulus, as well as from situations in which perception can be dissociated from sensation. In the context of perceptual inference, sensory areas of the cerebral cortex that have been facilitated by a priming signal may be regarded as comparators in a closed feedback loop, similar to the better known motor reflexes in the sensorimotor system. The adult cerebral cortex can be regarded as similar to a servomechanism, in using sensory feedback to correct internal models, producing predictions of the outside world on the basis of past experience. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Composite Structural Motifs of Binding Sites for Delineating Biological Functions of Proteins

    Science.gov (United States)

    Kinjo, Akira R.; Nakamura, Haruki

    2012-01-01

    Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs that represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures. PMID:22347478

  2. Motif signatures of transcribed enhancers

    KAUST Repository

    Kleftogiannis, Dimitrios

    2017-09-14

    In mammalian cells, transcribed enhancers (TrEn) play important roles in the initiation of gene expression and maintenance of gene expression levels in spatiotemporal manner. One of the most challenging questions in biology today is how the genomic characteristics of enhancers relate to enhancer activities. This is particularly critical, as several recent studies have linked enhancer sequence motifs to specific functional roles. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers genomic code in a more systematic way. To address this problem, we developed a novel computational method, TELS, aimed at identifying predictive cell type/tissue specific motif signatures. We used TELS to compile a comprehensive catalog of motif signatures for all known TrEn identified by the FANTOM5 consortium across 112 human primary cells and tissues. Our results confirm that distinct cell type/tissue specific motif signatures characterize TrEn. These signatures allow discriminating successfully a) TrEn from random controls, proxy of non-enhancer activity, and b) cell type/tissue specific TrEn from enhancers expressed and transcribed in different cell types/tissues. TELS codes and datasets are publicly available at http://www.cbrc.kaust.edu.sa/TELS.

  3. An Analysis of Multi-type Relational Interactions in FMA Using Graph Motifs with Disjointness Constraints

    Science.gov (United States)

    Zhang, Guo-Qiang; Luo, Lingyun; Ogbuji, Chime; Joslyn, Cliff; Mejino, Jose; Sahoo, Satya S

    2012-01-01

    The interaction of multiple types of relationships among anatomical classes in the Foundational Model of Anatomy (FMA) can provide inferred information valuable for quality assurance. This paper introduces a method called Motif Checking (MOCH) to study the effects of such multi-relation type interactions for detecting logical inconsistencies as well as other anomalies represented by the motifs. MOCH represents patterns of multi-type interaction as small labeled (with multiple types of edges) sub-graph motifs, whose nodes represent class variables, and labeled edges represent relational types. By representing FMA as an RDF graph and motifs as SPARQL queries, fragments of FMA are automatically obtained as auditing candidates. Leveraging the scalability and reconfigurability of Semantic Web Technology, we performed exhaustive analyses of a variety of labeled sub-graph motifs. The quality assurance feature of MOCH comes from the distinct use of a subset of the edges of the graph motifs as constraints for disjointness, whereby bringing in rule-based flavor to the approach as well. With possible disjointness implied by antonyms, we performed manual inspection of the resulting FMA fragments and tracked down sources of abnormal inferred conclusions (logical inconsistencies), which are amendable for programmatic revision of the FMA. Our results demonstrate that MOCH provides a unique source of valuable information for quality assurance. Since our approach is general, it is applicable to any ontological system with an OWL representation. PMID:23304382

  4. An analysis of multi-type relational interactions in FMA using graph motifs with disjointness constraints.

    Science.gov (United States)

    Zhang, Guo-Qiang; Luo, Lingyun; Ogbuji, Chime; Joslyn, Cliff; Mejino, Jose; Sahoo, Satya S

    2012-01-01

    The interaction of multiple types of relationships among anatomical classes in the Foundational Model of Anatomy (FMA) can provide inferred information valuable for quality assurance. This paper introduces a method called Motif Checking (MOCH) to study the effects of such multi-relation type interactions for detecting logical inconsistencies as well as other anomalies represented by the motifs. MOCH represents patterns of multi-type interaction as small labeled (with multiple types of edges) sub-graph motifs, whose nodes represent class variables, and labeled edges represent relational types. By representing FMA as an RDF graph and motifs as SPARQL queries, fragments of FMA are automatically obtained as auditing candidates. Leveraging the scalability and reconfigurability of Semantic Web Technology, we performed exhaustive analyses of a variety of labeled sub-graph motifs. The quality assurance feature of MOCH comes from the distinct use of a subset of the edges of the graph motifs as constraints for disjointness, whereby bringing in rule-based flavor to the approach as well. With possible disjointness implied by antonyms, we performed manual inspection of the resulting FMA fragments and tracked down sources of abnormal inferred conclusions (logical inconsistencies), which are amendable for programmatic revision of the FMA. Our results demonstrate that MOCH provides a unique source of valuable information for quality assurance. Since our approach is general, it is applicable to any ontological system with an OWL representation.

  5. Statistical inference

    CERN Document Server

    Rohatgi, Vijay K

    2003-01-01

    Unified treatment of probability and statistics examines and analyzes the relationship between the two fields, exploring inferential issues. Numerous problems, examples, and diagrams--some with solutions--plus clear-cut, highlighted summaries of results. Advanced undergraduate to graduate level. Contents: 1. Introduction. 2. Probability Model. 3. Probability Distributions. 4. Introduction to Statistical Inference. 5. More on Mathematical Expectation. 6. Some Discrete Models. 7. Some Continuous Models. 8. Functions of Random Variables and Random Vectors. 9. Large-Sample Theory. 10. General Meth

  6. On the origin of distribution patterns of motifs in biological networks

    Directory of Open Access Journals (Sweden)

    Lesk Arthur M

    2008-08-01

    Full Text Available Abstract Background Inventories of small subgraphs in biological networks have identified commonly-recurring patterns, called motifs. The inference that these motifs have been selected for function rests on the idea that their occurrences are significantly more frequent than random. Results Our analysis of several large biological networks suggests, in contrast, that the frequencies of appearance of common subgraphs are similar in natural and corresponding random networks. Conclusion Indeed, certain topological features of biological networks give rise naturally to the common appearance of the motifs. We therefore question whether frequencies of occurrences are reasonable evidence that the structures of motifs have been selected for their functional contribution to the operation of networks.

  7. Computational methods to dissect cis-regulatory transcriptional ...

    Indian Academy of Sciences (India)

    The formation of diverse cell types from an invariant set of genes is governed by biochemical and molecular processes that regulate gene activity. A complete understanding of the regulatory mechanisms of gene expression is the major function of genomics. Computational genomics is a rapidly emerging area for ...

  8. Patterns of cis regulatory variation in diverse human populations.

    Directory of Open Access Journals (Sweden)

    Barbara E Stranger

    Full Text Available The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants, but also more recently to assist in the interpretation and elucidation of disease signals. To date, many studies have looked in specific tissues and population-based samples, but there has been limited assessment of the degree of inter-population variability in regulatory variation. We analyzed genome-wide gene expression in lymphoblastoid cell lines from a total of 726 individuals from 8 global populations from the HapMap3 project and correlated gene expression levels with HapMap3 SNPs located in cis to the genes. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We further dissect the specific functional pathways differentiated between populations. We also identify 5,691 expression quantitative trait loci (eQTLs after controlling for both non-genetic factors and population admixture and observe that half of the cis-eQTLs are replicated in one or more of the populations. We highlight patterns of eQTL-sharing between populations, which are partially determined by population genetic relatedness, and discover significant sharing of eQTL effects between Asians, European-admixed, and African subpopulations. Specifically, we observe that both the effect size and the direction of effect for eQTLs are highly conserved across populations. We observe an increasing proximity of eQTLs toward the transcription start site as sharing of eQTLs among populations increases, highlighting that variants close to TSS have stronger effects and therefore are more likely to be detected across a wider panel of populations. Together these results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation and provide an estimate for the transferability of complex trait variants across populations.

  9. Cis-regulatory timers for developmental gene expression.

    Directory of Open Access Journals (Sweden)

    Lionel Christiaen

    2013-10-01

    Full Text Available How does a fertilized egg decode its own genome to eventually develop into a mature animal? Each developing cell must activate a battery of genes in a timely manner and according to the function it will ultimately perform, but how? During development of the notochord--a structure akin to the vertebrate spine--in a simple marine invertebrate, an essential protein called Brachyury binds to specific sites in its target genes. A study just published in PLOS Biology reports that if the target gene contains multiple Brachyury-binding sites it will be activated early in development but if it contains only one site it will be activated later. Genes that contain no binding site can still be activated by Brachyury, but only indirectly by an earlier Brachyury-dependent gene product, so later than the directly activated genes. Thus, this study shows how several genes can interpret the presence of a single factor differently to become active at distinct times in development.

  10. Cis-regulatory RNA elements that regulate specialized ribosome activity.

    Science.gov (United States)

    Xue, Shifeng; Barna, Maria

    2015-01-01

    Recent evidence has shown that the ribosome itself can play a highly regulatory role in the specialized translation of specific subpools of mRNAs, in particular at the level of ribosomal proteins (RP). However, the mechanism(s) by which this selection takes place has remained poorly understood. In our recent study, we discovered a combination of unique RNA elements in the 5'UTRs of mRNAs that allows for such control by the ribosome. These mRNAs contain a Translation Inhibitory Element (TIE) that inhibits general cap-dependent translation, and an Internal Ribosome Entry Site (IRES) that relies on a specific RP for activation. The unique combination of an inhibitor of general translation and an activator of specialized translation is key to ribosome-mediated control of gene expression. Here we discuss how these RNA regulatory elements provide a new level of control to protein expression and their implications for gene expression, organismal development and evolution.

  11. Kopi dan Kakao dalam Kreasi Motif Batik Khas Jember

    Directory of Open Access Journals (Sweden)

    Irfa'ina Rohana Salma

    2015-06-01

    Full Text Available ABSTRAK Batik Jember selama ini identik dengan motif daun tembakau. Visualisasi daun tembakau dalam motif Batik Jember cukup lemah, yaitu kurang berkarakter karena motif yang muncul adalah seperti gambar daun pada umumnya. Oleh karena itu perlu diciptakan desain motif batik khas Jember yang sumber inspirasinya digali dari kekayaan alam lainnya dari Jember yang mempunyai bentuk spesifik dan karakteristik sehingga identitas motif bisa didapatkan dengan lebih kuat. Hasil alam khas Jember tersebut adalah kopi dan kakao. Tujuan penciptaan seni ini adalah untuk menghasilkan motif batik  baru yang mempunyai ciri khas Jember. Metode yang digunakan yaitu pengumpulan data, pengamatan mendalam terhadap objek penciptaan, pengkajian sumber inspirasi, pembuatan desain motif, dan perwujudan menjadi batik. Dari penciptaan seni ini berhasil dikreasikan 6 (enam motif batik yaitu: (1 Motif Uwoh Kopi; (2 Motif Godong Kopi;  (3 Motif Ceplok Kakao; (4 Motif Kakao Raja; (5 Motif Kakao Biru; dan (6 Motif Wiji Mukti. Berdasarkan hasil penilaian “Selera Estetika” diketahui bahwa motif yang paling banyak disukai adalah Motif Uwoh Kopi dan Motif Kakao Raja. Kata kunci: Motif Woh Kopi, Motif Godong Kopi, Motif Ceplok Kakao, Motif Kakao Raja, Motif Kakao Biru, Motif Wiji Mukti ABSTRACTBatik Jember is synonymous with tobacco leaf motif. Tobacco leaf shape is quite weak in the visual appearance characterized as that motif emerges like a picture of leaves in general. Therefore, it is necessary to create a distinctive design motif extracted from other natural resources of Jember that have specific shapes and characteristics that can be obtained as the stronger motif identity. The typical natural resources from Jember are coffee and cocoa. The purpose of the creation of this art is to produce the unique, creative and innovative batik and have specific characteristics of Jember. The method used are data collection, observation of the object, reviewing inspiration sources

  12. Statistical tests to compare motif count exceptionalities

    Directory of Open Access Journals (Sweden)

    Vandewalle Vincent

    2007-03-01

    Full Text Available Abstract Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use.

  13. BayesMotif: de novo protein sorting motif discovery from impure datasets.

    Science.gov (United States)

    Hu, Jianjun; Zhang, Fan

    2010-01-18

    Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of

  14. Fitness for synchronization of network motifs

    DEFF Research Database (Denmark)

    Vega, Y.M.; Vázquez-Prada, M.; Pacheco, A.F.

    2004-01-01

    We study the synchronization of Kuramoto's oscillators in small parts of networks known as motifs. We first report on the system dynamics for the case of a scale-free network and show the existence of a non-trivial critical point. We compute the probability that network motifs synchronize, and fi...... that the fitness for synchronization correlates well with motifs interconnectedness and structural complexity. Possible implications for present debates about network evolution in biological and other systems are discussed....

  15. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data.

    Science.gov (United States)

    Tran, Ngoc Tam L; Huang, Chun-Hsi

    2014-02-20

    ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data.

  16. MSDmotif: exploring protein sites and motifs

    Directory of Open Access Journals (Sweden)

    Henrick Kim

    2008-07-01

    Full Text Available Abstract Background Protein structures have conserved features – motifs, which have a sufficient influence on the protein function. These motifs can be found in sequence as well as in 3D space. Understanding of these fragments is essential for 3D structure prediction, modelling and drug-design. The Protein Data Bank (PDB is the source of this information however present search tools have limited 3D options to integrate protein sequence with its 3D structure. Results We describe here a web application for querying the PDB for ligands, binding sites, small 3D structural and sequence motifs and the underlying database. Novel algorithms for chemical fragments, 3D motifs, ϕ/ψ sequences, super-secondary structure motifs and for small 3D structural motif associations searches are incorporated. The interface provides functionality for visualization, search criteria creation, sequence and 3D multiple alignment options. MSDmotif is an integrated system where a results page is also a search form. A set of motif statistics is available for analysis. This set includes molecule and motif binding statistics, distribution of motif sequences, occurrence of an amino-acid within a motif, correlation of amino-acids side-chain charges within a motif and Ramachandran plots for each residue. The binding statistics are presented in association with properties that include a ligand fragment library. Access is also provided through the distributed Annotation System (DAS protocol. An additional entry point facilitates XML requests with XML responses. Conclusion MSDmotif is unique by combining chemical, sequence and 3D data in a single search engine with a range of search and visualisation options. It provides multiple views of data found in the PDB archive for exploring protein structures.

  17. Temporal motifs in time-dependent networks

    International Nuclear Information System (INIS)

    Kovanen, Lauri; Karsai, Márton; Kaski, Kimmo; Kertész, János; Saramäki, Jari

    2011-01-01

    Temporal networks are commonly used to represent systems where connections between elements are active only for restricted periods of time, such as telecommunication, neural signal processing, biochemical reaction and human social interaction networks. We introduce the framework of temporal motifs to study the mesoscale topological–temporal structure of temporal networks in which the events of nodes do not overlap in time. Temporal motifs are classes of similar event sequences, where the similarity refers not only to topology but also to the temporal order of the events. We provide a mapping from event sequences to coloured directed graphs that enables an efficient algorithm for identifying temporal motifs. We discuss some aspects of temporal motifs, including causality and null models, and present basic statistics of temporal motifs in a large mobile call network

  18. Motif discovery in ranked lists of sequences

    DEFF Research Database (Denmark)

    Nielsen, Morten Muhlig; Tataru, Paula; Madsen, Tobias

    2016-01-01

    Motif analysis has long been an important method to characterize biological functionality and the current growth of sequencing-based genomics experiments further extends its potential. These diverse experiments often generate sequence lists ranked by some functional property. There is therefore...... advantage of the regular expression feature, including enrichments for combinations of different microRNA seed sites. The method is implemented and made publicly available as an R package and supports high parallelization on multi-core machinery....... a growing need for motif analysis methods that can exploit this coupled data structure and be tailored for specific biological questions. Here, we present an exploratory motif analysis tool, Regmex (REGular expression Motif EXplorer), which offers several methods to evaluate the correlation of motifs...

  19. MotifNet: a web-server for network motif analysis.

    Science.gov (United States)

    Smoly, Ilan Y; Lerman, Eugene; Ziv-Ukelson, Michal; Yeger-Lotem, Esti

    2017-06-15

    Network motifs are small topological patterns that recur in a network significantly more often than expected by chance. Their identification emerged as a powerful approach for uncovering the design principles underlying complex networks. However, available tools for network motif analysis typically require download and execution of computationally intensive software on a local computer. We present MotifNet, the first open-access web-server for network motif analysis. MotifNet allows researchers to analyze integrated networks, where nodes and edges may be labeled, and to search for motifs of up to eight nodes. The output motifs are presented graphically and the user can interactively filter them by their significance, number of instances, node and edge labels, and node identities, and view their instances. MotifNet also allows the user to distinguish between motifs that are centered on specific nodes and motifs that recur in distinct parts of the network. MotifNet is freely available at http://netbio.bgu.ac.il/motifnet . The website was implemented using ReactJs and supports all major browsers. The server interface was implemented in Python with data stored on a MySQL database. estiyl@bgu.ac.il or michaluz@cs.bgu.ac.il. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  20. Inference of expanded Lrp-like feast/famine transcription factor targets in a non-model organism using protein structure-based prediction.

    Science.gov (United States)

    Ashworth, Justin; Plaisier, Christopher L; Lo, Fang Yin; Reiss, David J; Baliga, Nitin S

    2014-01-01

    Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer.

  1. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

    Science.gov (United States)

    Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

    2013-01-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545

  2. Hunting Motifs in Situla Art

    Directory of Open Access Journals (Sweden)

    Andrej Preložnik

    2013-07-01

    Full Text Available Situla art developed as an echo of the toreutic style which had spread from the Near East through the Phoenicians, Greeks and Etruscans as far as the Veneti, Raeti, Histri, and their eastern neighbours in the region of Dolenjska (Lower Carniola. An Early Iron Age phenomenon (c. 600—300 BC, it rep- resents the major and most arresting form of the contemporary visual arts in an area stretching from the foot of the Apennines in the south to the Drava and Sava rivers in the east. Indeed, individual pieces have found their way across the Alpine passes and all the way north to the Danube. In the world and art of the situlae, a prominent role is accorded to ani- mals. They are displayed in numerous representations of human activities on artefacts crafted in the classic situla style – that is, between the late 6th  and early 5th centuries BC – as passive participants (e.g. in pageants or in harness or as an active element of the situla narrative. The most typical example of the latter is the hunting scene. Today we know at least four objects decorat- ed exclusively with hunting themes, and a number of situlae and other larger vessels where hunting scenes are embedded in composite narratives. All this suggests a popularity unparallelled by any other genre. Clearly recognisable are various hunting techniques and weapons, each associated with a particu- lar type of game (Fig. 1. The chase of a stag with javelin, horse and hound is depicted on the long- familiar and repeatedly published fibula of Zagorje (Fig. 2. It displays a hound mauling the stag’s back and a hunter on horseback pursuing a hind, her neck already pierced by the javelin. To judge by the (so far unnoticed shaft end un- der the stag’s muzzle, the hunter would have been brandishing a second jave- lin as well, like the warrior of the Vače fibula or the rider of the Nesactium situla, presumably himself a hunter. Many parallels to his motif are known from Greece, Etruria, and

  3. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed; Mansour, Essam; Kalnis, Panos

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern

  4. Deciphering functional glycosaminoglycan motifs in development.

    Science.gov (United States)

    Townley, Robert A; Bülow, Hannes E

    2018-03-23

    Glycosaminoglycans (GAGs) such as heparan sulfate, chondroitin/dermatan sulfate, and keratan sulfate are linear glycans, which when attached to protein backbones form proteoglycans. GAGs are essential components of the extracellular space in metazoans. Extensive modifications of the glycans such as sulfation, deacetylation and epimerization create structural GAG motifs. These motifs regulate protein-protein interactions and are thereby repsonsible for many of the essential functions of GAGs. This review focusses on recent genetic approaches to characterize GAG motifs and their function in defined signaling pathways during development. We discuss a coding approach for GAGs that would enable computational analyses of GAG sequences such as alignments and the computation of position weight matrices to describe GAG motifs. Copyright © 2018 Elsevier Ltd. All rights reserved.

  5. Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets.

    Science.gov (United States)

    Chiu, Yi-Yuan; Lin, Chun-Yu; Lin, Chih-Ta; Hsu, Kai-Cheng; Chang, Li-Zen; Yang, Jinn-Moon

    2012-01-01

    To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery.

  6. The MHC motif viewer: a visualization tool for MHC binding motifs

    DEFF Research Database (Denmark)

    Rapin, Nicolas; Hoof, Ilka; Lund, Ole

    2010-01-01

    is hampered by the lack of tools for browsing and comparing specificity of these molecules. We have developed a Web server, MHC Motif Viewer, which allows the display of the binding motif for MHC class I proteins for human, chimpanzee, rhesus monkey, mouse, and swine, as well as HLA-DR protein sequences...

  7. Analisis Unsur Matematika pada Motif Sulam Usus

    Directory of Open Access Journals (Sweden)

    Fredi Ganda Putra

    2017-12-01

    Full Text Available Based on interviews with researchers sources said that the beginning of the intestine embroidery is an art of genuine crafts. Called the intestine embroidery because this technique is a technique of combining a strand of cloth resembling the intestine formed according to the pattern by means of embroidered using a thread. Intestinal embroidery techniques were originally used to create a cover of the women's customary wardrobe of Lampung or often referred to as bebe. But not many people in Lampung, especially people who live in Lampung are still many who do not know and recognize the intestine embroidery because most only know tapis only characteristic of Lampung, besides that there are other cultural results that is embroidered intestine. There are still many who do not know that the intestine motif there is a knowledge of mathematics. The researcher's problem formulation is whether there are mathematical elements contained in the intestine embroidery motif based on the concept of geometry. The purpose of this study is to determine whether there are elements of mathematics contained in the intestine motif based on the concept of geometry. Subjects in this study consisted of 4 people obtained by purposive sampling technique. From the results of data analysis conducted by using descriptive analysis and discussion as follows: (1 Intestinal embroidery motif contains the meaning of mathematics and culture or often called Etnomatematika. On the meaning of culture there is a link between the embroidery intestine with a culture that has been there before as the existence of cultural linkage between Hindu belief Buddhism and there are similarities of motifs and decorative patterns contained in the motif embroidery intestine with ornamental variety in Indonesia. (2 The relationship between the intestine with mathematical motifs there are elements of mathematics such as geometry elements in the form of geometry of dimension one and dimension two, and the

  8. Armadillo motifs involved in vesicular transport.

    Directory of Open Access Journals (Sweden)

    Harald Striegl

    Full Text Available Armadillo (ARM repeat proteins function in various cellular processes including vesicular transport and membrane tethering. They contain an imperfect repeating sequence motif that forms a conserved three-dimensional structure. Recently, structural and functional insight into tethering mediated by the ARM-repeat protein p115 has been provided. Here we describe the p115 ARM-motifs for reasons of clarity and nomenclature and show that both sequence and structure are highly conserved among ARM-repeat proteins. We argue that there is no need to invoke repeat types other than ARM repeats for a proper description of the structure of the p115 globular head region. Additionally, we propose to define a new subfamily of ARM-like proteins and show lack of evidence that the ARM motifs found in p115 are present in other long coiled-coil tethering factors of the golgin family.

  9. Direct AUC optimization of regulatory motifs.

    Science.gov (United States)

    Zhu, Lin; Zhang, Hong-Bo; Huang, De-Shuang

    2017-07-15

    The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high-throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences. We propose a novel algorithm called CDAUC for optimizing DML-learned motifs based on the area under the receiver-operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high-throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities. CDAUC is available at: https://drive.google.com/drive/folders/0BxOW5MtIZbJjNFpCeHlBVWJHeW8 . dshuang@tongji.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  10. SEMANTIC PATCH INFERENCE

    DEFF Research Database (Denmark)

    Andersen, Jesper

    2009-01-01

    Collateral evolution the problem of updating several library-using programs in response to API changes in the used library. In this dissertation we address the issue of understanding collateral evolutions by automatically inferring a high-level specification of the changes evident in a given set ...... specifications inferred by spdiff in Linux are shown. We find that the inferred specifications concisely capture the actual collateral evolution performed in the examples....

  11. Impact of noise on molecular network inference.

    Directory of Open Access Journals (Sweden)

    Radhakrishnan Nagarajan

    Full Text Available Molecular entities work in concert as a system and mediate phenotypic outcomes and disease states. There has been recent interest in modelling the associations between molecular entities from their observed expression profiles as networks using a battery of algorithms. These networks have proven to be useful abstractions of the underlying pathways and signalling mechanisms. Noise is ubiquitous in molecular data and can have a pronounced effect on the inferred network. Noise can be an outcome of several factors including: inherent stochastic mechanisms at the molecular level, variation in the abundance of molecules, heterogeneity, sensitivity of the biological assay or measurement artefacts prevalent especially in high-throughput settings. The present study investigates the impact of discrepancies in noise variance on pair-wise dependencies, conditional dependencies and constraint-based Bayesian network structure learning algorithms that incorporate conditional independence tests as a part of the learning process. Popular network motifs and fundamental connections, namely: (a common-effect, (b three-chain, and (c coherent type-I feed-forward loop (FFL are investigated. The choice of these elementary networks can be attributed to their prevalence across more complex networks. Analytical expressions elucidating the impact of discrepancies in noise variance on pairwise dependencies and conditional dependencies for special cases of these motifs are presented. Subsequently, the impact of noise on two popular constraint-based Bayesian network structure learning algorithms such as Grow-Shrink (GS and Incremental Association Markov Blanket (IAMB that implicitly incorporate tests for conditional independence is investigated. Finally, the impact of noise on networks inferred from publicly available single cell molecular expression profiles is investigated. While discrepancies in noise variance are overlooked in routine molecular network inference, the

  12. Highly scalable Ab initio genomic motif identification

    KAUST Repository

    Marchand, Benoit; Bajic, Vladimir B.; Kaushik, Dinesh

    2011-01-01

    We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time. Copyright 2011 ACM.

  13. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  14. DNA motif elucidation using belief propagation

    KAUST Repository

    Wong, Ka-Chun; Chan, Tak-Ming; Peng, Chengbin; Li, Yue; Zhang, Zhaolei

    2013-01-01

    Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k = 8 ?10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors' websites: e.g. http://www.cs.toronto.edu/?wkc/kmerHMM. 2013 The Author(s).

  15. DNA motif elucidation using belief propagation.

    Science.gov (United States)

    Wong, Ka-Chun; Chan, Tak-Ming; Peng, Chengbin; Li, Yue; Zhang, Zhaolei

    2013-09-01

    Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k=8∼10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors' websites: e.g. http://www.cs.toronto.edu/∼wkc/kmerHMM.

  16. DNA motif elucidation using belief propagation

    KAUST Repository

    Wong, Ka-Chun

    2013-06-29

    Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k = 8 ?10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors\\' websites: e.g. http://www.cs.toronto.edu/?wkc/kmerHMM. 2013 The Author(s).

  17. CombiMotif: A new algorithm for network motifs discovery in protein-protein interaction networks

    Science.gov (United States)

    Luo, Jiawei; Li, Guanghui; Song, Dan; Liang, Cheng

    2014-12-01

    Discovering motifs in protein-protein interaction networks is becoming a current major challenge in computational biology, since the distribution of the number of network motifs can reveal significant systemic differences among species. However, this task can be computationally expensive because of the involvement of graph isomorphic detection. In this paper, we present a new algorithm (CombiMotif) that incorporates combinatorial techniques to count non-induced occurrences of subgraph topologies in the form of trees. The efficiency of our algorithm is demonstrated by comparing the obtained results with the current state-of-the art subgraph counting algorithms. We also show major differences between unicellular and multicellular organisms. The datasets and source code of CombiMotif are freely available upon request.

  18. Inference in `poor` languages

    Energy Technology Data Exchange (ETDEWEB)

    Petrov, S.

    1996-10-01

    Languages with a solvable implication problem but without complete and consistent systems of inference rules (`poor` languages) are considered. The problem of existence of finite complete and consistent inference rule system for a ``poor`` language is stated independently of the language or rules syntax. Several properties of the problem arc proved. An application of results to the language of join dependencies is given.

  19. Bayesian statistical inference

    Directory of Open Access Journals (Sweden)

    Bruno De Finetti

    2017-04-01

    Full Text Available This work was translated into English and published in the volume: Bruno De Finetti, Induction and Probability, Biblioteca di Statistica, eds. P. Monari, D. Cocchi, Clueb, Bologna, 1993.Bayesian statistical Inference is one of the last fundamental philosophical papers in which we can find the essential De Finetti's approach to the statistical inference.

  20. Geometric statistical inference

    International Nuclear Information System (INIS)

    Periwal, Vipul

    1999-01-01

    A reparametrization-covariant formulation of the inverse problem of probability is explicitly solved for finite sample sizes. The inferred distribution is explicitly continuous for finite sample size. A geometric solution of the statistical inference problem in higher dimensions is outlined

  1. Practical Bayesian Inference

    Science.gov (United States)

    Bailer-Jones, Coryn A. L.

    2017-04-01

    Preface; 1. Probability basics; 2. Estimation and uncertainty; 3. Statistical models and inference; 4. Linear models, least squares, and maximum likelihood; 5. Parameter estimation: single parameter; 6. Parameter estimation: multiple parameters; 7. Approximating distributions; 8. Monte Carlo methods for inference; 9. Parameter estimation: Markov chain Monte Carlo; 10. Frequentist hypothesis testing; 11. Model comparison; 12. Dealing with more complicated problems; References; Index.

  2. Knowledge and inference

    CERN Document Server

    Nagao, Makoto

    1990-01-01

    Knowledge and Inference discusses an important problem for software systems: How do we treat knowledge and ideas on a computer and how do we use inference to solve problems on a computer? The book talks about the problems of knowledge and inference for the purpose of merging artificial intelligence and library science. The book begins by clarifying the concept of """"knowledge"""" from many points of view, followed by a chapter on the current state of library science and the place of artificial intelligence in library science. Subsequent chapters cover central topics in the artificial intellig

  3. Logical inference and evaluation

    International Nuclear Information System (INIS)

    Perey, F.G.

    1981-01-01

    Most methodologies of evaluation currently used are based upon the theory of statistical inference. It is generally perceived that this theory is not capable of dealing satisfactorily with what are called systematic errors. Theories of logical inference should be capable of treating all of the information available, including that not involving frequency data. A theory of logical inference is presented as an extension of deductive logic via the concept of plausibility and the application of group theory. Some conclusions, based upon the application of this theory to evaluation of data, are also given

  4. Dynamic motifs in socio-economic networks

    Science.gov (United States)

    Zhang, Xin; Shao, Shuai; Stanley, H. Eugene; Havlin, Shlomo

    2014-12-01

    Socio-economic networks are of central importance in economic life. We develop a method of identifying and studying motifs in socio-economic networks by focusing on “dynamic motifs,” i.e., evolutionary connection patterns that, because of “node acquaintances” in the network, occur much more frequently than random patterns. We examine two evolving bi-partite networks: i) the world-wide commercial ship chartering market and ii) the ship build-to-order market. We find similar dynamic motifs in both bipartite networks, even though they describe different economic activities. We also find that “influence” and “persistence” are strong factors in the interaction behavior of organizations. When two companies are doing business with the same customer, it is highly probable that another customer who currently only has business relationship with one of these two companies, will become customer of the second in the future. This is the effect of influence. Persistence means that companies with close business ties to customers tend to maintain their relationships over a long period of time.

  5. Probabilistic generation of random networks taking into account information on motifs occurrence.

    Science.gov (United States)

    Bois, Frederic Y; Gayraud, Ghislaine

    2015-01-01

    Because of the huge number of graphs possible even with a small number of nodes, inference on network structure is known to be a challenging problem. Generating large random directed graphs with prescribed probabilities of occurrences of some meaningful patterns (motifs) is also difficult. We show how to generate such random graphs according to a formal probabilistic representation, using fast Markov chain Monte Carlo methods to sample them. As an illustration, we generate realistic graphs with several hundred nodes mimicking a gene transcription interaction network in Escherichia coli.

  6. The effects of incomplete protein interaction data on structural and evolutionary inferences

    DEFF Research Database (Denmark)

    de Silva, E; Thorne, T; Ingram, P

    2006-01-01

    of the inherent noise in protein interaction data. The effects of the incomplete nature of network data become very noticeable, especially for so-called network motifs. We also consider the effect of incomplete network data on functional and evolutionary inferences. Conclusion Crucially, when only small, partial...

  7. Probability and Statistical Inference

    OpenAIRE

    Prosper, Harrison B.

    2006-01-01

    These lectures introduce key concepts in probability and statistical inference at a level suitable for graduate students in particle physics. Our goal is to paint as vivid a picture as possible of the concepts covered.

  8. On quantum statistical inference

    NARCIS (Netherlands)

    Barndorff-Nielsen, O.E.; Gill, R.D.; Jupp, P.E.

    2003-01-01

    Interest in problems of statistical inference connected to measurements of quantum systems has recently increased substantially, in step with dramatic new developments in experimental techniques for studying small quantum systems. Furthermore, developments in the theory of quantum measurements have

  9. INFERENCE BUILDING BLOCKS

    Science.gov (United States)

    2018-02-15

    expressed a variety of inference techniques on discrete and continuous distributions: exact inference, importance sampling, Metropolis-Hastings (MH...without redoing any math or rewriting any code. And although our main goal is composable reuse, our performance is also good because we can use...control paths. • The Hakaru language can express mixtures of discrete and continuous distributions, but the current disintegration transformation

  10. Introductory statistical inference

    CERN Document Server

    Mukhopadhyay, Nitis

    2014-01-01

    This gracefully organized text reveals the rigorous theory of probability and statistical inference in the style of a tutorial, using worked examples, exercises, figures, tables, and computer simulations to develop and illustrate concepts. Drills and boxed summaries emphasize and reinforce important ideas and special techniques.Beginning with a review of the basic concepts and methods in probability theory, moments, and moment generating functions, the author moves to more intricate topics. Introductory Statistical Inference studies multivariate random variables, exponential families of dist

  11. CONTEMPORARY USAGE OF TRADITIONAL TURKISH MOTIFS IN PRODUCT DESIGNS

    Directory of Open Access Journals (Sweden)

    Tulay Gumuser

    2012-12-01

    Full Text Available The aim of this study is to identify the traditional Turkish motifs and its relations among present industrial designs. Traditional Turkish motifs played a very important role in 16th century onwards. The arts of the Ottoman Empire were used because of their symbolic meanings and unique styles. When we examine these motifs we encounter; Tiger Stripe, Three Spot (Çintemani, Rumi, Hatayi, Penç, Cloud, Crescent, Star, Crown, Hyacinth, Tulip and Carnation motifs. Nowadays, Turkish designers have begun to use these traditional Turkish motifs in their designs so as to create differences and awareness in the world design. The examples of these industrial designs, using the Turkish motifs, have survived and have Ottoman heritage and historical value. In this study, the Turkish motifs will be examined along with their focus on contemporary Turkish industrial designs used today.

  12. Aplikasi Ornamen Khas Maluku untuk Pengembangan Desain Motif Batik

    Directory of Open Access Journals (Sweden)

    Masiswo Masiswo

    2016-04-01

    Full Text Available ABSTRAKMaluku memiliki banyak ragam hias budaya warisan nilai leluhur berupa ornamen etnis yang merupakan kesenian dan keterampilan kerajinan. Hasil warisan tersebut sampai saat ini masih lestari hidup serta dapat dinikmati sebagai konsumsi rohani yang memuaskan manusia. Berkaitan dengan keberlangsungan nilai-nilai tradisi etnis yang berwujud pada ornamen-ornamen daerah Maluku, maka dikembangkan untuk kebutuhan manusia berupa motif batik pada kain. Pengembangan ornamen ini lebih menekankan pada representasi akan bentuk-bentuk ornamen yang diterapkan pada kerajinan batik berupa motif khas Maluku. Pengembangan alternatif desain motif batik dibuat tiga variasi yang bersumber dari ornamen khas Maluku dibuat prototipe produknya dan diuji ketahanan luntur warnanya. Hasil uji ketahanan luntur warna terhadap gosokan basah dari tiga prototipe produk berpredikat baik sekali terdapat pada “Motif Siwa” dan predikat baik pada motif “Siwa Talang” dan motif “Matahari Siwa Talang”.Kata kunci: desain, Maluku, motif batik, ornamenABSTRACTMaluku has much decorative ancestral cultural heritage value in the form of ornament ethnic arts and crafts skills. The result of the legacy is still sustainable living can be enjoyed as well as satisfying spiritual human consumption.Related to the sustainability of traditional values in the form of ethnic ornaments Maluku, it was developed for human needs in the form of batik cloth . The development of these ornaments will be more emphasis on the representation forms of ornamentation that is applied to a batik motif Maluku. Development of alternative design motif made three variations. The development of three alternative design motifs derived from the Maluku ornaments made and tested a prototype product color fastness. The test results of color fastness to wet rubbing of the three prototypes are excellent products predicated on the "Motif Siwa" and a good rating on the motif "Siwa Talang" and motif "Matahari Siwa

  13. Identity and functions of CxxC-derived motifs.

    Science.gov (United States)

    Fomenko, Dmitri E; Gladyshev, Vadim N

    2003-09-30

    Two cysteines separated by two other residues (the CxxC motif) are employed by many redox proteins for formation, isomerization, and reduction of disulfide bonds and for other redox functions. The place of the C-terminal cysteine in this motif may be occupied by serine (the CxxS motif), modifying the functional repertoire of redox proteins. Here we found that the CxxC motif may also give rise to a motif, in which the C-terminal cysteine is replaced with threonine (the CxxT motif). Moreover, in contrast to a view that the N-terminal cysteine in the CxxC motif always serves as a nucleophilic attacking group, this residue could also be replaced with threonine (the TxxC motif), serine (the SxxC motif), or other residues. In each of these CxxC-derived motifs, the presence of a downstream alpha-helix was strongly favored. A search for conserved CxxC-derived motif/helix patterns in four complete genomes representing bacteria, archaea, and eukaryotes identified known redox proteins and suggested possible redox functions for several additional proteins. Catalytic sites in peroxiredoxins were major representatives of the TxxC motif, whereas those in glutathione peroxidases represented the CxxT motif. Structural assessments indicated that threonines in these enzymes could stabilize catalytic thiolates, suggesting revisions to previously proposed catalytic triads. Each of the CxxC-derived motifs was also observed in natural selenium-containing proteins, in which selenocysteine was present in place of a catalytic cysteine.

  14. UKIRAN KERAWANG ACEH GAYO SEBAGAI INSPIRASI PENCIPTAAN MOTIF BATIK KHAS GAYO

    Directory of Open Access Journals (Sweden)

    Irfa ina Rohana Salma

    2016-12-01

    Full Text Available ABSTRAK Industri batik mulai berkembang di Gayo, tetapi belum memiliki motif batik khas daerah. Oleh karena itu perlu diciptakan motif batik khas Gayo, dengan mengambil inspirasi dari ukiran yang terdapat pada rumah tradisional yang biasa disebut ukiran kerawang Gayo. Tujuan penciptaan seni ini adalah untuk menciptakan motif batik yang memiliki ciri khas Gayo. Metode yang digunakan yaitu eksplorasi ide, perancangan, dan perwujudan menjadi motif batik. Dalam kegiatan ini telah diciptakan enam motif batik khas Gayo yaitu: (1 Motif Ceplok Gayo; (2 Motif Gayo Tegak; (3 Motif Gayo Lurus; (4 Motif Parang Gayo; (5 Motif Gayo Lembut; dan (6 Motif Geometris Gayo. Hasil uji kesukaan terhadap motif kepada lima puluh responden menunjukkan bahwa Motif Ceplok Gayo paling banyak dipilih oleh responden yaitu sebesar 19%, sedangkan Motif Parang Gayo 18%, Motif Gayo Lembut 17%, Motif Geometris Gayo 17%, Motif Gayo Lurus 15% dan Motif Gayo Tegak 14%. Rata-rata motif yang dihasilkan mendapatkan apresiasi yang baik dari responden, sehingga semua motif layak diproduksi sebagai batik khas Gayo.Kata kunci: batik Gayo, Motif Ceplok Gayo, Motif Parang Gayo.ABSTRACTBatik industry began to develop in Gayo, but have not had a typical batik motif itself. Therefore, it is necessary to create batik motifs of Gayo, by taking inspiration from the carvings found in traditional houses commonly called kerawang Gayo. The purpose of this art is to create motifs those have a Gayo characteristic. The method used are the idea exploration, design, and motifs embodiment. In this activity has created six Gayo batik motifs, namely: (1 Motif Ceplok Gayo; (2 Motif Gayo Tegak; (3 Motif GayoLurus; (4 Motif Parang Gayo; (5 Motif Gayo Lembut; dan (6 Motif Geometris Gayo. The test results fondness of the motives to fifty respondents indicated that the Motif Ceplok Gayo most preferred by respondents ie 19%, while Motif Parang Gayo 18%, Motif Gayo Lembut 17%, Motif Geometris Gayo 17%, Motif Gayo

  15. iELM—a web server to explore short linear motif-mediated interactions

    Science.gov (United States)

    Weatheritt, Robert J.; Jehl, Peter; Dinkel, Holger; Gibson, Toby J.

    2012-01-01

    The recent expansion in our knowledge of protein–protein interactions (PPIs) has allowed the annotation and prediction of hundreds of thousands of interactions. However, the function of many of these interactions remains elusive. The interactions of Eukaryotic Linear Motif (iELM) web server provides a resource for predicting the function and positional interface for a subset of interactions mediated by short linear motifs (SLiMs). The iELM prediction algorithm is based on the annotated SLiM classes from the Eukaryotic Linear Motif (ELM) resource and allows users to explore both annotated and user-generated PPI networks for SLiM-mediated interactions. By incorporating the annotated information from the ELM resource, iELM provides functional details of PPIs. This can be used in proteomic analysis, for example, to infer whether an interaction promotes complex formation or degradation. Furthermore, details of the molecular interface of the SLiM-mediated interactions are also predicted. This information is displayed in a fully searchable table, as well as graphically with the modular architecture of the participating proteins extracted from the UniProt and Phospho.ELM resources. A network figure is also presented to aid the interpretation of results. The iELM server supports single protein queries as well as large-scale proteomic submissions and is freely available at http://i.elm.eu.org. PMID:22638578

  16. RMOD: a tool for regulatory motif detection in signaling network.

    Directory of Open Access Journals (Sweden)

    Jinki Kim

    Full Text Available Regulatory motifs are patterns of activation and inhibition that appear repeatedly in various signaling networks and that show specific regulatory properties. However, the network structures of regulatory motifs are highly diverse and complex, rendering their identification difficult. Here, we present a RMOD, a web-based system for the identification of regulatory motifs and their properties in signaling networks. RMOD finds various network structures of regulatory motifs by compressing the signaling network and detecting the compressed forms of regulatory motifs. To apply it into a large-scale signaling network, it adopts a new subgraph search algorithm using a novel data structure called path-tree, which is a tree structure composed of isomorphic graphs of query regulatory motifs. This algorithm was evaluated using various sizes of signaling networks generated from the integration of various human signaling pathways and it showed that the speed and scalability of this algorithm outperforms those of other algorithms. RMOD includes interactive analysis and auxiliary tools that make it possible to manipulate the whole processes from building signaling network and query regulatory motifs to analyzing regulatory motifs with graphical illustration and summarized descriptions. As a result, RMOD provides an integrated view of the regulatory motifs and mechanism underlying their regulatory motif activities within the signaling network. RMOD is freely accessible online at the following URL: http://pks.kaist.ac.kr/rmod.

  17. Type Inference with Inequalities

    DEFF Research Database (Denmark)

    Schwartzbach, Michael Ignatieff

    1991-01-01

    of (monotonic) inequalities on the types of variables and expressions. A general result about systems of inequalities over semilattices yields a solvable form. We distinguish between deciding typability (the existence of solutions) and type inference (the computation of a minimal solution). In our case, both......Type inference can be phrased as constraint-solving over types. We consider an implicitly typed language equipped with recursive types, multiple inheritance, 1st order parametric polymorphism, and assignments. Type correctness is expressed as satisfiability of a possibly infinite collection...

  18. Structural motifs of pre-nucleation clusters.

    Science.gov (United States)

    Zhang, Y; Türkmen, I R; Wassermann, B; Erko, A; Rühl, E

    2013-10-07

    Structural motifs of pre-nucleation clusters prepared in single, optically levitated supersaturated aqueous aerosol microparticles containing CaBr2 as a model system are reported. Cluster formation is identified by means of X-ray absorption in the Br K-edge regime. The salt concentration beyond the saturation point is varied by controlling the humidity in the ambient atmosphere surrounding the 15-30 μm microdroplets. This leads to the formation of metastable supersaturated liquid particles. Distinct spectral shifts in near-edge spectra as a function of salt concentration are observed, in which the energy position of the Br K-edge is red-shifted by up to 7.1 ± 0.4 eV if the dilute solution is compared to the solid. The K-edge positions of supersaturated solutions are found between these limits. The changes in electronic structure are rationalized in terms of the formation of pre-nucleation clusters. This assumption is verified by spectral simulations using first-principle density functional theory and molecular dynamics calculations, in which structural motifs are considered, explaining the experimental results. These consist of solvated CaBr2 moieties, rather than building blocks forming calcium bromide hexahydrates, the crystal system that is formed by drying aqueous CaBr2 solutions.

  19. POWRS: position-sensitive motif discovery.

    Directory of Open Access Journals (Sweden)

    Ian W Davis

    Full Text Available Transcription factors and the short, often degenerate DNA sequences they recognize are central regulators of gene expression, but their regulatory code is challenging to dissect experimentally. Thus, computational approaches have long been used to identify putative regulatory elements from the patterns in promoter sequences. Here we present a new algorithm "POWRS" (POsition-sensitive WoRd Set for identifying regulatory sequence motifs, specifically developed to address two common shortcomings of existing algorithms. First, POWRS uses the position-specific enrichment of regulatory elements near transcription start sites to significantly increase sensitivity, while providing new information about the preferred localization of those elements. Second, POWRS forgoes position weight matrices for a discrete motif representation that appears more resistant to over-generalization. We apply this algorithm to discover sequences related to constitutive, high-level gene expression in the model plant Arabidopsis thaliana, and then experimentally validate the importance of those elements by systematically mutating two endogenous promoters and measuring the effect on gene expression levels. This provides a foundation for future efforts to rationally engineer gene expression in plants, a problem of great importance in developing biotech crop varieties.BSD-licensed Python code at http://grassrootsbio.com/papers/powrs/.

  20. Promoter Motifs in NCLDVs: An Evolutionary Perspective

    Science.gov (United States)

    Oliveira, Graziele Pereira; Andrade, Ana Cláudia dos Santos Pereira; Rodrigues, Rodrigo Araújo Lima; Arantes, Thalita Souza; Boratto, Paulo Victor Miranda; Silva, Ludmila Karen dos Santos; Dornas, Fábio Pio; Trindade, Giliane de Souza; Drumond, Betânia Paiva; La Scola, Bernard; Kroon, Erna Geessien; Abrahão, Jônatas Santos

    2017-01-01

    For many years, gene expression in the three cellular domains has been studied in an attempt to discover sequences associated with the regulation of the transcription process. Some specific transcriptional features were described in viruses, although few studies have been devoted to understanding the evolutionary aspects related to the spread of promoter motifs through related viral families. The discovery of giant viruses and the proposition of the new viral order Megavirales that comprise a monophyletic group, named nucleo-cytoplasmic large DNA viruses (NCLDV), raised new questions in the field. Some putative promoter sequences have already been described for some NCLDV members, bringing new insights into the evolutionary history of these complex microorganisms. In this review, we summarize the main aspects of the transcription regulation process in the three domains of life, followed by a systematic description of what is currently known about promoter regions in several NCLDVs. We also discuss how the analysis of the promoter sequences could bring new ideas about the giant viruses’ evolution. Finally, considering a possible common ancestor for the NCLDV group, we discussed possible promoters’ evolutionary scenarios and propose the term “MEGA-box” to designate an ancestor promoter motif (‘TATATAAAATTGA’) that could be evolved gradually by nucleotides’ gain and loss and point mutations. PMID:28117683

  1. Promoter Motifs in NCLDVs: An Evolutionary Perspective

    Directory of Open Access Journals (Sweden)

    Graziele Pereira Oliveira

    2017-01-01

    Full Text Available For many years, gene expression in the three cellular domains has been studied in an attempt to discover sequences associated with the regulation of the transcription process. Some specific transcriptional features were described in viruses, although few studies have been devoted to understanding the evolutionary aspects related to the spread of promoter motifs through related viral families. The discovery of giant viruses and the proposition of the new viral order Megavirales that comprise a monophyletic group, named nucleo-cytoplasmic large DNA viruses (NCLDV, raised new questions in the field. Some putative promoter sequences have already been described for some NCLDV members, bringing new insights into the evolutionary history of these complex microorganisms. In this review, we summarize the main aspects of the transcription regulation process in the three domains of life, followed by a systematic description of what is currently known about promoter regions in several NCLDVs. We also discuss how the analysis of the promoter sequences could bring new ideas about the giant viruses’ evolution. Finally, considering a possible common ancestor for the NCLDV group, we discussed possible promoters’ evolutionary scenarios and propose the term “MEGA-box” to designate an ancestor promoter motif (‘TATATAAAATTGA’ that could be evolved gradually by nucleotides’ gain and loss and point mutations.

  2. Parole, Sintagmatik, dan Paradigmatik Motif Batik Mega Mendung

    Directory of Open Access Journals (Sweden)

    Rudi - Nababan

    2012-04-01

    Full Text Available ABSTRACT   Discussing traditional batik is related a lot to the organization system of fine arts element ac- companying it, either the pattern of the motif or the technique of the making. In this case, the motif of Mega Mendung Cirebon certainly has patterns and rules which are traditionally different from the other motifs in other areas. Through  semiotics analysis especially with Saussure and Pierce concept, it can be traced that batik with Cirebon motif, in this case Mega Mendung motif, has parole and langue system, as unique fine arts language in batik, and structure of visual syntagmatic and paradigmatic. In the context of batik motif as fine arts language, it is surely related to sign system as symbol and icon.       Keywords: visual semiotic, Cirebon’s batik.

  3. Inference as Prediction

    Science.gov (United States)

    Watson, Jane

    2007-01-01

    Inference, or decision making, is seen in curriculum documents as the final step in a statistical investigation. For a formal statistical enquiry this may be associated with sophisticated tests involving probability distributions. For young students without the mathematical background to perform such tests, it is still possible to draw informal…

  4. Hybrid Optical Inference Machines

    Science.gov (United States)

    1991-09-27

    with labels. Now, events. a set of facts cal be generated in the dyadic form "u, R 1,2" Eichmann and Caulfield (19] consider the same type of and can...these enceding-schemes. These architectures are-based pri- 19. G. Eichmann and H. J. Caulfield, "Optical Learning (Inference)marily on optical inner

  5. Motif-role-fingerprints: the building-blocks of motifs, clustering-coefficients and transitivities in directed networks.

    Directory of Open Access Journals (Sweden)

    Mark D McDonnell

    Full Text Available Complex networks are frequently characterized by metrics for which particular subgraphs are counted. One statistic from this category, which we refer to as motif-role fingerprints, differs from global subgraph counts in that the number of subgraphs in which each node participates is counted. As with global subgraph counts, it can be important to distinguish between motif-role fingerprints that are 'structural' (induced subgraphs and 'functional' (partial subgraphs. Here we show mathematically that a vector of all functional motif-role fingerprints can readily be obtained from an arbitrary directed adjacency matrix, and then converted to structural motif-role fingerprints by multiplying that vector by a specific invertible conversion matrix. This result demonstrates that a unique structural motif-role fingerprint exists for any given functional motif-role fingerprint. We demonstrate a similar result for the cases of functional and structural motif-fingerprints without node roles, and global subgraph counts that form the basis of standard motif analysis. We also explicitly highlight that motif-role fingerprints are elemental to several popular metrics for quantifying the subgraph structure of directed complex networks, including motif distributions, directed clustering coefficient, and transitivity. The relationships between each of these metrics and motif-role fingerprints also suggest new subtypes of directed clustering coefficients and transitivities. Our results have potential utility in analyzing directed synaptic networks constructed from neuronal connectome data, such as in terms of centrality. Other potential applications include anomaly detection in networks, identification of similar networks and identification of similar nodes within networks. Matlab code for calculating all stated metrics following calculation of functional motif-role fingerprints is provided as S1 Matlab File.

  6. Rekayasa Pengembangan Desain Motif Batik Khas Melayu

    Directory of Open Access Journals (Sweden)

    Eustasia Sri Murwati

    2016-04-01

    Full Text Available ABSTRAKPengembangan desain batik melalui rancang bangun perekayasaan desain menurut ragam hias Melayu meliputi pengembangan motif dan proses, termasuk pemilihan komposisi warna. Proses yang sering dilakukan yaitu proses celup, penghilangan lilin dan celup warna tumpangan atau proses colet, celup, penghilangan lilin atau celup kemudian penghilangan lilin yang disebut Batik Kelengan. Setiap pulau di Indonesia mempunyai ciri khas budaya dan kesenian yang dikenal dengan corak/ragam hias khas daerah, juga ornamen yang diminati oleh masyarakat dari daerah tersebut atau dari daerah lain. Kondisi demikian mendorong pertumbuhan industri kerajinan yang memanfaatkan unsur–unsur seni. Adapun motif yang diperoleh adalah: Ayam Berlaga, Bungo Matahari, Kuntum Bersanding, Lancang Kuning, Encong Kerinci, Durian Pecah, Bungo Bintang, Bungo Pauh Kecil, Riang-riang, Bungo Nagaro. Pengembangan desain tersebut dipilih 3 produk terbaik yang dinilai oleh 5 penilai yang ahli di bidang desain batik, yaitu motif Durian Pecah, Ayam Berlaga, dan Bungo Matahari. Rancang bangun diversifikasi desain dengan memanfaatkan unsur–unsur seni dan ketrampilan etnis Melayu yaitu pemilihan ragam hias dan motif batik Melayu untuk diterapkan ke bahan sandang dengan komposisi warna yang menarik, sehingga produk memenuhi selera konsumen. Memperbaiki keberagaman batik dengan meningkatkan desain produk antara lain menuangkan ragam hias Melayu ke dalam proses batik yang menggunakan berbagai macam warna sehingga komposisi warna memadai. Diperoleh hasil produk batik dengan ragam hias Melayu yang berkualitas dan komposisi warna yang sesuai dengan karakter ragam hias Melayu. Rancang bangun desain produk untuk mendapatkan formulasi desain serta kelayakan prosesnya dengan penekanan pada teknologi akrab lingkungan dilaksanakan dengan alternatif pendekatan yaitu penciptaan desain bentuk baru.Kata kunci: desain, batik, rancang bangun, ragam hias, MelayuABSTRACTDevelopment of batik design through

  7. Transnationalism as a motif in family stories.

    Science.gov (United States)

    Stone, Elizabeth; Gomez, Erica; Hotzoglou, Despina; Lipnitsky, Jane Y

    2005-12-01

    Family stories have long been recognized as a vehicle for assessing components of a family's emotional and social life, including the degree to which an immigrant family has been willing to assimilate. Transnationalism, defined as living in one or more cultures and maintaining connections to both, is now increasingly common. A qualitative study of family stories in the family of those who appear completely "American" suggests that an affiliation with one's home country is nevertheless detectable in the stories via motifs such as (1) positively connotated home remedies, (2) continuing denigration of home country "enemies," (3) extensive knowledge of the home country history and politics, (4) praise of endogamy and negative assessment of exogamy, (5) superiority of home country to America, and (6) beauty of home country. Furthermore, an awareness of which model--assimilationist or transnational--governs a family's experience may help clarify a clinician's understanding of a family's strengths, vulnerabilities, and mode of framing their cultural experiences.

  8. Inference rule and problem solving

    Energy Technology Data Exchange (ETDEWEB)

    Goto, S

    1982-04-01

    Intelligent information processing signifies an opportunity of having man's intellectual activity executed on the computer, in which inference, in place of ordinary calculation, is used as the basic operational mechanism for such an information processing. Many inference rules are derived from syllogisms in formal logic. The problem of programming this inference function is referred to as a problem solving. Although logically inference and problem-solving are in close relation, the calculation ability of current computers is on a low level for inferring. For clarifying the relation between inference and computers, nonmonotonic logic has been considered. The paper deals with the above topics. 16 references.

  9. Motif decomposition of the phosphotyrosine proteome reveals a new N-terminal binding motif for SHIP2

    DEFF Research Database (Denmark)

    Miller, Martin Lee; Hanke, S.; Hinsby, A. M.

    2008-01-01

    set of 481 unique phosphotyrosine (Tyr(P)) peptides by sequence similarity to known ligands of the Src homology 2 (SH2) and the phosphotyrosine binding (PTB) domains. From 20 clusters we extracted 16 known and four new interaction motifs. Using quantitative mass spectrometry we pulled down Tyr......(P)-specific binding partners for peptides corresponding to the extracted motifs. We confirmed numerous previously known interaction motifs and found 15 new interactions mediated by phosphosites not previously known to bind SH2 or PTB. Remarkably, a novel hydrophobic N-terminal motif ((L/V/I)(L/V/I)pY) was identified...

  10. Assessing local structure motifs using order parameters for motif recognition, interstitial identification, and diffusion path characterization

    Science.gov (United States)

    Zimmermann, Nils E. R.; Horton, Matthew K.; Jain, Anubhav; Haranczyk, Maciej

    2017-11-01

    Structure-property relationships form the basis of many design rules in materials science, including synthesizability and long-term stability of catalysts, control of electrical and optoelectronic behavior in semiconductors as well as the capacity of and transport properties in cathode materials for rechargeable batteries. The immediate atomic environments (i.e., the first coordination shells) of a few atomic sites are often a key factor in achieving a desired property. Some of the most frequently encountered coordination patterns are tetrahedra, octahedra, body and face-centered cubic as well as hexagonal closed packed-like environments. Here, we showcase the usefulness of local order parameters to identify these basic structural motifs in inorganic solid materials by developing classification criteria. We introduce a systematic testing framework, the Einstein crystal test rig, that probes the response of order parameters to distortions in perfect motifs to validate our approach. Subsequently, we highlight three important application cases. First, we map basic crystal structure information of a large materials database in an intuitive manner by screening the Materials Project (MP) database (61,422 compounds) for element-specific motif distributions. Second, we use the structure-motif recognition capabilities to automatically find interstitials in metals, semiconductor, and insulator materials. Our Interstitialcy Finding Tool (InFiT) facilitates high-throughput screenings of defect properties. Third, the order parameters are reliable and compact quantitative structure descriptors for characterizing diffusion hops of intercalants as our example of magnesium in MnO2-spinel indicates. Finally, the tools developed in our work are readily and freely available as software implementations in the pymatgen library, and we expect them to be further applied to machine-learning approaches for emerging applications in materials science.

  11. Assessing Local Structure Motifs Using Order Parameters for Motif Recognition, Interstitial Identification, and Diffusion Path Characterization

    Directory of Open Access Journals (Sweden)

    Nils E. R. Zimmermann

    2017-11-01

    Full Text Available Structure–property relationships form the basis of many design rules in materials science, including synthesizability and long-term stability of catalysts, control of electrical and optoelectronic behavior in semiconductors, as well as the capacity of and transport properties in cathode materials for rechargeable batteries. The immediate atomic environments (i.e., the first coordination shells of a few atomic sites are often a key factor in achieving a desired property. Some of the most frequently encountered coordination patterns are tetrahedra, octahedra, body and face-centered cubic as well as hexagonal close packed-like environments. Here, we showcase the usefulness of local order parameters to identify these basic structural motifs in inorganic solid materials by developing classification criteria. We introduce a systematic testing framework, the Einstein crystal test rig, that probes the response of order parameters to distortions in perfect motifs to validate our approach. Subsequently, we highlight three important application cases. First, we map basic crystal structure information of a large materials database in an intuitive manner by screening the Materials Project (MP database (61,422 compounds for element-specific motif distributions. Second, we use the structure-motif recognition capabilities to automatically find interstitials in metals, semiconductor, and insulator materials. Our Interstitialcy Finding Tool (InFiT facilitates high-throughput screenings of defect properties. Third, the order parameters are reliable and compact quantitative structure descriptors for characterizing diffusion hops of intercalants as our example of magnesium in MnO2-spinel indicates. Finally, the tools developed in our work are readily and freely available as software implementations in the pymatgen library, and we expect them to be further applied to machine-learning approaches for emerging applications in materials science.

  12. Stochastic processes inference theory

    CERN Document Server

    Rao, Malempati M

    2014-01-01

    This is the revised and enlarged 2nd edition of the authors’ original text, which was intended to be a modest complement to Grenander's fundamental memoir on stochastic processes and related inference theory. The present volume gives a substantial account of regression analysis, both for stochastic processes and measures, and includes recent material on Ridge regression with some unexpected applications, for example in econometrics. The first three chapters can be used for a quarter or semester graduate course on inference on stochastic processes. The remaining chapters provide more advanced material on stochastic analysis suitable for graduate seminars and discussions, leading to dissertation or research work. In general, the book will be of interest to researchers in probability theory, mathematical statistics and electrical and information theory.

  13. Making Type Inference Practical

    DEFF Research Database (Denmark)

    Schwartzbach, Michael Ignatieff; Oxhøj, Nicholas; Palsberg, Jens

    1992-01-01

    We present the implementation of a type inference algorithm for untyped object-oriented programs with inheritance, assignments, and late binding. The algorithm significantly improves our previous one, presented at OOPSLA'91, since it can handle collection classes, such as List, in a useful way. Abo......, the complexity has been dramatically improved, from exponential time to low polynomial time. The implementation uses the techniques of incremental graph construction and constraint template instantiation to avoid representing intermediate results, doing superfluous work, and recomputing type information....... Experiments indicate that the implementation type checks as much as 100 lines pr. second. This results in a mature product, on which a number of tools can be based, for example a safety tool, an image compression tool, a code optimization tool, and an annotation tool. This may make type inference for object...

  14. Russell and Humean Inferences

    Directory of Open Access Journals (Sweden)

    João Paulo Monteiro

    2001-12-01

    Full Text Available Russell's The Problems of Philosophy tries to establish a new theory of induction, at the same time that Hume is there accused of an irrational/ scepticism about induction". But a careful analysis of the theory of knowledge explicitly acknowledged by Hume reveals that, contrary to the standard interpretation in the XXth century, possibly influenced by Russell, Hume deals exclusively with causal inference (which he never classifies as "causal induction", although now we are entitled to do so, never with inductive inference in general, mainly generalizations about sensible qualities of objects ( whether, e.g., "all crows are black" or not is not among Hume's concerns. Russell's theories are thus only false alternatives to Hume's, in (1912 or in his (1948.

  15. Causal inference in econometrics

    CERN Document Server

    Kreinovich, Vladik; Sriboonchitta, Songsak

    2016-01-01

    This book is devoted to the analysis of causal inference which is one of the most difficult tasks in data analysis: when two phenomena are observed to be related, it is often difficult to decide whether one of them causally influences the other one, or whether these two phenomena have a common cause. This analysis is the main focus of this volume. To get a good understanding of the causal inference, it is important to have models of economic phenomena which are as accurate as possible. Because of this need, this volume also contains papers that use non-traditional economic models, such as fuzzy models and models obtained by using neural networks and data mining techniques. It also contains papers that apply different econometric models to analyze real-life economic dependencies.

  16. Probing structural changes of self assembled i-motif DNA

    KAUST Repository

    Lee, Iljoon; Patil, Sachin; Fhayli, Karim; Alsaiari, Shahad K.; Khashab, Niveen M.

    2015-01-01

    We report an i-motif structural probing system based on Thioflavin T (ThT) as a fluorescent sensor. This probe can discriminate the structural changes of RET and Rb i-motif sequences according to pH change. This journal is

  17. RNA recognition motif (RRM)-containing proteins in Bombyx mori

    African Journals Online (AJOL)

    STORAGESEVER

    2009-03-20

    Mar 20, 2009 ... Recognition Motif (RRM), sometimes referred to as. RNP1, is one of the first identified domains for RNA interaction. RRM is very common ..... Apart from the RRM motif, eIF3-S9 has a Trp-Asp. (WD) repeat domain, Poly (A) ...

  18. BlockLogo: Visualization of peptide and sequence motif conservation

    DEFF Research Database (Denmark)

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian

    2013-01-01

    BlockLogo is a web-server application for the visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, se...

  19. Fingerprint motifs of phytases | Fan | African Journal of Biotechnology

    African Journals Online (AJOL)

    Among the total of potential 173 phytases gained in 11 plant genomes through MAST, PAPhys are the major phytases, and HAPhys are the minor, and other phytase groups are not found in planta. Keywords: Phytase, fingerprint motif, multiple EM for motif elicitation (MEME), MAST African Journal of Biotechnology Vol.

  20. Active inference and learning.

    Science.gov (United States)

    Friston, Karl; FitzGerald, Thomas; Rigoli, Francesco; Schwartenbeck, Philipp; O Doherty, John; Pezzulo, Giovanni

    2016-09-01

    This paper offers an active inference account of choice behaviour and learning. It focuses on the distinction between goal-directed and habitual behaviour and how they contextualise each other. We show that habits emerge naturally (and autodidactically) from sequential policy optimisation when agents are equipped with state-action policies. In active inference, behaviour has explorative (epistemic) and exploitative (pragmatic) aspects that are sensitive to ambiguity and risk respectively, where epistemic (ambiguity-resolving) behaviour enables pragmatic (reward-seeking) behaviour and the subsequent emergence of habits. Although goal-directed and habitual policies are usually associated with model-based and model-free schemes, we find the more important distinction is between belief-free and belief-based schemes. The underlying (variational) belief updating provides a comprehensive (if metaphorical) process theory for several phenomena, including the transfer of dopamine responses, reversal learning, habit formation and devaluation. Finally, we show that active inference reduces to a classical (Bellman) scheme, in the absence of ambiguity. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  1. Identification of sequence motifs significantly associated with antisense activity

    Directory of Open Access Journals (Sweden)

    Peek Andrew S

    2007-06-01

    Full Text Available Abstract Background Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features. Results We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs. Conclusion The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic

  2. Learning Convex Inference of Marginals

    OpenAIRE

    Domke, Justin

    2012-01-01

    Graphical models trained using maximum likelihood are a common tool for probabilistic inference of marginal distributions. However, this approach suffers difficulties when either the inference process or the model is approximate. In this paper, the inference process is first defined to be the minimization of a convex function, inspired by free energy approximations. Learning is then done directly in terms of the performance of the inference process at univariate marginal prediction. The main ...

  3. Probabilistic inductive inference: a survey

    OpenAIRE

    Ambainis, Andris

    2001-01-01

    Inductive inference is a recursion-theoretic theory of learning, first developed by E. M. Gold (1967). This paper surveys developments in probabilistic inductive inference. We mainly focus on finite inference of recursive functions, since this simple paradigm has produced the most interesting (and most complex) results.

  4. Multimodel inference and adaptive management

    Science.gov (United States)

    Rehme, S.E.; Powell, L.A.; Allen, Craig R.

    2011-01-01

    Ecology is an inherently complex science coping with correlated variables, nonlinear interactions and multiple scales of pattern and process, making it difficult for experiments to result in clear, strong inference. Natural resource managers, policy makers, and stakeholders rely on science to provide timely and accurate management recommendations. However, the time necessary to untangle the complexities of interactions within ecosystems is often far greater than the time available to make management decisions. One method of coping with this problem is multimodel inference. Multimodel inference assesses uncertainty by calculating likelihoods among multiple competing hypotheses, but multimodel inference results are often equivocal. Despite this, there may be pressure for ecologists to provide management recommendations regardless of the strength of their study’s inference. We reviewed papers in the Journal of Wildlife Management (JWM) and the journal Conservation Biology (CB) to quantify the prevalence of multimodel inference approaches, the resulting inference (weak versus strong), and how authors dealt with the uncertainty. Thirty-eight percent and 14%, respectively, of articles in the JWM and CB used multimodel inference approaches. Strong inference was rarely observed, with only 7% of JWM and 20% of CB articles resulting in strong inference. We found the majority of weak inference papers in both journals (59%) gave specific management recommendations. Model selection uncertainty was ignored in most recommendations for management. We suggest that adaptive management is an ideal method to resolve uncertainty when research results in weak inference.

  5. Automatic annotation of protein motif function with Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  6. Nonparametric statistical inference

    CERN Document Server

    Gibbons, Jean Dickinson

    2010-01-01

    Overall, this remains a very fine book suitable for a graduate-level course in nonparametric statistics. I recommend it for all people interested in learning the basic ideas of nonparametric statistical inference.-Eugenia Stoimenova, Journal of Applied Statistics, June 2012… one of the best books available for a graduate (or advanced undergraduate) text for a theory course on nonparametric statistics. … a very well-written and organized book on nonparametric statistics, especially useful and recommended for teachers and graduate students.-Biometrics, 67, September 2011This excellently presente

  7. Emotional inferences by pragmatics

    OpenAIRE

    Iza-Miqueleiz, Mauricio

    2017-01-01

    It has for long been taken for granted that, along the course of reading a text, world knowledge is often required in order to establish coherent links between sentences (McKoon & Ratcliff 1992, Iza & Ezquerro 2000). The content grasped from a text turns out to be strongly dependent upon the reader’s additional knowledge that allows a coherent interpretation of the text as a whole. The world knowledge directing the inference may be of distinctive nature. Gygax et al. (2007) showed that m...

  8. Generic patch inference

    DEFF Research Database (Denmark)

    Andersen, Jesper; Lawall, Julia

    2010-01-01

    A key issue in maintaining Linux device drivers is the need to keep them up to date with respect to evolutions in Linux internal libraries. Currently, there is little tool support for performing and documenting such changes. In this paper we present a tool, spdiff, that identifies common changes...... developers can use it to extract an abstract representation of the set of changes that others have made. Our experiments on recent changes in Linux show that the inferred generic patches are more concise than the corresponding patches found in commits to the Linux source tree while being safe with respect...

  9. Motif statistics and spike correlations in neuronal networks

    International Nuclear Information System (INIS)

    Hu, Yu; Shea-Brown, Eric; Trousdale, James; Josić, Krešimir

    2013-01-01

    Motifs are patterns of subgraphs of complex networks. We studied the impact of such patterns of connectivity on the level of correlated, or synchronized, spiking activity among pairs of cells in a recurrent network of integrate and fire neurons. For a range of network architectures, we find that the pairwise correlation coefficients, averaged across the network, can be closely approximated using only three statistics of network connectivity. These are the overall network connection probability and the frequencies of two second order motifs: diverging motifs, in which one cell provides input to two others, and chain motifs, in which two cells are connected via a third intermediary cell. Specifically, the prevalence of diverging and chain motifs tends to increase correlation. Our method is based on linear response theory, which enables us to express spiking statistics using linear algebra, and a resumming technique, which extrapolates from second order motifs to predict the overall effect of coupling on network correlation. Our motif-based results seek to isolate the effect of network architecture perturbatively from a known network state. (paper)

  10. Computational analyses of synergism in small molecular network motifs.

    Directory of Open Access Journals (Sweden)

    Yili Zhang

    2014-03-01

    Full Text Available Cellular functions and responses to stimuli are controlled by complex regulatory networks that comprise a large diversity of molecular components and their interactions. However, achieving an intuitive understanding of the dynamical properties and responses to stimuli of these networks is hampered by their large scale and complexity. To address this issue, analyses of regulatory networks often focus on reduced models that depict distinct, reoccurring connectivity patterns referred to as motifs. Previous modeling studies have begun to characterize the dynamics of small motifs, and to describe ways in which variations in parameters affect their responses to stimuli. The present study investigates how variations in pairs of parameters affect responses in a series of ten common network motifs, identifying concurrent variations that act synergistically (or antagonistically to alter the responses of the motifs to stimuli. Synergism (or antagonism was quantified using degrees of nonlinear blending and additive synergism. Simulations identified concurrent variations that maximized synergism, and examined the ways in which it was affected by stimulus protocols and the architecture of a motif. Only a subset of architectures exhibited synergism following paired changes in parameters. The approach was then applied to a model describing interlocked feedback loops governing the synthesis of the CREB1 and CREB2 transcription factors. The effects of motifs on synergism for this biologically realistic model were consistent with those for the abstract models of single motifs. These results have implications for the rational design of combination drug therapies with the potential for synergistic interactions.

  11. Triadic motifs in the dependence networks of virtual societies

    Science.gov (United States)

    Xie, Wen-Jie; Li, Ming-Xia; Jiang, Zhi-Qiang; Zhou, Wei-Xing

    2014-06-01

    In friendship networks, individuals have different numbers of friends, and the closeness or intimacy between an individual and her friends is heterogeneous. Using a statistical filtering method to identify relationships about who depends on whom, we construct dependence networks (which are directed) from weighted friendship networks of avatars in more than two hundred virtual societies of a massively multiplayer online role-playing game (MMORPG). We investigate the evolution of triadic motifs in dependence networks. Several metrics show that the virtual societies evolved through a transient stage in the first two to three weeks and reached a relatively stable stage. We find that the unidirectional loop motif (M9) is underrepresented and does not appear, open motifs are also underrepresented, while other close motifs are overrepresented. We also find that, for most motifs, the overall level difference of the three avatars in the same motif is significantly lower than average, whereas the sum of ranks is only slightly larger than average. Our findings show that avatars' social status plays an important role in the formation of triadic motifs.

  12. RNA motif search with data-driven element ordering.

    Science.gov (United States)

    Rampášek, Ladislav; Jimenez, Randi M; Lupták, Andrej; Vinař, Tomáš; Brejová, Broňa

    2016-05-18

    In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo .

  13. Triadic motifs in the dependence networks of virtual societies.

    Science.gov (United States)

    Xie, Wen-Jie; Li, Ming-Xia; Jiang, Zhi-Qiang; Zhou, Wei-Xing

    2014-06-10

    In friendship networks, individuals have different numbers of friends, and the closeness or intimacy between an individual and her friends is heterogeneous. Using a statistical filtering method to identify relationships about who depends on whom, we construct dependence networks (which are directed) from weighted friendship networks of avatars in more than two hundred virtual societies of a massively multiplayer online role-playing game (MMORPG). We investigate the evolution of triadic motifs in dependence networks. Several metrics show that the virtual societies evolved through a transient stage in the first two to three weeks and reached a relatively stable stage. We find that the unidirectional loop motif (M9) is underrepresented and does not appear, open motifs are also underrepresented, while other close motifs are overrepresented. We also find that, for most motifs, the overall level difference of the three avatars in the same motif is significantly lower than average, whereas the sum of ranks is only slightly larger than average. Our findings show that avatars' social status plays an important role in the formation of triadic motifs.

  14. A speedup technique for (l, d-motif finding algorithms

    Directory of Open Access Journals (Sweden)

    Dinh Hieu

    2011-03-01

    Full Text Available Abstract Background The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem. Results Three versions of the motif search problem have been proposed in the literature: Simple Motif Search (SMS, (l, d-motif search (or Planted Motif Search (PMS, and Edit-distance-based Motif Search (EMS. In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate. An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. Exact algorithms proposed in the literature for PMS take time that is exponential in some of the underlying parameters. In this paper we propose a generic technique that can be used to speedup PMS algorithms. Conclusions We present a speedup technique that can be used on any PMS algorithm. We have tested our speedup technique on a number of algorithms. These experimental results show that our speedup technique is indeed very

  15. A Comparative Study on the Origin and Variety of Motifs in Shahsavan Salt Bags and Caucasian Textiles

    Directory of Open Access Journals (Sweden)

    Siamak Egharloo

    2017-12-01

    Full Text Available Shahsavan tribes of Iran and the Caucasus region have had considerable and often inevitable intercourse and associations during the history due to their common borders and special geographical locations. The result of this has been manifested in different forms of intermingled factors and elements, specifically the textiles of tribes and ethnic groups. The interactions of the mentioned realm, i.e. textile industry, have best been appeared in patterns, motifs, colors and compositions and weaving of the hand-woven textiles among which Shahsavan "salt bags" (NAMAKDᾹN are a case in point. According to the facts and the importance of this subject, we can propose some questions as follows: What influences have the field of weaving had in these two regions as a result of their interactions and historical background? What are the motifs and their classifications in these two regions and which ones share common patterns? And which ones abound? Having been done in analytical and comparative method, the present research has examined the field of weaving in Shahsavan tribe with emphasis on its salt bags together with other Caucasian textiles (salt bags, etc.. The objectives of the research have been the study of the influences and interactions between the two regions and the recognition of patterns and motifs on their textiles. Finally, we can infer that the certain location of Iran and its common borders with the Caucasus besides tribal distribution of groups in northern and southern areas could be considered the reasons for cultural influences in the mentioned regions. The dominant motifs to be noticed here are dragons (S shape, diamonds and stars, crab-like and cross motifs as well as negative and positive spaces.

  16. Targeting functional motifs of a protein family

    Science.gov (United States)

    Bhadola, Pradeep; Deo, Nivedita

    2016-10-01

    The structural organization of a protein family is investigated by devising a method based on the random matrix theory (RMT), which uses the physiochemical properties of the amino acid with multiple sequence alignment. A graphical method to represent protein sequences using physiochemical properties is devised that gives a fast, easy, and informative way of comparing the evolutionary distances between protein sequences. A correlation matrix associated with each property is calculated, where the noise reduction and information filtering is done using RMT involving an ensemble of Wishart matrices. The analysis of the eigenvalue statistics of the correlation matrix for the β -lactamase family shows the universal features as observed in the Gaussian orthogonal ensemble (GOE). The property-based approach captures the short- as well as the long-range correlation (approximately following GOE) between the eigenvalues, whereas the previous approach (treating amino acids as characters) gives the usual short-range correlations, while the long-range correlations are the same as that of an uncorrelated series. The distribution of the eigenvector components for the eigenvalues outside the bulk (RMT bound) deviates significantly from RMT observations and contains important information about the system. The information content of each eigenvector of the correlation matrix is quantified by introducing an entropic estimate, which shows that for the β -lactamase family the smallest eigenvectors (low eigenmodes) are highly localized as well as informative. These small eigenvectors when processed gives clusters involving positions that have well-defined biological and structural importance matching with experiments. The approach is crucial for the recognition of structural motifs as shown in β -lactamase (and other families) and selectively identifies the important positions for targets to deactivate (activate) the enzymatic actions.

  17. ROMANIAN FOLKLORE MOTIFS IN FASHION DESIGN

    Directory of Open Access Journals (Sweden)

    MOCENCO Alexandra

    2014-05-01

    Full Text Available The traditional Romanian costume such as the entire popular art (architecture, woodcarvins, pottery etc. was born and lasted in our country since ancient times. Closely related to human existence, the traditional costume reflected over the years as reflected nowadays, the mentality and artistic conception of the people. Today the traditional Romanian costume became an inspiration source to the wholesale fashion production industry designers, both Romanian and international. Although the contemporary designers are working in accordance with a vision, using a wide area of styles, methods and current technology, they usually return to traditional techniques and ethnic folklore motifs, which converts and resize them, integrating them in their contemporary space. Adrian Oianu is a very appreciated Romanian designer who launched two collections inspired by his native’s country traditional costumes: “Suflecata pan’ la brau” (“Turned up ‘til the belt” and “Bucurie” (“Joy”. Dorin Negrau had as inspiration for his “Lost” collection the traditional costume from the Bihor region. Yves Saint Laurent had a collection inspired by the Romanian traditional flax blouses called “La blouse roumaine”. The paper presents the traditional Romanian values throw fashion collections. The research activity will create innovative concepts to support the garment industry in order to develop their own brand and to bring the design activities in Romania at an international level. The research was conducted during the initial stage of a project, financed through national founds, consisting in a documentary study on ethnographic characteristics of the popular costume from different regions of the country.

  18. Review article: The mountain motif in the plot of Matthew

    Directory of Open Access Journals (Sweden)

    Gert J. Volschenk

    2010-09-01

    Full Text Available This article reviewed T.L. Donaldson’s book, Jesus on the mountain: A study in Matthean theology, published in 1985 by JSOT Press, Sheffield, and focused on the mountain motif in the structure and plot of the Gospel of Matthew, in addition to the work of Donaldson on the mountain motif as a literary motif and as theological symbol. The mountain is a primary theological setting for Jesus’ ministry and thus is an important setting, serving as one of the literary devices by which Matthew structured and progressed his narrative. The Zion theological and eschatological significance and Second Temple Judaism serve as the historical and theological background for the mountain motif. The last mountain setting (Mt 28:16–20 is the culmination of the three theological themes in the plot of Matthew, namely Christology, ecclesiology and salvation history.

  19. Methods and statistics for combining motif match scores.

    Science.gov (United States)

    Bailey, T L; Gribskov, M

    1998-01-01

    Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score p-values. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The MAST sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http:/(/)www.sdsc.edu/MEME.

  20. DNA regulatory motif selection based on support vector machine ...

    African Journals Online (AJOL)

    ... machine (SVM) and its application in microarray experiment of Kashin-Beck disease. ... speed and amount of the corresponding mRNA in gene replication process. ... and revealed that some motifs may be related to the immune reactions.

  1. BEAM web server: a tool for structural RNA motif discovery.

    Science.gov (United States)

    Pietrosanto, Marco; Adinolfi, Marta; Casula, Riccardo; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela

    2018-03-15

    RNA structural motif finding is a relevant problem that becomes computationally hard when working on high-throughput data (e.g. eCLIP, PAR-CLIP), often represented by thousands of RNA molecules. Currently, the BEAM server is the only web tool capable to handle tens of thousands of RNA in input with a motif discovery procedure that is only limited by the current secondary structure prediction accuracies. The recently developed method BEAM (BEAr Motifs finder) can analyze tens of thousands of RNA molecules and identify RNA secondary structure motifs associated to a measure of their statistical significance. BEAM is extremely fast thanks to the BEAR encoding that transforms each RNA secondary structure in a string of characters. BEAM also exploits the evolutionary knowledge contained in a substitution matrix of secondary structure elements, extracted from the RFAM database of families of homologous RNAs. The BEAM web server has been designed to streamline data pre-processing by automatically handling folding and encoding of RNA sequences, giving users a choice for the preferred folding program. The server provides an intuitive and informative results page with the list of secondary structure motifs identified, the logo of each motif, its significance, graphic representation and information about its position in the RNA molecules sharing it. The web server is freely available at http://beam.uniroma2.it/ and it is implemented in NodeJS and Python with all major browsers supported. marco.pietrosanto@uniroma2.it. Supplementary data are available at Bioinformatics online.

  2. Characterizing Motif Dynamics of Electric Brain Activity Using Symbolic Analysis

    Directory of Open Access Journals (Sweden)

    Massimiliano Zanin

    2014-10-01

    Full Text Available Motifs are small recurring circuits of interactions which constitute the backbone of networked systems. Characterizing motif dynamics is therefore key to understanding the functioning of such systems. Here we propose a method to define and quantify the temporal variability and time scales of electroencephalogram (EEG motifs of resting brain activity. Given a triplet of EEG sensors, links between them are calculated by means of linear correlation; each pattern of links (i.e., each motif is then associated to a symbol, and its appearance frequency is analyzed by means of Shannon entropy. Our results show that each motif becomes observable with different coupling thresholds and evolves at its own time scale, with fronto-temporal sensors emerging at high thresholds and changing at fast time scales, and parietal ones at low thresholds and changing at slower rates. Finally, while motif dynamics differed across individuals, for each subject, it showed robustness across experimental conditions, indicating that it could represent an individual dynamical signature.

  3. Efficient motif finding algorithms for large-alphabet inputs

    Directory of Open Access Journals (Sweden)

    Pavlovic Vladimir

    2010-10-01

    Full Text Available Abstract Background We consider the problem of identifying motifs, recurring or conserved patterns, in the biological sequence data sets. To solve this task, we present a new deterministic algorithm for finding patterns that are embedded as exact or inexact instances in all or most of the input strings. Results The proposed algorithm (1 improves search efficiency compared to existing algorithms, and (2 scales well with the size of alphabet. On a synthetic planted DNA motif finding problem our algorithm is over 10× more efficient than MITRA, PMSPrune, and RISOTTO for long motifs. Improvements are orders of magnitude higher in the same setting with large alphabets. On benchmark TF-binding site problems (FNP, CRP, LexA we observed reduction in running time of over 12×, with high detection accuracy. The algorithm was also successful in rapidly identifying protein motifs in Lipocalin, Zinc metallopeptidase, and supersecondary structure motifs for Cadherin and Immunoglobin families. Conclusions Our algorithm reduces computational complexity of the current motif finding algorithms and demonstrate strong running time improvements over existing exact algorithms, especially in important and difficult cases of large-alphabet sequences.

  4. Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

    Science.gov (United States)

    Roy, Indranil; Aluru, Srinivas

    2016-01-01

    Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.

  5. An experimental test of a fundamental food web motif.

    Science.gov (United States)

    Rip, Jason M K; McCann, Kevin S; Lynn, Denis H; Fawcett, Sonia

    2010-06-07

    Large-scale changes to the world's ecosystem are resulting in the deterioration of biostructure-the complex web of species interactions that make up ecological communities. A difficult, yet crucial task is to identify food web structures, or food web motifs, that are the building blocks of this baroque network of interactions. Once identified, these food web motifs can then be examined through experiments and theory to provide mechanistic explanations for how structure governs ecosystem stability. Here, we synthesize recent ecological research to show that generalist consumers coupling resources with different interaction strengths, is one such motif. This motif amazingly occurs across an enormous range of spatial scales, and so acts to distribute coupled weak and strong interactions throughout food webs. We then perform an experiment that illustrates the importance of this motif to ecological stability. We find that weak interactions coupled to strong interactions by generalist consumers dampen strong interaction strengths and increase community stability. This study takes a critical step by isolating a common food web motif and through clear, experimental manipulation, identifies the fundamental stabilizing consequences of this structure for ecological communities.

  6. PISMA: A Visual Representation of Motif Distribution in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Rogelio Alcántara-Silva

    2017-03-01

    Full Text Available Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf .

  7. Finding cis-regulatory modules in Drosophila using phylogenetic hidden Markov models

    DEFF Research Database (Denmark)

    Wong, Wendy S W; Nielsen, Rasmus

    2007-01-01

    MOTIVATION: Finding the regulatory modules for transcription factors binding is an important step in elucidating the complex molecular mechanisms underlying regulation of gene expression. There are numerous methods available for solving this problem, however, very few of them take advantage of th...

  8. Piecing together cis-regulatory networks: insights from epigenomics studies in plants.

    Science.gov (United States)

    Huang, Shao-Shan C; Ecker, Joseph R

    2018-05-01

    5-Methylcytosine, a chemical modification of DNA, is a covalent modification found in the genomes of both plants and animals. Epigenetic inheritance of phenotypes mediated by DNA methylation is well established in plants. Most of the known mechanisms of establishing, maintaining and modifying DNA methylation have been worked out in the reference plant Arabidopsis thaliana. Major functions of DNA methylation in plants include regulation of gene expression and silencing of transposable elements (TEs) and repetitive sequences, both of which have parallels in mammalian biology, involve interaction with the transcriptional machinery, and may have profound effects on the regulatory networks in the cell. Methylome and transcriptome dynamics have been investigated in development and environmental responses in Arabidopsis and agriculturally and ecologically important plants, revealing the interdependent relationship among genomic context, methylation patterns, and expression of TE and protein coding genes. Analyses of methylome variation among plant natural populations and species have begun to quantify the extent of genetic control of methylome variation vs. true epimutation, and model the evolutionary forces driving methylome evolution in both short and long time scales. The ability of DNA methylation to positively or negatively modulate binding affinity of transcription factors (TFs) provides a natural link from genome sequence and methylation changes to transcription. Technologies that allow systematic determination of methylation sensitivities of TFs, in native genomic and methylation context without confounding factors such as histone modifications, will provide baseline datasets for building cell-type- and individual-specific regulatory networks that underlie the establishment and inheritance of complex traits. This article is categorized under: Laboratory Methods and Technologies > Genetic/Genomic Methods Biological Mechanisms > Regulatory Biology. © 2017 Wiley Periodicals, Inc.

  9. Confocal quantification of cis-regulatory reporter gene expression in living sea urchin.

    Science.gov (United States)

    Damle, Sagar; Hanser, Bridget; Davidson, Eric H; Fraser, Scott E

    2006-11-15

    Quantification of GFP reporter gene expression at single cell level in living sea urchin embryos can now be accomplished by a new method of confocal laser scanning microscopy (CLSM). Eggs injected with a tissue-specific GFP reporter DNA construct were grown to gastrula stage and their fluorescence recorded as a series of contiguous Z-section slices that spanned the entire embryo. To measure the depth-dependent signal decay seen in the successive slices of an image stack, the eggs were coinjected with a freely diffusible internal fluorescent standard, rhodamine dextran. The measured rhodamine fluorescence was used to generate a computational correction for the depth-dependent loss of GFP fluorescence per slice. The intensity of GFP fluorescence was converted to the number of GFP molecules using a conversion constant derived from CLSM imaging of eggs injected with a measured quantity of GFP protein. The outcome is a validated method for accurately counting GFP molecules in given cells in reporter gene transfer experiments, as we demonstrate by use of an expression construct expressed exclusively in skeletogenic cells.

  10. Mapping cis-Regulatory Domains in the Human Genome UsingMulti-Species Conservation of Synteny

    Energy Technology Data Exchange (ETDEWEB)

    Ahituv, Nadav; Prabhakar, Shyam; Poulin, Francis; Rubin, EdwardM.; Couronne, Olivier

    2005-06-13

    Our inability to associate distant regulatory elements with the genes that they regulate has largely precluded their examination for sequence alterations contributing to human disease. One major obstacle is the large genomic space surrounding targeted genes in which such elements could potentially reside. In order to delineate gene regulatory boundaries we used whole-genome human-mouse-chicken (HMC) and human-mouse-frog (HMF) multiple alignments to compile conserved blocks of synteny (CBS), under the hypothesis that these blocks have been kept intact throughout evolution at least in part by the requirement of regulatory elements to stay linked to the genes that they regulate. A total of 2,116 and 1,942 CBS>200 kb were assembled for HMC and HMF respectively, encompassing 1.53 and 0.86 Gb of human sequence. To support the existence of complex long-range regulatory domains within these CBS we analyzed the prevalence and distribution of chromosomal aberrations leading to position effects (disruption of a genes regulatory environment), observing a clear bias not only for mapping onto CBS but also for longer CBS size. Our results provide a genome wide data set characterizing the regulatory domains of genes and the conserved regulatory elements within them.

  11. Feature Inference Learning and Eyetracking

    Science.gov (United States)

    Rehder, Bob; Colner, Robert M.; Hoffman, Aaron B.

    2009-01-01

    Besides traditional supervised classification learning, people can learn categories by inferring the missing features of category members. It has been proposed that feature inference learning promotes learning a category's internal structure (e.g., its typical features and interfeature correlations) whereas classification promotes the learning of…

  12. An Inference Language for Imaging

    DEFF Research Database (Denmark)

    Pedemonte, Stefano; Catana, Ciprian; Van Leemput, Koen

    2014-01-01

    We introduce iLang, a language and software framework for probabilistic inference. The iLang framework enables the definition of directed and undirected probabilistic graphical models and the automated synthesis of high performance inference algorithms for imaging applications. The iLang framewor...

  13. Verification of the MOTIF code version 3.0

    International Nuclear Information System (INIS)

    Chan, T.; Guvanasen, V.; Nakka, B.W.; Reid, J.A.K.; Scheier, N.W.; Stanchell, F.W.

    1996-12-01

    As part of the Canadian Nuclear Fuel Waste Management Program (CNFWMP), AECL has developed a three-dimensional finite-element code, MOTIF (Model Of Transport In Fractured/ porous media), for detailed modelling of groundwater flow, heat transport and solute transport in a fractured rock mass. The code solves the transient and steady-state equations of groundwater flow, solute (including one-species radionuclide) transport, and heat transport in variably saturated fractured/porous media. The initial development was completed in 1985 (Guvanasen 1985) and version 3.0 was completed in 1986. This version is documented in detail in Guvanasen and Chan (in preparation). This report describes a series of fourteen verification cases which has been used to test the numerical solution techniques and coding of MOTIF, as well as demonstrate some of the MOTIF analysis capabilities. For each case the MOTIF solution has been compared with a corresponding analytical or independently developed alternate numerical solution. Several of the verification cases were included in Level 1 of the International Hydrologic Code Intercomparison Project (HYDROCOIN). The MOTIF results for these cases were also described in the HYDROCOIN Secretariat's compilation and comparison of results submitted by the various project teams (Swedish Nuclear Power Inspectorate 1988). It is evident from the graphical comparisons presented that the MOTIF solutions for the fourteen verification cases are generally in excellent agreement with known analytical or numerical solutions obtained from independent sources. This series of verification studies has established the ability of the MOTIF finite-element code to accurately model the groundwater flow and solute and heat transport phenomena for which it is intended. (author). 20 refs., 14 tabs., 32 figs

  14. Mechanisms of zero-lag synchronization in cortical motifs.

    Directory of Open Access Journals (Sweden)

    Leonardo L Gollo

    2014-04-01

    Full Text Available Zero-lag synchronization between distant cortical areas has been observed in a diversity of experimental data sets and between many different regions of the brain. Several computational mechanisms have been proposed to account for such isochronous synchronization in the presence of long conduction delays: Of these, the phenomenon of "dynamical relaying"--a mechanism that relies on a specific network motif--has proven to be the most robust with respect to parameter mismatch and system noise. Surprisingly, despite a contrary belief in the community, the common driving motif is an unreliable means of establishing zero-lag synchrony. Although dynamical relaying has been validated in empirical and computational studies, the deeper dynamical mechanisms and comparison to dynamics on other motifs is lacking. By systematically comparing synchronization on a variety of small motifs, we establish that the presence of a single reciprocally connected pair--a "resonance pair"--plays a crucial role in disambiguating those motifs that foster zero-lag synchrony in the presence of conduction delays (such as dynamical relaying from those that do not (such as the common driving triad. Remarkably, minor structural changes to the common driving motif that incorporate a reciprocal pair recover robust zero-lag synchrony. The findings are observed in computational models of spiking neurons, populations of spiking neurons and neural mass models, and arise whether the oscillatory systems are periodic, chaotic, noise-free or driven by stochastic inputs. The influence of the resonance pair is also robust to parameter mismatch and asymmetrical time delays amongst the elements of the motif. We call this manner of facilitating zero-lag synchrony resonance-induced synchronization, outline the conditions for its occurrence, and propose that it may be a general mechanism to promote zero-lag synchrony in the brain.

  15. Phyloproteomic Analysis of 11780 Six-Residue-Long Motifs Occurrences

    Directory of Open Access Journals (Sweden)

    O. V. Galzitskaya

    2015-01-01

    Full Text Available How is it possible to find good traits for phylogenetic reconstructions? Here, we present a new phyloproteomic criterion that is an occurrence of simple motifs which can be imprints of evolution history. We studied the occurrences of 11780 six-residue-long motifs consisting of two randomly located amino acids in 97 eukaryotic and 25 bacterial proteomes. For all eukaryotic proteomes, with the exception of the Amoebozoa, Stramenopiles, and Diplomonadida kingdoms, the number of proteins containing the motifs from the first group (one of the two amino acids occurs once at the terminal position made about 20%; in the case of motifs from the second (one of two amino acids occurs one time within the pattern and third (the two amino acids occur randomly groups, 30% and 50%, respectively. For bacterial proteomes, this relationship was 10%, 27%, and 63%, respectively. The matrices of correlation coefficients between numbers of proteins where a motif from the set of 11780 motifs appears at least once in 9 kingdoms and 5 phyla of bacteria were calculated. Among the correlation coefficients for eukaryotic proteomes, the correlation between the animal and fungi kingdoms (0.62 is higher than between fungi and plants (0.54. Our study provides support that animals and fungi are sibling kingdoms. Comparison of the frequencies of six-residue-long motifs in different proteomes allows obtaining phylogenetic relationships based on similarities between these frequencies: the Diplomonadida kingdoms are more close to Bacteria than to Eukaryota; Stramenopiles and Amoebozoa are more close to each other than to other kingdoms of Eukaryota.

  16. Gauging Variational Inference

    Energy Technology Data Exchange (ETDEWEB)

    Chertkov, Michael [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Ahn, Sungsoo [Korea Advanced Inst. Science and Technology (KAIST), Daejeon (Korea, Republic of); Shin, Jinwoo [Korea Advanced Inst. Science and Technology (KAIST), Daejeon (Korea, Republic of)

    2017-05-25

    Computing partition function is the most important statistical inference task arising in applications of Graphical Models (GM). Since it is computationally intractable, approximate methods have been used to resolve the issue in practice, where meanfield (MF) and belief propagation (BP) are arguably the most popular and successful approaches of a variational type. In this paper, we propose two new variational schemes, coined Gauged-MF (G-MF) and Gauged-BP (G-BP), improving MF and BP, respectively. Both provide lower bounds for the partition function by utilizing the so-called gauge transformation which modifies factors of GM while keeping the partition function invariant. Moreover, we prove that both G-MF and G-BP are exact for GMs with a single loop of a special structure, even though the bare MF and BP perform badly in this case. Our extensive experiments, on complete GMs of relatively small size and on large GM (up-to 300 variables) confirm that the newly proposed algorithms outperform and generalize MF and BP.

  17. Social Inference Through Technology

    Science.gov (United States)

    Oulasvirta, Antti

    Awareness cues are computer-mediated, real-time indicators of people’s undertakings, whereabouts, and intentions. Already in the mid-1970 s, UNIX users could use commands such as “finger” and “talk” to find out who was online and to chat. The small icons in instant messaging (IM) applications that indicate coconversants’ presence in the discussion space are the successors of “finger” output. Similar indicators can be found in online communities, media-sharing services, Internet relay chat (IRC), and location-based messaging applications. But presence and availability indicators are only the tip of the iceberg. Technological progress has enabled richer, more accurate, and more intimate indicators. For example, there are mobile services that allow friends to query and follow each other’s locations. Remote monitoring systems developed for health care allow relatives and doctors to assess the wellbeing of homebound patients (see, e.g., Tang and Venables 2000). But users also utilize cues that have not been deliberately designed for this purpose. For example, online gamers pay attention to other characters’ behavior to infer what the other players are like “in real life.” There is a common denominator underlying these examples: shared activities rely on the technology’s representation of the remote person. The other human being is not physically present but present only through a narrow technological channel.

  18. Learning Probabilistic Inference through Spike-Timing-Dependent Plasticity.

    Science.gov (United States)

    Pecevski, Dejan; Maass, Wolfgang

    2016-01-01

    Numerous experimental data show that the brain is able to extract information from complex, uncertain, and often ambiguous experiences. Furthermore, it can use such learnt information for decision making through probabilistic inference. Several models have been proposed that aim at explaining how probabilistic inference could be performed by networks of neurons in the brain. We propose here a model that can also explain how such neural network could acquire the necessary information for that from examples. We show that spike-timing-dependent plasticity in combination with intrinsic plasticity generates in ensembles of pyramidal cells with lateral inhibition a fundamental building block for that: probabilistic associations between neurons that represent through their firing current values of random variables. Furthermore, by combining such adaptive network motifs in a recursive manner the resulting network is enabled to extract statistical information from complex input streams, and to build an internal model for the distribution p (*) that generates the examples it receives. This holds even if p (*) contains higher-order moments. The analysis of this learning process is supported by a rigorous theoretical foundation. Furthermore, we show that the network can use the learnt internal model immediately for prediction, decision making, and other types of probabilistic inference.

  19. Learning Probabilistic Inference through Spike-Timing-Dependent Plasticity123

    Science.gov (United States)

    Pecevski, Dejan

    2016-01-01

    Abstract Numerous experimental data show that the brain is able to extract information from complex, uncertain, and often ambiguous experiences. Furthermore, it can use such learnt information for decision making through probabilistic inference. Several models have been proposed that aim at explaining how probabilistic inference could be performed by networks of neurons in the brain. We propose here a model that can also explain how such neural network could acquire the necessary information for that from examples. We show that spike-timing-dependent plasticity in combination with intrinsic plasticity generates in ensembles of pyramidal cells with lateral inhibition a fundamental building block for that: probabilistic associations between neurons that represent through their firing current values of random variables. Furthermore, by combining such adaptive network motifs in a recursive manner the resulting network is enabled to extract statistical information from complex input streams, and to build an internal model for the distribution p* that generates the examples it receives. This holds even if p* contains higher-order moments. The analysis of this learning process is supported by a rigorous theoretical foundation. Furthermore, we show that the network can use the learnt internal model immediately for prediction, decision making, and other types of probabilistic inference. PMID:27419214

  20. Binding properties of SUMO-interacting motifs (SIMs) in yeast.

    Science.gov (United States)

    Jardin, Christophe; Horn, Anselm H C; Sticht, Heinrich

    2015-03-01

    Small ubiquitin-like modifier (SUMO) conjugation and interaction play an essential role in many cellular processes. A large number of yeast proteins is known to interact non-covalently with SUMO via short SUMO-interacting motifs (SIMs), but the structural details of this interaction are yet poorly characterized. In the present work, sequence analysis of a large dataset of 148 yeast SIMs revealed the existence of a hydrophobic core binding motif and a preference for acidic residues either within or adjacent to the core motif. Thus the sequence properties of yeast SIMs are highly similar to those described for human. Molecular dynamics simulations were performed to investigate the binding preferences for four representative SIM peptides differing in the number and distribution of acidic residues. Furthermore, the relative stability of two previously observed alternative binding orientations (parallel, antiparallel) was assessed. For all SIMs investigated, the antiparallel binding mode remained stable in the simulations and the SIMs were tightly bound via their hydrophobic core residues supplemented by polar interactions of the acidic residues. In contrary, the stability of the parallel binding mode is more dependent on the sequence features of the SIM motif like the number and position of acidic residues or the presence of additional adjacent interaction motifs. This information should be helpful to enhance the prediction of SIMs and their binding properties in different organisms to facilitate the reconstruction of the SUMO interactome.

  1. RegRNA: an integrated web server for identifying regulatory RNA motifs and elements

    OpenAIRE

    Huang, Hsi-Yuan; Chien, Chia-Hung; Jen, Kuan-Hua; Huang, Hsien-Da

    2006-01-01

    Numerous regulatory structural motifs have been identified as playing essential roles in transcriptional and post-transcriptional regulation of gene expression. RegRNA is an integrated web server for identifying the homologs of regulatory RNA motifs and elements against an input mRNA sequence. Both sequence homologs and structural homologs of regulatory RNA motifs can be recognized. The regulatory RNA motifs supported in RegRNA are categorized into several classes: (i) motifs in mRNA 5′-untra...

  2. Optimization methods for logical inference

    CERN Document Server

    Chandru, Vijay

    2011-01-01

    Merging logic and mathematics in deductive inference-an innovative, cutting-edge approach. Optimization methods for logical inference? Absolutely, say Vijay Chandru and John Hooker, two major contributors to this rapidly expanding field. And even though ""solving logical inference problems with optimization methods may seem a bit like eating sauerkraut with chopsticks. . . it is the mathematical structure of a problem that determines whether an optimization model can help solve it, not the context in which the problem occurs."" Presenting powerful, proven optimization techniques for logic in

  3. How pathogens use linear motifs to perturb host cell networks

    KAUST Repository

    Via, Allegra; Uyar, Bora; Brun, Christine; Zanzoni, Andreas

    2015-01-01

    Molecular mimicry is one of the powerful stratagems that pathogens employ to colonise their hosts and take advantage of host cell functions to guarantee their replication and dissemination. In particular, several viruses have evolved the ability to interact with host cell components through protein short linear motifs (SLiMs) that mimic host SLiMs, thus facilitating their internalisation and the manipulation of a wide range of cellular networks. Here we present convincing evidence from the literature that motif mimicry also represents an effective, widespread hijacking strategy in prokaryotic and eukaryotic parasites. Further insights into host motif mimicry would be of great help in the elucidation of the molecular mechanisms behind host cell invasion and the development of anti-infective therapeutic strategies.

  4. Discovery of candidate KEN-box motifs using cell cycle keyword enrichment combined with native disorder prediction and motif conservation.

    Science.gov (United States)

    Michael, Sushama; Travé, Gilles; Ramu, Chenna; Chica, Claudia; Gibson, Toby J

    2008-02-15

    KEN-box-mediated target selection is one of the mechanisms used in the proteasomal destruction of mitotic cell cycle proteins via the APC/C complex. While annotating the Eukaryotic Linear Motif resource (ELM, http://elm.eu.org/), we found that KEN motifs were significantly enriched in human protein entries with cell cycle keywords in the UniProt/Swiss-Prot database-implying that KEN-boxes might be more common than reported. Matches to short linear motifs in protein database searches are not, per se, significant. KEN-box enrichment with cell cycle Gene Ontology terms suggests that collectively these motifs are functional but does not prove that any given instance is so. Candidates were surveyed for native disorder prediction using GlobPlot and IUPred and for motif conservation in homologues. Among >25 strong new candidates, the most notable are human HIPK2, CHFR, CDC27, Dab2, Upf2, kinesin Eg5, DNA Topoisomerase 1 and yeast Cdc5 and Swi5. A similar number of weaker candidates were present. These proteins have yet to be tested for APC/C targeted destruction, providing potential new avenues of research.

  5. Selection against spurious promoter motifs correlates withtranslational efficiency across bacteria

    Energy Technology Data Exchange (ETDEWEB)

    Froula, Jeffrey L.; Francino, M. Pilar

    2007-05-01

    Because binding of RNAP to misplaced sites could compromise the efficiency of transcription, natural selection for the optimization of gene expression should regulate the distribution of DNA motifs capable of RNAP-binding across the genome. Here we analyze the distribution of the -10 promoter motifs that bind the {sigma}{sup 70} subunit of RNAP in 42 bacterial genomes. We show that selection on these motifs operates across the genome, maintaining an over-representation of -10 motifs in regulatory sequences while eliminating them from the nonfunctional and, in most cases, from the protein coding regions. In some genomes, however, -10 sites are over-represented in the coding sequences; these sites could induce pauses effecting regulatory roles throughout the length of a transcriptional unit. For nonfunctional sequences, the extent of motif under-representation varies across genomes in a manner that broadly correlates with the number of tRNA genes, a good indicator of translational speed and growth rate. This suggests that minimizing the time invested in gene transcription is an important selective pressure against spurious binding. However, selection against spurious binding is detectable in the reduced genomes of host-restricted bacteria that grow at slow rates, indicating that components of efficiency other than speed may also be important. Minimizing the number of RNAP molecules per cell required for transcription, and the corresponding energetic expense, may be most relevant in slow growers. These results indicate that genome-level properties affecting the efficiency of transcription and translation can respond in an integrated manner to optimize gene expression. The detection of selection against promoter motifs in nonfunctional regions also implies that no sequence may evolve free of selective constraints, at least in the relatively small and unstructured genomes of bacteria.

  6. Genome Analysis of Conserved Dehydrin Motifs in Vascular Plants

    Directory of Open Access Journals (Sweden)

    Ahmad A. Malik

    2017-05-01

    Full Text Available Dehydrins, a large family of abiotic stress proteins, are defined by the presence of a mostly conserved motif known as the K-segment, and may also contain two other conserved motifs known as the Y-segment and S-segment. Using the dehydrin literature, we developed a sequence motif definition of the K-segment, which we used to create a large dataset of dehydrin sequences by searching the Pfam00257 dehydrin dataset and the Phytozome 10 sequences of vascular plants. A comprehensive analysis of these sequences reveals that lysine residues are highly conserved in the K-segment, while the amino acid type is often conserved at other positions. Despite the Y-segment name, the central tyrosine is somewhat conserved, but can be substituted with two other small aromatic amino acids (phenylalanine or histidine. The S-segment contains a series of serine residues, but in some proteins is also preceded by a conserved LHR sequence. In many dehydrins containing all three of these motifs the S-segment is linked to the K-segment by a GXGGRRKK motif (where X can be any amino acid, suggesting a functional linkage between these two motifs. An analysis of the sequences shows that the dehydrin architecture and several biochemical properties (isoelectric point, molecular mass, and hydrophobicity score are dependent on each other, and that some dehydrin architectures are overexpressed during certain abiotic stress, suggesting that they may be optimized for a specific abiotic stress while others are involved in all forms of dehydration stress (drought, cold, and salinity.

  7. BayesMD: flexible biological modeling for motif discovery

    DEFF Research Database (Denmark)

    Tang, Man-Hung Eric; Krogh, Anders; Winther, Ole

    2008-01-01

    We present BayesMD, a Bayesian Motif Discovery model with several new features. Three different types of biological a priori knowledge are built into the framework in a modular fashion. A mixture of Dirichlets is used as prior over nucleotide probabilities in binding sites. It is trained on trans......We present BayesMD, a Bayesian Motif Discovery model with several new features. Three different types of biological a priori knowledge are built into the framework in a modular fashion. A mixture of Dirichlets is used as prior over nucleotide probabilities in binding sites. It is trained...

  8. Probabilistic inference in general graphical models through sampling in stochastic networks of spiking neurons.

    Science.gov (United States)

    Pecevski, Dejan; Buesing, Lars; Maass, Wolfgang

    2011-12-01

    An important open problem of computational neuroscience is the generic organization of computations in networks of neurons in the brain. We show here through rigorous theoretical analysis that inherent stochastic features of spiking neurons, in combination with simple nonlinear computational operations in specific network motifs and dendritic arbors, enable networks of spiking neurons to carry out probabilistic inference through sampling in general graphical models. In particular, it enables them to carry out probabilistic inference in Bayesian networks with converging arrows ("explaining away") and with undirected loops, that occur in many real-world tasks. Ubiquitous stochastic features of networks of spiking neurons, such as trial-to-trial variability and spontaneous activity, are necessary ingredients of the underlying computational organization. We demonstrate through computer simulations that this approach can be scaled up to neural emulations of probabilistic inference in fairly large graphical models, yielding some of the most complex computations that have been carried out so far in networks of spiking neurons.

  9. On principles of inductive inference

    OpenAIRE

    Kostecki, Ryszard Paweł

    2011-01-01

    We propose an intersubjective epistemic approach to foundations of probability theory and statistical inference, based on relative entropy and category theory, and aimed to bypass the mathematical and conceptual problems of existing foundational approaches.

  10. Statistical inference via fiducial methods

    OpenAIRE

    Salomé, Diemer

    1998-01-01

    In this thesis the attention is restricted to inductive reasoning using a mathematical probability model. A statistical procedure prescribes, for every theoretically possible set of data, the inference about the unknown of interest. ... Zie: Summary

  11. Statistical inference for stochastic processes

    National Research Council Canada - National Science Library

    Basawa, Ishwar V; Prakasa Rao, B. L. S

    1980-01-01

    The aim of this monograph is to attempt to reduce the gap between theory and applications in the area of stochastic modelling, by directing the interest of future researchers to the inference aspects...

  12. Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks

    Science.gov (United States)

    Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E.; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A.; Kellis, Manolis

    2012-01-01

    Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level. PMID:22456606

  13. Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks.

    Science.gov (United States)

    Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A; Kellis, Manolis

    2012-07-01

    Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein-protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level.

  14. Active inference, communication and hermeneutics.

    Science.gov (United States)

    Friston, Karl J; Frith, Christopher D

    2015-07-01

    Hermeneutics refers to interpretation and translation of text (typically ancient scriptures) but also applies to verbal and non-verbal communication. In a psychological setting it nicely frames the problem of inferring the intended content of a communication. In this paper, we offer a solution to the problem of neural hermeneutics based upon active inference. In active inference, action fulfils predictions about how we will behave (e.g., predicting we will speak). Crucially, these predictions can be used to predict both self and others--during speaking and listening respectively. Active inference mandates the suppression of prediction errors by updating an internal model that generates predictions--both at fast timescales (through perceptual inference) and slower timescales (through perceptual learning). If two agents adopt the same model, then--in principle--they can predict each other and minimise their mutual prediction errors. Heuristically, this ensures they are singing from the same hymn sheet. This paper builds upon recent work on active inference and communication to illustrate perceptual learning using simulated birdsongs. Our focus here is the neural hermeneutics implicit in learning, where communication facilitates long-term changes in generative models that are trying to predict each other. In other words, communication induces perceptual learning and enables others to (literally) change our minds and vice versa. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  15. Positional bias of general and tissue-specific regulatory motifs in mouse gene promoters

    Directory of Open Access Journals (Sweden)

    Farré Domènec

    2007-12-01

    Full Text Available Abstract Background The arrangement of regulatory motifs in gene promoters, or promoter architecture, is the result of mutation and selection processes that have operated over many millions of years. In mammals, tissue-specific transcriptional regulation is related to the presence of specific protein-interacting DNA motifs in gene promoters. However, little is known about the relative location and spacing of these motifs. To fill this gap, we have performed a systematic search for motifs that show significant bias at specific promoter locations in a large collection of housekeeping and tissue-specific genes. Results We observe that promoters driving housekeeping gene expression are enriched in particular motifs with strong positional bias, such as YY1, which are of little relevance in promoters driving tissue-specific expression. We also identify a large number of motifs that show positional bias in genes expressed in a highly tissue-specific manner. They include well-known tissue-specific motifs, such as HNF1 and HNF4 motifs in liver, kidney and small intestine, or RFX motifs in testis, as well as many potentially novel regulatory motifs. Based on this analysis, we provide predictions for 559 tissue-specific motifs in mouse gene promoters. Conclusion The study shows that motif positional bias is an important feature of mammalian proximal promoters and that it affects both general and tissue-specific motifs. Motif positional constraints define very distinct promoter architectures depending on breadth of expression and type of tissue.

  16. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods.

    Science.gov (United States)

    Schaffter, Thomas; Marbach, Daniel; Floreano, Dario

    2011-08-15

    Over the last decade, numerous methods have been developed for inference of regulatory networks from gene expression data. However, accurate and systematic evaluation of these methods is hampered by the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks. Here, we describe a novel and comprehensive method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW). In addition to the generation of detailed dynamical models of gene regulatory networks to be used as benchmarks, GNW provides a network motif analysis that reveals systematic prediction errors, thereby indicating potential ways of improving inference methods. The accuracy of network inference methods is evaluated using standard metrics such as precision-recall and receiver operating characteristic curves. We show how GNW can be used to assess the performance and identify the strengths and weaknesses of six inference methods. Furthermore, we used GNW to provide the international Dialogue for Reverse Engineering Assessments and Methods (DREAM) competition with three network inference challenges (DREAM3, DREAM4 and DREAM5). GNW is available at http://gnw.sourceforge.net along with its Java source code, user manual and supporting data. Supplementary data are available at Bioinformatics online. dario.floreano@epfl.ch.

  17. Optimal inference with suboptimal models: Addiction and active Bayesian inference

    Science.gov (United States)

    Schwartenbeck, Philipp; FitzGerald, Thomas H.B.; Mathys, Christoph; Dolan, Ray; Wurst, Friedrich; Kronbichler, Martin; Friston, Karl

    2015-01-01

    When casting behaviour as active (Bayesian) inference, optimal inference is defined with respect to an agent’s beliefs – based on its generative model of the world. This contrasts with normative accounts of choice behaviour, in which optimal actions are considered in relation to the true structure of the environment – as opposed to the agent’s beliefs about worldly states (or the task). This distinction shifts an understanding of suboptimal or pathological behaviour away from aberrant inference as such, to understanding the prior beliefs of a subject that cause them to behave less ‘optimally’ than our prior beliefs suggest they should behave. Put simply, suboptimal or pathological behaviour does not speak against understanding behaviour in terms of (Bayes optimal) inference, but rather calls for a more refined understanding of the subject’s generative model upon which their (optimal) Bayesian inference is based. Here, we discuss this fundamental distinction and its implications for understanding optimality, bounded rationality and pathological (choice) behaviour. We illustrate our argument using addictive choice behaviour in a recently described ‘limited offer’ task. Our simulations of pathological choices and addictive behaviour also generate some clear hypotheses, which we hope to pursue in ongoing empirical work. PMID:25561321

  18. SSTRAP: A computational model for genomic motif discovery ...

    African Journals Online (AJOL)

    Computational methods can potentially provide high-quality prediction of biological molecules such as DNA binding sites and Transcription factors and therefore reduce the time needed for experimental verification and challenges associated with experimental methods. These biological molecules or motifs have significant ...

  19. Identification of a Baeyer-Villiger monooxygenase sequence motif

    NARCIS (Netherlands)

    Fraaije, MW; Kamerbeek, NM; van Berkel, WJH; Janssen, DB; Kamerbeek, Nanne M.; Berkel, Willem J.H. van

    2002-01-01

    Baeyer-Villiger monooxygenases (BVMOs) form a distinct class of flavoproteins that catalyze the insertion of an oxygen atom in a C-C bond using dioxygen and NAD(P)H. Using newly characterized BVMO sequences, we have uncovered a BVMO-identifying sequence motif: FXGXXXRXXXW(P/D). Studies with

  20. Linear motif atlas for phosphorylation-dependent signaling

    DEFF Research Database (Denmark)

    Miller, Martin Lee; Jensen, LJ; Diella, F

    2008-01-01

    bind to them remains a challenge. NetPhorest is an atlas of consensus sequence motifs that covers 179 kinases and 104 phosphorylation-dependent binding domains [Src homology 2 (SH2), phosphotyrosine binding (PTB), BRCA1 C-terminal (BRCT), WW, and 14-3-3]. The atlas reveals new aspects of signaling...

  1. [Cover motifs of the Tidsskrift. A 14-year cavalcade].

    Science.gov (United States)

    Nylenna, M

    1998-12-10

    In 1985 the Journal of the Norwegian Medical Association changed its cover policy, moving the table of contents inside the Journal and introducing cover illustrations. This article provides an analysis of all cover illustrations published over this 14-year period, 420 covers in all. There is a great variation in cover motifs and designs and a development towards more general motifs. The initial emphasis on historical and medical aspects is now less pronounced, while the use of works of art and nature motifs has increased, and the cover now more often has a direct bearing on the specific contents of the issue. Professor of medical history Oivind Larsen has photographed two thirds of the covers and contributed 95% of the inside essay-style reflections on the cover motif. Over the years, he has expanded the role of the historian of medicine disseminating knowledge to include that of the raconteur with a personal tone of voice. The Journal's covers are now one of its most characteristic features, emblematic of the Journal's ambition of standing for quality and timelessness vis-à-vis the news media, and of its aim of bridging the gap between medicine and the humanities.

  2. Perspektif Psikologi Humanistik Abraham Maslow dalam Meninjau Motif Pelaku Pembunuhan

    OpenAIRE

    Nurwatie, Azrina; Fauzia, Rahmi; Akbar, Sukma Noor

    2014-01-01

    Fokus penelitian ini diarahkan pada motif pelaku pembunuhan dengan meninjaunya melalui perspektif psikologi humanistik Abraham Maslow. Subyek dalam penelitian ini berjumlah dua orang narapidana yang berada di Lapas Kelas IIA Anak Martapura dengan kasus pembunuhan. Metode penelitian yang digunakan dalam penelitian ini adalah metode penelitian kualitatif. Teknik pengumpulan data melalui wawancara, observasi, dokumentasi,dan pemeriksaan psikologis (tes grafis). Berdasarkan hasil analisis data da...

  3. Motifs in triadic random graphs based on Steiner triple systems

    Science.gov (United States)

    Winkler, Marco; Reichardt, Jörg

    2013-08-01

    Conventionally, pairwise relationships between nodes are considered to be the fundamental building blocks of complex networks. However, over the last decade, the overabundance of certain subnetwork patterns, i.e., the so-called motifs, has attracted much attention. It has been hypothesized that these motifs, instead of links, serve as the building blocks of network structures. Although the relation between a network's topology and the general properties of the system, such as its function, its robustness against perturbations, or its efficiency in spreading information, is the central theme of network science, there is still a lack of sound generative models needed for testing the functional role of subgraph motifs. Our work aims to overcome this limitation. We employ the framework of exponential random graph models (ERGMs) to define models based on triadic substructures. The fact that only a small portion of triads can actually be set independently poses a challenge for the formulation of such models. To overcome this obstacle, we use Steiner triple systems (STSs). These are partitions of sets of nodes into pair-disjoint triads, which thus can be specified independently. Combining the concepts of ERGMs and STSs, we suggest generative models capable of generating ensembles of networks with nontrivial triadic Z-score profiles. Further, we discover inevitable correlations between the abundance of triad patterns, which occur solely for statistical reasons and need to be taken into account when discussing the functional implications of motif statistics. Moreover, we calculate the degree distributions of our triadic random graphs analytically.

  4. Genetic analysis of beta1 integrin "activation motifs" in mice

    DEFF Research Database (Denmark)

    Czuchra, Aleksandra; Meyer, Hannelore; Legate, Kyle R

    2006-01-01

    -null phenotype in vivo. Surprisingly, neither the substitution of the tyrosines with phenylalanine nor the aspartic acid with alanine resulted in an obvious defect. These data suggest that the NPXY motifs of the beta1 integrin tail are essential for beta1 integrin function, whereas tyrosine phosphorylation...

  5. Insights into the motif preference of APOBEC3 enzymes.

    Directory of Open Access Journals (Sweden)

    Diako Ebrahimi

    Full Text Available We used a multivariate data analysis approach to identify motifs associated with HIV hypermutation by different APOBEC3 enzymes. The analysis showed that APOBEC3G targets G mainly within GG, TG, TGG, GGG, TGGG and also GGGT. The G nucleotides flanked by a C at the 3' end (in +1 and +2 positions were indicated as disfavoured targets by APOBEC3G. The G nucleotides within GGGG were found to be targeted at a frequency much less than what is expected. We found that the infrequent G-to-A mutation within GGGG is not limited to the inaccessibility, to APOBEC3, of poly Gs in the central and 3'polypurine tracts (PPTs which remain double stranded during the HIV reverse transcription. GGGG motifs outside the PPTs were also disfavoured. The motifs GGAG and GAGG were also found to be disfavoured targets for APOBEC3. The motif-dependent mutation of G within the HIV genome by members of the APOBEC3 family other than APOBEC3G was limited to GA→AA changes. The results did not show evidence of other types of context dependent G-to-A changes in the HIV genome.

  6. Sequence alignment reveals possible MAPK docking motifs on HIV proteins.

    Directory of Open Access Journals (Sweden)

    Perry Evans

    Full Text Available Over the course of HIV infection, virus replication is facilitated by the phosphorylation of HIV proteins by human ERK1 and ERK2 mitogen-activated protein kinases (MAPKs. MAPKs are known to phosphorylate their substrates by first binding with them at a docking site. Docking site interactions could be viable drug targets because the sequences guiding them are more specific than phosphorylation consensus sites. In this study we use multiple bioinformatics tools to discover candidate MAPK docking site motifs on HIV proteins known to be phosphorylated by MAPKs, and we discuss the possibility of targeting docking sites with drugs. Using sequence alignments of HIV proteins of different subtypes, we show that MAPK docking patterns previously described for human proteins appear on the HIV matrix, Tat, and Vif proteins in a strain dependent manner, but are absent from HIV Rev and appear on all HIV Nef strains. We revise the regular expressions of previously annotated MAPK docking patterns in order to provide a subtype independent motif that annotates all HIV proteins. One revision is based on a documented human variant of one of the substrate docking motifs, and the other reduces the number of required basic amino acids in the standard docking motifs from two to one. The proposed patterns are shown to be consistent with in silico docking between ERK1 and the HIV matrix protein. The motif usage on HIV proteins is sufficiently different from human proteins in amino acid sequence similarity to allow for HIV specific targeting using small-molecule drugs.

  7. Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data

    Directory of Open Access Journals (Sweden)

    de los Reyes Benildo G

    2008-04-01

    Full Text Available Abstract Background Integrating data from multiple global assays and curated databases is essential to understand the spatio-temporal interactions within cells. Different experiments measure cellular processes at various widths and depths, while databases contain biological information based on established facts or published data. Integrating these complementary datasets helps infer a mutually consistent transcriptional regulatory network (TRN with strong similarity to the structure of the underlying genetic regulatory modules. Decomposing the TRN into a small set of recurring regulatory patterns, called network motifs (NM, facilitates the inference. Identifying NMs defined by specific transcription factors (TF establishes the framework structure of a TRN and allows the inference of TF-target gene relationship. This paper introduces a computational framework for utilizing data from multiple sources to infer TF-target gene relationships on the basis of NMs. The data include time course gene expression profiles, genome-wide location analysis data, binding sequence data, and gene ontology (GO information. Results The proposed computational framework was tested using gene expression data associated with cell cycle progression in yeast. Among 800 cell cycle related genes, 85 were identified as candidate TFs and classified into four previously defined NMs. The NMs for a subset of TFs are obtained from literature. Support vector machine (SVM classifiers were used to estimate NMs for the remaining TFs. The potential downstream target genes for the TFs were clustered into 34 biologically significant groups. The relationships between TFs and potential target gene clusters were examined by training recurrent neural networks whose topologies mimic the NMs to which the TFs are classified. The identified relationships between TFs and gene clusters were evaluated using the following biological validation and statistical analyses: (1 Gene set enrichment

  8. Interactive Instruction in Bayesian Inference

    DEFF Research Database (Denmark)

    Khan, Azam; Breslav, Simon; Hornbæk, Kasper

    2018-01-01

    An instructional approach is presented to improve human performance in solving Bayesian inference problems. Starting from the original text of the classic Mammography Problem, the textual expression is modified and visualizations are added according to Mayer’s principles of instruction. These pri......An instructional approach is presented to improve human performance in solving Bayesian inference problems. Starting from the original text of the classic Mammography Problem, the textual expression is modified and visualizations are added according to Mayer’s principles of instruction....... These principles concern coherence, personalization, signaling, segmenting, multimedia, spatial contiguity, and pretraining. Principles of self-explanation and interactivity are also applied. Four experiments on the Mammography Problem showed that these principles help participants answer the questions...... that an instructional approach to improving human performance in Bayesian inference is a promising direction....

  9. On Maximum Entropy and Inference

    Directory of Open Access Journals (Sweden)

    Luigi Gresele

    2017-11-01

    Full Text Available Maximum entropy is a powerful concept that entails a sharp separation between relevant and irrelevant variables. It is typically invoked in inference, once an assumption is made on what the relevant variables are, in order to estimate a model from data, that affords predictions on all other (dependent variables. Conversely, maximum entropy can be invoked to retrieve the relevant variables (sufficient statistics directly from the data, once a model is identified by Bayesian model selection. We explore this approach in the case of spin models with interactions of arbitrary order, and we discuss how relevant interactions can be inferred. In this perspective, the dimensionality of the inference problem is not set by the number of parameters in the model, but by the frequency distribution of the data. We illustrate the method showing its ability to recover the correct model in a few prototype cases and discuss its application on a real dataset.

  10. Convergent evolution and mimicry of protein linear motifs in host-pathogen interactions.

    Science.gov (United States)

    Chemes, Lucía Beatriz; de Prat-Gay, Gonzalo; Sánchez, Ignacio Enrique

    2015-06-01

    Pathogen linear motif mimics are highly evolvable elements that facilitate rewiring of host protein interaction networks. Host linear motifs and pathogen mimics differ in sequence, leading to thermodynamic and structural differences in the resulting protein-protein interactions. Moreover, the functional output of a mimic depends on the motif and domain repertoire of the pathogen protein. Regulatory evolution mediated by linear motifs can be understood by measuring evolutionary rates, quantifying positive and negative selection and performing phylogenetic reconstructions of linear motif natural history. Convergent evolution of linear motif mimics is widespread among unrelated proteins from viral, prokaryotic and eukaryotic pathogens and can also take place within individual protein phylogenies. Statistics, biochemistry and laboratory models of infection link pathogen linear motifs to phenotypic traits such as tropism, virulence and oncogenicity. In vitro evolution experiments and analysis of natural sequences suggest that changes in linear motif composition underlie pathogen adaptation to a changing environment. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. Proteome-level assessment of origin, prevalence and function of Leucine-Aspartic Acid (LD) motifs

    KAUST Repository

    Alam, Tanvir; Alazmi, Meshari; Naser, Rayan Mohammad Mahmoud; Huser, Franceline; Momin, Afaque Ahmad Imtiyaz; Walkiewicz, Katarzyna Wiktoria; Canlas, Christian; Huser, Raphaë l; Ali, Amal J.; Merzaban, Jasmeen; Bajic, Vladimir B.; Gao, Xin; Arold, Stefan T.

    2018-01-01

    and migration, and revealed a new type of inverse LD motif consensus. Our evolutionary analysis suggested that LD motif signalling originated in the common unicellular ancestor of opisthokonts and amoebozoa by co-opting nuclear export sequences. Inter

  12. Eight challenges in phylodynamic inference

    Directory of Open Access Journals (Sweden)

    Simon D.W. Frost

    2015-03-01

    Full Text Available The field of phylodynamics, which attempts to enhance our understanding of infectious disease dynamics using pathogen phylogenies, has made great strides in the past decade. Basic epidemiological and evolutionary models are now well characterized with inferential frameworks in place. However, significant challenges remain in extending phylodynamic inference to more complex systems. These challenges include accounting for evolutionary complexities such as changing mutation rates, selection, reassortment, and recombination, as well as epidemiological complexities such as stochastic population dynamics, host population structure, and different patterns at the within-host and between-host scales. An additional challenge exists in making efficient inferences from an ever increasing corpus of sequence data.

  13. Problem solving and inference mechanisms

    Energy Technology Data Exchange (ETDEWEB)

    Furukawa, K; Nakajima, R; Yonezawa, A; Goto, S; Aoyama, A

    1982-01-01

    The heart of the fifth generation computer will be powerful mechanisms for problem solving and inference. A deduction-oriented language is to be designed, which will form the core of the whole computing system. The language is based on predicate logic with the extended features of structuring facilities, meta structures and relational data base interfaces. Parallel computation mechanisms and specialized hardware architectures are being investigated to make possible efficient realization of the language features. The project includes research into an intelligent programming system, a knowledge representation language and system, and a meta inference system to be built on the core. 30 references.

  14. Finding a Leucine in a Haystack: Searching the Proteome for ambigous Leucine-Aspartic Acid motifs

    KAUST Repository

    Arold, Stefan T.

    2016-01-01

    LDMF predicted 13 new LD motifs in humans. Using biophysical assays, we experimentally confirmed in vitro interactions for four novel LD motif proteins. Thus, LDMF allows proteome-wide discovery of LD motifs, despite a highly ambiguous sequence pattern. Functional implications will be discussed.

  15. Distinct configurations of protein complexes and biochemical pathways revealed by epistatic interaction network motifs

    LENUS (Irish Health Repository)

    Casey, Fergal

    2011-08-22

    Abstract Background Gene and protein interactions are commonly represented as networks, with the genes or proteins comprising the nodes and the relationship between them as edges. Motifs, or small local configurations of edges and nodes that arise repeatedly, can be used to simplify the interpretation of networks. Results We examined triplet motifs in a network of quantitative epistatic genetic relationships, and found a non-random distribution of particular motif classes. Individual motif classes were found to be associated with different functional properties, suggestive of an underlying biological significance. These associations were apparent not only for motif classes, but for individual positions within the motifs. As expected, NNN (all negative) motifs were strongly associated with previously reported genetic (i.e. synthetic lethal) interactions, while PPP (all positive) motifs were associated with protein complexes. The two other motif classes (NNP: a positive interaction spanned by two negative interactions, and NPP: a negative spanned by two positives) showed very distinct functional associations, with physical interactions dominating for the former but alternative enrichments, typical of biochemical pathways, dominating for the latter. Conclusion We present a model showing how NNP motifs can be used to recognize supportive relationships between protein complexes, while NPP motifs often identify opposing or regulatory behaviour between a gene and an associated pathway. The ability to use motifs to point toward underlying biological organizational themes is likely to be increasingly important as more extensive epistasis mapping projects in higher organisms begin.

  16. Gene regulatory and signaling networks exhibit distinct topological distributions of motifs

    Science.gov (United States)

    Ferreira, Gustavo Rodrigues; Nakaya, Helder Imoto; Costa, Luciano da Fontoura

    2018-04-01

    The biological processes of cellular decision making and differentiation involve a plethora of signaling pathways and gene regulatory circuits. These networks in turn exhibit a multitude of motifs playing crucial parts in regulating network activity. Here we compare the topological placement of motifs in gene regulatory and signaling networks and observe that it suggests different evolutionary strategies in motif distribution for distinct cellular subnetworks.

  17. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs.

    Science.gov (United States)

    Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude

    2011-06-20

    One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  18. WildSpan: mining structured motifs from protein sequences

    Directory of Open Access Journals (Sweden)

    Chen Chien-Yu

    2011-03-01

    Full Text Available Abstract Background Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions that incorporates several pruning strategies to largely reduce the mining cost. Results WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode

  19. Core signalling motif displaying multistability through multi-state enzymes

    DEFF Research Database (Denmark)

    Feng, Song; Saez Cornellana, Meritxell; Wiuf, Carsten Henrik

    2016-01-01

    Bistability, and more generally multistability, is a key system dynamics feature enabling decision-making and memory in cells. Deciphering the molecular determinants of multistability is thus crucial for a better understanding of cellular pathways and their (re)engineering in synthetic biology....... Here, we show that a key motif found predominantly in eukaryotic signalling systems, namely a futile signalling cycle, can display bistability when featuring a two-state kinase. We provide necessary and sufficient mathematical conditions on the kinetic parameters of this motif that guarantee...... the existence of multiple steady states. These conditions foster the intuition that bistability arises as a consequence of competition between the two states of the kinase. Extending from this result, we find that increasing the number of kinase states linearly translates into an increase in the number...

  20. Factoring local sequence composition in motif significance analysis.

    Science.gov (United States)

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  1. Object-Oriented Type Inference

    DEFF Research Database (Denmark)

    Schwartzbach, Michael Ignatieff; Palsberg, Jens

    1991-01-01

    We present a new approach to inferring types in untyped object-oriented programs with inheritance, assignments, and late binding. It guarantees that all messages are understood, annotates the program with type information, allows polymorphic methods, and can be used as the basis of an op...

  2. Inference in hybrid Bayesian networks

    DEFF Research Database (Denmark)

    Lanseth, Helge; Nielsen, Thomas Dyhre; Rumí, Rafael

    2009-01-01

    Since the 1980s, Bayesian Networks (BNs) have become increasingly popular for building statistical models of complex systems. This is particularly true for boolean systems, where BNs often prove to be a more efficient modelling framework than traditional reliability-techniques (like fault trees...... decade's research on inference in hybrid Bayesian networks. The discussions are linked to an example model for estimating human reliability....

  3. Mixed normal inference on multicointegration

    NARCIS (Netherlands)

    Boswijk, H.P.

    2009-01-01

    Asymptotic likelihood analysis of cointegration in I(2) models, see Johansen (1997, 2006), Boswijk (2000) and Paruolo (2000), has shown that inference on most parameters is mixed normal, implying hypothesis test statistics with an asymptotic 2 null distribution. The asymptotic distribution of the

  4. Statistical inference and Aristotle's Rhetoric.

    Science.gov (United States)

    Macdonald, Ranald R

    2004-11-01

    Formal logic operates in a closed system where all the information relevant to any conclusion is present, whereas this is not the case when one reasons about events and states of the world. Pollard and Richardson drew attention to the fact that the reasoning behind statistical tests does not lead to logically justifiable conclusions. In this paper statistical inferences are defended not by logic but by the standards of everyday reasoning. Aristotle invented formal logic, but argued that people mostly get at the truth with the aid of enthymemes--incomplete syllogisms which include arguing from examples, analogies and signs. It is proposed that statistical tests work in the same way--in that they are based on examples, invoke the analogy of a model and use the size of the effect under test as a sign that the chance hypothesis is unlikely. Of existing theories of statistical inference only a weak version of Fisher's takes this into account. Aristotle anticipated Fisher by producing an argument of the form that there were too many cases in which an outcome went in a particular direction for that direction to be plausibly attributed to chance. We can therefore conclude that Aristotle would have approved of statistical inference and there is a good reason for calling this form of statistical inference classical.

  5. Neoanalysis, Orality, and Intertextuality: An Examination of Homeric Motif Transference

    Directory of Open Access Journals (Sweden)

    Jonathan Burgess

    2006-03-01

    Full Text Available In Homeric studies scholars have speculated on the influence of (non-surviving preHomeric material on the Iliad. This article expands this line of argument from an oralist perspective, with reference to modern intertextual theory. It concludes that preHomeric and nonHomeric motifs from oral traditions were transferred into the epic poem, creating an intertextually allusive poetics that would have been recognizable to an early Greek audience informed of mythological traditions.

  6. Motif Subscriber Menonton Channel YouTube Raditya Dika

    OpenAIRE

    Mellyaningsih, Adinda

    2016-01-01

    Penelitian ini dilakukan untuk mengetahui motif para subscriber dalam menonton channelYouTube Raditya Dika. Raditya Dika merupakan YouTuber Indonesia dengan jumlah subscriber terbanyak dan merupakan orang pertama di Indonesia yang mendapatkan penghargaan Certifies Award oleh YouTube. Peneliti menggunakan teori Uses and Gratification dengan empat indikator, yaitu hiburan dan relaksasi, hubungan antar pribadi, mencari informasi, dan persahabatan. Metode dalam penelitian ini adalah online survei...

  7. Perception Enhancement using Visual Attributes in Sequence Motif Visualization

    OpenAIRE

    Oon, Yin; Lee, Nung; Kok, Wei

    2016-01-01

    Sequence logo is a well-accepted scientific method to visualize the conservation characteristics of biological sequence motifs. Previous studies found that using sequence logo graphical representation for scientific evidence reports or arguments could seriously cause biases and misinterpretation by users. This study investigates on the visual attributes performance of a sequence logo in helping users to perceive and interpret the information based on preattentive theories and Gestalt principl...

  8. Exon silencing by UAGG motifs in response to neuronal excitation.

    Directory of Open Access Journals (Sweden)

    Ping An

    2007-02-01

    Full Text Available Alternative pre-mRNA splicing plays fundamental roles in neurons by generating functional diversity in proteins associated with the communication and connectivity of the synapse. The CI cassette of the NMDA R1 receptor is one of a variety of exons that show an increase in exon skipping in response to cell excitation, but the molecular nature of this splicing responsiveness is not yet understood. Here we investigate the molecular basis for the induced changes in splicing of the CI cassette exon in primary rat cortical cultures in response to KCl-induced depolarization using an expression assay with a tight neuron-specific readout. In this system, exon silencing in response to neuronal excitation was mediated by multiple UAGG-type silencing motifs, and transfer of the motifs to a constitutive exon conferred a similar responsiveness by gain of function. Biochemical analysis of protein binding to UAGG motifs in extracts prepared from treated and mock-treated cortical cultures showed an increase in nuclear hnRNP A1-RNA binding activity in parallel with excitation. Evidence for the role of the NMDA receptor and calcium signaling in the induced splicing response was shown by the use of specific antagonists, as well as cell-permeable inhibitors of signaling pathways. Finally, a wider role for exon-skipping responsiveness is shown to involve additional exons with UAGG-related silencing motifs, and transcripts involved in synaptic functions. These results suggest that, at the post-transcriptional level, excitable exons such as the CI cassette may be involved in strategies by which neurons mount adaptive responses to hyperstimulation.

  9. Multiple TPR motifs characterize the Fanconi anemia FANCG protein.

    Science.gov (United States)

    Blom, Eric; van de Vrugt, Henri J; de Vries, Yne; de Winter, Johan P; Arwert, Fré; Joenje, Hans

    2004-01-05

    The genome protection pathway that is defective in patients with Fanconi anemia (FA) is controlled by at least eight genes, including BRCA2. A key step in the pathway involves the monoubiquitylation of FANCD2, which critically depends on a multi-subunit nuclear 'core complex' of at least six FANC proteins (FANCA, -C, -E, -F, -G, and -L). Except for FANCL, which has WD40 repeats and a RING finger domain, no significant domain structure has so far been recognized in any of the core complex proteins. By using a homology search strategy comparing the human FANCG protein sequence with its ortholog sequences in Oryzias latipes (Japanese rice fish) and Danio rerio (zebrafish) we identified at least seven tetratricopeptide repeat motifs (TPRs) covering a major part of this protein. TPRs are degenerate 34-amino acid repeat motifs which function as scaffolds mediating protein-protein interactions, often found in multiprotein complexes. In four out of five TPR motifs tested (TPR1, -2, -5, and -6), targeted missense mutagenesis disrupting the motifs at the critical position 8 of each TPR caused complete or partial loss of FANCG function. Loss of function was evident from failure of the mutant proteins to complement the cellular FA phenotype in FA-G lymphoblasts, which was correlated with loss of binding to FANCA. Although the TPR4 mutant fully complemented the cells, it showed a reduced interaction with FANCA, suggesting that this TPR may also be of functional importance. The recognition of FANCG as a typical TPR protein predicts this protein to play a key role in the assembly and/or stabilization of the nuclear FA protein core complex.

  10. Efficient sequential and parallel algorithms for planted motif search.

    Science.gov (United States)

    Nicolae, Marius; Rajasekaran, Sanguthevar

    2014-01-31

    Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances. This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/. We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.

  11. Aplikasi Ornamen Khas Maluku untuk Pengembangan Desain Motif Batik

    OpenAIRE

    Masiswo Masiswo; Vivin Atika

    2016-01-01

    ABSTRAKMaluku memiliki banyak ragam hias budaya warisan nilai leluhur berupa ornamen etnis yang merupakan kesenian dan keterampilan kerajinan. Hasil warisan tersebut sampai saat ini masih lestari hidup serta dapat dinikmati sebagai konsumsi rohani yang memuaskan manusia. Berkaitan dengan keberlangsungan nilai-nilai tradisi etnis yang berwujud pada ornamen-ornamen daerah Maluku, maka dikembangkan untuk kebutuhan manusia berupa motif batik pada kain. Pengembangan ornamen ini lebih menekankan pa...

  12. ROMANIAN TRADITIONAL MOTIF ELEMENT OF MODERNITY IN CLOTHING

    Directory of Open Access Journals (Sweden)

    ŞUTEU Marius Darius

    2017-05-01

    Full Text Available In this paper are presented the phases for improving from an aesthetic point of view a clothing item, the T-shirt for women using software design patterns, computerised graphics and textile different modern technologies including: industrial embroidery, digital printing, sublimation. In the first phase a documentation was prepared in the University of Oradea and traditional motif was selected from a collection comprising a number of Romanian traditional motifs from different parts of the country and were reintepreted and stylized whilst preserving the symbolism and color range specified to the area. For the styling phase was used CorelDraw vector graphics program that allows changing the shape, size and color of the drawings without affecting the identity of the pattern. The embroidery was done using BERNINA Embroidery Software Designer Plus Software. This software allows you to export the model to any domestic or industrial embroidery machine regardless of brand. Finally we observed the resistance of the printed and embroided model to various: elasticity, resistance to abrasion and a sensory analysis on the preservation of color. After testing we noticed the imprint resistance applied to the fabric, resulting in a quality that makes possible to keep the Romanian traditional motif from generation to generation.

  13. Insertion of tetracysteine motifs into dopamine transporter extracellular domains.

    Directory of Open Access Journals (Sweden)

    Deanna M Navaroli

    Full Text Available The neuronal dopamine transporter (DAT is a major determinant of extracellular dopamine (DA levels and is the primary target for a variety of addictive and therapeutic psychoactive drugs. DAT is acutely regulated by protein kinase C (PKC activation and amphetamine exposure, both of which modulate DAT surface expression by endocytic trafficking. In order to use live imaging approaches to study DAT endocytosis, methods are needed to exclusively label the DAT surface pool. The use of membrane impermeant, sulfonated biarsenic dyes holds potential as one such approach, and requires introduction of an extracellular tetracysteine motif (tetraCys; CCPGCC to facilitate dye binding. In the current study, we took advantage of intrinsic proline-glycine (Pro-Gly dipeptides encoded in predicted DAT extracellular domains to introduce tetraCys motifs into DAT extracellular loops 2, 3, and 4. [(3H]DA uptake studies, surface biotinylation and fluorescence microscopy in PC12 cells indicate that tetraCys insertion into the DAT second extracellular loop results in a functional transporter that maintains PKC-mediated downregulation. Introduction of tetraCys into extracellular loops 3 and 4 yielded DATs with severely compromised function that failed to mature and traffic to the cell surface. This is the first demonstration of successful introduction of a tetracysteine motif into a DAT extracellular domain, and may hold promise for use of biarsenic dyes in live DAT imaging studies.

  14. Codon based co-occurrence network motifs in human mitochondria

    Directory of Open Access Journals (Sweden)

    Pramod Shinde

    2017-10-01

    Full Text Available The nucleotide polymorphism in human mitochondrial genome (mtDNA tolled by codon position bias plays an indispensable role in human population dispersion and expansion. Herein, we constructed genome-wide nucleotide co-occurrence networks using a massive data consisting of five different geographical regions and around 3000 samples for each region. We developed a powerful network model to describe complex mitochondrial evolutionary patterns between codon and non-codon positions. It was interesting to report a different evolution of Asian genomes than those of the rest which is divulged by network motifs. We found evidence that mtDNA undergoes substantial amounts of adaptive evolution, a finding which was supported by a number of previous studies. The dominance of higher order motifs indicated the importance of long-range nucleotide co-occurrence in genomic diversity. Most notably, codon motifs apparently underpinned the preferences among codon positions for co-evolution which is probably highly biased during the origin of the genetic code. Our analyses manifested that codon position co-evolution is very well conserved across human sub-populations and independently maintained within human sub-populations implying the selective role of evolutionary processes on codon position co-evolution. Ergo, this study provided a framework to investigate cooperative genomic interactions which are critical in underlying complex mitochondrial evolution.

  15. Organofluorine chemistry: synthesis and conformation of vicinal fluoromethylene motifs.

    Science.gov (United States)

    O'Hagan, David

    2012-04-20

    The C-F bond is the most polar bond in organic chemistry, and thus the bond has a relatively large dipole moment with a significant -ve charge density on the fluorine atom and correspondingly a +ve charge density on carbon. The electrostatic nature of the bond renders it the strongest one in organic chemistry. However, the fluorine atom itself is nonpolarizable, and thus, despite the charge localization on fluorine, it is a poor hydrogen-bonding acceptor. These properties of the C-F bond make it attractive in the design of nonviscous but polar organic compounds, with a polarity limited to influencing the intramolecular nature of the molecule and less so intermolecular interactions with the immediate environment. In this Perspective, the synthesis of aliphatic chains carrying multivicinal fluoromethylene motifs is described. It emerges that the dipoles of adjacent C-F bonds orientate relative to each other, and thus, individual diastereoisomers display different backbone carbon chain conformations. These conformational preferences recognize the influence of the well-known gauche effect associated with 1,2-difluoroethane but extend to considering 1,3-fluorine-fluorine dipolar repulsions. The synthesis of carbon chains carrying two, three, four, five, and six vicinal fluoromethylene motifs is described, with an emphasis on our own research contributions. These motifs obey almost predictable conformational behavior, and they emerge as candidates for inclusion in the design of performance organic molecules. © 2012 American Chemical Society

  16. iFORM: Incorporating Find Occurrence of Regulatory Motifs.

    Science.gov (United States)

    Ren, Chao; Chen, Hebing; Yang, Bite; Liu, Feng; Ouyang, Zhangyi; Bo, Xiaochen; Shu, Wenjie

    2016-01-01

    Accurately identifying the binding sites of transcription factors (TFs) is crucial to understanding the mechanisms of transcriptional regulation and human disease. We present incorporating Find Occurrence of Regulatory Motifs (iFORM), an easy-to-use and efficient tool for scanning DNA sequences with TF motifs described as position weight matrices (PWMs). Both performance assessment with a receiver operating characteristic (ROC) curve and a correlation-based approach demonstrated that iFORM achieves higher accuracy and sensitivity by integrating five classical motif discovery programs using Fisher's combined probability test. We have used iFORM to provide accurate results on a variety of data in the ENCODE Project and the NIH Roadmap Epigenomics Project, and the tool has demonstrated its utility in further elucidating individual roles of functional elements. Both the source and binary codes for iFORM can be freely accessed at https://github.com/wenjiegroup/iFORM. The identified TF binding sites across human cell and tissue types using iFORM have been deposited in the Gene Expression Omnibus under the accession ID GSE53962.

  17. THE MOTIF OF THE PRODIGAL SON IN IVAN TURGENEV'S NOVELS

    Directory of Open Access Journals (Sweden)

    Valentina Ivanovna Gabdullina

    2013-11-01

    Full Text Available The author questions the perception of Ivan Turgenev as a “non- Christian writer” and studies the problem of the prodigal son motif functioning in a series of his novels. In his novels, Turgenev pictured different phases of the archetypal story, originating from the Gospel parable of the prodigal son. In the novel Rudin he depicted the phase of spiritual wanderings of the hero who had lost touch with his native land — Russia. In his next novels (Home of the Gentry, Fathers and Sons and Smoke, after leading his hero in circles and sending him back to his paternal home, Turgenev reconstructs the model of human behavior, represented in the parable, thereby recognizing the immutability of the idea formalized in the Gospel. The motif of the return to Russian land gets its completion in Turgenev's last novel Virgin Soil, in which the author paradoxically connects the Westernist idea with the Gospel imperative. Solomin, the son of a deacon, sent by his wise father out to Europe “to get education”, studies in England, masters the European knowledge and returns back “to his native land” to establish his own business in inland Russia. Thus, a series of Turgenev's novels, in which he portrayed different phases of social life, are interlinked with the motif of the prodigal son, who is represented by novels' main characters.

  18. The city as a motif in Slovene youth literature

    Directory of Open Access Journals (Sweden)

    Milena Mileva Blažić

    2003-01-01

    Full Text Available The article presents the city as motif of Slovenian youth literature in four different periods, beginning in the first period of original Slovenian youth literature in the second half of the 19th century, second period in the first half of the 20th century, third period in the second half of the 20th century and after 1950, when significant books were produced in the field of short modern stories, emphasising on picture books and realistic narrative prose, and the fourth period after 1990. A discernable shift can be observed in the thirties of the 20th century, during the times of socialist realism. The most significant change occurred after 1960, when massive migration from rural to urban environments caused by industrialisation began. The motif of urban environment especially marked modern realistic narrative, coined problematic narrative after 1990, with its focus on issues of growing up in such environments. The city as motif or theme doesn’t appear only in realistic narrative, but since the early 20th century also in fantastic narrative, thus it dichotomically presents the image of real world in Slovenian youth realistic narrative.

  19. TOPDOM: database of conservatively located domains and motifs in proteins.

    Science.gov (United States)

    Varga, Julia; Dobson, László; Tusnády, Gábor E

    2016-09-01

    The TOPDOM database-originally created as a collection of domains and motifs located consistently on the same side of the membranes in α-helical transmembrane proteins-has been updated and extended by taking into consideration consistently localized domains and motifs in globular proteins, too. By taking advantage of the recently developed CCTOP algorithm to determine the type of a protein and predict topology in case of transmembrane proteins, and by applying a thorough search for domains and motifs as well as utilizing the most up-to-date version of all source databases, we managed to reach a 6-fold increase in the size of the whole database and a 2-fold increase in the number of transmembrane proteins. TOPDOM database is available at http://topdom.enzim.hu The webpage utilizes the common Apache, PHP5 and MySQL software to provide the user interface for accessing and searching the database. The database itself is generated on a high performance computer. tusnady.gabor@ttk.mta.hu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  20. Biophysical properties of regions flanking the bHLH-Zip motif in the p22 Max protein

    International Nuclear Information System (INIS)

    Pursglove, Sharon E.; Fladvad, Malin; Bellanda, Massimo; Moshref, Ahmad; Henriksson, Marie; Carey, Jannette; Sunnerhagen, Maria

    2004-01-01

    The Max protein is the central dimerization partner in the Myc-Max-Mad network of transcriptional regulators, and a founding structural member of the family of basic-helix-loop-helix (bHLH)-leucine zipper (Zip) proteins. Biologically important regions flanking its bHLH-Zip motif have been disordered or absent in crystal structures. The present study shows that these regions are resistant to proteolysis in both the presence and absence of DNA, and that Max dimers containing both flanking regions have significantly higher helix content as measured by circular dichroism than that predicted from the crystal structures. Nuclear magnetic resonance measurements in the absence of DNA also support the inferred structural order. Deletion of both flanking regions is required to achieve maximal DNA affinity as measured by EMSA. Thus, the previously observed functionalities of these Max regions in DNA binding, phosphorylation, and apoptosis are suggested to be linked to structural properties

  1. Leucine-based receptor sorting motifs are dependent on the spacing relative to the plasma membrane

    DEFF Research Database (Denmark)

    Geisler, C; Dietrich, J; Nielsen, B L

    1998-01-01

    Many integral membrane proteins contain leucine-based motifs within their cytoplasmic domains that mediate internalization and intracellular sorting. Two types of leucine-based motifs have been identified. One type is dependent on phosphorylation, whereas the other type, which includes an acidic...... amino acid, is constitutively active. In this study, we have investigated how the spacing relative to the plasma membrane affects the function of both types of leucine-based motifs. For phosphorylation-dependent leucine-based motifs, a minimal spacing of 7 residues between the plasma membrane...... and the phospho-acceptor was required for phosphorylation and thereby activation of the motifs. For constitutively active leucine-based motifs, a minimal spacing of 6 residues between the plasma membrane and the acidic residue was required for optimal activity of the motifs. In addition, we found that the acidic...

  2. Organization of feed-forward loop motifs reveals architectural principles in natural and engineered networks.

    Science.gov (United States)

    Gorochowski, Thomas E; Grierson, Claire S; di Bernardo, Mario

    2018-03-01

    Network motifs are significantly overrepresented subgraphs that have been proposed as building blocks for natural and engineered networks. Detailed functional analysis has been performed for many types of motif in isolation, but less is known about how motifs work together to perform complex tasks. To address this issue, we measure the aggregation of network motifs via methods that extract precisely how these structures are connected. Applying this approach to a broad spectrum of networked systems and focusing on the widespread feed-forward loop motif, we uncover striking differences in motif organization. The types of connection are often highly constrained, differ between domains, and clearly capture architectural principles. We show how this information can be used to effectively predict functionally important nodes in the metabolic network of Escherichia coli . Our findings have implications for understanding how networked systems are constructed from motif parts and elucidate constraints that guide their evolution.

  3. Evolutionarily conserved bias of amino-acid usage refines the definition of PDZ-binding motif

    Directory of Open Access Journals (Sweden)

    Launey Thomas

    2011-06-01

    Full Text Available Abstract Background The interactions between PDZ (PSD-95, Dlg, ZO-1 domains and PDZ-binding motifs play central roles in signal transductions within cells. Proteins with PDZ domains bind to PDZ-binding motifs almost exclusively when the motifs are located at the carboxyl (C- terminal ends of their binding partners. However, it remains little explored whether PDZ-binding motifs show any preferential location at the C-terminal ends of proteins, at genome-level. Results Here, we examined the distribution of the type-I (x-x-S/T-x-I/L/V or type-II (x-x-V-x-I/V PDZ-binding motifs in proteins encoded in the genomes of five different species (human, mouse, zebrafish, fruit fly and nematode. We first established that these PDZ-binding motifs are indeed preferentially present at their C-terminal ends. Moreover, we found specific amino acid (AA bias for the 'x' positions in the motifs at the C-terminal ends. In general, hydrophilic AAs were favored. Our genomics-based findings confirm and largely extend the results of previous interaction-based studies, allowing us to propose refined consensus sequences for all of the examined PDZ-binding motifs. An ontological analysis revealed that the refined motifs are functionally relevant since a large fraction of the proteins bearing the motif appear to be involved in signal transduction. Furthermore, co-precipitation experiments confirmed two new protein interactions predicted by our genomics-based approach. Finally, we show that influenza virus pathogenicity can be correlated with PDZ-binding motif, with high-virulence viral proteins bearing a refined PDZ-binding motif. Conclusions Our refined definition of PDZ-binding motifs should provide important clues for identifying functional PDZ-binding motifs and proteins involved in signal transduction.

  4. Statistical learning and selective inference.

    Science.gov (United States)

    Taylor, Jonathan; Tibshirani, Robert J

    2015-06-23

    We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.

  5. Bayesian inference with ecological applications

    CERN Document Server

    Link, William A

    2009-01-01

    This text is written to provide a mathematically sound but accessible and engaging introduction to Bayesian inference specifically for environmental scientists, ecologists and wildlife biologists. It emphasizes the power and usefulness of Bayesian methods in an ecological context. The advent of fast personal computers and easily available software has simplified the use of Bayesian and hierarchical models . One obstacle remains for ecologists and wildlife biologists, namely the near absence of Bayesian texts written specifically for them. The book includes many relevant examples, is supported by software and examples on a companion website and will become an essential grounding in this approach for students and research ecologists. Engagingly written text specifically designed to demystify a complex subject Examples drawn from ecology and wildlife research An essential grounding for graduate and research ecologists in the increasingly prevalent Bayesian approach to inference Companion website with analyt...

  6. Statistical inference an integrated approach

    CERN Document Server

    Migon, Helio S; Louzada, Francisco

    2014-01-01

    Introduction Information The concept of probability Assessing subjective probabilities An example Linear algebra and probability Notation Outline of the bookElements of Inference Common statistical modelsLikelihood-based functions Bayes theorem Exchangeability Sufficiency and exponential family Parameter elimination Prior Distribution Entirely subjective specification Specification through functional forms Conjugacy with the exponential family Non-informative priors Hierarchical priors Estimation Introduction to decision theoryBayesian point estimation Classical point estimation Empirical Bayes estimation Comparison of estimators Interval estimation Estimation in the Normal model Approximating Methods The general problem of inference Optimization techniquesAsymptotic theory Other analytical approximations Numerical integration methods Simulation methods Hypothesis Testing Introduction Classical hypothesis testingBayesian hypothesis testing Hypothesis testing and confidence intervalsAsymptotic tests Prediction...

  7. Bayesian inference on proportional elections.

    Directory of Open Access Journals (Sweden)

    Gabriel Hideki Vatanabe Brunello

    Full Text Available Polls for majoritarian voting systems usually show estimates of the percentage of votes for each candidate. However, proportional vote systems do not necessarily guarantee the candidate with the most percentage of votes will be elected. Thus, traditional methods used in majoritarian elections cannot be applied on proportional elections. In this context, the purpose of this paper was to perform a Bayesian inference on proportional elections considering the Brazilian system of seats distribution. More specifically, a methodology to answer the probability that a given party will have representation on the chamber of deputies was developed. Inferences were made on a Bayesian scenario using the Monte Carlo simulation technique, and the developed methodology was applied on data from the Brazilian elections for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010. A performance rate was also presented to evaluate the efficiency of the methodology. Calculations and simulations were carried out using the free R statistical software.

  8. Causal inference based on counterfactuals

    Directory of Open Access Journals (Sweden)

    Höfler M

    2005-09-01

    Full Text Available Abstract Background The counterfactual or potential outcome model has become increasingly standard for causal inference in epidemiological and medical studies. Discussion This paper provides an overview on the counterfactual and related approaches. A variety of conceptual as well as practical issues when estimating causal effects are reviewed. These include causal interactions, imperfect experiments, adjustment for confounding, time-varying exposures, competing risks and the probability of causation. It is argued that the counterfactual model of causal effects captures the main aspects of causality in health sciences and relates to many statistical procedures. Summary Counterfactuals are the basis of causal inference in medicine and epidemiology. Nevertheless, the estimation of counterfactual differences pose several difficulties, primarily in observational studies. These problems, however, reflect fundamental barriers only when learning from observations, and this does not invalidate the counterfactual concept.

  9. System Support for Forensic Inference

    Science.gov (United States)

    Gehani, Ashish; Kirchner, Florent; Shankar, Natarajan

    Digital evidence is playing an increasingly important role in prosecuting crimes. The reasons are manifold: financially lucrative targets are now connected online, systems are so complex that vulnerabilities abound and strong digital identities are being adopted, making audit trails more useful. If the discoveries of forensic analysts are to hold up to scrutiny in court, they must meet the standard for scientific evidence. Software systems are currently developed without consideration of this fact. This paper argues for the development of a formal framework for constructing “digital artifacts” that can serve as proxies for physical evidence; a system so imbued would facilitate sound digital forensic inference. A case study involving a filesystem augmentation that provides transparent support for forensic inference is described.

  10. Probability biases as Bayesian inference

    Directory of Open Access Journals (Sweden)

    Andre; C. R. Martins

    2006-11-01

    Full Text Available In this article, I will show how several observed biases in human probabilistic reasoning can be partially explained as good heuristics for making inferences in an environment where probabilities have uncertainties associated to them. Previous results show that the weight functions and the observed violations of coalescing and stochastic dominance can be understood from a Bayesian point of view. We will review those results and see that Bayesian methods should also be used as part of the explanation behind other known biases. That means that, although the observed errors are still errors under the be understood as adaptations to the solution of real life problems. Heuristics that allow fast evaluations and mimic a Bayesian inference would be an evolutionary advantage, since they would give us an efficient way of making decisions. %XX In that sense, it should be no surprise that humans reason with % probability as it has been observed.

  11. Statistical inference on residual life

    CERN Document Server

    Jeong, Jong-Hyeon

    2014-01-01

    This is a monograph on the concept of residual life, which is an alternative summary measure of time-to-event data, or survival data. The mean residual life has been used for many years under the name of life expectancy, so it is a natural concept for summarizing survival or reliability data. It is also more interpretable than the popular hazard function, especially for communications between patients and physicians regarding the efficacy of a new drug in the medical field. This book reviews existing statistical methods to infer the residual life distribution. The review and comparison includes existing inference methods for mean and median, or quantile, residual life analysis through medical data examples. The concept of the residual life is also extended to competing risks analysis. The targeted audience includes biostatisticians, graduate students, and PhD (bio)statisticians. Knowledge in survival analysis at an introductory graduate level is advisable prior to reading this book.

  12. Nonparametric Bayesian inference in biostatistics

    CERN Document Server

    Müller, Peter

    2015-01-01

    As chapters in this book demonstrate, BNP has important uses in clinical sciences and inference for issues like unknown partitions in genomics. Nonparametric Bayesian approaches (BNP) play an ever expanding role in biostatistical inference from use in proteomics to clinical trials. Many research problems involve an abundance of data and require flexible and complex probability models beyond the traditional parametric approaches. As this book's expert contributors show, BNP approaches can be the answer. Survival Analysis, in particular survival regression, has traditionally used BNP, but BNP's potential is now very broad. This applies to important tasks like arrangement of patients into clinically meaningful subpopulations and segmenting the genome into functionally distinct regions. This book is designed to both review and introduce application areas for BNP. While existing books provide theoretical foundations, this book connects theory to practice through engaging examples and research questions. Chapters c...

  13. Statistical inference a short course

    CERN Document Server

    Panik, Michael J

    2012-01-01

    A concise, easily accessible introduction to descriptive and inferential techniques Statistical Inference: A Short Course offers a concise presentation of the essentials of basic statistics for readers seeking to acquire a working knowledge of statistical concepts, measures, and procedures. The author conducts tests on the assumption of randomness and normality, provides nonparametric methods when parametric approaches might not work. The book also explores how to determine a confidence interval for a population median while also providing coverage of ratio estimation, randomness, and causal

  14. On Quantum Statistical Inference, II

    OpenAIRE

    Barndorff-Nielsen, O. E.; Gill, R. D.; Jupp, P. E.

    2003-01-01

    Interest in problems of statistical inference connected to measurements of quantum systems has recently increased substantially, in step with dramatic new developments in experimental techniques for studying small quantum systems. Furthermore, theoretical developments in the theory of quantum measurements have brought the basic mathematical framework for the probability calculations much closer to that of classical probability theory. The present paper reviews this field and proposes and inte...

  15. Nonparametric predictive inference in reliability

    International Nuclear Information System (INIS)

    Coolen, F.P.A.; Coolen-Schrijner, P.; Yan, K.J.

    2002-01-01

    We introduce a recently developed statistical approach, called nonparametric predictive inference (NPI), to reliability. Bounds for the survival function for a future observation are presented. We illustrate how NPI can deal with right-censored data, and discuss aspects of competing risks. We present possible applications of NPI for Bernoulli data, and we briefly outline applications of NPI for replacement decisions. The emphasis is on introduction and illustration of NPI in reliability contexts, detailed mathematical justifications are presented elsewhere

  16. Variational inference & deep learning : A new synthesis

    NARCIS (Netherlands)

    Kingma, D.P.

    2017-01-01

    In this thesis, Variational Inference and Deep Learning: A New Synthesis, we propose novel solutions to the problems of variational (Bayesian) inference, generative modeling, representation learning, semi-supervised learning, and stochastic optimization.

  17. Variational inference & deep learning: A new synthesis

    OpenAIRE

    Kingma, D.P.

    2017-01-01

    In this thesis, Variational Inference and Deep Learning: A New Synthesis, we propose novel solutions to the problems of variational (Bayesian) inference, generative modeling, representation learning, semi-supervised learning, and stochastic optimization.

  18. Continuous Integrated Invariant Inference, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — The proposed project will develop a new technique for invariant inference and embed this and other current invariant inference and checking techniques in an...

  19. Insights into the evolution and diversification of the AT-hook Motif Nuclear Localized gene family in land plants.

    Science.gov (United States)

    Zhao, Jianfei; Favero, David S; Qiu, Jiwen; Roalson, Eric H; Neff, Michael M

    2014-10-14

    Members of the ancient land-plant-specific transcription factor AT-Hook Motif Nuclear Localized (AHL) gene family regulate various biological processes. However, the relationships among the AHL genes, as well as their evolutionary history, still remain unexplored. We analyzed over 500 AHL genes from 19 land plant species, ranging from the early diverging Physcomitrella patens and Selaginella to a variety of monocot and dicot flowering plants. We classified the AHL proteins into three types (Type-I/-II/-III) based on the number and composition of their functional domains, the AT-hook motif(s) and PPC domain. We further inferred their phylogenies via Bayesian inference analysis and predicted gene gain/loss events throughout their diversification. Our analyses suggested that the AHL gene family emerged in embryophytes and further evolved into two distinct clades, with Type-I AHLs forming one clade (Clade-A), and the other two types together diversifying in another (Clade-B). The two AHL clades likely diverged before the separation of Physcomitrella patens from the vascular plant lineage. In angiosperms, Clade-A AHLs expanded into 5 subfamilies; while, the ones in Clade-B expanded into 4 subfamilies. Examination of their expression patterns suggests that the AHLs within each clade share similar expression patterns with each other; however, AHLs in one monophyletic clade exhibit distinct expression patterns from the ones in the other clade. Over-expression of a Glycine max AHL PPC domain in Arabidopsis thaliana recapitulates the phenotype observed when over-expressing its Arabidopsis thaliana counterpart. This result suggests that the AHL genes from different land plant species may share conserved functions in regulating plant growth and development. Our study further suggests that such functional conservation may be due to conserved physical interactions among the PPC domains of AHL proteins. Our analyses reveal a possible evolutionary scenario for the AHL gene family

  20. Variations on Bayesian Prediction and Inference

    Science.gov (United States)

    2016-05-09

    inference 2.2.1 Background There are a number of statistical inference problems that are not generally formulated via a full probability model...problem of inference about an unknown parameter, the Bayesian approach requires a full probability 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...the problem of inference about an unknown parameter, the Bayesian approach requires a full probability model/likelihood which can be an obstacle

  1. Adaptive Inference on General Graphical Models

    OpenAIRE

    Acar, Umut A.; Ihler, Alexander T.; Mettu, Ramgopal; Sumer, Ozgur

    2012-01-01

    Many algorithms and applications involve repeatedly solving variations of the same inference problem; for example we may want to introduce new evidence to the model or perform updates to conditional dependencies. The goal of adaptive inference is to take advantage of what is preserved in the model and perform inference more rapidly than from scratch. In this paper, we describe techniques for adaptive inference on general graphs that support marginal computation and updates to the conditional ...

  2. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  3. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  4. Indonesian Traditional Toys and the Development of Batik Motifs

    Directory of Open Access Journals (Sweden)

    Bagus Indrayana

    2016-06-01

    Full Text Available There is a wide array of traditional toys in Indonesia. In the past, traditional toys played an important role for skill and creativity development of children. Today, the position of traditional toys in the society is displaced by toys from large-scale manufacturers. Given the critical role of traditional toys for children’s motoric and social development, there is a need to develop media that can be used to promote these traditional products and strengthen their position in the public. We propose to use Batik as a way to effectively disseminate and promote traditional toys to the general public. Apart from this, using traditional toys to create new Batik motifs can have an economic value for the producers of Batik, promote Indonesian products and enrich the Indonesian Batik. This study aims to explore the variety of traditional toys, mainly from Klaten and Magelang, in the Central Java province of Indonesia, and use them as the basis for the development of Batik motif creation. This study used Trilogi Keseimbangan (or Harmony Trilogy aesthetic theory analytical approach that explains the creation of craft consists of the following phases: exploration, design, and materialization. The creation method in this study adopts Tiga Tahap Enam Langkah (Three Phases, Six Steps method offered in the theory. The finding in the field found that the traditional toys material used in Klaten and Magelang, mostly made from waste wood, plywood, and zinc. The manufacturing process is done manually by two or three craftsmen using a simple technology. The traditional toys are designed by the artisans mostly, although there may be designs from the clients. In addition, we also found that the traditional toys have never been used as a Batik motif. The traditional toys Batik motif presented in this work is researcher’s design. For the purposes of this study, we first research the variety of traditional toys available in the market today in Indonesia. We look

  5. Dimensionality of social networks using motifs and eigenvalues.

    Directory of Open Access Journals (Sweden)

    Anthony Bonato

    Full Text Available We consider the dimensionality of social networks, and develop experiments aimed at predicting that dimension. We find that a social network model with nodes and links sampled from an m-dimensional metric space with power-law distributed influence regions best fits samples from real-world networks when m scales logarithmically with the number of nodes of the network. This supports a logarithmic dimension hypothesis, and we provide evidence with two different social networks, Facebook and LinkedIn. Further, we employ two different methods for confirming the hypothesis: the first uses the distribution of motif counts, and the second exploits the eigenvalue distribution.

  6. RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.

    Science.gov (United States)

    Castro-Mondragon, Jaime Abraham; Jaeger, Sébastien; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2017-07-27

    Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. CMD: A Database to Store the Bonding States of Cysteine Motifs with Secondary Structures

    Directory of Open Access Journals (Sweden)

    Hamed Bostan

    2012-01-01

    Full Text Available Computational approaches to the disulphide bonding state and its connectivity pattern prediction are based on various descriptors. One descriptor is the amino acid sequence motifs flanking the cysteine residue motifs. Despite the existence of disulphide bonding information in many databases and applications, there is no complete reference and motif query available at the moment. Cysteine motif database (CMD is the first online resource that stores all cysteine residues, their flanking motifs with their secondary structure, and propensity values assignment derived from the laboratory data. We extracted more than 3 million cysteine motifs from PDB and UniProt data, annotated with secondary structure assignment, propensity value assignment, and frequency of occurrence and coefficiency of their bonding status. Removal of redundancies generated 15875 unique flanking motifs that are always bonded and 41577 unique patterns that are always nonbonded. Queries are based on the protein ID, FASTA sequence, sequence motif, and secondary structure individually or in batch format using the provided APIs that allow remote users to query our database via third party software and/or high throughput screening/querying. The CMD offers extensive information about the bonded, free cysteine residues, and their motifs that allows in-depth characterization of the sequence motif composition.

  8. Finding a Leucine in a Haystack: Searching the Proteome for ambigous Leucine-Aspartic Acid motifs

    KAUST Repository

    Arold, Stefan T.

    2016-01-25

    Leucine-aspartic acid (LD) motifs are short helical protein-protein interaction motifs involved in cell motility, survival and communication. LD motif interactions are also implicated in cancer metastasis and are targeted by several viruses. LD motifs are notoriously difficult to detect because sequence pattern searches lead to an excessively high number of false positives. Hence, despite 20 years of research, only six LD motif–containing proteins are known in humans, three of which are close homologues of the paxillin family. To enable the proteome-wide discovery of LD motifs, we developed LD Motif Finder (LDMF), a web tool based on machine learning that combines sequence information with structural predictions to detect LD motifs with high accuracy. LDMF predicted 13 new LD motifs in humans. Using biophysical assays, we experimentally confirmed in vitro interactions for four novel LD motif proteins. Thus, LDMF allows proteome-wide discovery of LD motifs, despite a highly ambiguous sequence pattern. Functional implications will be discussed.

  9. Transduction motif analysis of gastric cancer based on a human signaling network

    Energy Technology Data Exchange (ETDEWEB)

    Liu, G.; Li, D.Z.; Jiang, C.S.; Wang, W. [Fuzhou General Hospital of Nanjing Command, Department of Gastroenterology, Fuzhou, China, Department of Gastroenterology, Fuzhou General Hospital of Nanjing Command, Fuzhou (China)

    2014-04-04

    To investigate signal regulation models of gastric cancer, databases and literature were used to construct the signaling network in humans. Topological characteristics of the network were analyzed by CytoScape. After marking gastric cancer-related genes extracted from the CancerResource, GeneRIF, and COSMIC databases, the FANMOD software was used for the mining of gastric cancer-related motifs in a network with three vertices. The significant motif difference method was adopted to identify significantly different motifs in the normal and cancer states. Finally, we conducted a series of analyses of the significantly different motifs, including gene ontology, function annotation of genes, and model classification. A human signaling network was constructed, with 1643 nodes and 5089 regulating interactions. The network was configured to have the characteristics of other biological networks. There were 57,942 motifs marked with gastric cancer-related genes out of a total of 69,492 motifs, and 264 motifs were selected as significantly different motifs by calculating the significant motif difference (SMD) scores. Genes in significantly different motifs were mainly enriched in functions associated with cancer genesis, such as regulation of cell death, amino acid phosphorylation of proteins, and intracellular signaling cascades. The top five significantly different motifs were mainly cascade and positive feedback types. Almost all genes in the five motifs were cancer related, including EPOR, MAPK14, BCL2L1, KRT18, PTPN6, CASP3, TGFBR2, AR, and CASP7. The development of cancer might be curbed by inhibiting signal transductions upstream and downstream of the selected motifs.

  10. RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach

    Energy Technology Data Exchange (ETDEWEB)

    Novichkov, Pavel S.; Rodionov, Dmitry A.; Stavrovskaya, Elena D.; Novichkova, Elena S.; Kazakov, Alexey E.; Gelfand, Mikhail S.; Arkin, Adam P.; Mironov, Andrey A.; Dubchak, Inna

    2010-05-26

    RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov.

  11. CAGEd-oPOSSUM: motif enrichment analysis from CAGE-derived TSSs.

    Science.gov (United States)

    Arenillas, David J; Forrest, Alistair R R; Kawaji, Hideya; Lassmann, Timo; Wasserman, Wyeth W; Mathelier, Anthony

    2016-09-15

    With the emergence of large-scale Cap Analysis of Gene Expression (CAGE) datasets from individual labs and the FANTOM consortium, one can now analyze the cis-regulatory regions associated with gene transcription at an unprecedented level of refinement. By coupling transcription factor binding site (TFBS) enrichment analysis with CAGE-derived genomic regions, CAGEd-oPOSSUM can identify TFs that act as key regulators of genes involved in specific mammalian cell and tissue types. The webtool allows for the analysis of CAGE-derived transcription start sites (TSSs) either provided by the user or selected from ∼1300 mammalian samples from the FANTOM5 project with pre-computed TFBS predicted with JASPAR TF binding profiles. The tool helps power insights into the regulation of genes through the study of the specific usage of TSSs within specific cell types and/or under specific conditions. The CAGEd-oPOSUM web tool is implemented in Perl, MySQL and Apache and is available at http://cagedop.cmmt.ubc.ca/CAGEd_oPOSSUM CONTACTS: anthony.mathelier@ncmm.uio.no or wyeth@cmmt.ubc.ca Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  12. More than one kind of inference: re-examining what's learned in feature inference and classification.

    Science.gov (United States)

    Sweller, Naomi; Hayes, Brett K

    2010-08-01

    Three studies examined how task demands that impact on attention to typical or atypical category features shape the category representations formed through classification learning and inference learning. During training categories were learned via exemplar classification or by inferring missing exemplar features. In the latter condition inferences were made about missing typical features alone (typical feature inference) or about both missing typical and atypical features (mixed feature inference). Classification and mixed feature inference led to the incorporation of typical and atypical features into category representations, with both kinds of features influencing inferences about familiar (Experiments 1 and 2) and novel (Experiment 3) test items. Those in the typical inference condition focused primarily on typical features. Together with formal modelling, these results challenge previous accounts that have characterized inference learning as producing a focus on typical category features. The results show that two different kinds of inference learning are possible and that these are subserved by different kinds of category representations.

  13. Generative inference for cultural evolution.

    Science.gov (United States)

    Kandler, Anne; Powell, Adam

    2018-04-05

    One of the major challenges in cultural evolution is to understand why and how various forms of social learning are used in human populations, both now and in the past. To date, much of the theoretical work on social learning has been done in isolation of data, and consequently many insights focus on revealing the learning processes or the distributions of cultural variants that are expected to have evolved in human populations. In population genetics, recent methodological advances have allowed a greater understanding of the explicit demographic and/or selection mechanisms that underlie observed allele frequency distributions across the globe, and their change through time. In particular, generative frameworks-often using coalescent-based simulation coupled with approximate Bayesian computation (ABC)-have provided robust inferences on the human past, with no reliance on a priori assumptions of equilibrium. Here, we demonstrate the applicability and utility of generative inference approaches to the field of cultural evolution. The framework advocated here uses observed population-level frequency data directly to establish the likely presence or absence of particular hypothesized learning strategies. In this context, we discuss the problem of equifinality and argue that, in the light of sparse cultural data and the multiplicity of possible social learning processes, the exclusion of those processes inconsistent with the observed data might be the most instructive outcome. Finally, we summarize the findings of generative inference approaches applied to a number of case studies.This article is part of the theme issue 'Bridging cultural gaps: interdisciplinary studies in human cultural evolution'. © 2018 The Author(s).

  14. Romanian traditional motif - element of modernity in clothing

    Science.gov (United States)

    Doble, L.; Stan, O.; Suteu, M. D.; Albu, A.; Bohm, G.; Tsatsarou-Michalaki, A.; Gialinou, E.

    2017-10-01

    In this paper are presented the phases for improving from an aesthetic point of view a clothing item, the jacket respectively, with a straight cut for women using software design patterns, computerised graphics and textile different modern technologies including: industrial embroidery, digital printing, sublimation. In the first phase a documentation was prepared in the Ethnographic Museum of Transylvania from Cluj Napoca where more traditional motifs were selected specific to Transylvania etnographic region and were reintepreted and stylized whilst preserving the symbolism and color range specified to the area. For the styling phase was used CorelDraw vector graphics program that allows changing the shape, size and color of the drawings without affecting the identity of the pattern. In the patterns design phase Gemini CAD software was used and for the modeling and model development Optitex software was used. The part for garnishing the model was performed using Embrodery machine software reproducing the stylized motif identically. In order to obtain a significantly improved aesthetic look and an added artistic value the pattern chosen for the jacket was done using a combination of modern textile technologies. This has allowed the realization of a particular texture on the surface of the designed product, demonstrating that traditional patterns can be reintepreted in modern clothing

  15. THE MOTIF OF THE SECOND COMING IN RUSSIAN FANTASTIC FICTION

    Directory of Open Access Journals (Sweden)

    Tatyana I. Khoruzhenko

    2017-06-01

    Full Text Available The motif of the Second Coming of Christ takes a special place in Russian fantastic fiction at the turn of the millennium. In the recent decades allusions to the Gospel topic appears in increasing frequency in the genre of fantasy. The aim of the given article was to analyze the peculiarities of the depiction of the subject of Advent in Russian fantastic fiction. As the basis for the research the novels of Y. Voznesenskaya, N. Perumov, V. Khlumov, S. Lukyanenko and T. Ustimenko are of particular interest. The Advent motif appears in the story line of each of the novels in question. Though, the attitude of the authors to the image of the Savior and his second coming to the world fluctuates: from a respectful expectation (Y. Voznesenskaya, T. Ustimenko, S. Lukyanenko to the depiction of the Savior as a monster (N. Perumov. The possibility of an ambivalent interpretation of the Savior is the eloquent evidence of desacralization of this image. The profaning of the sacred is one of the tendencies of the modern popular culture. The genre of fantastic fiction, as a product of mass culture, has caught this trend quite precisely.

  16. Regulation of amyloid precursor protein processing by its KFERQ motif.

    Science.gov (United States)

    Park, Ji-Seon; Kim, Dong-Hou; Yoon, Seung-Yong

    2016-06-01

    Understanding of trafficking, processing, and degradation mechanisms of amyloid precursor protein (APP) is important because APP can be processed to produce β-amyloid (Aβ), a key pathogenic molecule in Alzheimer's disease (AD). Here, we found that APP contains KFERQ motif at its C-terminus, a consensus sequence for chaperone-mediated autophagy (CMA) or microautophagy which are another types of autophagy for degradation of pathogenic molecules in neurodegenerative diseases. Deletion of KFERQ in APP increased C-terminal fragments (CTFs) and secreted N-terminal fragments of APP and kept it away from lysosomes. KFERQ deletion did not abolish the interaction of APP or its cleaved products with heat shock cognate protein 70 (Hsc70), a protein necessary for CMA or microautophagy. These findings suggest that KFERQ motif is important for normal processing and degradation of APP to preclude the accumulation of APP-CTFs although it may not be important for CMA or microautophagy. [BMB Reports 2016; 49(6): 337-342].

  17. Network motif frequency vectors reveal evolving metabolic network organisation.

    Science.gov (United States)

    Pearcy, Nicole; Crofts, Jonathan J; Chuzhanova, Nadia

    2015-01-01

    At the systems level many organisms of interest may be described by their patterns of interaction, and as such, are perhaps best characterised via network or graph models. Metabolic networks, in particular, are fundamental to the proper functioning of many important biological processes, and thus, have been widely studied over the past decade or so. Such investigations have revealed a number of shared topological features, such as a short characteristic path-length, large clustering coefficient and hierarchical modular structure. However, the extent to which evolutionary and functional properties of metabolism manifest via this underlying network architecture remains unclear. In this paper, we employ a novel graph embedding technique, based upon low-order network motifs, to compare metabolic network structure for 383 bacterial species categorised according to a number of biological features. In particular, we introduce a new global significance score which enables us to quantify important evolutionary relationships that exist between organisms and their physical environments. Using this new approach, we demonstrate a number of significant correlations between environmental factors, such as growth conditions and habitat variability, and network motif structure, providing evidence that organism adaptability leads to increased complexities in the resultant metabolic networks.

  18. sick: The Spectroscopic Inference Crank

    Science.gov (United States)

    Casey, Andrew R.

    2016-03-01

    There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal

  19. Inferring network structure from cascades

    Science.gov (United States)

    Ghonge, Sushrut; Vural, Dervis Can

    2017-07-01

    Many physical, biological, and social phenomena can be described by cascades taking place on a network. Often, the activity can be empirically observed, but not the underlying network of interactions. In this paper we offer three topological methods to infer the structure of any directed network given a set of cascade arrival times. Our formulas hold for a very general class of models where the activation probability of a node is a generic function of its degree and the number of its active neighbors. We report high success rates for synthetic and real networks, for several different cascade models.

  20. SICK: THE SPECTROSCOPIC INFERENCE CRANK

    Energy Technology Data Exchange (ETDEWEB)

    Casey, Andrew R., E-mail: arc@ast.cam.ac.uk [Institute of Astronomy, University of Cambridge, Madingley Road, Cambdridge, CB3 0HA (United Kingdom)

    2016-03-15

    There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal

  1. Bayesian inference for Hawkes processes

    DEFF Research Database (Denmark)

    Rasmussen, Jakob Gulddahl

    The Hawkes process is a practically and theoretically important class of point processes, but parameter-estimation for such a process can pose various problems. In this paper we explore and compare two approaches to Bayesian inference. The first approach is based on the so-called conditional...... intensity function, while the second approach is based on an underlying clustering and branching structure in the Hawkes process. For practical use, MCMC (Markov chain Monte Carlo) methods are employed. The two approaches are compared numerically using three examples of the Hawkes process....

  2. Bayesian inference for Hawkes processes

    DEFF Research Database (Denmark)

    Rasmussen, Jakob Gulddahl

    2013-01-01

    The Hawkes process is a practically and theoretically important class of point processes, but parameter-estimation for such a process can pose various problems. In this paper we explore and compare two approaches to Bayesian inference. The first approach is based on the so-called conditional...... intensity function, while the second approach is based on an underlying clustering and branching structure in the Hawkes process. For practical use, MCMC (Markov chain Monte Carlo) methods are employed. The two approaches are compared numerically using three examples of the Hawkes process....

  3. Inference in hybrid Bayesian networks

    International Nuclear Information System (INIS)

    Langseth, Helge; Nielsen, Thomas D.; Rumi, Rafael; Salmeron, Antonio

    2009-01-01

    Since the 1980s, Bayesian networks (BNs) have become increasingly popular for building statistical models of complex systems. This is particularly true for boolean systems, where BNs often prove to be a more efficient modelling framework than traditional reliability techniques (like fault trees and reliability block diagrams). However, limitations in the BNs' calculation engine have prevented BNs from becoming equally popular for domains containing mixtures of both discrete and continuous variables (the so-called hybrid domains). In this paper we focus on these difficulties, and summarize some of the last decade's research on inference in hybrid Bayesian networks. The discussions are linked to an example model for estimating human reliability.

  4. SICK: THE SPECTROSCOPIC INFERENCE CRANK

    International Nuclear Information System (INIS)

    Casey, Andrew R.

    2016-01-01

    There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal

  5. Anion induced conformational preference of Cα NN motif residues in functional proteins.

    Science.gov (United States)

    Patra, Piya; Ghosh, Mahua; Banerjee, Raja; Chakrabarti, Jaydeb

    2017-12-01

    Among different ligand binding motifs, anion binding C α NN motif consisting of peptide backbone atoms of three consecutive residues are observed to be important for recognition of free anions, like sulphate or biphosphate and participate in different key functions. Here we study the interaction of sulphate and biphosphate with C α NN motif present in different proteins. Instead of total protein, a peptide fragment has been studied keeping C α NN motif flanked in between other residues. We use classical force field based molecular dynamics simulations to understand the stability of this motif. Our data indicate fluctuations in conformational preferences of the motif residues in absence of the anion. The anion gives stability to one of these conformations. However, the anion induced conformational preferences are highly sequence dependent and specific to the type of anion. In particular, the polar residues are more favourable compared to the other residues for recognising the anion. © 2017 Wiley Periodicals, Inc.

  6. I-Ad-binding peptides derived from unrelated protein antigens share a common structural motif

    DEFF Research Database (Denmark)

    Sette, A; Buus, S; Colon, S

    1988-01-01

    on the I-Ad binding of the immunogenic peptide OVA 323-339. The results obtained demonstrated the very permissive nature of Ag-Ia interaction. We also showed that unrelated peptides that are good I-Ad binders share a common structural motif and speculated that recognition of such motifs could represent...... that I-Ad molecules recognize a large library of Ag by virtue of common structural motifs present in peptides derived from phylogenetically unrelated proteins....

  7. Lucky Motifs in Chinese Folk Art: Interpreting Paper-cut from Chinese Shaanxi

    OpenAIRE

    Xuxiao WANG

    2013-01-01

    Paper-cut is not simply a form of traditional Chinese folk art. Lucky motifs developed in paper-cut certainly acquired profound cultural connotations. As paper-cut is a time-honoured skill across the nation, interpreting those motifs requires cultural receptiveness and anthropological sensitivity. The author of this article analyzes examples of paper-cut from Northern Shaanxi, China, to identify the cohesive motifs and explore the auspiciousness of the specific concepts of Fu, Lu, Shou, Xi. T...

  8. Low-dimensional morphospace of topological motifs in human fMRI brain networks

    Directory of Open Access Journals (Sweden)

    Sarah E. Morgan

    2018-06-01

    Full Text Available We present a low-dimensional morphospace of fMRI brain networks, where axes are defined in a data-driven manner based on the network motifs. The morphospace allows us to identify the key variations in healthy fMRI networks in terms of their underlying motifs, and we observe that two principal components (PCs can account for 97% of the motif variability. The first PC of the motif distribution is correlated with efficiency and inversely correlated with transitivity. Hence this axis approximately conforms to the well-known economical small-world trade-off between integration and segregation in brain networks. Finally, we show that the economical clustering generative model proposed by Vértes et al. (2012 can approximately reproduce the motif morphospace of the real fMRI brain networks, in contrast to other generative models. Overall, the motif morphospace provides a powerful way to visualize the relationships between network properties and to investigate generative or constraining factors in the formation of complex human brain functional networks. Motifs have been described as the building blocks of complex networks. Meanwhile, a morphospace allows networks to be placed in a common space and can reveal the relationships between different network properties and elucidate the driving forces behind network topology. We combine the concepts of motifs and morphospaces to create the first motif morphospace of fMRI brain networks. Crucially, the morphospace axes are defined by the motifs, in a data-driven manner. We observe strong correlations between the networks’ positions in morphospace and their global topological properties, suggesting that motif morphospaces are a powerful way to capture the topology of networks in a low-dimensional space and to compare generative models of brain networks. Motif morphospaces could also be used to study other complex networks’ topologies.

  9. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

    Directory of Open Access Journals (Sweden)

    Martin Juliette

    2011-06-01

    Full Text Available Abstract Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet, which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i ubiquitous motifs, shared by several superfamilies and (ii superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  10. Subjective randomness as statistical inference.

    Science.gov (United States)

    Griffiths, Thomas L; Daniels, Dylan; Austerweil, Joseph L; Tenenbaum, Joshua B

    2018-06-01

    Some events seem more random than others. For example, when tossing a coin, a sequence of eight heads in a row does not seem very random. Where do these intuitions about randomness come from? We argue that subjective randomness can be understood as the result of a statistical inference assessing the evidence that an event provides for having been produced by a random generating process. We show how this account provides a link to previous work relating randomness to algorithmic complexity, in which random events are those that cannot be described by short computer programs. Algorithmic complexity is both incomputable and too general to capture the regularities that people can recognize, but viewing randomness as statistical inference provides two paths to addressing these problems: considering regularities generated by simpler computing machines, and restricting the set of probability distributions that characterize regularity. Building on previous work exploring these different routes to a more restricted notion of randomness, we define strong quantitative models of human randomness judgments that apply not just to binary sequences - which have been the focus of much of the previous work on subjective randomness - but also to binary matrices and spatial clustering. Copyright © 2018 Elsevier Inc. All rights reserved.

  11. Hybrid DNA i-motif: Aminoethylprolyl-PNA (pC5) enhance the stability of DNA (dC5) i-motif structure.

    Science.gov (United States)

    Gade, Chandrasekhar Reddy; Sharma, Nagendra K

    2017-12-15

    This report describes the synthesis of C-rich sequence, cytosine pentamer, of aep-PNA and its biophysical studies for the formation of hybrid DNA:aep-PNAi-motif structure with DNA cytosine pentamer (dC 5 ) under acidic pH conditions. Herein, the CD/UV/NMR/ESI-Mass studies strongly support the formation of stable hybrid DNA i-motif structure with aep-PNA even near acidic conditions. Hence aep-PNA C-rich sequence cytosine could be considered as potential DNA i-motif stabilizing agents in vivo conditions. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae

    Directory of Open Access Journals (Sweden)

    Christian J. Michel

    2017-12-01

    Full Text Available A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C 3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X , using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X , in the complete genome of the yeast Saccharomyces cerevisiae. Several properties of X motifs are identified by basic statistics (at the frequency level, and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R . We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae. We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae, but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions. This property is true for all cardinalities of X motifs (from 4 to 20 and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non- X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together

  13. Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae.

    Science.gov (United States)

    Michel, Christian J; Ngoune, Viviane Nguefack; Poch, Olivier; Ripp, Raymond; Thompson, Julie D

    2017-12-03

    A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading) frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X, using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X, in the complete genome of the yeast Saccharomyces cerevisiae . Several properties of X motifs are identified by basic statistics (at the frequency level), and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R. We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae . We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae , but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions). This property is true for all cardinalities of X motifs (from 4 to 20) and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non-X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together, represent the first

  14. Purification and functional motifs of the recombinant ATPase of orf virus.

    Science.gov (United States)

    Lin, Fong-Yuan; Chan, Kun-Wei; Wang, Chi-Young; Wong, Min-Liang; Hsu, Wei-Li

    2011-10-01

    Our previous study showed that the recombinant ATPase encoded by the A32L gene of orf virus displayed ATP hydrolysis activity as predicted from its amino acids sequence. This viral ATPase contains four known functional motifs (motifs I-IV) and a novel AYDG motif; they are essential for ATP hydrolysis reaction by binding ATP and magnesium ions. The motifs I and II correspond with the Walker A and B motifs of the typical ATPase, respectively. To examine the biochemical roles of these five conserved motifs, recombinant ATPases of five deletion mutants derived from the Taiping strain were expressed and purified. Their ATPase functions were assayed and compared with those of two wild type strains, Taiping and Nantou isolated in Taiwan. Our results showed that deletions at motifs I-III or IV exhibited lower activity than that of the wild type. Interestingly, deletion of AYDG motif decreased the ATPase activity more significantly than those of motifs I-IV deletions. Divalent ions such as magnesium and calcium were essential for ATPase activity. Moreover, our recombinant proteins of orf virus also demonstrated GTPase activity, though weaker than the original ATPase activity. Copyright © 2011 Elsevier Inc. All rights reserved.

  15. Systematic comparison of the response properties of protein and RNA mediated gene regulatory motifs.

    Science.gov (United States)

    Iyengar, Bharat Ravi; Pillai, Beena; Venkatesh, K V; Gadgil, Chetan J

    2017-05-30

    We present a framework enabling the dissection of the effects of motif structure (feedback or feedforward), the nature of the controller (RNA or protein), and the regulation mode (transcriptional, post-transcriptional or translational) on the response to a step change in the input. We have used a common model framework for gene expression where both motif structures have an activating input and repressing regulator, with the same set of parameters, to enable a comparison of the responses. We studied the global sensitivity of the system properties, such as steady-state gain, overshoot, peak time, and peak duration, to parameters. We find that, in all motifs, overshoot correlated negatively whereas peak duration varied concavely with peak time. Differences in the other system properties were found to be mainly dependent on the nature of the controller rather than the motif structure. Protein mediated motifs showed a higher degree of adaptation i.e. a tendency to return to baseline levels; in particular, feedforward motifs exhibited perfect adaptation. RNA mediated motifs had a mild regulatory effect; they also exhibited a lower peaking tendency and mean overshoot. Protein mediated feedforward motifs showed higher overshoot and lower peak time compared to the corresponding feedback motifs.

  16. Sulfur-induced structural motifs on copper and gold surfaces

    Energy Technology Data Exchange (ETDEWEB)

    Walen, Holly [Iowa State Univ., Ames, IA (United States)

    2016-01-01

    The interaction of sulfur with copper and gold surfaces plays a fundamental role in important phenomena that include coarsening of surface nanostructures, and self-assembly of alkanethiols. Here, we identify and analyze unique sulfur-induced structural motifs observed on the low-index surfaces of these two metals. We seek out these structures in an effort to better understand the fundamental interactions between these metals and sulfur that lends to the stability and favorability of metal-sulfur complexes vs. chemisorbed atomic sulfur. The experimental observations presented here—made under identical conditions—together with extensive DFT analyses, allow comparisons and insights into factors that favor the existence of metal-sulfur complexes, vs. chemisorbed atomic sulfur, on metal terraces. We believe this data will be instrumental in better understanding the complex phenomena occurring between the surfaces of coinage metals and sulfur.

  17. Sequential dynamics in the motif of excitatory coupled elements

    Science.gov (United States)

    Korotkov, Alexander G.; Kazakov, Alexey O.; Osipov, Grigory V.

    2015-11-01

    In this article a new model of motif (small ensemble) of neuron-like elements is proposed. It is built with the use of the generalized Lotka-Volterra model with excitatory couplings. The main motivation for this work comes from the problems of neuroscience where excitatory couplings are proved to be the predominant type of interaction between neurons of the brain. In this paper it is shown that there are two modes depending on the type of coupling between the elements: the mode with a stable heteroclinic cycle and the mode with a stable limit cycle. Our second goal is to examine the chaotic dynamics of the generalized three-dimensional Lotka-Volterra model.

  18. Study on online community user motif using web usage mining

    Science.gov (United States)

    Alphy, Meera; Sharma, Ajay

    2016-04-01

    The Web usage mining is the application of data mining, which is used to extract useful information from the online community. The World Wide Web contains at least 4.73 billion pages according to Indexed Web and it contains at least 228.52 million pages according Dutch Indexed web on 6th august 2015, Thursday. It’s difficult to get needed data from these billions of web pages in World Wide Web. Here is the importance of web usage mining. Personalizing the search engine helps the web user to identify the most used data in an easy way. It reduces the time consumption; automatic site search and automatic restore the useful sites. This study represents the old techniques to latest techniques used in pattern discovery and analysis in web usage mining from 1996 to 2015. Analyzing user motif helps in the improvement of business, e-commerce, personalisation and improvement of websites.

  19. μXRF analysis of decoration motifs on Majolica pottery

    International Nuclear Information System (INIS)

    Padilla Lavarez, Roman; Van Espen, Pierr M.; Janssens, K; Schalm, O.

    2001-01-01

    μXRF analysis of decoration motifs on Majolica pottery in fragments corresponding to several Majolica types was carried out using an spectrometer comprising a low power Mo X-ray tube and a elliptic-shape concentration lens with a 60 um spot. Both surface scanning and spot measurements were carried a out, allowing the qualitative identification of the inorganic pigments used for the surface painting decoration and the quantitative analysis of the main glaze composition. The absence of interference signal arising from the excitation on the underlying paste when analysing thin-lead glazing was evaluated, allowing ensuring the suitable of the analytical procedures. A distinction was found between different types of majolica by the composition of the lead tin glaze enamel and by the presence of other elements in the blue, black and orange decoration

  20. Lower complexity bounds for lifted inference

    DEFF Research Database (Denmark)

    Jaeger, Manfred

    2015-01-01

    instances of the model. Numerous approaches for such “lifted inference” techniques have been proposed. While it has been demonstrated that these techniques will lead to significantly more efficient inference on some specific models, there are only very recent and still quite restricted results that show...... the feasibility of lifted inference on certain syntactically defined classes of models. Lower complexity bounds that imply some limitations for the feasibility of lifted inference on more expressive model classes were established earlier in Jaeger (2000; Jaeger, M. 2000. On the complexity of inference about...... that under the assumption that NETIME≠ETIME, there is no polynomial lifted inference algorithm for knowledge bases of weighted, quantifier-, and function-free formulas. Further strengthening earlier results, this is also shown to hold for approximate inference and for knowledge bases not containing...

  1. Statistical inference for financial engineering

    CERN Document Server

    Taniguchi, Masanobu; Ogata, Hiroaki; Taniai, Hiroyuki

    2014-01-01

    This monograph provides the fundamentals of statistical inference for financial engineering and covers some selected methods suitable for analyzing financial time series data. In order to describe the actual financial data, various stochastic processes, e.g. non-Gaussian linear processes, non-linear processes, long-memory processes, locally stationary processes etc. are introduced and their optimal estimation is considered as well. This book also includes several statistical approaches, e.g., discriminant analysis, the empirical likelihood method, control variate method, quantile regression, realized volatility etc., which have been recently developed and are considered to be powerful tools for analyzing the financial data, establishing a new bridge between time series and financial engineering. This book is well suited as a professional reference book on finance, statistics and statistical financial engineering. Readers are expected to have an undergraduate-level knowledge of statistics.

  2. Type inference for correspondence types

    DEFF Research Database (Denmark)

    Hüttel, Hans; Gordon, Andy; Hansen, Rene Rydhof

    2009-01-01

    We present a correspondence type/effect system for authenticity in a π-calculus with polarized channels, dependent pair types and effect terms and show how one may, given a process P and an a priori type environment E, generate constraints that are formulae in the Alternating Least Fixed......-Point (ALFP) logic. We then show how a reasonable model of the generated constraints yields a type/effect assignment such that P becomes well-typed with respect to E if and only if this is possible. The formulae generated satisfy a finite model property; a system of constraints is satisfiable if and only...... if it has a finite model. As a consequence, we obtain the result that type/effect inference in our system is polynomial-time decidable....

  3. Causal inference in public health.

    Science.gov (United States)

    Glass, Thomas A; Goodman, Steven N; Hernán, Miguel A; Samet, Jonathan M

    2013-01-01

    Causal inference has a central role in public health; the determination that an association is causal indicates the possibility for intervention. We review and comment on the long-used guidelines for interpreting evidence as supporting a causal association and contrast them with the potential outcomes framework that encourages thinking in terms of causes that are interventions. We argue that in public health this framework is more suitable, providing an estimate of an action's consequences rather than the less precise notion of a risk factor's causal effect. A variety of modern statistical methods adopt this approach. When an intervention cannot be specified, causal relations can still exist, but how to intervene to change the outcome will be unclear. In application, the often-complex structure of causal processes needs to be acknowledged and appropriate data collected to study them. These newer approaches need to be brought to bear on the increasingly complex public health challenges of our globalized world.

  4. Identification of coupling DNA motif pairs on long-range chromatin interactions in human K562 cells

    KAUST Repository

    Wong, Ka-Chun; Li, Yue; Peng, Chengbin

    2015-01-01

    Motivation: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. Results: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. © The Author 2015. Published by Oxford University Press.

  5. Identification of coupling DNA motif pairs on long-range chromatin interactions in human K562 cells

    KAUST Repository

    Wong, Ka-Chun

    2015-09-27

    Motivation: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. Results: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. © The Author 2015. Published by Oxford University Press.

  6. Structural fragment clustering reveals novel structural and functional motifs in α-helical transmembrane proteins

    Directory of Open Access Journals (Sweden)

    Vassilev Boris

    2010-04-01

    Full Text Available Abstract Background A large proportion of an organism's genome encodes for membrane proteins. Membrane proteins are important for many cellular processes, and several diseases can be linked to mutations in them. With the tremendous growth of sequence data, there is an increasing need to reliably identify membrane proteins from sequence, to functionally annotate them, and to correctly predict their topology. Results We introduce a technique called structural fragment clustering, which learns sequential motifs from 3D structural fragments. From over 500,000 fragments, we obtain 213 statistically significant, non-redundant, and novel motifs that are highly specific to α-helical transmembrane proteins. From these 213 motifs, 58 of them were assigned to function and checked in the scientific literature for a biological assessment. Seventy percent of the motifs are found in co-factor, ligand, and ion binding sites, 30% at protein interaction interfaces, and 12% bind specific lipids such as glycerol or cardiolipins. The vast majority of motifs (94% appear across evolutionarily unrelated families, highlighting the modularity of functional design in membrane proteins. We describe three novel motifs in detail: (1 a dimer interface motif found in voltage-gated chloride channels, (2 a proton transfer motif found in heme-copper oxidases, and (3 a convergently evolved interface helix motif found in an aspartate symporter, a serine protease, and cytochrome b. Conclusions Our findings suggest that functional modules exist in membrane proteins, and that they occur in completely different evolutionary contexts and cover different binding sites. Structural fragment clustering allows us to link sequence motifs to function through clusters of structural fragments. The sequence motifs can be applied to identify and characterize membrane proteins in novel genomes.

  7. Inference Attacks and Control on Database Structures

    Directory of Open Access Journals (Sweden)

    Muhamed Turkanovic

    2015-02-01

    Full Text Available Today’s databases store information with sensitivity levels that range from public to highly sensitive, hence ensuring confidentiality can be highly important, but also requires costly control. This paper focuses on the inference problem on different database structures. It presents possible treats on privacy with relation to the inference, and control methods for mitigating these treats. The paper shows that using only access control, without any inference control is inadequate, since these models are unable to protect against indirect data access. Furthermore, it covers new inference problems which rise from the dimensions of new technologies like XML, semantics, etc.

  8. Wayward Warriors: The Viking Motif in Swedish and English Children's Literature

    Science.gov (United States)

    Sundmark, Björn

    2014-01-01

    In this article the Viking motif in children's literature is explored--from its roots in (adult) nationalist and antiquarian discourse, over pedagogical and historical texts for children, to the eventual diversification (or dissolution) of the motif into different genres and forms. The focus is on Swedish Viking narratives, but points of…

  9. Physical-chemical property based sequence motifs and methods regarding same

    Science.gov (United States)

    Braun, Werner [Friendswood, TX; Mathura, Venkatarajan S [Sarasota, FL; Schein, Catherine H [Friendswood, TX

    2008-09-09

    A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.

  10. Gene Isolation Using Degenerate Primers Targeting Protein Motif: A Laboratory Exercise

    Science.gov (United States)

    Yeo, Brandon Pei Hui; Foong, Lian Chee; Tam, Sheh May; Lee, Vivian; Hwang, Siaw San

    2018-01-01

    Structures and functions of protein motifs are widely included in many biology-based course syllabi. However, little emphasis is placed to link this knowledge to applications in biotechnology to enhance the learning experience. Here, the conserved motifs of nucleotide binding site-leucine rich repeats (NBS-LRR) proteins, successfully used for the…

  11. MOMFER: A Search Engine of Thompson's Motif-Index of Folk Literature

    NARCIS (Netherlands)

    Karsdorp, F.B.; van der Meulen, Marten; Meder, Theo; van den Bosch, Antal

    2015-01-01

    More than fifty years after the first edition of Thompson's seminal Motif-Indexof Folk Literature, we present an online search engine tailored to fully disclose the index digitally. This search engine, called MOMFER, greatly enhances the searchability of the Motif-Index and provides exciting new

  12. Aggregation of topological motifs in the Escherichia coli transcriptional regulatory network

    Directory of Open Access Journals (Sweden)

    Barabási Albert-László

    2004-01-01

    Full Text Available Abstract Background Transcriptional regulation of cellular functions is carried out through a complex network of interactions among transcription factors and the promoter regions of genes and operons regulated by them.To better understand the system-level function of such networks simplification of their architecture was previously achieved by identifying the motifs present in the network, which are small, overrepresented, topologically distinct regulatory interaction patterns (subgraphs. However, the interaction of such motifs with each other, and their form of integration into the full network has not been previously examined. Results By studying the transcriptional regulatory network of the bacterium, Escherichia coli, we demonstrate that the two previously identified motif types in the network (i.e., feed-forward loops and bi-fan motifs do not exist in isolation, but rather aggregate into homologous motif clusters that largely overlap with known biological functions. Moreover, these clusters further coalesce into a supercluster, thus establishing distinct topological hierarchies that show global statistical properties similar to the whole network. Targeted removal of motif links disintegrates the network into small, isolated clusters, while random disruptions of equal number of links do not cause such an effect. Conclusion Individual motifs aggregate into homologous motif clusters and a supercluster forming the backbone of the E. coli transcriptional regulatory network and play a central role in defining its global topological organization.

  13. High affinity recognition of a Phytophthora protein by Arabidopsis via an RGD motif

    NARCIS (Netherlands)

    Senchou, V.; Weide, R.L.; Carrasco, A.; Bouyssou, H.; Pont-Lezica, R.; Govers, F.; Canut, H.

    2004-01-01

    The RGD tripeptide sequence, a cell adhesion motif present in several extracellular matrix proteins of mammalians, is involved in numerous plant processes. In plant-pathogen interactions, the RGD motif is believed to reduce plant defence responses by disrupting adhesions between the cell wall and

  14. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

    Directory of Open Access Journals (Sweden)

    Lynch Michael

    2010-05-01

    Full Text Available Abstract Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1 shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2 are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3 reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  15. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium.

    Science.gov (United States)

    Catania, Francesco; Lynch, Michael

    2010-05-04

    In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  16. Mechanism for activation of the growth factor-activated AGC kinases by turn motif phosphorylation

    DEFF Research Database (Denmark)

    Hauge, Camilla; Antal, Torben L; Hirschberg, Daniel

    2007-01-01

    investigated the role of the third, so-called turn motif phosphate, also located in the tail, in the AGC kinases PKB, S6K, RSK, MSK, PRK and PKC. We report cooperative action of the HM phosphate and the turn motif phosphate, because it binds a phosphoSer/Thr-binding site above the glycine-rich loop within...

  17. Proteome-level assessment of origin, prevalence and function of Leucine-Aspartic Acid (LD) motifs

    KAUST Repository

    Alam, Tanvir

    2018-03-11

    Short Linear Motifs (SLiMs) contribute to almost every cellular function by connecting appropriate protein partners. Accurate prediction of SLiMs is difficult due to their shortness and sequence degeneracy. Leucine-aspartic acid (LD) motifs are SLiMs that link paxillin family proteins to factors controlling (cancer) cell adhesion, motility and survival. The existence and importance of LD motifs beyond the paxillin family is poorly understood. To enable a proteome-wide assessment of these motifs, we developed an active-learning based framework that iteratively integrates computational predictions with experimental validation. Our analysis of the human proteome identified a dozen proteins that contain LD motifs, all being involved in cell adhesion and migration, and revealed a new type of inverse LD motif consensus. Our evolutionary analysis suggested that LD motif signalling originated in the common unicellular ancestor of opisthokonts and amoebozoa by co-opting nuclear export sequences. Inter-species comparison revealed a conserved LD signalling core, and reveals the emergence of species-specific adaptive connections, while maintaining a strong functional focus of the LD motif interactome. Collectively, our data elucidate the mechanisms underlying the origin and adaptation of an ancestral SLiM.

  18. A proposed vestigial translation initiation motif in VP1 of hepatitis A virus.

    Science.gov (United States)

    Kang, Jeong-Ah; Funkhouser, Ann W

    2002-07-01

    The internal ribosome entry site (IRES) of picornaviruses has a 3' polypyrimidine tract (PPT) 16-24 bases upstream of an AUG triplet (PPT/AUG motif). This motif is critical in determining the efficiency of cap-independent translation. HAV has a conserved PPT/AUG motif consisting of a nine base sequence (AGGUUUUUC) 23 bases upstream of the preferred AUG start codon. This HAV-specific PPT/AUG motif is repeated and conserved in VP1 of HAV, but not of other picornaviruses. We proposed that the PPT/AUG motif in the open reading frame initiated translation and/or had an impact on the life cycle of the virus. In vitro translation of mutant bicistronic mRNAs and growth in cell culture of mutant viruses provided no evidence that the VP1 PPT/AUG motif had any impact on either translation or growth. HAV differs from other picornaviruses in its inefficient growth in cell culture. Since the HAV-specific PPT/AUG motif is found in only 1 in 300,000 reported viral sequences outside the hepatovirus genus, this motif may be a vestigial translation initiation element and may have played a role in determining the unusual phenotype of HAV.

  19. Genome-wide conserved consensus transcription factor binding motifs are hyper-methylated

    Directory of Open Access Journals (Sweden)

    Down Thomas A

    2010-09-01

    Full Text Available Abstract Background DNA methylation can regulate gene expression by modulating the interaction between DNA and proteins or protein complexes. Conserved consensus motifs exist across the human genome ("predicted transcription factor binding sites": "predicted TFBS" but the large majority of these are proven by chromatin immunoprecipitation and high throughput sequencing (ChIP-seq not to be biological transcription factor binding sites ("empirical TFBS". We hypothesize that DNA methylation at conserved consensus motifs prevents promiscuous or disorderly transcription factor binding. Results Using genome-wide methylation maps of the human heart and sperm, we found that all conserved consensus motifs as well as the subset of those that reside outside CpG islands have an aggregate profile of hyper-methylation. In contrast, empirical TFBS with conserved consensus motifs have a profile of hypo-methylation. 40% of empirical TFBS with conserved consensus motifs resided in CpG islands whereas only 7% of all conserved consensus motifs were in CpG islands. Finally we further identified a minority subset of TF whose profiles are either hypo-methylated or neutral at their respective conserved consensus motifs implicating that these TF may be responsible for establishing or maintaining an un-methylated DNA state, or whose binding is not regulated by DNA methylation. Conclusions Our analysis supports the hypothesis that at least for a subset of TF, empirical binding to conserved consensus motifs genome-wide may be controlled by DNA methylation.

  20. Thermal Stability of Modified i-Motif Oligonucleotides with Naphthalimide Intercalating Nucleic Acids

    DEFF Research Database (Denmark)

    El-Sayed, Ahmed Ali; Pedersen, Erik B.; Khaireldin, Nahid Y.

    2016-01-01

    In continuation of our investigation of characteristics and thermodynamic properties of the i-motif 5′-d[(CCCTAA)3CCCT)] upon insertion of intercalating nucleotides into the cytosine-rich oligonucleotide, this article evaluates the stabilities of i-motif oligonucleotides upon insertion of naphtha......In continuation of our investigation of characteristics and thermodynamic properties of the i-motif 5′-d[(CCCTAA)3CCCT)] upon insertion of intercalating nucleotides into the cytosine-rich oligonucleotide, this article evaluates the stabilities of i-motif oligonucleotides upon insertion...... of naphthalimide (1H-benzo[de]isoquinoline-1,3(2H)-dione) as the intercalating nucleic acid. The stabilities of i-motif structures with inserted naphthalimide intercalating nucleotides were studied using UV melting temperatures (Tm) and circular dichroism spectra at different pH values and conditions (crowding...

  1. I-motif DNA structures are formed in the nuclei of human cells

    Science.gov (United States)

    Zeraati, Mahdi; Langley, David B.; Schofield, Peter; Moye, Aaron L.; Rouet, Romain; Hughes, William E.; Bryan, Tracy M.; Dinger, Marcel E.; Christ, Daniel

    2018-06-01

    Human genome function is underpinned by the primary storage of genetic information in canonical B-form DNA, with a second layer of DNA structure providing regulatory control. I-motif structures are thought to form in cytosine-rich regions of the genome and to have regulatory functions; however, in vivo evidence for the existence of such structures has so far remained elusive. Here we report the generation and characterization of an antibody fragment (iMab) that recognizes i-motif structures with high selectivity and affinity, enabling the detection of i-motifs in the nuclei of human cells. We demonstrate that the in vivo formation of such structures is cell-cycle and pH dependent. Furthermore, we provide evidence that i-motif structures are formed in regulatory regions of the human genome, including promoters and telomeric regions. Our results support the notion that i-motif structures provide key regulatory roles in the genome.

  2. LAIT: a local ancestry inference toolkit.

    Science.gov (United States)

    Hui, Daniel; Fang, Zhou; Lin, Jerome; Duan, Qing; Li, Yun; Hu, Ming; Chen, Wei

    2017-09-06

    Inferring local ancestry in individuals of mixed ancestry has many applications, most notably in identifying disease-susceptible loci that vary among different ethnic groups. Many software packages are available for inferring local ancestry in admixed individuals. However, most of these existing software packages require specific formatted input files and generate output files in various types, yielding practical inconvenience. We developed a tool set, Local Ancestry Inference Toolkit (LAIT), which can convert standardized files into software-specific input file formats as well as standardize and summarize inference results for four popular local ancestry inference software: HAPMIX, LAMP, LAMP-LD, and ELAI. We tested LAIT using both simulated and real data sets and demonstrated that LAIT provides convenience to run multiple local ancestry inference software. In addition, we evaluated the performance of local ancestry software among different supported software packages, mainly focusing on inference accuracy and computational resources used. We provided a toolkit to facilitate the use of local ancestry inference software, especially for users with limited bioinformatics background.

  3. Forward and backward inference in spatial cognition.

    Directory of Open Access Journals (Sweden)

    Will D Penny

    Full Text Available This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of 'lower-level' computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus.

  4. Generative Inferences Based on Learned Relations

    Science.gov (United States)

    Chen, Dawn; Lu, Hongjing; Holyoak, Keith J.

    2017-01-01

    A key property of relational representations is their "generativity": From partial descriptions of relations between entities, additional inferences can be drawn about other entities. A major theoretical challenge is to demonstrate how the capacity to make generative inferences could arise as a result of learning relations from…

  5. Inference in models with adaptive learning

    NARCIS (Netherlands)

    Chevillon, G.; Massmann, M.; Mavroeidis, S.

    2010-01-01

    Identification of structural parameters in models with adaptive learning can be weak, causing standard inference procedures to become unreliable. Learning also induces persistent dynamics, and this makes the distribution of estimators and test statistics non-standard. Valid inference can be

  6. Fiducial inference - A Neyman-Pearson interpretation

    NARCIS (Netherlands)

    Salome, D; VonderLinden, W; Dose,; Fischer, R; Preuss, R

    1999-01-01

    Fisher's fiducial argument is a tool for deriving inferences in the form of a probability distribution on the parameter space, not based on Bayes's Theorem. Lindley established that in exceptional situations fiducial inferences coincide with posterior distributions; in the other situations fiducial

  7. Uncertainty in prediction and in inference

    NARCIS (Netherlands)

    Hilgevoord, J.; Uffink, J.

    1991-01-01

    The concepts of uncertainty in prediction and inference are introduced and illustrated using the diffraction of light as an example. The close re-lationship between the concepts of uncertainty in inference and resolving power is noted. A general quantitative measure of uncertainty in

  8. Causal inference in economics and marketing.

    Science.gov (United States)

    Varian, Hal R

    2016-07-05

    This is an elementary introduction to causal inference in economics written for readers familiar with machine learning methods. The critical step in any causal analysis is estimating the counterfactual-a prediction of what would have happened in the absence of the treatment. The powerful techniques used in machine learning may be useful for developing better estimates of the counterfactual, potentially improving causal inference.

  9. Nonparametric predictive inference in statistical process control

    NARCIS (Netherlands)

    Arts, G.R.J.; Coolen, F.P.A.; Laan, van der P.

    2000-01-01

    New methods for statistical process control are presented, where the inferences have a nonparametric predictive nature. We consider several problems in process control in terms of uncertainties about future observable random quantities, and we develop inferences for these random quantities hased on

  10. The Impact of Disablers on Predictive Inference

    Science.gov (United States)

    Cummins, Denise Dellarosa

    2014-01-01

    People consider alternative causes when deciding whether a cause is responsible for an effect (diagnostic inference) but appear to neglect them when deciding whether an effect will occur (predictive inference). Five experiments were conducted to test a 2-part explanation of this phenomenon: namely, (a) that people interpret standard predictive…

  11. Compiling Relational Bayesian Networks for Exact Inference

    DEFF Research Database (Denmark)

    Jaeger, Manfred; Darwiche, Adnan; Chavira, Mark

    2006-01-01

    We describe in this paper a system for exact inference with relational Bayesian networks as defined in the publicly available PRIMULA tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference...

  12. Compiling Relational Bayesian Networks for Exact Inference

    DEFF Research Database (Denmark)

    Jaeger, Manfred; Chavira, Mark; Darwiche, Adnan

    2004-01-01

    We describe a system for exact inference with relational Bayesian networks as defined in the publicly available \\primula\\ tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference by evaluating...

  13. Extended likelihood inference in reliability

    International Nuclear Information System (INIS)

    Martz, H.F. Jr.; Beckman, R.J.; Waller, R.A.

    1978-10-01

    Extended likelihood methods of inference are developed in which subjective information in the form of a prior distribution is combined with sampling results by means of an extended likelihood function. The extended likelihood function is standardized for use in obtaining extended likelihood intervals. Extended likelihood intervals are derived for the mean of a normal distribution with known variance, the failure-rate of an exponential distribution, and the parameter of a binomial distribution. Extended second-order likelihood methods are developed and used to solve several prediction problems associated with the exponential and binomial distributions. In particular, such quantities as the next failure-time, the number of failures in a given time period, and the time required to observe a given number of failures are predicted for the exponential model with a gamma prior distribution on the failure-rate. In addition, six types of life testing experiments are considered. For the binomial model with a beta prior distribution on the probability of nonsurvival, methods are obtained for predicting the number of nonsurvivors in a given sample size and for predicting the required sample size for observing a specified number of nonsurvivors. Examples illustrate each of the methods developed. Finally, comparisons are made with Bayesian intervals in those cases where these are known to exist

  14. Reinforcement learning or active inference?

    Science.gov (United States)

    Friston, Karl J; Daunizeau, Jean; Kiebel, Stefan J

    2009-07-29

    This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  15. Reinforcement learning or active inference?

    Directory of Open Access Journals (Sweden)

    Karl J Friston

    2009-07-01

    Full Text Available This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.

  16. Active inference and epistemic value.

    Science.gov (United States)

    Friston, Karl; Rigoli, Francesco; Ognibene, Dimitri; Mathys, Christoph; Fitzgerald, Thomas; Pezzulo, Giovanni

    2015-01-01

    We offer a formal treatment of choice behavior based on the premise that agents minimize the expected free energy of future outcomes. Crucially, the negative free energy or quality of a policy can be decomposed into extrinsic and epistemic (or intrinsic) value. Minimizing expected free energy is therefore equivalent to maximizing extrinsic value or expected utility (defined in terms of prior preferences or goals), while maximizing information gain or intrinsic value (or reducing uncertainty about the causes of valuable outcomes). The resulting scheme resolves the exploration-exploitation dilemma: Epistemic value is maximized until there is no further information gain, after which exploitation is assured through maximization of extrinsic value. This is formally consistent with the Infomax principle, generalizing formulations of active vision based upon salience (Bayesian surprise) and optimal decisions based on expected utility and risk-sensitive (Kullback-Leibler) control. Furthermore, as with previous active inference formulations of discrete (Markovian) problems, ad hoc softmax parameters become the expected (Bayes-optimal) precision of beliefs about, or confidence in, policies. This article focuses on the basic theory, illustrating the ideas with simulations. A key aspect of these simulations is the similarity between precision updates and dopaminergic discharges observed in conditioning paradigms.

  17. Ancient Biomolecules and Evolutionary Inference.

    Science.gov (United States)

    Cappellini, Enrico; Prohaska, Ana; Racimo, Fernando; Welker, Frido; Pedersen, Mikkel Winther; Allentoft, Morten E; de Barros Damgaard, Peter; Gutenbrunner, Petra; Dunne, Julie; Hammann, Simon; Roffet-Salque, Mélanie; Ilardo, Melissa; Moreno-Mayar, J Víctor; Wang, Yucheng; Sikora, Martin; Vinner, Lasse; Cox, Jürgen; Evershed, Richard P; Willerslev, Eske

    2018-04-25

    Over the last decade, studies of ancient biomolecules-particularly ancient DNA, proteins, and lipids-have revolutionized our understanding of evolutionary history. Though initially fraught with many challenges, the field now stands on firm foundations. Researchers now successfully retrieve nucleotide and amino acid sequences, as well as lipid signatures, from progressively older samples, originating from geographic areas and depositional environments that, until recently, were regarded as hostile to long-term preservation of biomolecules. Sampling frequencies and the spatial and temporal scope of studies have also increased markedly, and with them the size and quality of the data sets generated. This progress has been made possible by continuous technical innovations in analytical methods, enhanced criteria for the selection of ancient samples, integrated experimental methods, and advanced computational approaches. Here, we discuss the history and current state of ancient biomolecule research, its applications to evolutionary inference, and future directions for this young and exciting field. Expected final online publication date for the Annual Review of Biochemistry Volume 87 is June 20, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

  18. A Repeating Sulfated Galactan Motif Resuscitates Dormant Micrococcus luteus Bacteria.

    Science.gov (United States)

    Böttcher, Thomas; Szamosvári, Dávid; Clardy, Jon

    2018-07-01

    Only a small fraction of bacteria can autonomously initiate growth on agar plates. Nongrowing bacteria typically enter a metabolically inactive dormant state and require specific chemical trigger factors or signals to exit this state and to resume growth. Micrococcus luteus has become a model organism for this important yet poorly understood phenomenon. Only a few resuscitation signals have been described to date, and all of them are produced endogenously by bacterial species. We report the discovery of a novel type of resuscitation signal that allows M. luteus to grow on agar but not agarose plates. Fractionation of the agar polysaccharide complex and sulfation of agarose allowed us to identify the signal as highly sulfated saccharides found in agar or carrageenans. Purification of hydrolyzed κ-carrageenan ultimately led to the identification of the signal as a small fragment of a large linear polysaccharide, i.e., an oligosaccharide of five or more sugars with a repeating disaccharide motif containing d-galactose-4-sulfate (G4S) 1,4-linked to 3,6-anhydro-α-d-galactose (DA), G4S-(DA-G4S) n ≥2 IMPORTANCE Most environmental bacteria cannot initiate growth on agar plates, but they can flourish on the same plates once growth is initiated. While there are a number of names for and manifestations of this phenomenon, the underlying cause appears to be the requirement for a molecular signal indicating safe growing conditions. Micrococcus luteus has become a model organism for studying this growth initiation process, often called resuscitation, because of its apparent connection with the persistent or dormant form of Mycobacterium tuberculosis , an important human pathogen. In this report, we identify a highly sulfated saccharide from agar or carrageenans that robustly resuscitates dormant M. luteus on agarose plates. We identified and characterized the signal as a small repeating disaccharide motif. Our results indicate that signals inherent in or absent from the

  19. Bayesian Inference Methods for Sparse Channel Estimation

    DEFF Research Database (Denmark)

    Pedersen, Niels Lovmand

    2013-01-01

    This thesis deals with sparse Bayesian learning (SBL) with application to radio channel estimation. As opposed to the classical approach for sparse signal representation, we focus on the problem of inferring complex signals. Our investigations within SBL constitute the basis for the development...... of Bayesian inference algorithms for sparse channel estimation. Sparse inference methods aim at finding the sparse representation of a signal given in some overcomplete dictionary of basis vectors. Within this context, one of our main contributions to the field of SBL is a hierarchical representation...... analysis of the complex prior representation, where we show that the ability to induce sparse estimates of a given prior heavily depends on the inference method used and, interestingly, whether real or complex variables are inferred. We also show that the Bayesian estimators derived from the proposed...

  20. EI: A Program for Ecological Inference

    Directory of Open Access Journals (Sweden)

    Gary King

    2004-09-01

    Full Text Available The program EI provides a method of inferring individual behavior from aggregate data. It implements the statistical procedures, diagnostics, and graphics from the book A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data (King 1997. Ecological inference, as traditionally defined, is the process of using aggregate (i.e., "ecological" data to infer discrete individual-level relationships of interest when individual-level data are not available. Ecological inferences are required in political science research when individual-level surveys are unavailable (e.g., local or comparative electoral politics, unreliable (racial politics, insufficient (political geography, or infeasible (political history. They are also required in numerous areas of ma jor significance in public policy (e.g., for applying the Voting Rights Act and other academic disciplines ranging from epidemiology and marketing to sociology and quantitative history.

  1. One motif to bind them: A small-XXX-small motif affects transmembrane domain 1 oligomerization, function, localization, and cross-talk between two yeast GPCRs.

    Science.gov (United States)

    Lock, Antonia; Forfar, Rachel; Weston, Cathryn; Bowsher, Leo; Upton, Graham J G; Reynolds, Christopher A; Ladds, Graham; Dixon, Ann M

    2014-12-01

    G protein-coupled receptors (GPCRs) are the largest family of cell-surface receptors in mammals and facilitate a range of physiological responses triggered by a variety of ligands. GPCRs were thought to function as monomers, however it is now accepted that GPCR homo- and hetero-oligomers also exist and influence receptor properties. The Schizosaccharomyces pombe GPCR Mam2 is a pheromone-sensing receptor involved in mating and has previously been shown to form oligomers in vivo. The first transmembrane domain (TMD) of Mam2 contains a small-XXX-small motif, overrepresented in membrane proteins and well-known for promoting helix-helix interactions. An ortholog of Mam2 in Saccharomyces cerevisiae, Ste2, contains an analogous small-XXX-small motif which has been shown to contribute to receptor homo-oligomerization, localization and function. Here we have used experimental and computational techniques to characterize the role of the small-XXX-small motif in function and assembly of Mam2 for the first time. We find that disruption of the motif via mutagenesis leads to reduction of Mam2 TMD1 homo-oligomerization and pheromone-responsive cellular signaling of the full-length protein. It also impairs correct targeting to the plasma membrane. Mutation of the analogous motif in Ste2 yielded similar results, suggesting a conserved mechanism for assembly. Using co-expression of the two fungal receptors in conjunction with computational models, we demonstrate a functional change in G protein specificity and propose that this is brought about through hetero-dimeric interactions of Mam2 with Ste2 via the complementary small-XXX-small motifs. This highlights the potential of these motifs to affect a range of properties that can be investigated in other GPCRs. Copyright © 2014. Published by Elsevier B.V.

  2. Probabilistic Inference in General Graphical Models through Sampling in Stochastic Networks of Spiking Neurons

    Science.gov (United States)

    Pecevski, Dejan; Buesing, Lars; Maass, Wolfgang

    2011-01-01

    An important open problem of computational neuroscience is the generic organization of computations in networks of neurons in the brain. We show here through rigorous theoretical analysis that inherent stochastic features of spiking neurons, in combination with simple nonlinear computational operations in specific network motifs and dendritic arbors, enable networks of spiking neurons to carry out probabilistic inference through sampling in general graphical models. In particular, it enables them to carry out probabilistic inference in Bayesian networks with converging arrows (“explaining away”) and with undirected loops, that occur in many real-world tasks. Ubiquitous stochastic features of networks of spiking neurons, such as trial-to-trial variability and spontaneous activity, are necessary ingredients of the underlying computational organization. We demonstrate through computer simulations that this approach can be scaled up to neural emulations of probabilistic inference in fairly large graphical models, yielding some of the most complex computations that have been carried out so far in networks of spiking neurons. PMID:22219717

  3. Probabilistic inference in general graphical models through sampling in stochastic networks of spiking neurons.

    Directory of Open Access Journals (Sweden)

    Dejan Pecevski

    2011-12-01

    Full Text Available An important open problem of computational neuroscience is the generic organization of computations in networks of neurons in the brain. We show here through rigorous theoretical analysis that inherent stochastic features of spiking neurons, in combination with simple nonlinear computational operations in specific network motifs and dendritic arbors, enable networks of spiking neurons to carry out probabilistic inference through sampling in general graphical models. In particular, it enables them to carry out probabilistic inference in Bayesian networks with converging arrows ("explaining away" and with undirected loops, that occur in many real-world tasks. Ubiquitous stochastic features of networks of spiking neurons, such as trial-to-trial variability and spontaneous activity, are necessary ingredients of the underlying computational organization. We demonstrate through computer simulations that this approach can be scaled up to neural emulations of probabilistic inference in fairly large graphical models, yielding some of the most complex computations that have been carried out so far in networks of spiking neurons.

  4. Distinct cis regulatory elements govern the expression of TAG1 in embryonic sensory ganglia and spinal cord.

    Directory of Open Access Journals (Sweden)

    Yoav Hadas

    Full Text Available Cell fate commitment of spinal progenitor neurons is initiated by long-range, midline-derived, morphogens that regulate an array of transcription factors that, in turn, act sequentially or in parallel to control neuronal differentiation. Included among these are transcription factors that regulate the expression of receptors for guidance cues, thereby determining axonal trajectories. The Ig/FNIII superfamily molecules TAG1/Axonin1/CNTN2 (TAG1 and Neurofascin (Nfasc are co-expressed in numerous neuronal cell types in the CNS and PNS - for example motor, DRG and interneurons - both promote neurite outgrowth and both are required for the architecture and function of nodes of Ranvier. The genes encoding TAG1 and Nfasc are adjacent in the genome, an arrangement which is evolutionarily conserved. To study the transcriptional network that governs TAG1 and Nfasc expression in spinal motor and commissural neurons, we set out to identify cis elements that regulate their expression. Two evolutionarily conserved DNA modules, one located between the Nfasc and TAG1 genes and the second directly 5' to the first exon and encompassing the first intron of TAG1, were identified that direct complementary expression to the CNS and PNS, respectively, of the embryonic hindbrain and spinal cord. Sequential deletions and point mutations of the CNS enhancer element revealed a 130bp element containing three conserved E-boxes required for motor neuron expression. In combination, these two elements appear to recapitulate a major part of the pattern of TAG1 expression in the embryonic nervous system.

  5. Changes in cis-regulatory elements of a key floral regulator are associated with divergence of inflorescence architectures

    NARCIS (Netherlands)

    Kusters, E.; Della Pina, S.; Castel, R.; Souer, E.; Koes, R.

    2015-01-01

    Higher plant species diverged extensively with regard to the moment (flowering time) and position (inflorescence architecture) at which flowers are formed. This seems largely caused by variation in the expression patterns of conserved genes that specify floral meristem identity (FMI), rather than

  6. Changes in cis-regulatory elements of a key floral regulator are associated with divergence of inflorescence architectures.

    NARCIS (Netherlands)

    Kusters, E.; Della Pina, S.; Castel, R.; Souer, E.J.; Koes, R.E.

    2015-01-01

    Higher plant species diverged extensively with regard to the moment (flowering time) and position (inflorescence architecture) at which flowers are formed. This seems largely caused by variation in the expression patterns of conserved genes that specify floral meristem identity (FMI), rather than

  7. Cis-regulatory control of the nuclear receptor Coup-TF gene in the sea urchin Paracentrotus lividus embryo.

    Directory of Open Access Journals (Sweden)

    Lamprini G Kalampoki

    Full Text Available Coup-TF, an orphan member of the nuclear receptor super family, has a fundamental role in the development of metazoan embryos. The study of the gene's regulatory circuit in the sea urchin embryo will facilitate the placement of this transcription factor in the well-studied embryonic Gene Regulatory Network (GRN. The Paracentrotus lividus Coup-TF gene (PlCoup-TF is expressed throughout embryonic development preferentially in the oral ectoderm of the gastrula and the ciliary band of the pluteus stage. Two overlapping λ genomic clones, containing three exons and upstream sequences of PlCoup-TF, were isolated from a genomic library. The transcription initiation site was determined and 5' deletions and individual segments of a 1930 bp upstream region were placed ahead of a GFP reporter cassette and injected into fertilized P.lividus eggs. Module a (-532 to -232, was necessary and sufficient to confer ciliary band expression to the reporter. Comparison of P.lividus and Strongylocentrotus purpuratus upstream Coup-TF sequences, revealed considerable conservation, but none within module a. 5' and internal deletions into module a, defined a smaller region that confers ciliary band specific expression. Putative regulatory cis-acting elements (RE1, RE2 and RE3 within module a, were specifically bound by proteins in sea urchin embryonic nuclear extracts. Site-specific mutagenesis of these elements resulted in loss of reporter activity (RE1 or ectopic expression (RE2, RE3. It is proposed that sea urchin transcription factors, which bind these three regulatory sites, are necessary for spatial and quantitative regulation of the PlCoup-TF gene at pluteus stage sea urchin embryos. These findings lead to the future identification of these factors and to the hierarchical positioning of PlCoup-TF within the embryonic GRN.

  8. Cis-regulatory PLETHORA promoter elements directing root and nodule expression are conserved between Arabidopsis thaliana and Medicago truncatula

    NARCIS (Netherlands)

    Franssen, H.G.J.M.; Kulikova, O.; Willemsen, V.A.; Heidstra, R.

    2017-01-01

    Nodules are unique organs formed on roots of legumes by soil-borne bacteria, collectively known as rhizobium. Recently, we have shown that orthologs of the AINTEGUMENTA-like (AIL) AP2 transcription factors PLETHORA (PLT) 1 to 4, that redundantly regulate Arabidopsis thaliana root development are

  9. Identification of sparsely distributed clusters of cis-regulatory elements in sets of co-expressed genes

    OpenAIRE

    Kreiman, Gabriel

    2004-01-01

    Sequence information and high‐throughput methods to measure gene expression levels open the door to explore transcriptional regulation using computational tools. Combinatorial regulation and sparseness of regulatory elements throughout the genome allow organisms to control the spatial and temporal patterns of gene expression. Here we study the organization of cis‐regulatory elements in sets of co‐regulated genes. We build an algorithm to search for combinations of transcription factor binding...

  10. Perkembangan Motif Sineas Film Indie dalam Menghadapi Industri Film Mainstream

    Directory of Open Access Journals (Sweden)

    Yoppy Ardiyono

    2016-03-01

    Full Text Available The research aims to review to review determine the effect and its impact raised by motive - a motive the ada in the hearts period travel time history of film short against cinematographer-filmmaker as principal especially filmmakers left path (indie. The used platform theory research hearts singer adopts from theory commodification media vincent mosco. Singer helped shift theory understanding the motive filmmakers working hearts differences fundamental basis of political pressure economic happens under with demands regime. The method used is descriptive qualitative research methods. Data collection techniques through observation of the environment of an independent film live and in-depth interviews with speakers including mr. Yang prayer orangutan direct contact 'with realm of research. Coupled with study to review the literature references adding insight research. And that was concluded change appears motif among indie film cinematographer it is true the situation is closely linked to the mainstream industry, konstilasi politics, and the orientation of capitalism. Necessary their one thing is clear and systematic regulation from the government to the future movement of currents sidestream (indie more with good operates professionally arranged, the air so that the contribution of indie cinema film land for progress can feels good to yourself indie filmmakers as well as those of its main industries.

  11. Crammed signaling motifs in the T-cell receptor.

    Science.gov (United States)

    Borroto, Aldo; Abia, David; Alarcón, Balbino

    2014-09-01

    Although the T cell antigen receptor (TCR) is long known to contain multiple signaling subunits (CD3γ, CD3δ, CD3ɛ and CD3ζ), their role in signal transduction is still not well understood. The presence of at least one immunoreceptor tyrosine-based activation motif (ITAM) in each CD3 subunit has led to the idea that the multiplication of such elements essentially serves to amplify signals. However, the evolutionary conservation of non-ITAM sequences suggests that each CD3 subunit is likely to have specific non-redundant roles at some stage of development or in mature T cell function. The CD3ɛ subunit is paradigmatic because in a relatively short cytoplasmic sequence (∼55 amino acids) it contains several docking sites for proteins involved in intracellular trafficking and signaling, proteins whose relevance in T cell activation is slowly starting to be revealed. In this review we will summarize our current knowledge on the signaling effectors that bind directly to the TCR and we will propose a hierarchy in their response to TCR triggering. Copyright © 2014 Elsevier B.V. All rights reserved.

  12. DNA mutation motifs in the genes associated with inherited diseases.

    Directory of Open Access Journals (Sweden)

    Michal Růžička

    Full Text Available Mutations in human genes can be responsible for inherited genetic disorders and cancer. Mutations can arise due to environmental factors or spontaneously. It has been shown that certain DNA sequences are more prone to mutate. These sites are termed hotspots and exhibit a higher mutation frequency than expected by chance. In contrast, DNA sequences with lower mutation frequencies than expected by chance are termed coldspots. Mutation hotspots are usually derived from a mutation spectrum, which reflects particular population where an effect of a common ancestor plays a role. To detect coldspots/hotspots unaffected by population bias, we analysed the presence of germline mutations obtained from HGMD database in the 5-nucleotide segments repeatedly occurring in genes associated with common inherited disorders, in particular, the PAH, LDLR, CFTR, F8, and F9 genes. Statistically significant sequences (mutational motifs rarely associated with mutations (coldspots and frequently associated with mutations (hotspots exhibited characteristic sequence patterns, e.g. coldspots contained purine tract while hotspots showed alternating purine-pyrimidine bases, often with the presence of CpG dinucleotide. Using molecular dynamics simulations and free energy calculations, we analysed the global bending properties of two selected coldspots and two hotspots with a G/T mismatch. We observed that the coldspots were inherently more flexible than the hotspots. We assume that this property might be critical for effective mismatch repair as DNA with a mutation recognized by MutSα protein is noticeably bent.

  13. ARCHETYPES AND MYTHOLOGICAL MOTIFS: JOHN UPDIKE’S LEGACY REVISITED

    Directory of Open Access Journals (Sweden)

    Loreta Ulvydienė

    2018-04-01

    Full Text Available John Updike is widely considered to be one of the greatest, one of the most popular and sometimes most controversial writers concerned with the American small town and middle-class materialism. A lot of literary critics and researchers observe that Updike’s finest work came from his exploration of ordinary America and from his use of elegant prose, rich with metaphor, to portray the public and private feelings of Americans, their daily rounds of life. In addition, discussing Updike’s individual works a lot of literary critics and researchers have observed the writer’s attempts to re-write myth in “the mythical age”1 of the twentieth century. Naturally enough, as the return to myth is assumed to be a certain feature of the Modernist movement, half a century later since Updike’s famous novel Centaur was penned, it is indispensable to re-examine the writer’s fictional intentions in the usage of myth. More importantly, it is needful to determine whether we can see the mythic elements and realistic details as a continuum or as the contrasted opposites in his so called “historical chronicles”. Updike’s novels and stories are filled with mythological motifs and character archetypes. Thus, the study aims at revisiting John Updike’s creation considering mythological elements and archetypal images of his heroes alongside with heroic masculinity, war, terrorism and American perfectionism.

  14. Dystroglycan versatility in cell adhesion: a tale of multiple motifs

    Directory of Open Access Journals (Sweden)

    Winder Steve J

    2010-02-01

    Full Text Available Abstract Dystroglycan is a ubiquitously expressed heterodimeric adhesion receptor. The extracellular α-subunit makes connections with a number of laminin G domain ligands including laminins, agrin and perlecan in the extracellular matrix and the transmembrane β-subunit makes connections to the actin filament network via cytoskeletal linkers including dystrophin, utrophin, ezrin and plectin, depending on context. Originally discovered as part of the dystrophin glycoprotein complex of skeletal muscle, dystroglycan is an important adhesion molecule and signalling scaffold in a multitude of cell types and tissues and is involved in several diseases. Dystroglycan has emerged as a multifunctional adhesion platform with many interacting partners associating with its short unstructured cytoplasmic domain. Two particular hotspots are the cytoplasmic juxtamembrane region and at the very carboxy terminus of dystroglycan. Regions which between them have several overlapping functions: in the juxtamembrane region; a nuclear localisation signal, ezrin/radixin/moesin protein, rapsyn and ERK MAP Kinase binding function, and at the C terminus a regulatory tyrosine governing WW, SH2 and SH3 domain interactions. We will discuss the binding partners for these motifs and how their interactions and regulation can modulate the involvement of dystroglycan in a range of different adhesion structures and functions depending on context. Thus dystroglycan presents as a multifunctional scaffold involved in adhesion and adhesion-mediated signalling with its functions under exquisite spatio-temporal regulation.

  15. Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.

    Science.gov (United States)

    Vishnevsky, Oleg V; Bocharnikov, Andrey V; Kolchanov, Nikolay A

    2018-02-01

    The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.

  16. A Novel Protein Interaction between Nucleotide Binding Domain of Hsp70 and p53 Motif

    Directory of Open Access Journals (Sweden)

    Asita Elengoe

    2015-01-01

    Full Text Available Currently, protein interaction of Homo sapiens nucleotide binding domain (NBD of heat shock 70 kDa protein (PDB: 1HJO with p53 motif remains to be elucidated. The NBD-p53 motif complex enhances the p53 stabilization, thereby increasing the tumor suppression activity in cancer treatment. Therefore, we identified the interaction between NBD and p53 using STRING version 9.1 program. Then, we modeled the three-dimensional structure of p53 motif through homology modeling and determined the binding affinity and stability of NBD-p53 motif complex structure via molecular docking and dynamics (MD simulation. Human DNA binding domain of p53 motif (SCMGGMNR retrieved from UniProt (UniProtKB: P04637 was docked with the NBD protein, using the Autodock version 4.2 program. The binding energy and intermolecular energy for the NBD-p53 motif complex were −0.44 Kcal/mol and −9.90 Kcal/mol, respectively. Moreover, RMSD, RMSF, hydrogen bonds, salt bridge, and secondary structure analyses revealed that the NBD protein had a strong bond with p53 motif and the protein-ligand complex was stable. Thus, the current data would be highly encouraging for designing Hsp70 structure based drug in cancer therapy.

  17. Conserved binding of GCAC motifs by MEC-8, couch potato, and the RBPMS protein family

    Science.gov (United States)

    Soufari, Heddy

    2017-01-01

    Precise regulation of mRNA processing, translation, localization, and stability relies on specific interactions with RNA-binding proteins whose biological function and target preference are dictated by their preferred RNA motifs. The RBPMS family of RNA-binding proteins is defined by a conserved RNA recognition motif (RRM) domain found in metazoan RBPMS/Hermes and RBPMS2, Drosophila couch potato, and MEC-8 from Caenorhabditis elegans. In order to determine the parameters of RNA sequence recognition by the RBPMS family, we have first used the N-terminal domain from MEC-8 in binding assays and have demonstrated a preference for two GCAC motifs optimally separated by >6 nucleotides (nt). We have also determined the crystal structure of the dimeric N-terminal RRM domain from MEC-8 in the unbound form, and in complex with an oligonucleotide harboring two copies of the optimal GCAC motif. The atomic details reveal the molecular network that provides specificity to all four bases in the motif, including multiple hydrogen bonds to the initial guanine. Further studies with human RBPMS, as well as Drosophila couch potato, confirm a general preference for this double GCAC motif by other members of the protein family and the presence of this motif in known targets. PMID:28003515

  18. Distance-dependent duplex DNA destabilization proximal to G-quadruplex/i-motif sequences

    Science.gov (United States)

    König, Sebastian L. B.; Huppert, Julian L.; Sigel, Roland K. O.; Evans, Amanda C.

    2013-01-01

    G-quadruplexes and i-motifs are complementary examples of non-canonical nucleic acid substructure conformations. G-quadruplex thermodynamic stability has been extensively studied for a variety of base sequences, but the degree of duplex destabilization that adjacent quadruplex structure formation can cause has yet to be fully addressed. Stable in vivo formation of these alternative nucleic acid structures is likely to be highly dependent on whether sufficient spacing exists between neighbouring duplex- and quadruplex-/i-motif-forming regions to accommodate quadruplexes or i-motifs without disrupting duplex stability. Prediction of putative G-quadruplex-forming regions is likely to be assisted by further understanding of what distance (number of base pairs) is required for duplexes to remain stable as quadruplexes or i-motifs form. Using oligonucleotide constructs derived from precedented G-quadruplexes and i-motif-forming bcl-2 P1 promoter region, initial biophysical stability studies indicate that the formation of G-quadruplex and i-motif conformations do destabilize proximal duplex regions. The undermining effect that quadruplex formation can have on duplex stability is mitigated with increased distance from the duplex region: a spacing of five base pairs or more is sufficient to maintain duplex stability proximal to predicted quadruplex/i-motif-forming regions. PMID:23771141

  19. MODA: an efficient algorithm for network motif discovery in biological networks.

    Science.gov (United States)

    Omidi, Saeed; Schreiber, Falk; Masoudi-Nejad, Ali

    2009-10-01

    In recent years, interest has been growing in the study of complex networks. Since Erdös and Rényi (1960) proposed their random graph model about 50 years ago, many researchers have investigated and shaped this field. Many indicators have been proposed to assess the global features of networks. Recently, an active research area has developed in studying local features named motifs as the building blocks of networks. Unfortunately, network motif discovery is a computationally hard problem and finding rather large motifs (larger than 8 nodes) by means of current algorithms is impractical as it demands too much computational effort. In this paper, we present a new algorithm (MODA) that incorporates techniques such as a pattern growth approach for extracting larger motifs efficiently. We have tested our algorithm and found it able to identify larger motifs with more than 8 nodes more efficiently than most of the current state-of-the-art motif discovery algorithms. While most of the algorithms rely on induced subgraphs as motifs of the networks, MODA is able to extract both induced and non-induced subgraphs simultaneously. The MODA source code is freely available at: http://LBB.ut.ac.ir/Download/LBBsoft/MODA/

  20. Comparative analysis of evolutionarily conserved motifs of epidermal growth factor receptor 2 (HER2) predicts novel potential therapeutic epitopes

    DEFF Research Database (Denmark)

    Deng, Xiaohong; Zheng, Xuxu; Yang, Huanming

    2014-01-01

    druggable epitopes/targets. We employed the PROSITE Scan to detect structurally conserved motifs and PRINTS to search for linearly conserved motifs of ECD HER2. We found that the epitopes recognized by trastuzumab and pertuzumab are located in the predicted conserved motifs of ECD HER2, supporting our...

  1. Identification of a putative nuclear export signal motif in human NANOG homeobox domain

    International Nuclear Information System (INIS)

    Park, Sung-Won; Do, Hyun-Jin; Huh, Sun-Hyung; Sung, Boreum; Uhm, Sang-Jun; Song, Hyuk; Kim, Nam-Hyung; Kim, Jae-Hwan

    2012-01-01

    Highlights: ► We found the putative nuclear export signal motif within human NANOG homeodomain. ► Leucine-rich residues are important for human NANOG homeodomain nuclear export. ► CRM1-specific inhibitor LMB blocked the potent human NANOG NES-mediated nuclear export. -- Abstract: NANOG is a homeobox-containing transcription factor that plays an important role in pluripotent stem cells and tumorigenic cells. To understand how nuclear localization of human NANOG is regulated, the NANOG sequence was examined and a leucine-rich nuclear export signal (NES) motif ( 125 MQELSNILNL 134 ) was found in the homeodomain (HD). To functionally validate the putative NES motif, deletion and site-directed mutants were fused to an EGFP expression vector and transfected into COS-7 cells, and the localization of the proteins was examined. While hNANOG HD exclusively localized to the nucleus, a mutant with both NLSs deleted and only the putative NES motif contained (hNANOG HD-ΔNLSs) was predominantly cytoplasmic, as observed by nucleo/cytoplasmic fractionation and Western blot analysis as well as confocal microscopy. Furthermore, site-directed mutagenesis of the putative NES motif in a partial hNANOG HD only containing either one of the two NLS motifs led to localization in the nucleus, suggesting that the NES motif may play a functional role in nuclear export. Furthermore, CRM1-specific nuclear export inhibitor LMB blocked the hNANOG potent NES-mediated export, suggesting that the leucine-rich motif may function in CRM1-mediated nuclear export of hNANOG. Collectively, a NES motif is present in the hNANOG HD and may be functionally involved in CRM1-mediated nuclear export pathway.

  2. Role of the ERC motif in the proximal part of the second intracellular loop and the C-terminal domain of the human prostaglandin F2alpha receptor (hFP-R) in G-protein coupling control.

    Science.gov (United States)

    Pathe-Neuschäfer-Rube, Andrea; Neuschäfer-Rube, Frank; Püschel, Gerhard P

    2005-05-15

    The human FP-R (F2alpha prostaglandin receptor) is a Gq-coupled heptahelical ectoreceptor, which is of significant medical interest, since it is a potential target for the treatment of glaucoma and preterm labour. On agonist exposure, it mediates an increase in intracellular inositol phosphate formation. Little is known about the structures that govern the agonist-dependent receptor activation. In other prostanoid receptors, the C-terminal domain has been inferred in the control of agonist-dependent receptor activation. A DRY motif at the beginning of the second intracellular loop is highly conserved throughout the G-protein-coupled receptor family and appears to be crucial for controlling agonist-dependent receptor activation. It is replaced by an ERC motif in the FP-R and no evidence for the relevance of this motif in ligand-dependent activation of prostanoid receptors has been provided so far. The aim of the present study was to elucidate the potential role of the C-terminal domain and the ERC motif in agonist-controlled intracellular signalling in FP-R mutants generated by site-directed mutagenesis. It was found that substitution of the acidic Glu(132) in the ERC motif by a threonine residue led to full constitutive activation, whereas truncation of the receptor's C-terminal domain led to partial constitutive activation of all three intracellular signal pathways that had previously been shown to be activated by the FP-R, i.e. inositol trisphosphate formation, focal adhesion kinase activation and T-cell factor signalling. Inositol trisphosphate formation and focal adhesion kinase phosphorylation were further enhanced by ligand binding in cells expressing the truncation mutant but not the E132T (Glu132-->Thr) mutant. Thus C-terminal truncation appeared to result in a receptor with partial constitutive activation, whereas substitution of Glu132 by threonine apparently resulted in a receptor with full constitutive activity.

  3. Statistical inference an integrated Bayesianlikelihood approach

    CERN Document Server

    Aitkin, Murray

    2010-01-01

    Filling a gap in current Bayesian theory, Statistical Inference: An Integrated Bayesian/Likelihood Approach presents a unified Bayesian treatment of parameter inference and model comparisons that can be used with simple diffuse prior specifications. This novel approach provides new solutions to difficult model comparison problems and offers direct Bayesian counterparts of frequentist t-tests and other standard statistical methods for hypothesis testing.After an overview of the competing theories of statistical inference, the book introduces the Bayes/likelihood approach used throughout. It pre

  4. Salt-bridge Swapping in the EXXERFXYY Motif of Proton Coupled Oligopeptide Transporters

    DEFF Research Database (Denmark)

    Aduri, Nanda G; Prabhala, Bala K; Ernst, Heidi A

    2015-01-01

    to as E1XXE2R), located on Helix I, in interactions with the proton. In this study we investigated the intracellular substrate accumulation by motif variants with all possible combinations of glutamate residues changed to glutamine and arginine changed to a tyrosine; the latter being a natural variant......-motif salt bridge, i.e. R-E2 to R-E1, which is consistent with previous structural studies. Molecular dynamics simulations of the motif variants E1XXE2R and E1XXQ2R support this mechanism. The simulations showed that upon changing conformation, arginine pushes Helix V, through interactions with the highly...

  5. Stochastic Resonance in Neuronal Network Motifs with Ornstein-Uhlenbeck Colored Noise

    Directory of Open Access Journals (Sweden)

    Xuyang Lou

    2014-01-01

    Full Text Available We consider here the effect of the Ornstein-Uhlenbeck colored noise on the stochastic resonance of the feed-forward-loop (FFL network motif. The FFL motif is modeled through the FitzHugh-Nagumo neuron model as well as the chemical coupling. Our results show that the noise intensity and the correlation time of the noise process serve as the control parameters, which have great impacts on the stochastic dynamics of the FFL motif. We find that, with a proper choice of noise intensities and the correlation time of the noise process, the signal-to-noise ratio (SNR can display more than one peak.

  6. Identification of putative regulatory motifs in the upstream regions of co-expressed functional groups of genes in Plasmodium falciparum

    Directory of Open Access Journals (Sweden)

    Joshi NV

    2009-01-01

    Full Text Available Abstract Background Regulation of gene expression in Plasmodium falciparum (Pf remains poorly understood. While over half the genes are estimated to be regulated at the transcriptional level, few regulatory motifs and transcription regulators have been found. Results The study seeks to identify putative regulatory motifs in the upstream regions of 13 functional groups of genes expressed in the intraerythrocytic developmental cycle of Pf. Three motif-discovery programs were used for the purpose, and motifs were searched for only on the gene coding strand. Four motifs – the 'G-rich', the 'C-rich', the 'TGTG' and the 'CACA' motifs – were identified, and zero to all four of these occur in the 13 sets of upstream regions. The 'CACA motif' was absent in functional groups expressed during the ring to early trophozoite transition. For functional groups expressed in each transition, the motifs tended to be similar. Upstream motifs in some functional groups showed 'positional conservation' by occurring at similar positions relative to the translational start site (TLS; this increases their significance as regulatory motifs. In the ribonucleotide synthesis, mitochondrial, proteasome and organellar translation machinery genes, G-rich, C-rich, CACA and TGTG motifs, respectively, occur with striking positional conservation. In the organellar translation machinery group, G-rich motifs occur close to the TLS. The same motifs were sometimes identified for multiple functional groups; differences in location and abundance of the motifs appear to ensure different modes of action. Conclusion The identification of positionally conserved over-represented upstream motifs throws light on putative regulatory elements for transcription in Pf.

  7. A structural study for the optimisation of functional motifs encoded in protein sequences

    Directory of Open Access Journals (Sweden)

    Helmer-Citterich Manuela

    2004-04-01

    Full Text Available Abstract Background A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure. Results Here we present a new procedure aimed at improving the sensitivity and/ or specificity of poorly-performing patterns. The procedure can be summarised as follows: 1. residues structurally conserved in different proteins, that are true positives for a pattern, are identified by means of a computational technique and by visual inspection. 2. the sequence positions of the structurally conserved residues falling outside the pattern are used to build extended sequence patterns. 3. the extended patterns are optimised on the SWISS-PROT database for their sensitivity and specificity. The method was applied to eight PROSITE patterns. Whenever structurally conserved residues are found in the surface region close to the pattern (seven out of eight cases, the addition of information inferred from structural analysis is shown to improve pattern selectivity and in some cases selectivity and sensitivity as well. In some of the cases considered the procedure allowed the identification of functionally interesting residues, whose biological role is also discussed. Conclusion Our method can be applied to any type of functional motif or pattern (not only PROSITE ones which is not able to select all and only the true positive hits and for which at least two true positive structures are available. The computational technique for the identification of

  8. Inferring Domain Plans in Question-Answering

    National Research Council Canada - National Science Library

    Pollack, Martha E

    1986-01-01

    The importance of plan inference in models of conversation has been widely noted in the computational-linguistics literature, and its incorporation in question-answering systems has enabled a range...

  9. Scalable inference for stochastic block models

    KAUST Repository

    Peng, Chengbin; Zhang, Zhihua; Wong, Ka-Chun; Zhang, Xiangliang; Keyes, David E.

    2017-01-01

    Community detection in graphs is widely used in social and biological networks, and the stochastic block model is a powerful probabilistic tool for describing graphs with community structures. However, in the era of "big data," traditional inference

  10. Efficient algorithms for conditional independence inference

    Czech Academy of Sciences Publication Activity Database

    Bouckaert, R.; Hemmecke, R.; Lindner, S.; Studený, Milan

    2010-01-01

    Roč. 11, č. 1 (2010), s. 3453-3479 ISSN 1532-4435 R&D Projects: GA ČR GA201/08/0539; GA MŠk 1M0572 Institutional research plan: CEZ:AV0Z10750506 Keywords : conditional independence inference * linear programming approach Subject RIV: BA - General Mathematics Impact factor: 2.949, year: 2010 http://library.utia.cas.cz/separaty/2010/MTR/studeny-efficient algorithms for conditional independence inference.pdf

  11. An intracellular motif of GLUT4 regulates fusion of GLUT4-containing vesicles.

    Science.gov (United States)

    Heyward, Catherine A; Pettitt, Trevor R; Leney, Sophie E; Welsh, Gavin I; Tavaré, Jeremy M; Wakelam, Michael J O

    2008-05-20

    Insulin stimulates glucose uptake by adipocytes through increasing translocation of the glucose transporter GLUT4 from an intracellular compartment to the plasma membrane. Fusion of GLUT4-containing vesicles at the cell surface is thought to involve phospholipase D activity, generating the signalling lipid phosphatidic acid, although the mechanism of action is not yet clear. Here we report the identification of a putative phosphatidic acid-binding motif in a GLUT4 intracellular loop. Mutation of this motif causes a decrease in the insulin-induced exposure of GLUT4 at the cell surface of 3T3-L1 adipocytes via an effect on vesicle fusion. The potential phosphatidic acid-binding motif identified in this study is unique to GLUT4 among the sugar transporters, therefore this motif may provide a unique mechanism for regulating insulin-induced translocation by phospholipase D signalling.

  12. An intracellular motif of GLUT4 regulates fusion of GLUT4-containing vesicles

    Directory of Open Access Journals (Sweden)

    Welsh Gavin I

    2008-05-01

    Full Text Available Abstract Background Insulin stimulates glucose uptake by adipocytes through increasing translocation of the glucose transporter GLUT4 from an intracellular compartment to the plasma membrane. Fusion of GLUT4-containing vesicles at the cell surface is thought to involve phospholipase D activity, generating the signalling lipid phosphatidic acid, although the mechanism of action is not yet clear. Results Here we report the identification of a putative phosphatidic acid-binding motif in a GLUT4 intracellular loop. Mutation of this motif causes a decrease in the insulin-induced exposure of GLUT4 at the cell surface of 3T3-L1 adipocytes via an effect on vesicle fusion. Conclusion The potential phosphatidic acid-binding motif identified in this study is unique to GLUT4 among the sugar transporters, therefore this motif may provide a unique mechanism for regulating insulin-induced translocation by phospholipase D signalling.

  13. A Simple Decision Rule for Recognition of Poly(A) Tail Signal Motifs in Human Genome

    KAUST Repository

    AbouEisha, Hassan M.; Chikalov, Igor; Moshkov, Mikhail; Jankovic, Boris R.

    2015-01-01

    Background is the numerous attempts were made to predict motifs in genomic sequences that correspond to poly (A) tail signals. Vast portion of this effort has been directed to a plethora of nonlinear classification methods. Even when such approaches

  14. Lucky Motifs in Chinese Folk Art: Interpreting Paper-cut from Chinese Shaanxi

    Directory of Open Access Journals (Sweden)

    Xuxiao WANG

    2013-11-01

    Full Text Available Paper-cut is not simply a form of traditional Chinese folk art. Lucky motifs developed in paper-cut certainly acquired profound cultural connotations. As paper-cut is a time-honoured skill across the nation, interpreting those motifs requires cultural receptiveness and anthropological sensitivity. The author of this article analyzes examples of paper-cut from Northern Shaanxi, China, to identify the cohesive motifs and explore the auspiciousness of the specific concepts of Fu, Lu, Shou, Xi. The paper-cut of Northern Shaanxi is an ideal representative of the craft as a whole because of the relative stability of this region in history, in terms of both art and culture. Furthermore, its straightforward style provides a clear demonstration of motifs regarding folk understanding of expectations for life.

  15. Design of Fashion Accessories Using Akwa-Ocha Motifs and Symbols

    African Journals Online (AJOL)

    Nneka Umera-Okeke

    Nkpopu: holes. 16. Osikapa na ... accessories anchors in both social semiotics and archetypal theories. Social semiotics theory as ... the two earrings incorporate the Onwa (moon) motif in spherical shape and in black colour. They are held ...

  16. Exploiting publicly available biological and biochemical information for the discovery of novel short linear motifs.

    KAUST Repository

    Sayadi, Ahmed; Briganti, Leonardo; Tramontano, Anna; Via, Allegra

    2011-01-01

    The function of proteins is often mediated by short linear segments of their amino acid sequence, called Short Linear Motifs or SLiMs, the identification of which can provide important information about a protein function. However, the short length

  17. FTZ-Factor1 and Fushi tarazu interact via conserved nuclear receptor and coactivator motifs

    Science.gov (United States)

    Schwartz, Carol J.E.; Sampson, Heidi M.; Hlousek, Daniela; Percival-Smith, Anthony; Copeland, John W.R.; Simmonds, Andrew J.; Krause, Henry M.

    2001-01-01

    To activate transcription, most nuclear receptor proteins require coactivators that bind to their ligand-binding domains (LBDs). The Drosophila FTZ-Factor1 (FTZ-F1) protein is a conserved member of the nuclear receptor superfamily, but was previously thought to lack an AF2 motif, a motif that is required for ligand and coactivator binding. Here we show that FTZ-F1 does have an AF2 motif and that it is required to bind a coactivator, the homeodomain-containing protein Fushi tarazu (FTZ). We also show that FTZ contains an AF2-interacting nuclear receptor box, the first to be found in a homeodomain protein. Both interaction motifs are shown to be necessary for physical interactions in vitro and for functional interactions in developing embryos. These unexpected findings have important implications for the conserved homologs of the two proteins. PMID:11157757

  18. On the criticality of inferred models

    Science.gov (United States)

    Mastromatteo, Iacopo; Marsili, Matteo

    2011-10-01

    Advanced inference techniques allow one to reconstruct a pattern of interaction from high dimensional data sets, from probing simultaneously thousands of units of extended systems—such as cells, neural tissues and financial markets. We focus here on the statistical properties of inferred models and argue that inference procedures are likely to yield models which are close to singular values of parameters, akin to critical points in physics where phase transitions occur. These are points where the response of physical systems to external perturbations, as measured by the susceptibility, is very large and diverges in the limit of infinite size. We show that the reparameterization invariant metrics in the space of probability distributions of these models (the Fisher information) are directly related to the susceptibility of the inferred model. As a result, distinguishable models tend to accumulate close to critical points, where the susceptibility diverges in infinite systems. This region is the one where the estimate of inferred parameters is most stable. In order to illustrate these points, we discuss inference of interacting point processes with application to financial data and show that sensible choices of observation time scales naturally yield models which are close to criticality.

  19. On the criticality of inferred models

    International Nuclear Information System (INIS)

    Mastromatteo, Iacopo; Marsili, Matteo

    2011-01-01

    Advanced inference techniques allow one to reconstruct a pattern of interaction from high dimensional data sets, from probing simultaneously thousands of units of extended systems—such as cells, neural tissues and financial markets. We focus here on the statistical properties of inferred models and argue that inference procedures are likely to yield models which are close to singular values of parameters, akin to critical points in physics where phase transitions occur. These are points where the response of physical systems to external perturbations, as measured by the susceptibility, is very large and diverges in the limit of infinite size. We show that the reparameterization invariant metrics in the space of probability distributions of these models (the Fisher information) are directly related to the susceptibility of the inferred model. As a result, distinguishable models tend to accumulate close to critical points, where the susceptibility diverges in infinite systems. This region is the one where the estimate of inferred parameters is most stable. In order to illustrate these points, we discuss inference of interacting point processes with application to financial data and show that sensible choices of observation time scales naturally yield models which are close to criticality

  20. Polynomial Chaos Surrogates for Bayesian Inference

    KAUST Repository

    Le Maitre, Olivier

    2016-01-06

    The Bayesian inference is a popular probabilistic method to solve inverse problems, such as the identification of field parameter in a PDE model. The inference rely on the Bayes rule to update the prior density of the sought field, from observations, and derive its posterior distribution. In most cases the posterior distribution has no explicit form and has to be sampled, for instance using a Markov-Chain Monte Carlo method. In practice the prior field parameter is decomposed and truncated (e.g. by means of Karhunen- Lo´eve decomposition) to recast the inference problem into the inference of a finite number of coordinates. Although proved effective in many situations, the Bayesian inference as sketched above faces several difficulties requiring improvements. First, sampling the posterior can be a extremely costly task as it requires multiple resolutions of the PDE model for different values of the field parameter. Second, when the observations are not very much informative, the inferred parameter field can highly depends on its prior which can be somehow arbitrary. These issues have motivated the introduction of reduced modeling or surrogates for the (approximate) determination of the parametrized PDE solution and hyperparameters in the description of the prior field. Our contribution focuses on recent developments in these two directions: the acceleration of the posterior sampling by means of Polynomial Chaos expansions and the efficient treatment of parametrized covariance functions for the prior field. We also discuss the possibility of making such approach adaptive to further improve its efficiency.

  1. A Bayesian Network Schema for Lessening Database Inference

    National Research Council Canada - National Science Library

    Chang, LiWu; Moskowitz, Ira S

    2001-01-01

    .... The authors introduce a formal schema for database inference analysis, based upon a Bayesian network structure, which identifies critical parameters involved in the inference problem and represents...

  2. Connectivity in the yeast cell cycle transcription network: inferences from neural networks.

    Directory of Open Access Journals (Sweden)

    Christopher E Hart

    2006-12-01

    Full Text Available A current challenge is to develop computational approaches to infer gene network regulatory relationships based on multiple types of large-scale functional genomic data. We find that single-layer feed-forward artificial neural network (ANN models can effectively discover gene network structure by integrating global in vivo protein:DNA interaction data (ChIP/Array with genome-wide microarray RNA data. We test this on the yeast cell cycle transcription network, which is composed of several hundred genes with phase-specific RNA outputs. These ANNs were robust to noise in data and to a variety of perturbations. They reliably identified and ranked 10 of 12 known major cell cycle factors at the top of a set of 204, based on a sum-of-squared weights metric. Comparative analysis of motif occurrences among multiple yeast species independently confirmed relationships inferred from ANN weights analysis. ANN models can capitalize on properties of biological gene networks that other kinds of models do not. ANNs naturally take advantage of patterns of absence, as well as presence, of factor binding associated with specific expression output; they are easily subjected to in silico "mutation" to uncover biological redundancies; and they can use the full range of factor binding values. A prominent feature of cell cycle ANNs suggested an analogous property might exist in the biological network. This postulated that "network-local discrimination" occurs when regulatory connections (here between MBF and target genes are explicitly disfavored in one network module (G2, relative to others and to the class of genes outside the mitotic network. If correct, this predicts that MBF motifs will be significantly depleted from the discriminated class and that the discrimination will persist through evolution. Analysis of distantly related Schizosaccharomyces pombe confirmed this, suggesting that network-local discrimination is real and complements well-known enrichment of

  3. The Verrucomicrobia LexA-binding Motif: Insights into the Evolutionary Dynamics of the SOS Response

    Directory of Open Access Journals (Sweden)

    Ivan Erill

    2016-07-01

    Full Text Available The SOS response is the primary bacterial mechanism to address DNA damage, coordinating multiple cellular processes that include DNA repair, cell division and translesion synthesis. In contrast to other regulatory systems, the composition of the SOS genetic network and the binding motif of its transcriptional repressor, LexA, have been shown to vary greatly across bacterial clades, making it an ideal system to study the co-evolution of transcription factors and their regulons. Leveraging comparative genomics approaches and prior knowledge on the core SOS regulon, here we define the binding motif of the Verrucomicrobia, a recently described phylum of emerging interest due to its association with eukaryotic hosts. Site directed mutagenesis of the Verrucomicrobium spinosum recA promoter confirms that LexA binds a 14 bp palindromic motif with consensus sequence TGTTC-N4-GAACA. Computational analyses suggest that recognition of this novel motif is determined primarily by changes in base-contacting residues of the third alpha helix of the LexA helix-turn-helix DNA binding motif. In conjunction with comparative genomics analysis of the LexA regulon in the Verrucomicrobia phylum, electrophoretic shift assays reveal that LexA binds to operators in the promoter region of DNA repair genes and a mutagenesis cassette in this organism, and identify previously unreported components of the SOS response. The identification of tandem LexA-binding sites generating instances of other LexA-binding motifs in the lexA gene promoter of Verrucomicrobia species leads us to postulate a novel mechanism for LexA-binding motif evolution. This model, based on gene duplication, successfully addresses outstanding questions in the intricate co-evolution of the LexA protein, its binding motif and the regulatory network it controls.

  4. The Verrucomicrobia LexA-Binding Motif: Insights into the Evolutionary Dynamics of the SOS Response.

    Science.gov (United States)

    Erill, Ivan; Campoy, Susana; Kılıç, Sefa; Barbé, Jordi

    2016-01-01

    The SOS response is the primary bacterial mechanism to address DNA damage, coordinating multiple cellular processes that include DNA repair, cell division, and translesion synthesis. In contrast to other regulatory systems, the composition of the SOS genetic network and the binding motif of its transcriptional repressor, LexA, have been shown to vary greatly across bacterial clades, making it an ideal system to study the co-evolution of transcription factors and their regulons. Leveraging comparative genomics approaches and prior knowledge on the core SOS regulon, here we define the binding motif of the Verrucomicrobia, a recently described phylum of emerging interest due to its association with eukaryotic hosts. Site directed mutagenesis of the Verrucomicrobium spinosum recA promoter confirms that LexA binds a 14 bp palindromic motif with consensus sequence TGTTC-N4-GAACA. Computational analyses suggest that recognition of this novel motif is determined primarily by changes in base-contacting residues of the third alpha helix of the LexA helix-turn-helix DNA binding motif. In conjunction with comparative genomics analysis of the LexA regulon in the Verrucomicrobia phylum, electrophoretic shift assays reveal that LexA binds to operators in the promoter region of DNA repair genes and a mutagenesis cassette in this organism, and identify previously unreported components of the SOS response. The identification of tandem LexA-binding sites generating instances of other LexA-binding motifs in the lexA gene promoter of Verrucomicrobia species leads us to postulate a novel mechanism for LexA-binding motif evolution. This model, based on gene duplication, successfully addresses outstanding questions in the intricate co-evolution of the LexA protein, its binding motif and the regulatory network it controls.

  5. Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling

    OpenAIRE

    Song, Tao; Gu, Hong

    2014-01-01

    Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the sta...

  6. A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction.

    Science.gov (United States)

    Guo, Yuchun; Tian, Kevin; Zeng, Haoyang; Guo, Xiaoyun; Gifford, David Kenneth

    2018-04-13

    The representation and discovery of transcription factor (TF) sequence binding specificities is critical for understanding gene regulatory networks and interpreting the impact of disease-associated noncoding genetic variants. We present a novel TF binding motif representation, the k -mer set memory (KSM), which consists of a set of aligned k -mers that are overrepresented at TF binding sites, and a new method called KMAC for de novo discovery of KSMs. We find that KSMs more accurately predict in vivo binding sites than position weight matrix (PWM) models and other more complex motif models across a large set of ChIP-seq experiments. Furthermore, KSMs outperform PWMs and more complex motif models in predicting in vitro binding sites. KMAC also identifies correct motifs in more experiments than five state-of-the-art motif discovery methods. In addition, KSM-derived features outperform both PWM and deep learning model derived sequence features in predicting differential regulatory activities of expression quantitative trait loci (eQTL) alleles. Finally, we have applied KMAC to 1600 ENCODE TF ChIP-seq data sets and created a public resource of KSM and PWM motifs. We expect that the KSM representation and KMAC method will be valuable in characterizing TF binding specificities and in interpreting the effects of noncoding genetic variations. © 2018 Guo et al.; Published by Cold Spring Harbor Laboratory Press.

  7. Pipeline for the Analysis of ChIP-seq Data and New Motif Ranking Procedure

    KAUST Repository

    Ashoor, Haitham

    2011-06-01

    This thesis presents a computational methodology for ab-initio identification of transcription factor binding sites based on ChIP-seq data. This method consists of three main steps, namely ChIP-seq data processing, motif discovery and models selection. A novel method for ranking the models of motifs identified in this process is proposed. This method combines multiple factors in order to rank the provided candidate motifs. It combines the model coverage of the ChIP-seq fragments that contain motifs from which that model is built, the suitable background data made up of shuffled ChIP-seq fragments, and the p-value that resulted from evaluating the model on actual and background data. Two ChIP-seq datasets retrieved from ENCODE project are used to evaluate and demonstrate the ability of the method to predict correct TFBSs with high precision. The first dataset relates to neuron-restrictive silencer factor, NRSF, while the second one corresponds to growth-associated binding protein, GABP. The pipeline system shows high precision prediction for both datasets, as in both cases the top ranked motif closely resembles the known motifs for the respective transcription factors.

  8. Creation of Hybrid Nanorods From Sequences of Natural Trimeric Fibrous Proteins Using the Fibritin Trimerization Motif

    Science.gov (United States)

    Papanikolopoulou, Katerina; van Raaij, Mark J.; Mitraki, Anna

    Stable, artificial fibrous proteins that can be functionalized open new avenues in fields such as bionanomaterials design and fiber engineering. An important source of inspiration for the creation of such proteins are natural fibrous proteins such as collagen, elastin, insect silks, and fibers from phages and viruses. The fibrous parts of this last class of proteins usually adopt trimeric, β-stranded structural folds and are appended to globular, receptor-binding domains. It has been recently shown that the globular domains are essential for correct folding and trimerization and can be successfully substituted by a very small (27-amino acid) trimerization motif from phage T4 fibritin. The hybrid proteins are correctly folded nanorods that can withstand extreme conditions. When the fibrous part derives from the adenovirus fiber shaft, different tissue-targeting specificities can be engineered into the hybrid proteins, which therefore can be used as gene therapy vectors. The integration of such stable nanorods in devices is also a big challenge in the field of biomechanical design. The fibritin foldon domain is a versatile trimerization motif and can be combined with a variety of fibrous motifs, such as coiled-coil, collagenous, and triple β-stranded motifs, provided the appropriate linkers are used. The combination of different motifs within the same fibrous molecule to create stable rods with multiple functions can even be envisioned. We provide a comprehensive overview of the experimental procedures used for designing, creating, and characterizing hybrid fibrous nanorods using the fibritin trimerization motif.

  9. Motif formation and industry specific topologies in the Japanese business firm network

    Science.gov (United States)

    Maluck, Julian; Donner, Reik V.; Takayasu, Hideki; Takayasu, Misako

    2017-05-01

    Motifs and roles are basic quantities for the characterization of interactions among 3-node subsets in complex networks. In this work, we investigate how the distribution of 3-node motifs can be influenced by modifying the rules of an evolving network model while keeping the statistics of simpler network characteristics, such as the link density and the degree distribution, invariant. We exemplify this problem for the special case of the Japanese Business Firm Network, where a well-studied and relatively simple yet realistic evolving network model is available, and compare the resulting motif distribution in the real-world and simulated networks. To better approximate the motif distribution of the real-world network in the model, we introduce both subgraph dependent and global additional rules. We find that a specific rule that allows only for the merging process between nodes with similar link directionality patterns reduces the observed excess of densely connected motifs with bidirectional links. Our study improves the mechanistic understanding of motif formation in evolving network models to better describe the characteristic features of real-world networks with a scale-free topology.

  10. GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

    Science.gov (United States)

    Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

    2012-01-01

    Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/

  11. GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

    Directory of Open Access Journals (Sweden)

    Pooya Zandevakili

    Full Text Available Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/

  12. Short Arginine Motifs Drive Protein Stickiness in the Escherichia coli Cytoplasm.

    Science.gov (United States)

    Kyne, Ciara; Crowley, Peter B

    2017-09-19

    Although essential to numerous biotech applications, knowledge of molecular recognition by arginine-rich motifs in live cells remains limited. 1 H, 15 N HSQC and 19 F NMR spectroscopies were used to investigate the effects of C-terminal -GR n (n = 1-5) motifs on GB1 interactions in Escherichia coli cells and cell extracts. While the "biologically inert" GB1 yields high-quality in-cell spectra, the -GR n fusions with n = 4 or 5 were undetectable. This result suggests that a tetra-arginine motif is sufficient to drive interactions between a test protein and macromolecules in the E. coli cytoplasm. The inclusion of a 12 residue flexible linker between GB1 and the -GR 5 motif did not improve detection of the "inert" domain. In contrast, all of the constructs were detectable in cell lysates and extracts, suggesting that the arginine-mediated complexes were weak. Together these data reveal the significance of weak interactions between short arginine-rich motifs and the E. coli cytoplasm and demonstrate the potential of such motifs to modify protein interactions in living cells. These interactions must be considered in the design of (in vivo) nanoscale assemblies that rely on arginine-rich sequences.

  13. Discriminative motif discovery via simulated evolution and random under-sampling.

    Directory of Open Access Journals (Sweden)

    Tao Song

    Full Text Available Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.

  14. Discriminative motif discovery via simulated evolution and random under-sampling.

    Science.gov (United States)

    Song, Tao; Gu, Hong

    2014-01-01

    Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.

  15. Improved i-motif thermal stability by insertion of anthraquinone monomers

    DEFF Research Database (Denmark)

    Gouda, Alaa S; Amine, Mahasen S.; Pedersen, Erik Bjerregaard

    2017-01-01

    In order to gain insight into how to improve thermal stability of i-motifs when used in the context of biomedical and nanotechnological applications, novel anthraquinone-modified i-motifs were synthesized by insertion of 1,8-, 1,4-, 1,5- and 2,6-disubstituted anthraquinone monomers into the TAA...... loops of a 22mer cytosine-rich human telomeric DNA sequence. The influence of the four anthraquinone linkers on the i-motif thermal stability was investigated at 295 nm and pH 5.5. Anthraquinone monomers modulate the i-motif stability in a position-depending manner and the modulation also depends...... unlocked nucleic acid monomers or twisted intercalating nucleic acid. The 2,6-disubstituted anthraquinone linker replacing T10 enabled a significant increase of i-motif thermal melting by 8.2 °C. A substantial increase of 5.0 °C in i-motif thermal melting was recorded when both A6 and T16 were modified...

  16. Fast social-like learning of complex behaviors based on motor motifs

    Science.gov (United States)

    Calvo Tapia, Carlos; Tyukin, Ivan Y.; Makarov, Valeri A.

    2018-05-01

    Social learning is widely observed in many species. Less experienced agents copy successful behaviors exhibited by more experienced individuals. Nevertheless, the dynamical mechanisms behind this process remain largely unknown. Here we assume that a complex behavior can be decomposed into a sequence of n motor motifs. Then a neural network capable of activating motor motifs in a given sequence can drive an agent. To account for (n -1 )! possible sequences of motifs in a neural network, we employ the winnerless competition approach. We then consider a teacher-learner situation: one agent exhibits a complex movement, while another one aims at mimicking the teacher's behavior. Despite the huge variety of possible motif sequences we show that the learner, equipped with the provided learning model, can rewire "on the fly" its synaptic couplings in no more than (n -1 ) learning cycles and converge exponentially to the durations of the teacher's motifs. We validate the learning model on mobile robots. Experimental results show that the learner is indeed capable of copying the teacher's behavior composed of six motor motifs in a few learning cycles. The reported mechanism of learning is general and can be used for replicating different functions, including, for example, sound patterns or speech.

  17. Peptide-binding motifs of two common equine class I MHC molecules in Thoroughbred horses.

    Science.gov (United States)

    Bergmann, Tobias; Lindvall, Mikaela; Moore, Erin; Moore, Eugene; Sidney, John; Miller, Donald; Tallmadge, Rebecca L; Myers, Paisley T; Malaker, Stacy A; Shabanowitz, Jeffrey; Osterrieder, Nikolaus; Peters, Bjoern; Hunt, Donald F; Antczak, Douglas F; Sette, Alessandro

    2017-05-01

    Quantitative peptide-binding motifs of MHC class I alleles provide a valuable tool to efficiently identify putative T cell epitopes. Detailed information on equine MHC class I alleles is still very limited, and to date, only a single equine MHC class I allele, Eqca-1*00101 (ELA-A3 haplotype), has been characterized. The present study extends the number of characterized ELA class I specificities in two additional haplotypes found commonly in the Thoroughbred breed. Accordingly, we here report quantitative binding motifs for the ELA-A2 allele Eqca-16*00101 and the ELA-A9 allele Eqca-1*00201. Utilizing analyses of endogenously bound and eluted ligands and the screening of positional scanning combinatorial libraries, detailed and quantitative peptide-binding motifs were derived for both alleles. Eqca-16*00101 preferentially binds peptides with aliphatic/hydrophobic residues in position 2 and at the C-terminus, and Eqca-1*00201 has a preference for peptides with arginine in position 2 and hydrophobic/aliphatic residues at the C-terminus. Interestingly, the Eqca-16*00101 motif resembles that of the human HLA A02-supertype, while the Eqca-1*00201 motif resembles that of the HLA B27-supertype and two macaque class I alleles. It is expected that the identified motifs will facilitate the selection of candidate epitopes for the study of immune responses in horses.

  18. Structural and Functional Motifs in Influenza Virus RNAs

    Directory of Open Access Journals (Sweden)

    Damien Ferhadian

    2018-03-01

    have now been validated experimentally and their role in the viral life cycle demonstrated. This review aims to compile the structural motifs found in the different RNA classes (vRNA, cRNA, and vmRNA of influenza viruses and their function in the viral replication cycle.

  19. A formal model of interpersonal inference

    Directory of Open Access Journals (Sweden)

    Michael eMoutoussis

    2014-03-01

    Full Text Available Introduction: We propose that active Bayesian inference – a general framework for decision-making – can equally be applied to interpersonal exchanges. Social cognition, however, entails special challenges. We address these challenges through a novel formulation of a formal model and demonstrate its psychological significance. Method: We review relevant literature, especially with regards to interpersonal representations, formulate a mathematical model and present a simulation study. The model accommodates normative models from utility theory and places them within the broader setting of Bayesian inference. Crucially, we endow people's prior beliefs, into which utilities are absorbed, with preferences of self and others. The simulation illustrates the model's dynamics and furnishes elementary predictions of the theory. Results: 1. Because beliefs about self and others inform both the desirability and plausibility of outcomes, in this framework interpersonal representations become beliefs that have to be actively inferred. This inference, akin to 'mentalising' in the psychological literature, is based upon the outcomes of interpersonal exchanges. 2. We show how some well-known social-psychological phenomena (e.g. self-serving biases can be explained in terms of active interpersonal inference. 3. Mentalising naturally entails Bayesian updating of how people value social outcomes. Crucially this includes inference about one’s own qualities and preferences. Conclusion: We inaugurate a Bayes optimal framework for modelling intersubject variability in mentalising during interpersonal exchanges. Here, interpersonal representations are endowed with explicit functional and affective properties. We suggest the active inference framework lends itself to the study of psychiatric conditions where mentalising is distorted.

  20. Memetic algorithms for de novo motif-finding in biomedical sequences.

    Science.gov (United States)

    Bi, Chengpeng

    2012-09-01

    The objectives of this study are to design and implement a new memetic algorithm for de novo motif discovery, which is then applied to detect important signals hidden in various biomedical molecular sequences. In this paper, memetic algorithms are developed and tested in de novo motif-finding problems. Several strategies in the algorithm design are employed that are to not only efficiently explore the multiple sequence local alignment space, but also effectively uncover the molecular signals. As a result, there are a number of key features in the implementation of the memetic motif-finding algorithm (MaMotif), including a chromosome replacement operator, a chromosome alteration-aware local search operator, a truncated local search strategy, and a stochastic operation of local search imposed on individual learning. To test the new algorithm, we compare MaMotif with a few of other similar algorithms using simulated and experimental data including genomic DNA, primary microRNA sequences (let-7 family), and transmembrane protein sequences. The new memetic motif-finding algorithm is successfully implemented in C++, and exhaustively tested with various simulated and real biological sequences. In the simulation, it shows that MaMotif is the most time-efficient algorithm compared with others, that is, it runs 2 times faster than the expectation maximization (EM) method and 16 times faster than the genetic algorithm-based EM hybrid. In both simulated and experimental testing, results show that the new algorithm is compared favorably or superior to other algorithms. Notably, MaMotif is able to successfully discover the transcription factors' binding sites in the chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) data, correctly uncover the RNA splicing signals in gene expression, and precisely find the highly conserved helix motif in the transmembrane protein sequences, as well as rightly detect the palindromic segments in the primary micro

  1. Estimating uncertainty of inference for validation

    Energy Technology Data Exchange (ETDEWEB)

    Booker, Jane M [Los Alamos National Laboratory; Langenbrunner, James R [Los Alamos National Laboratory; Hemez, Francois M [Los Alamos National Laboratory; Ross, Timothy J [UNM

    2010-09-30

    We present a validation process based upon the concept that validation is an inference-making activity. This has always been true, but the association has not been as important before as it is now. Previously, theory had been confirmed by more data, and predictions were possible based on data. The process today is to infer from theory to code and from code to prediction, making the role of prediction somewhat automatic, and a machine function. Validation is defined as determining the degree to which a model and code is an accurate representation of experimental test data. Imbedded in validation is the intention to use the computer code to predict. To predict is to accept the conclusion that an observable final state will manifest; therefore, prediction is an inference whose goodness relies on the validity of the code. Quantifying the uncertainty of a prediction amounts to quantifying the uncertainty of validation, and this involves the characterization of uncertainties inherent in theory/models/codes and the corresponding data. An introduction to inference making and its associated uncertainty is provided as a foundation for the validation problem. A mathematical construction for estimating the uncertainty in the validation inference is then presented, including a possibility distribution constructed to represent the inference uncertainty for validation under uncertainty. The estimation of inference uncertainty for validation is illustrated using data and calculations from Inertial Confinement Fusion (ICF). The ICF measurements of neutron yield and ion temperature were obtained for direct-drive inertial fusion capsules at the Omega laser facility. The glass capsules, containing the fusion gas, were systematically selected with the intent of establishing a reproducible baseline of high-yield 10{sup 13}-10{sup 14} neutron output. The deuterium-tritium ratio in these experiments was varied to study its influence upon yield. This paper on validation inference is the

  2. Deep Learning for Population Genetic Inference.

    Science.gov (United States)

    Sheehan, Sara; Song, Yun S

    2016-03-01

    Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.

  3. Deep Learning for Population Genetic Inference.

    Directory of Open Access Journals (Sweden)

    Sara Sheehan

    2016-03-01

    Full Text Available Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data to the output (e.g., population genetic parameters of interest. We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history. Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.

  4. Deep Learning for Population Genetic Inference

    Science.gov (United States)

    Sheehan, Sara; Song, Yun S.

    2016-01-01

    Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908

  5. Inferring Phylogenetic Networks Using PhyloNet.

    Science.gov (United States)

    Wen, Dingqiao; Yu, Yun; Zhu, Jiafan; Nakhleh, Luay

    2018-07-01

    PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the "minimizing deep coalescences" criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.

  6. Goal inferences about robot behavior : goal inferences and human response behaviors

    NARCIS (Netherlands)

    Broers, H.A.T.; Ham, J.R.C.; Broeders, R.; De Silva, P.; Okada, M.

    2014-01-01

    This explorative research focused on the goal inferences human observers draw based on a robot's behavior, and the extent to which those inferences predict people's behavior in response to that robot. Results show that different robot behaviors cause different response behavior from people.

  7. Disparate requirements for the Walker A and B ATPase motifs of human RAD51D in homologous recombination.

    Science.gov (United States)

    Wiese, Claudia; Hinz, John M; Tebbs, Robert S; Nham, Peter B; Urbin, Salustra S; Collins, David W; Thompson, Larry H; Schild, David

    2006-01-01

    In vertebrates, homologous recombinational repair (HRR) requires RAD51 and five RAD51 paralogs (XRCC2, XRCC3, RAD51B, RAD51C and RAD51D) that all contain conserved Walker A and B ATPase motifs. In human RAD51D we examined the requirement for these motifs in interactions with XRCC2 and RAD51C, and for survival of cells in response to DNA interstrand crosslinks (ICLs). Ectopic expression of wild-type human RAD51D or mutants having a non-functional A or B motif was used to test for complementation of a rad51d knockout hamster CHO cell line. Although A-motif mutants complement very efficiently, B-motif mutants do not. Consistent with these results, experiments using the yeast two- and three-hybrid systems show that the interactions between RAD51D and its XRCC2 and RAD51C partners also require a functional RAD51D B motif, but not motif A. Similarly, hamster Xrcc2 is unable to bind to the non-complementing human RAD51D B-motif mutants in co-immunoprecipitation assays. We conclude that a functional Walker B motif, but not A motif, is necessary for RAD51D's interactions with other paralogs and for efficient HRR. We present a model in which ATPase sites are formed in a bipartite manner between RAD51D and other RAD51 paralogs.

  8. How to find a leucine in a haystack? Structure, ligand recognition and regulation of leucine-aspartic acid (LD) motifs

    KAUST Repository

    Alam, Tanvir

    2014-05-29

    LD motifs (leucine-aspartic acidmotifs) are short helical protein-protein interaction motifs that have emerged as key players in connecting cell adhesion with cell motility and survival. LD motifs are required for embryogenesis, wound healing and the evolution of multicellularity. LD motifs also play roles in disease, such as in cancer metastasis or viral infection. First described in the paxillin family of scaffolding proteins, LD motifs and similar acidic LXXLL interaction motifs have been discovered in several other proteins, whereas 16 proteins have been reported to contain LDBDs (LD motif-binding domains). Collectively, structural and functional analyses have revealed a surprising multivalency in LD motif interactions and a wide diversity in LDBD architectures. In the present review, we summarize the molecular basis for function, regulation and selectivity of LD motif interactions that has emerged from more than a decade of research. This overview highlights the intricate multi-level regulation and the inherently noisy and heterogeneous nature of signalling through short protein-protein interaction motifs. © 2014 Biochemical Society.

  9. Disparate requirements for the Walker A and B ATPase motifs ofhuman RAD51D in homologous recombination

    Energy Technology Data Exchange (ETDEWEB)

    Wiese, Claudia; Hinz, John M.; Tebbs, Robert S.; Nham, Peter B.; Urbin, Salustra S.; Collins, David W.; Thompson, Larry H.; Schild, David

    2006-04-21

    In vertebrates, homologous recombinational repair (HRR) requires RAD51 and five RAD51 paralogs (XRCC2, XRCC3, RAD51B, RAD51C, and RAD51D) that all contain conserved Walker A and B ATPase motifs. In human RAD51D we examined the requirement for these motifs in interactions with XRCC2 and RAD51C, and for survival of cells in response to DNA interstrand crosslinks. Ectopic expression of wild type human RAD51D or mutants having a non-functional A or B motif was used to test for complementation of a rad51d knockout hamster CHO cell line. Although A-motif mutants complement very efficiently, B-motif mutants do not. Consistent with these results, experiments using the yeast two- and three-hybrid systems show that the interactions between RAD51D and its XRCC2 and RAD51C partners also require a functional RAD51D B motif, but not motif A. Similarly, hamster Xrcc2 is unable to bind to the non-complementing human RAD51D B-motif mutants in co-immunoprecipitation assays. We conclude that a functional Walker B motif, but not A motif, is necessary for RAD51D's interactions with other paralogs and for efficient HRR. We present a model in which ATPase sites are formed in a bipartite manner between RAD51D and other RAD51 paralogs.

  10. APOCALYPTIC MOTIFS IN THE CYCLE OF STORIES BY M.A. BULGAKOV «NOTES OF A YOUNG DOCTOR»

    Directory of Open Access Journals (Sweden)

    Evgeniy Igorevich Erokhov

    2015-10-01

    Full Text Available The motif analysis of a cycle of stories by M.A. Bulgakov «Notes of a Young Doctor» from the point of view of their apocalyptic problematics was first performed in this article. To identify apocalyptic motifs the method of motif analysis, developed by B.M. Gasparov, was used which will also help to prove the interpenetration of motifs in the cycle of stories. The result of the research work is the identification of apocalyptic motifs which are manifested in the experiences of the main character and the events taking place around him and passing through the prism of physician’s perception of the world. Our identified motifs show that the stories in the cycle are united not only thematically and with the help of the image of the main character, but with the help of the motifs which reflect interpenetration of apocalyptic motifs in the stories of one cycle. There are the following apocalyptic motifs in the cycle of stories by Bulgakov: diseases, darkness (as part of the landscape, resurrection from the dead and beast. They all belong to the biblical type which is allocated on the basis of the associative bond of these motifs with the biblical texts.

  11. How to find a leucine in a haystack? Structure, ligand recognition and regulation of leucine-aspartic acid (LD) motifs

    KAUST Repository

    Alam, Tanvir; Alazmi, Meshari; Gao, Xin; Arold, Stefan T.

    2014-01-01

    LD motifs (leucine-aspartic acidmotifs) are short helical protein-protein interaction motifs that have emerged as key players in connecting cell adhesion with cell motility and survival. LD motifs are required for embryogenesis, wound healing and the evolution of multicellularity. LD motifs also play roles in disease, such as in cancer metastasis or viral infection. First described in the paxillin family of scaffolding proteins, LD motifs and similar acidic LXXLL interaction motifs have been discovered in several other proteins, whereas 16 proteins have been reported to contain LDBDs (LD motif-binding domains). Collectively, structural and functional analyses have revealed a surprising multivalency in LD motif interactions and a wide diversity in LDBD architectures. In the present review, we summarize the molecular basis for function, regulation and selectivity of LD motif interactions that has emerged from more than a decade of research. This overview highlights the intricate multi-level regulation and the inherently noisy and heterogeneous nature of signalling through short protein-protein interaction motifs. © 2014 Biochemical Society.

  12. Using Alien Coins to Test Whether Simple Inference Is Bayesian

    Science.gov (United States)

    Cassey, Peter; Hawkins, Guy E.; Donkin, Chris; Brown, Scott D.

    2016-01-01

    Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we…

  13. Overlapping ETS and CRE Motifs (G/CCGGAAGTGACGTCA) Preferentially Bound by GABPα and CREB Proteins

    Science.gov (United States)

    Chatterjee, Raghunath; Zhao, Jianfei; He, Ximiao; Shlyakhtenko, Andrey; Mann, Ishminder; Waterfall, Joshua J.; Meltzer, Paul; Sathyanarayana, B. K.; FitzGerald, Peter C.; Vinson, Charles

    2012-01-01

    Previously, we identified 8-bps long DNA sequences (8-mers) that localize in human proximal promoters and grouped them into known transcription factor binding sites (TFBS). We now examine split 8-mers consisting of two 4-mers separated by 1-bp to 30-bps (X4-N1-30-X4) to identify pairs of TFBS that localize in proximal promoters at a precise distance. These include two overlapping TFBS: the ETS⇔ETS motif (C/GCCGGAAGCGGAA) and the ETS⇔CRE motif (C/GCGGAAGTGACGTCAC). The nucleotides in bold are part of both TFBS. Molecular modeling shows that the ETS⇔CRE motif can be bound simultaneously by both the ETS and the B-ZIP domains without protein-protein clashes. The electrophoretic mobility shift assay (EMSA) shows that the ETS protein GABPα and the B-ZIP protein CREB preferentially bind to the ETS⇔CRE motif only when the two TFBS overlap precisely. In contrast, the ETS domain of ETV5 and CREB interfere with each other for binding the ETS⇔CRE. The 11-mer (CGGAAGTGACG), the conserved part of the ETS⇔CRE motif, occurs 226 times in the human genome and 83% are in known regulatory regions. In vivo GABPα and CREB ChIP-seq peaks identified the ETS⇔CRE as the most enriched motif occurring in promoters of genes involved in mRNA processing, cellular catabolic processes, and stress response, suggesting that a specific class of genes is regulated by this composite motif. PMID:23050235

  14. Systematic discovery of regulatory motifs in Fusarium graminearum by comparing four Fusarium genomes

    Directory of Open Access Journals (Sweden)

    Kistler Corby

    2010-03-01

    Full Text Available Abstract Background Fusarium graminearum (Fg, a major fungal pathogen of cultivated cereals, is responsible for billions of dollars in agriculture losses. There is a growing interest in understanding the transcriptional regulation of this organism, especially the regulation of genes underlying its pathogenicity. The generation of whole genome sequence assemblies for Fg and three closely related Fusarium species provides a unique opportunity for such a study. Results Applying comparative genomics approaches, we developed a computational pipeline to systematically discover evolutionarily conserved regulatory motifs in the promoter, downstream and the intronic regions of Fg genes, based on the multiple alignments of sequenced Fusarium genomes. Using this method, we discovered 73 candidate regulatory motifs in the promoter regions. Nearly 30% of these motifs are highly enriched in promoter regions of Fg genes that are associated with a specific functional category. Through comparison to Saccharomyces cerevisiae (Sc and Schizosaccharomyces pombe (Sp, we observed conservation of transcription factors (TFs, their binding sites and the target genes regulated by these TFs related to pathways known to respond to stress conditions or phosphate metabolism. In addition, this study revealed 69 and 39 conserved motifs in the downstream regions and the intronic regions, respectively, of Fg genes. The top intronic motif is the splice donor site. For the downstream regions, we noticed an intriguing absence of the mammalian and Sc poly-adenylation signals among the list of conserved motifs. Conclusion This study provides the first comprehensive list of candidate regulatory motifs in Fg, and underscores the power of comparative genomics in revealing functional elements among related genomes. The conservation of regulatory pathways among the Fusarium genomes and the two yeast species reveals their functional significance, and provides new insights in their

  15. Explanatory Preferences Shape Learning and Inference.

    Science.gov (United States)

    Lombrozo, Tania

    2016-10-01

    Explanations play an important role in learning and inference. People often learn by seeking explanations, and they assess the viability of hypotheses by considering how well they explain the data. An emerging body of work reveals that both children and adults have strong and systematic intuitions about what constitutes a good explanation, and that these explanatory preferences have a systematic impact on explanation-based processes. In particular, people favor explanations that are simple and broad, with the consequence that engaging in explanation can shape learning and inference by leading people to seek patterns and favor hypotheses that support broad and simple explanations. Given the prevalence of explanation in everyday cognition, understanding explanation is therefore crucial to understanding learning and inference. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Fuzzy logic controller using different inference methods

    International Nuclear Information System (INIS)

    Liu, Z.; De Keyser, R.

    1994-01-01

    In this paper the design of fuzzy controllers by using different inference methods is introduced. Configuration of the fuzzy controllers includes a general rule-base which is a collection of fuzzy PI or PD rules, the triangular fuzzy data model and a centre of gravity defuzzification algorithm. The generalized modus ponens (GMP) is used with the minimum operator of the triangular norm. Under the sup-min inference rule, six fuzzy implication operators are employed to calculate the fuzzy look-up tables for each rule base. The performance is tested in simulated systems with MATLAB/SIMULINK. Results show the effects of using the fuzzy controllers with different inference methods and applied to different test processes

  17. Uncertainty in prediction and in inference

    International Nuclear Information System (INIS)

    Hilgevoord, J.; Uffink, J.

    1991-01-01

    The concepts of uncertainty in prediction and inference are introduced and illustrated using the diffraction of light as an example. The close relationship between the concepts of uncertainty in inference and resolving power is noted. A general quantitative measure of uncertainty in inference can be obtained by means of the so-called statistical distance between probability distributions. When applied to quantum mechanics, this distance leads to a measure of the distinguishability of quantum states, which essentially is the absolute value of the matrix element between the states. The importance of this result to the quantum mechanical uncertainty principle is noted. The second part of the paper provides a derivation of the statistical distance on the basis of the so-called method of support

  18. A Learning Algorithm for Multimodal Grammar Inference.

    Science.gov (United States)

    D'Ulizia, A; Ferri, F; Grifoni, P

    2011-12-01

    The high costs of development and maintenance of multimodal grammars in integrating and understanding input in multimodal interfaces lead to the investigation of novel algorithmic solutions in automating grammar generation and in updating processes. Many algorithms for context-free grammar inference have been developed in the natural language processing literature. An extension of these algorithms toward the inference of multimodal grammars is necessary for multimodal input processing. In this paper, we propose a novel grammar inference mechanism that allows us to learn a multimodal grammar from its positive samples of multimodal sentences. The algorithm first generates the multimodal grammar that is able to parse the positive samples of sentences and, afterward, makes use of two learning operators and the minimum description length metrics in improving the grammar description and in avoiding the over-generalization problem. The experimental results highlight the acceptable performances of the algorithm proposed in this paper since it has a very high probability of parsing valid sentences.

  19. Examples in parametric inference with R

    CERN Document Server

    Dixit, Ulhas Jayram

    2016-01-01

    This book discusses examples in parametric inference with R. Combining basic theory with modern approaches, it presents the latest developments and trends in statistical inference for students who do not have an advanced mathematical and statistical background. The topics discussed in the book are fundamental and common to many fields of statistical inference and thus serve as a point of departure for in-depth study. The book is divided into eight chapters: Chapter 1 provides an overview of topics on sufficiency and completeness, while Chapter 2 briefly discusses unbiased estimation. Chapter 3 focuses on the study of moments and maximum likelihood estimators, and Chapter 4 presents bounds for the variance. In Chapter 5, topics on consistent estimator are discussed. Chapter 6 discusses Bayes, while Chapter 7 studies some more powerful tests. Lastly, Chapter 8 examines unbiased and other tests. Senior undergraduate and graduate students in statistics and mathematics, and those who have taken an introductory cou...

  20. Grammatical inference algorithms, routines and applications

    CERN Document Server

    Wieczorek, Wojciech

    2017-01-01

    This book focuses on grammatical inference, presenting classic and modern methods of grammatical inference from the perspective of practitioners. To do so, it employs the Python programming language to present all of the methods discussed. Grammatical inference is a field that lies at the intersection of multiple disciplines, with contributions from computational linguistics, pattern recognition, machine learning, computational biology, formal learning theory and many others. Though the book is largely practical, it also includes elements of learning theory, combinatorics on words, the theory of automata and formal languages, plus references to real-world problems. The listings presented here can be directly copied and pasted into other programs, thus making the book a valuable source of ready recipes for students, academic researchers, and programmers alike, as well as an inspiration for their further development.>.

  1. Statistical inference based on divergence measures

    CERN Document Server

    Pardo, Leandro

    2005-01-01

    The idea of using functionals of Information Theory, such as entropies or divergences, in statistical inference is not new. However, in spite of the fact that divergence statistics have become a very good alternative to the classical likelihood ratio test and the Pearson-type statistic in discrete models, many statisticians remain unaware of this powerful approach.Statistical Inference Based on Divergence Measures explores classical problems of statistical inference, such as estimation and hypothesis testing, on the basis of measures of entropy and divergence. The first two chapters form an overview, from a statistical perspective, of the most important measures of entropy and divergence and study their properties. The author then examines the statistical analysis of discrete multivariate data with emphasis is on problems in contingency tables and loglinear models using phi-divergence test statistics as well as minimum phi-divergence estimators. The final chapter looks at testing in general populations, prese...

  2. Improved Inference of Heteroscedastic Fixed Effects Models

    Directory of Open Access Journals (Sweden)

    Afshan Saeed

    2016-12-01

    Full Text Available Heteroscedasticity is a stern problem that distorts estimation and testing of panel data model (PDM. Arellano (1987 proposed the White (1980 estimator for PDM with heteroscedastic errors but it provides erroneous inference for the data sets including high leverage points. In this paper, our attempt is to improve heteroscedastic consistent covariance matrix estimator (HCCME for panel dataset with high leverage points. To draw robust inference for the PDM, our focus is to improve kernel bootstrap estimators, proposed by Racine and MacKinnon (2007. The Monte Carlo scheme is used for assertion of the results.

  3. Likelihood inference for unions of interacting discs

    DEFF Research Database (Denmark)

    Møller, Jesper; Helisova, K.

    2010-01-01

    This is probably the first paper which discusses likelihood inference for a random set using a germ-grain model, where the individual grains are unobservable, edge effects occur and other complications appear. We consider the case where the grains form a disc process modelled by a marked point...... process, where the germs are the centres and the marks are the associated radii of the discs. We propose to use a recent parametric class of interacting disc process models, where the minimal sufficient statistic depends on various geometric properties of the random set, and the density is specified......-based maximum likelihood inference and the effect of specifying different reference Poisson models....

  4. IMAGINE: Interstellar MAGnetic field INference Engine

    Science.gov (United States)

    Steininger, Theo

    2018-03-01

    IMAGINE (Interstellar MAGnetic field INference Engine) performs inference on generic parametric models of the Galaxy. The modular open source framework uses highly optimized tools and technology such as the MultiNest sampler (ascl:1109.006) and the information field theory framework NIFTy (ascl:1302.013) to create an instance of the Milky Way based on a set of parameters for physical observables, using Bayesian statistics to judge the mismatch between measured data and model prediction. The flexibility of the IMAGINE framework allows for simple refitting for newly available data sets and makes state-of-the-art Bayesian methods easily accessible particularly for random components of the Galactic magnetic field.

  5. Inferring causality from noisy time series data

    DEFF Research Database (Denmark)

    Mønster, Dan; Fusaroli, Riccardo; Tylén, Kristian

    2016-01-01

    Convergent Cross-Mapping (CCM) has shown high potential to perform causal inference in the absence of models. We assess the strengths and weaknesses of the method by varying coupling strength and noise levels in coupled logistic maps. We find that CCM fails to infer accurate coupling strength...... and even causality direction in synchronized time-series and in the presence of intermediate coupling. We find that the presence of noise deterministically reduces the level of cross-mapping fidelity, while the convergence rate exhibits higher levels of robustness. Finally, we propose that controlled noise...

  6. Efficient sequential and parallel algorithms for finding edit distance based motifs.

    Science.gov (United States)

    Pal, Soumitra; Xiao, Peng; Rajasekaran, Sanguthevar

    2016-08-18

    Motif search is an important step in extracting meaningful patterns from biological data. The general problem of motif search is intractable and there is a pressing need to develop efficient, exact and approximation algorithms to solve this problem. In this paper, we present several novel, exact, sequential and parallel algorithms for solving the (l,d) Edit-distance-based Motif Search (EMS) problem: given two integers l,d and n biological strings, find all strings of length l that appear in each input string with atmost d errors of types substitution, insertion and deletion. One popular technique to solve the problem is to explore for each input string the set of all possible l-mers that belong to the d-neighborhood of any substring of the input string and output those which are common for all input strings. We introduce a novel and provably efficient neighborhood exploration technique. We show that it is enough to consider the candidates in neighborhood which are at a distance exactly d. We compactly represent these candidate motifs using wildcard characters and efficiently explore them with very few repetitions. Our sequential algorithm uses a trie based data structure to efficiently store and sort the candidate motifs. Our parallel algorithm in a multi-core shared memory setting uses arrays for storing and a novel modification of radix-sort for sorting the candidate motifs. The algorithms for EMS are customarily evaluated on several challenging instances such as (8,1), (12,2), (16,3), (20,4), and so on. The best previously known algorithm, EMS1, is sequential and in estimated 3 days solves up to instance (16,3). Our sequential algorithms are more than 20 times faster on (16,3). On other hard instances such as (9,2), (11,3), (13,4), our algorithms are much faster. Our parallel algorithm has more than 600 % scaling performance while using 16 threads. Our algorithms have pushed up the state-of-the-art of EMS solvers and we believe that the techniques introduced in

  7. Through the Portal: Viking Motifs Incorporated in the Romanesque Style in Telemark, Norway

    Directory of Open Access Journals (Sweden)

    Kristine Ødeby

    2013-09-01

    Full Text Available This paper presents the results of an analysis of motifs identified on six carved wooden Romanesque portal panels from the Norwegian county of Telemark. The findings suggest that animal motifs in the Late Viking style survived long into the Late Medieval period and were reused on these medieval portals. Stylistically, late expressions of Viking animal art do not differ a great deal from those of the subsequent Romanesque style. However, their symbolical differences are considered to be significant. The motifs themselves, and the issue of whether the Romanesque style adopted motifs from pre-Christian art, have attracted less attention. The motif portraying Sigurd slaying the dragon is considered in depth. It will be suggested that Sigurd, serving as a mediator between the old and the new beliefs when he appeared in late Viking contexts, was given a new role when portrayed in Christian art. Metaphor and liminality are a central part of this paper, and the theories of Alfred Gell and Margrete Andås suggest that the portal itself affects those who pass through it, and that the iconography is meaningful from a liminal perspective.

  8. Discovery and validation of information theory-based transcription factor and cofactor binding site motifs.

    Science.gov (United States)

    Lu, Ruipeng; Mucaki, Eliseos J; Rogan, Peter K

    2017-03-17

    Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The reliability and accuracy of these iPWMs were determined via four independent validation methods, including the detection of experimentally proven binding sites, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. We also predict previously unreported TF coregulatory interactions (e.g. TF complexes). These iPWMs constitute a powerful tool for predicting the effects of sequence variants in known binding sites, performing mutation analysis on regulatory SNPs and predicting previously unrecognized binding sites and target genes. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Spatiotemporal network motif reveals the biological traits of developmental gene regulatory networks in Drosophila melanogaster

    Directory of Open Access Journals (Sweden)

    Kim Man-Sun

    2012-05-01

    Full Text Available Abstract Background Network motifs provided a “conceptual tool” for understanding the functional principles of biological networks, but such motifs have primarily been used to consider static network structures. Static networks, however, cannot be used to reveal time- and region-specific traits of biological systems. To overcome this limitation, we proposed the concept of a “spatiotemporal network motif,” a spatiotemporal sequence of network motifs of sub-networks which are active only at specific time points and body parts. Results On the basis of this concept, we analyzed the developmental gene regulatory network of the Drosophila melanogaster embryo. We identified spatiotemporal network motifs and investigated their distribution pattern in time and space. As a result, we found how key developmental processes are temporally and spatially regulated by the gene network. In particular, we found that nested feedback loops appeared frequently throughout the entire developmental process. From mathematical simulations, we found that mutual inhibition in the nested feedback loops contributes to the formation of spatial expression patterns. Conclusions Taken together, the proposed concept and the simulations can be used to unravel the design principle of developmental gene regulatory networks.

  10. Do motifs reflect evolved function?--No convergent evolution of genetic regulatory network subgraph topologies.

    Science.gov (United States)

    Knabe, Johannes F; Nehaniv, Chrystopher L; Schilstra, Maria J

    2008-01-01

    Methods that analyse the topological structure of networks have recently become quite popular. Whether motifs (subgraph patterns that occur more often than in randomized networks) have specific functions as elementary computational circuits has been cause for debate. As the question is difficult to resolve with currently available biological data, we approach the issue using networks that abstractly model natural genetic regulatory networks (GRNs) which are evolved to show dynamical behaviors. Specifically one group of networks was evolved to be capable of exhibiting two different behaviors ("differentiation") in contrast to a group with a single target behavior. In both groups we find motif distribution differences within the groups to be larger than differences between them, indicating that evolutionary niches (target functions) do not necessarily mold network structure uniquely. These results show that variability operators can have a stronger influence on network topologies than selection pressures, especially when many topologies can create similar dynamics. Moreover, analysis of motif functional relevance by lesioning did not suggest that motifs were of greater importance to the functioning of the network than arbitrary subgraph patterns. Only when drastically restricting network size, so that one motif corresponds to a whole functionally evolved network, was preference for particular connection patterns found. This suggests that in non-restricted, bigger networks, entanglement with the rest of the network hinders topological subgraph analysis.

  11. The position of the Gly-xxx-Gly motif in transmembrane segments modulates dimer affinity.

    Science.gov (United States)

    Johnson, Rachel M; Rath, Arianna; Deber, Charles M

    2006-12-01

    Although the intrinsic low solubility of membrane proteins presents challenges to their high-resolution structure determination, insight into the amino acid sequence features and forces that stabilize their folds has been provided through study of sequence-dependent helix-helix interactions between single transmembrane (TM) helices. While the stability of helix-helix partnerships mediated by the Gly-xxx-Gly (GG4) motif is known to be generally modulated by distal interfacial residues, it has not been established whether the position of this motif, with respect to the ends of a given TM segment, affects dimer affinity. Here we examine the relationship between motif position and affinity in the homodimers of 2 single-spanning membrane protein TM sequences: glycophorin A (GpA) and bacteriophage M13 coat protein (MCP). Using the TOXCAT assay for dimer affinity on a series of GpA and MCP TM segments that have been modified with either 4 Leu residues at each end or with 8 Leu residues at the N-terminal end, we show that in each protein, centrally located GG4 motifs are capable of stronger helix-helix interactions than those proximal to TM helix ends, even when surrounding interfacial residues are maintained. The relative importance of GG4 motifs in stabilizing helix-helix interactions therefore must be considered not only in its specific residue context but also in terms of the location of the interactive surface relative to the N and C termini of alpha-helical TM segments.

  12. Identification of helix capping and {beta}-turn motifs from NMR chemical shifts

    Energy Technology Data Exchange (ETDEWEB)

    Shen Yang; Bax, Ad, E-mail: bax@nih.gov [National Institutes of Health, Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases (United States)

    2012-03-15

    We present an empirical method for identification of distinct structural motifs in proteins on the basis of experimentally determined backbone and {sup 13}C{sup {beta}} chemical shifts. Elements identified include the N-terminal and C-terminal helix capping motifs and five types of {beta}-turns: I, II, I Prime , II Prime and VIII. Using a database of proteins of known structure, the NMR chemical shifts, together with the PDB-extracted amino acid preference of the helix capping and {beta}-turn motifs are used as input data for training an artificial neural network algorithm, which outputs the statistical probability of finding each motif at any given position in the protein. The trained neural networks, contained in the MICS (motif identification from chemical shifts) program, also provide a confidence level for each of their predictions, and values ranging from ca 0.7-0.9 for the Matthews correlation coefficient of its predictions far exceed those attainable by sequence analysis. MICS is anticipated to be useful both in the conventional NMR structure determination process and for enhancing on-going efforts to determine protein structures solely on the basis of chemical shift information, where it can aid in identifying protein database fragments suitable for use in building such structures.

  13. EEVD motif of heat shock cognate protein 70 contributes to bacterial uptake by trophoblast giant cells

    Directory of Open Access Journals (Sweden)

    Kim Suk

    2009-12-01

    Full Text Available Abstract Background The uptake of abortion-inducing pathogens by trophoblast giant (TG cells is a key event in infectious abortion. However, little is known about phagocytic functions of TG cells against the pathogens. Here we show that heat shock cognate protein 70 (Hsc70 contributes to bacterial uptake by TG cells and the EEVD motif of Hsc70 plays an important role in this. Methods Brucella abortus and Listeria monocytogenes were used as the bacterial antigen in this study. Recombinant proteins containing tetratricopeptide repeat (TPR domains were constructed and confirmation of the binding capacity to Hsc70 was assessed by ELISA. The recombinant TPR proteins were used for investigation of the effect of TPR proteins on bacterial uptake by TG cells and on pregnancy in mice. Results The monoclonal antibody that inhibits bacterial uptake by TG cells reacted with the EEVD motif of Hsc70. Bacterial TPR proteins bound to the C-terminal of Hsc70 through its EEVD motif and this binding inhibited bacterial uptake by TG cells. Infectious abortion was also prevented by blocking the EEVD motif of Hsc70. Conclusions Our results demonstrate that surface located Hsc70 on TG cells mediates the uptake of pathogenic bacteria and proteins containing the TPR domain inhibit the function of Hsc70 by binding to its EEVD motif. These molecules may be useful in the development of methods for preventing infectious abortion.

  14. Regulation of TCF ETS-domain transcription factors by helix-loop-helix motifs.

    Science.gov (United States)

    Stinson, Julie; Inoue, Toshiaki; Yates, Paula; Clancy, Anne; Norton, John D; Sharrocks, Andrew D

    2003-08-15

    DNA binding by the ternary complex factor (TCF) subfamily of ETS-domain transcription factors is tightly regulated by intramolecular and intermolecular interactions. The helix-loop-helix (HLH)-containing Id proteins are trans-acting negative regulators of DNA binding by the TCFs. In the TCF, SAP-2/Net/ERP, intramolecular inhibition of DNA binding is promoted by the cis-acting NID region that also contains an HLH-like motif. The NID also acts as a transcriptional repression domain. Here, we have studied the role of HLH motifs in regulating DNA binding and transcription by the TCF protein SAP-1 and how Cdk-mediated phosphorylation affects the inhibitory activity of the Id proteins towards the TCFs. We demonstrate that the NID region of SAP-1 is an autoinhibitory motif that acts to inhibit DNA binding and also functions as a transcription repression domain. This region can be functionally replaced by fusion of Id proteins to SAP-1, whereby the Id moiety then acts to repress DNA binding in cis. Phosphorylation of the Ids by cyclin-Cdk complexes results in reduction in protein-protein interactions between the Ids and TCFs and relief of their DNA-binding inhibitory activity. In revealing distinct mechanisms through which HLH motifs modulate the activity of TCFs, our results therefore provide further insight into the role of HLH motifs in regulating TCF function and how the inhibitory properties of the trans-acting Id HLH proteins are themselves regulated by phosphorylation.

  15. Motif finding in DNA sequences based on skipping nonconserved positions in background Markov chains.

    Science.gov (United States)

    Zhao, Xiaoyan; Sze, Sing-Hoi

    2011-05-01

    One strategy to identify transcription factor binding sites is through motif finding in upstream DNA sequences of potentially co-regulated genes. Despite extensive efforts, none of the existing algorithms perform very well. We consider a string representation that allows arbitrary ignored positions within the nonconserved portion of single motifs, and use O(2(l)) Markov chains to model the background distributions of motifs of length l while skipping these positions within each Markov chain. By focusing initially on positions that have fixed nucleotides to define core occurrences, we develop an algorithm to identify motifs of moderate lengths. We compare the performance of our algorithm to other motif finding algorithms on a few benchmark data sets, and show that significant improvement in accuracy can be obtained when the sites are sufficiently conserved within a given sample, while comparable performance is obtained when the site conservation rate is low. A software program (PosMotif ) and detailed results are available online at http://faculty.cse.tamu.edu/shsze/posmotif.

  16. OSR1 regulates a subset of inward rectifier potassium channels via a binding motif variant.

    Science.gov (United States)

    Taylor, Clinton A; An, Sung-Wan; Kankanamalage, Sachith Gallolu; Stippec, Steve; Earnest, Svetlana; Trivedi, Ashesh T; Yang, Jonathan Zijiang; Mirzaei, Hamid; Huang, Chou-Long; Cobb, Melanie H

    2018-04-10

    The with-no-lysine (K) (WNK) signaling pathway to STE20/SPS1-related proline- and alanine-rich kinase (SPAK) and oxidative stress-responsive 1 (OSR1) kinase is an important mediator of cell volume and ion transport. SPAK and OSR1 associate with upstream kinases WNK 1-4, substrates, and other proteins through their C-terminal domains which interact with linear R-F-x-V/I sequence motifs. In this study we find that SPAK and OSR1 also interact with similar affinity with a motif variant, R-x-F-x-V/I. Eight of 16 human inward rectifier K + channels have an R-x-F-x-V motif. We demonstrate that two of these channels, Kir2.1 and Kir2.3, are activated by OSR1, while Kir4.1, which does not contain the motif, is not sensitive to changes in OSR1 or WNK activity. Mutation of the motif prevents activation of Kir2.3 by OSR1. Both siRNA knockdown of OSR1 and chemical inhibition of WNK activity disrupt NaCl-induced plasma membrane localization of Kir2.3. Our results suggest a mechanism by which WNK-OSR1 enhance Kir2.1 and Kir2.3 channel activity by increasing their plasma membrane localization. Regulation of members of the inward rectifier K + channel family adds functional and mechanistic insight into the physiological impact of the WNK pathway.

  17. DXD Motif-Dependent and -Independent Effects of the Chlamydia trachomatis Cytotoxin CT166

    Directory of Open Access Journals (Sweden)

    Miriam Bothe

    2015-02-01

    Full Text Available The Gram-negative, intracellular bacterium Chlamydia trachomatis causes acute and chronic urogenital tract infection, potentially leading to infertility and ectopic pregnancy. The only partially characterized cytotoxin CT166 of serovar D exhibits a DXD motif, which is important for the enzymatic activity of many bacterial and mammalian type A glycosyltransferases, leading to the hypothesis that CT166 possess glycosyltransferase activity. CT166-expressing HeLa cells exhibit actin reorganization, including cell rounding, which has been attributed to the inhibition of the Rho-GTPases Rac/Cdc42. Exploiting the glycosylation-sensitive Ras(27H5 antibody, we here show that CT166 induces an epitope change in Ras, resulting in inhibited ERK and PI3K signaling and delayed cell cycle progression. Consistent with the hypothesis that these effects strictly depend on the DXD motif, CT166 with the mutated DXD motif causes neither Ras-ERK inhibition nor delayed cell cycle progression. In contrast, CT166 with the mutated DXD motif is still capable of inhibiting cell migration, suggesting that CT166 with the mutated DXD motif cannot be regarded as inactive in any case. Taken together, CT166 affects various fundamental cellular processes, strongly suggesting its importance for the intracellular survival of chlamydia.

  18. A Simple Decision Rule for Recognition of Poly(A) Tail Signal Motifs in Human Genome

    KAUST Repository

    AbouEisha, Hassan M.

    2015-05-12

    Background is the numerous attempts were made to predict motifs in genomic sequences that correspond to poly (A) tail signals. Vast portion of this effort has been directed to a plethora of nonlinear classification methods. Even when such approaches yield good discriminant results, identifying dominant features of regulatory mechanisms nevertheless remains a challenge. In this work, we look at decision rules that may help identifying such features. Findings are we present a simple decision rule for classification of candidate poly (A) tail signal motifs in human genomic sequence obtained by evaluating features during the construction of gradient boosted trees. We found that values of a single feature based on the frequency of adenine in the genomic sequence surrounding candidate signal and the number of consecutive adenine molecules in a well-defined region immediately following the motif displays good discriminative potential in classification of poly (A) tail motifs for samples covered by the rule. Conclusions is the resulting simple rule can be used as an efficient filter in construction of more complex poly(A) tail motifs classification algorithms.

  19. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal M.

    2011-11-15

    Motivation: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants. The Author(s) 2011. Published by Oxford University Press. All rights reserved.

  20. Identification of helix capping and β-turn motifs from NMR chemical shifts

    International Nuclear Information System (INIS)

    Shen Yang; Bax, Ad

    2012-01-01

    We present an empirical method for identification of distinct structural motifs in proteins on the basis of experimentally determined backbone and 13 C β chemical shifts. Elements identified include the N-terminal and C-terminal helix capping motifs and five types of β-turns: I, II, I′, II′ and VIII. Using a database of proteins of known structure, the NMR chemical shifts, together with the PDB-extracted amino acid preference of the helix capping and β-turn motifs are used as input data for training an artificial neural network algorithm, which outputs the statistical probability of finding each motif at any given position in the protein. The trained neural networks, contained in the MICS (motif identification from chemical shifts) program, also provide a confidence level for each of their predictions, and values ranging from ca 0.7–0.9 for the Matthews correlation coefficient of its predictions far exceed those attainable by sequence analysis. MICS is anticipated to be useful both in the conventional NMR structure determination process and for enhancing on-going efforts to determine protein structures solely on the basis of chemical shift information, where it can aid in identifying protein database fragments suitable for use in building such structures.