WorldWideScience

Sample records for gene cluster analysis

  1. Semi-supervised consensus clustering for gene expression data analysis

    OpenAIRE

    Wang, Yunli; Pan, Youlian

    2014-01-01

    Background Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and do...

  2. IGSA: Individual Gene Sets Analysis, including Enrichment and Clustering.

    Science.gov (United States)

    Wu, Lingxiang; Chen, Xiujie; Zhang, Denan; Zhang, Wubing; Liu, Lei; Ma, Hongzhe; Yang, Jingbo; Xie, Hongbo; Liu, Bo; Jin, Qing

    2016-01-01

    Analysis of gene sets has been widely applied in various high-throughput biological studies. One weakness in the traditional methods is that they neglect the heterogeneity of genes expressions in samples which may lead to the omission of some specific and important gene sets. It is also difficult for them to reflect the severities of disease and provide expression profiles of gene sets for individuals. We developed an application software called IGSA that leverages a powerful analytical capacity in gene sets enrichment and samples clustering. IGSA calculates gene sets expression scores for each sample and takes an accumulating clustering strategy to let the samples gather into the set according to the progress of disease from mild to severe. We focus on gastric, pancreatic and ovarian cancer data sets for the performance of IGSA. We also compared the results of IGSA in KEGG pathways enrichment with David, GSEA, SPIA, ssGSEA and analyzed the results of IGSA clustering and different similarity measurement methods. Notably, IGSA is proved to be more sensitive and specific in finding significant pathways, and can indicate related changes in pathways with the severity of disease. In addition, IGSA provides with significant gene sets profile for each sample.

  3. QTL global meta-analysis: are trait determining genes clustered?

    Directory of Open Access Journals (Sweden)

    Adelson David L

    2009-04-01

    Full Text Available Abstract Background A key open question in biology is if genes are physically clustered with respect to their known functions or phenotypic effects. This is of particular interest for Quantitative Trait Loci (QTL where a QTL region could contain a number of genes that contribute to the trait being measured. Results We observed a significant increase in gene density within QTL regions compared to non-QTL regions and/or the entire bovine genome. By grouping QTL from the Bovine QTL Viewer database into 8 categories of non-redundant regions, we have been able to analyze gene density and gene function distribution, based on Gene Ontology (GO with relation to their location within QTL regions, outside of QTL regions and across the entire bovine genome. We identified a number of GO terms that were significantly over represented within particular QTL categories. Furthermore, select GO terms expected to be associated with the QTL category based on common biological knowledge have also proved to be significantly over represented in QTL regions. Conclusion Our analysis provides evidence of over represented GO terms in QTL regions. This increased GO term density indicates possible clustering of gene functions within QTL regions of the bovine genome. Genes with similar functions may be grouped in specific locales and could be contributing to QTL traits. Moreover, we have identified over-represented GO terminology that from a biological standpoint, makes sense with respect to QTL category type.

  4. Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

    Science.gov (United States)

    Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

    2016-04-01

    Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.

  5. Genome-scale analysis of positional clustering of mouse testis-specific genes

    Directory of Open Access Journals (Sweden)

    Lee Bernett TK

    2005-01-01

    Full Text Available Abstract Background Genes are not randomly distributed on a chromosome as they were thought even after removal of tandem repeats. The positional clustering of co-expressed genes is known in prokaryotes and recently reported in several eukaryotic organisms such as Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens. In order to further investigate the mode of tissue-specific gene clustering in higher eukaryotes, we have performed a genome-scale analysis of positional clustering of the mouse testis-specific genes. Results Our computational analysis shows that a large proportion of testis-specific genes are clustered in groups of 2 to 5 genes in the mouse genome. The number of clusters is much higher than expected by chance even after removal of tandem repeats. Conclusion Our result suggests that testis-specific genes tend to cluster on the mouse chromosomes. This provides another piece of evidence for the hypothesis that clusters of tissue-specific genes do exist.

  6. Global Analysis of miRNA Gene Clusters and Gene Families Reveals Dynamic and Coordinated Expression

    Directory of Open Access Journals (Sweden)

    Li Guo

    2014-01-01

    Full Text Available To further understand the potential expression relationships of miRNAs in miRNA gene clusters and gene families, a global analysis was performed in 4 paired tumor (breast cancer and adjacent normal tissue samples using deep sequencing datasets. The compositions of miRNA gene clusters and families are not random, and clustered and homologous miRNAs may have close relationships with overlapped miRNA species. Members in the miRNA group always had various expression levels, and even some showed larger expression divergence. Despite the dynamic expression as well as individual difference, these miRNAs always indicated consistent or similar deregulation patterns. The consistent deregulation expression may contribute to dynamic and coordinated interaction between different miRNAs in regulatory network. Further, we found that those clustered or homologous miRNAs that were also identified as sense and antisense miRNAs showed larger expression divergence. miRNA gene clusters and families indicated important biological roles, and the specific distribution and expression further enrich and ensure the flexible and robust regulatory network.

  7. Comparative analysis of clustering methods for gene expression time course data

    Directory of Open Access Journals (Sweden)

    Ivan G. Costa

    2004-01-01

    Full Text Available This work performs a data driven comparative study of clustering methods used in the analysis of gene expression time courses (or time series. Five clustering methods found in the literature of gene expression analysis are compared: agglomerative hierarchical clustering, CLICK, dynamical clustering, k-means and self-organizing maps. In order to evaluate the methods, a k-fold cross-validation procedure adapted to unsupervised methods is applied. The accuracy of the results is assessed by the comparison of the partitions obtained in these experiments with gene annotation, such as protein function and series classification.

  8. Gene cluster statistics with gene families.

    Science.gov (United States)

    Raghupathy, Narayanan; Durand, Dannie

    2009-05-01

    analysis in genomes of various sizes and illustrate the utility of our methods by applying them to gene clusters recently reported in the literature. Mathematical code to compute cluster probabilities using our methods is available as supplementary material.

  9. Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

    Energy Technology Data Exchange (ETDEWEB)

    Data Analysis and Visualization (IDAV) and the Department of Computer Science, University of California, Davis, One Shields Avenue, Davis CA 95616, USA,; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,' ' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA; Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA; Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA,; Computer Science Division,University of California, Berkeley, CA, USA,; Computer Science Department, University of California, Irvine, CA, USA,; All authors are with the Berkeley Drosophila Transcription Network Project, Lawrence Berkeley National Laboratory,; Rubel, Oliver; Weber, Gunther H.; Huang, Min-Yu; Bethel, E. Wes; Biggin, Mark D.; Fowlkes, Charless C.; Hendriks, Cris L. Luengo; Keranen, Soile V. E.; Eisen, Michael B.; Knowles, David W.; Malik, Jitendra; Hagen, Hans; Hamann, Bernd

    2008-05-12

    The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii) evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.

  10. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters

    NARCIS (Netherlands)

    Cimermancic, P.; Medema, Marnix; Claesen, J.; Kurika, K.; Wieland Brown, L.C.; Mavrommatis, K.; Pati, A.; Godfrey, P.A.; Koehrsen, M.; Clardy, J.; Birren, B. W.; Takano, Eriko; Sali, A.; Linington, R.G.; Fischbach, M.A.

    2014-01-01

    Although biosynthetic gene clusters (BGCs) have been discovered for hundreds of bacterial metabolites, our knowledge of their diversity remains limited. Here, we used a novel algorithm to systematically identify BGCs in the extensive extant microbial sequencing data. Network analysis of the

  11. Clustering analysis

    International Nuclear Information System (INIS)

    Romli

    1997-01-01

    Cluster analysis is the name of group of multivariate techniques whose principal purpose is to distinguish similar entities from the characteristics they process.To study this analysis, there are several algorithms that can be used. Therefore, this topic focuses to discuss the algorithms, such as, similarity measures, and hierarchical clustering which includes single linkage, complete linkage and average linkage method. also, non-hierarchical clustering method, which is popular name K -mean method ' will be discussed. Finally, this paper will be described the advantages and disadvantages of every methods

  12. Cluster analysis

    CERN Document Server

    Everitt, Brian S; Leese, Morven; Stahl, Daniel

    2011-01-01

    Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.Real life examples are used throughout to demons

  13. Genomic and expression analysis of the vanG-like gene cluster of Clostridium difficile.

    Science.gov (United States)

    Peltier, Johann; Courtin, Pascal; El Meouche, Imane; Catel-Ferreira, Manuella; Chapot-Chartier, Marie-Pierre; Lemée, Ludovic; Pons, Jean-Louis

    2013-07-01

    Primary antibiotic treatment of Clostridium difficile intestinal diseases requires metronidazole or vancomycin therapy. A cluster of genes homologous to enterococcal glycopeptides resistance vanG genes was found in the genome of C. difficile 630, although this strain remains sensitive to vancomycin. This vanG-like gene cluster was found to consist of five ORFs: the regulatory region consisting of vanR and vanS and the effector region consisting of vanG, vanXY and vanT. We found that 57 out of 83 C. difficile strains, representative of the main lineages of the species, harbour this vanG-like cluster. The cluster is expressed as an operon and, when present, is found at the same genomic location in all strains. The vanG, vanXY and vanT homologues in C. difficile 630 are co-transcribed and expressed to a low level throughout the growth phases in the absence of vancomycin. Conversely, the expression of these genes is strongly induced in the presence of subinhibitory concentrations of vancomycin, indicating that the vanG-like operon is functional at the transcriptional level in C. difficile. Hydrophilic interaction liquid chromatography (HILIC-HPLC) and MS analysis of cytoplasmic peptidoglycan precursors of C. difficile 630 grown without vancomycin revealed the exclusive presence of a UDP-MurNAc-pentapeptide with an alanine at the C terminus. UDP-MurNAc-pentapeptide [d-Ala] was also the only peptidoglycan precursor detected in C. difficile grown in the presence of vancomycin, corroborating the lack of vancomycin resistance. Peptidoglycan structures of a vanG-like mutant strain and of a strain lacking the vanG-like cluster did not differ from the C. difficile 630 strain, indicating that the vanG-like cluster also has no impact on cell-wall composition.

  14. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    Science.gov (United States)

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of EOperon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.

  15. Gene expression data clustering and it’s application in differential analysis of leukemia

    Directory of Open Access Journals (Sweden)

    M. Vahedi

    2008-02-01

    Full Text Available Introduction: DNA microarray technique is one of the most important categories in bioinformatics,which allows the possibility of monitoring thousands of expressed genes has been resulted in creatinggiant data bases of gene expression data, recently. Statistical analysis of such databases includednormalization, clustering, classification and etc.Materials and Methods: Golub et al (1999 collected data bases of leukemia based on the method ofoligonucleotide. The data is on the internet. In this paper, we analyzed gene expression data. It wasclustered by several methods including multi-dimensional scaling, hierarchical and non-hierarchicalclustering. Data set included 20 Acute Lymphoblastic Leukemia (ALL patients and 14 Acute MyeloidLeukemia (AML patients. The results of tow methods of clustering were compared with regard to realgrouping (ALL & AML. R software was used for data analysis.Results: Specificity and sensitivity of divisive hierarchical clustering in diagnosing of ALL patientswere 75% and 92%, respectively. Specificity and sensitivity of partitioning around medoids indiagnosing of ALL patients were 90% and 93%, respectively. These results showed a wellaccomplishment of both methods of clustering. It is considerable that, due to clustering methodsresults, one of the samples was placed in ALL groups, which was in AML group in clinical test.Conclusion: With regard to concordance of the results with real grouping of data, therefore we canuse these methods in the cases where we don't have accurate information of real grouping of data.Moreover, Results of clustering might distinct subgroups of data in such a way that would be necessaryfor concordance with clinical outcomes, laboratory results and so on.

  16. A genome-wide analysis of nonribosomal peptide synthetase gene clusters and their peptides in a Planktothrix rubescens strain

    Directory of Open Access Journals (Sweden)

    Nederbragt Alexander J

    2009-08-01

    Full Text Available Abstract Background Cyanobacteria often produce several different oligopeptides, with unknown biological functions, by nonribosomal peptide synthetases (NRPS. Although some cyanobacterial NRPS gene cluster types are well described, the entire NRPS genomic content within a single cyanobacterial strain has never been investigated. Here we have combined a genome-wide analysis using massive parallel pyrosequencing ("454" and mass spectrometry screening of oligopeptides produced in the strain Planktothrix rubescens NIVA CYA 98 in order to identify all putative gene clusters for oligopeptides. Results Thirteen types of oligopeptides were uncovered by mass spectrometry (MS analyses. Microcystin, cyanopeptolin and aeruginosin synthetases, highly similar to already characterized NRPS, were present in the genome. Two novel NRPS gene clusters were associated with production of anabaenopeptins and microginins, respectively. Sequence-depth of the genome and real-time PCR data revealed three copies of the microginin gene cluster. Since NRPS gene cluster candidates for microviridin and oscillatorin synthesis could not be found, putative (gene encoded precursor peptide sequences to microviridin and oscillatorin were found in the genes mdnA and oscA, respectively. The genes flanking the microviridin and oscillatorin precursor genes encode putative modifying enzymes of the precursor oligopeptides. We therefore propose ribosomal pathways involving modifications and cyclisation for microviridin and oscillatorin. The microviridin, anabaenopeptin and cyanopeptolin gene clusters are situated in close proximity to each other, constituting an oligopeptide island. Conclusion Altogether seven nonribosomal peptide synthetase (NRPS gene clusters and two gene clusters putatively encoding ribosomal oligopeptide biosynthetic pathways were revealed. Our results demonstrate that whole genome shotgun sequencing combined with MS-directed determination of oligopeptides successfully

  17. Gene cluster analysis for the biosynthesis of elgicins, novel lantibiotics produced by paenibacillus elgii B69

    Directory of Open Access Journals (Sweden)

    Teng Yi

    2012-03-01

    Full Text Available Abstract Background The recent increase in bacterial resistance to antibiotics has promoted the exploration of novel antibacterial materials. As a result, many researchers are undertaking work to identify new lantibiotics because of their potent antimicrobial activities. The objective of this study was to provide details of a lantibiotic-like gene cluster in Paenibacillus elgii B69 and to produce the antibacterial substances coded by this gene cluster based on culture screening. Results Analysis of the P. elgii B69 genome sequence revealed the presence of a lantibiotic-like gene cluster composed of five open reading frames (elgT1, elgC, elgT2, elgB, and elgA. Screening of culture extracts for active substances possessing the predicted properties of the encoded product led to the isolation of four novel peptides (elgicins AI, AII, B, and C with a broad inhibitory spectrum. The molecular weights of these peptides were 4536, 4593, 4706, and 4820 Da, respectively. The N-terminal sequence of elgicin B was Leu-Gly-Asp-Tyr, which corresponded to the partial sequence of the peptide ElgA encoded by elgA. Edman degradation suggested that the product elgicin B is derived from ElgA. By correlating the results of electrospray ionization-mass spectrometry analyses of elgicins AI, AII, and C, these peptides are deduced to have originated from the same precursor, ElgA. Conclusions A novel lantibiotic-like gene cluster was shown to be present in P. elgii B69. Four new lantibiotics with a broad inhibitory spectrum were isolated, and these appear to be promising antibacterial agents.

  18. Sequencing and transcriptional analysis of the Streptococcus thermophilus histamine biosynthesis gene cluster: factors that affect differential hdcA expression

    DEFF Research Database (Denmark)

    Calles-Enríquez, Marina; Hjort, Benjamin Benn; Andersen, Pia Skov

    2010-01-01

    to produce histamine. The hdc clusters of S. thermophilus CHCC1524 and CHCC6483 were sequenced, and the factors that affect histamine biosynthesis and histidine-decarboxylating gene (hdcA) expression were studied. The hdc cluster began with the hdcA gene, was followed by a transporter (hdcP), and ended...... with the hdcB gene, which is of unknown function. The three genes were orientated in the same direction. The genetic organization of the hdc cluster showed a unique organization among the lactic acid bacterial group and resembled those of Staphylococcus and Clostridium species, thus indicating possible...... acquisition through a horizontal transfer mechanism. Transcriptional analysis of the hdc cluster revealed the existence of a polycistronic mRNA covering the three genes. The histidine-decarboxylating gene (hdcA) of S. thermophilus demonstrated maximum expression during the stationary growth phase, with high...

  19. Cluster analysis

    OpenAIRE

    Mucha, Hans-Joachim; Sofyan, Hizir

    2000-01-01

    As an explorative technique, duster analysis provides a description or a reduction in the dimension of the data. It classifies a set of observations into two or more mutually exclusive unknown groups based on combinations of many variables. Its aim is to construct groups in such a way that the profiles of objects in the same groups are relatively homogenous whereas the profiles of objects in different groups are relatively heterogeneous. Clustering is distinct from classification techniques, ...

  20. Identification and functional analysis of gene cluster involvement in biosynthesis of the cyclic lipopeptide antibiotic pelgipeptin produced by Paenibacillus elgii

    Directory of Open Access Journals (Sweden)

    Qian Chao-Dong

    2012-09-01

    Full Text Available Abstract Background Pelgipeptin, a potent antibacterial and antifungal agent, is a non-ribosomally synthesised lipopeptide antibiotic. This compound consists of a β-hydroxy fatty acid and nine amino acids. To date, there is no information about its biosynthetic pathway. Results A potential pelgipeptin synthetase gene cluster (plp was identified from Paenibacillus elgii B69 through genome analysis. The gene cluster spans 40.8 kb with eight open reading frames. Among the genes in this cluster, three large genes, plpD, plpE, and plpF, were shown to encode non-ribosomal peptide synthetases (NRPSs, with one, seven, and one module(s, respectively. Bioinformatic analysis of the substrate specificity of all nine adenylation domains indicated that the sequence of the NRPS modules is well collinear with the order of amino acids in pelgipeptin. Additional biochemical analysis of four recombinant adenylation domains (PlpD A1, PlpE A1, PlpE A3, and PlpF A1 provided further evidence that the plp gene cluster involved in pelgipeptin biosynthesis. Conclusions In this study, a gene cluster (plp responsible for the biosynthesis of pelgipeptin was identified from the genome sequence of Paenibacillus elgii B69. The identification of the plp gene cluster provides an opportunity to develop novel lipopeptide antibiotics by genetic engineering.

  1. A Link-Based Cluster Ensemble Approach For Improved Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    P.Balaji

    2015-01-01

    Full Text Available Abstract It is difficult from possibilities to select a most suitable effective way of clustering algorithm and its dataset for a defined set of gene expression data because we have a huge number of ways and huge number of gene expressions. At present many researchers are preferring to use hierarchical clustering in different forms this is no more totally optimal. Cluster ensemble research can solve this type of problem by automatically merging multiple data partitions from a wide range of different clusterings of any dimensions to improve both the quality and robustness of the clustering result. But we have many existing ensemble approaches using an association matrix to condense sample-cluster and co-occurrence statistics and relations within the ensemble are encapsulated only at raw level while the existing among clusters are totally discriminated. Finding these missing associations can greatly expand the capability of those ensemble methodologies for microarray data clustering. We propose general K-means cluster ensemble approach for the clustering of general categorical data into required number of partitions.

  2. Gene expression profiling of resting and activated vascular smooth muscle cells by serial analysis of gene expression and clustering analysis

    NARCIS (Netherlands)

    Beauchamp, Nicholas J.; van Achterberg, Tanja A. E.; Engelse, Marten A.; Pannekoek, Hans; de Vries, Carlie J. M.

    2003-01-01

    Migration and proliferation of vascular smooth muscle cells (SMCs) are key events in atherosclerosis. However, little is known about alterations in gene expression upon transition of the quiescent, contractile SMC to the proliferative SMC. We performed serial analysis of gene expression (SAGE) of

  3. The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    Chen Yidong

    2004-01-01

    Full Text Available An unsupervised data clustering method, called the local maximum clustering (LMC method, is proposed for identifying clusters in experiment data sets based on research interest. A magnitude property is defined according to research purposes, and data sets are clustered around each local maximum of the magnitude property. By properly defining a magnitude property, this method can overcome many difficulties in microarray data clustering such as reduced projection in similarities, noises, and arbitrary gene distribution. To critically evaluate the performance of this clustering method in comparison with other methods, we designed three model data sets with known cluster distributions and applied the LMC method as well as the hierarchic clustering method, the -mean clustering method, and the self-organized map method to these model data sets. The results show that the LMC method produces the most accurate clustering results. As an example of application, we applied the method to cluster the leukemia samples reported in the microarray study of Golub et al. (1999.

  4. Global analysis of biosynthetic gene clusters reveals vast potential of secondary metabolite production in Penicillium species

    DEFF Research Database (Denmark)

    Nielsen, Jens Christian; Grijseels, Sietske; Prigent, Sylvain

    2017-01-01

    Filamentous fungi produce a wide range of bioactive compounds with important pharmaceutical applications, such as antibiotic penicillins and cholesterol-lowering statins. However, less attention has been paid to fungal secondary metabolites compared to those from bacteria. In this study, we...... sequenced the genomes of 9 Penicillium species and, together with 15 published genomes, we investigated the secondary metabolism of Penicillium and identified an immense, unexploited potential for producing secondary metabolites by this genus. A total of 1,317 putative biosynthetic gene clusters (BGCs) were......-referenced the predicted pathways with published data on the production of secondary metabolites and experimentally validated the production of antibiotic yanuthones in Penicillia and identified a previously undescribed compound from the yanuthone pathway. This study is the first genus-wide analysis of the genomic...

  5. Comprehensive cluster analysis with Transitivity Clustering.

    Science.gov (United States)

    Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan

    2011-03-01

    Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

  6. plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters

    DEFF Research Database (Denmark)

    Kautsar, Satria A.; Suarez Duran, Hernando G.; Blin, Kai

    2017-01-01

    exploration of the nature and dynamics of gene clustering in plant metabolism. Moreover, spurred by the continuing decrease in costs of plant genome sequencing, they will allow genome mining technologies to be applied to plant natural product discovery. The plantiSMASH web server, precalculated results...

  7. Pichia stipitis genomics, transcriptomics, and gene clusters

    Science.gov (United States)

    Thomas W. Jeffries; Jennifer R. Headman Van Vleet

    2009-01-01

    Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the...

  8. Gene co-expression analysis identifies gene clusters associated with isotropic and polarized growth in Aspergillus fumigatus conidia.

    Science.gov (United States)

    Baltussen, Tim J H; Coolen, Jordy P M; Zoll, Jan; Verweij, Paul E; Melchers, Willem J G

    2018-04-26

    Aspergillus fumigatus is a saprophytic fungus that extensively produces conidia. These microscopic asexually reproductive structures are small enough to reach the lungs. Germination of conidia followed by hyphal growth inside human lungs is a key step in the establishment of infection in immunocompromised patients. RNA-Seq was used to analyze the transcriptome of dormant and germinating A. fumigatus conidia. Construction of a gene co-expression network revealed four gene clusters (modules) correlated with a growth phase (dormant, isotropic growth, polarized growth). Transcripts levels of genes encoding for secondary metabolites were high in dormant conidia. During isotropic growth, transcript levels of genes involved in cell wall modifications increased. Two modules encoding for growth and cell cycle/DNA processing were associated with polarized growth. In addition, the co-expression network was used to identify highly connected intermodular hub genes. These genes may have a pivotal role in the respective module and could therefore be compelling therapeutic targets. Generally, cell wall remodeling is an important process during isotropic and polarized growth, characterized by an increase of transcripts coding for hyphal growth and cell cycle/DNA processing when polarized growth is initiated. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  9. Sequencing and Transcriptional Analysis of the Biosynthesis Gene Cluster of Putrescine-Producing Lactococcus lactis ▿ †

    Science.gov (United States)

    Ladero, Victor; Rattray, Fergal P.; Mayo, Baltasar; Martín, María Cruz; Fernández, María; Alvarez, Miguel A.

    2011-01-01

    Lactococcus lactis is a prokaryotic microorganism with great importance as a culture starter and has become the model species among the lactic acid bacteria. The long and safe history of use of L. lactis in dairy fermentations has resulted in the classification of this species as GRAS (General Regarded As Safe) or QPS (Qualified Presumption of Safety). However, our group has identified several strains of L. lactis subsp. lactis and L. lactis subsp. cremoris that are able to produce putrescine from agmatine via the agmatine deiminase (AGDI) pathway. Putrescine is a biogenic amine that confers undesirable flavor characteristics and may even have toxic effects. The AGDI cluster of L. lactis is composed of a putative regulatory gene, aguR, followed by the genes (aguB, aguD, aguA, and aguC) encoding the catabolic enzymes. These genes are transcribed as an operon that is induced in the presence of agmatine. In some strains, an insertion (IS) element interrupts the transcription of the cluster, which results in a non-putrescine-producing phenotype. Based on this knowledge, a PCR-based test was developed in order to differentiate nonproducing L. lactis strains from those with a functional AGDI cluster. The analysis of the AGDI cluster and their flanking regions revealed that the capacity to produce putrescine via the AGDI pathway could be a specific characteristic that was lost during the adaptation to the milk environment by a process of reductive genome evolution. PMID:21803900

  10. Human microRNA target analysis and gene ontology clustering by GOmir, a novel stand-alone application.

    Science.gov (United States)

    Roubelakis, Maria G; Zotos, Pantelis; Papachristoudis, Georgios; Michalopoulos, Ioannis; Pappa, Kalliopi I; Anagnou, Nicholas P; Kossida, Sophia

    2009-06-16

    microRNAs (miRNAs) are single-stranded RNA molecules of about 20-23 nucleotides length found in a wide variety of organisms. miRNAs regulate gene expression, by interacting with target mRNAs at specific sites in order to induce cleavage of the message or inhibit translation. Predicting or verifying mRNA targets of specific miRNAs is a difficult process of great importance. GOmir is a novel stand-alone application consisting of two separate tools: JTarget and TAGGO. JTarget integrates miRNA target prediction and functional analysis by combining the predicted target genes from TargetScan, miRanda, RNAhybrid and PicTar computational tools as well as the experimentally supported targets from TarBase and also providing a full gene description and functional analysis for each target gene. On the other hand, TAGGO application is designed to automatically group gene ontology annotations, taking advantage of the Gene Ontology (GO), in order to extract the main attributes of sets of proteins. GOmir represents a new tool incorporating two separate Java applications integrated into one stand-alone Java application. GOmir (by using up to five different databases) introduces miRNA predicted targets accompanied by (a) full gene description, (b) functional analysis and (c) detailed gene ontology clustering. Additionally, a reverse search initiated by a potential target can also be conducted. GOmir can freely be downloaded BRFAA.

  11. Functional Analysis of the Chaperone-Usher Fimbrial Gene Clusters of Salmonella enterica serovar Typhi.

    Science.gov (United States)

    Dufresne, Karine; Saulnier-Bellemare, Julie; Daigle, France

    2018-01-01

    The human-specific pathogen Salmonella enterica serovar Typhi causes typhoid, a major public health issue in developing countries. Several aspects of its pathogenesis are still poorly understood. S . Typhi possesses 14 fimbrial gene clusters including 12 chaperone-usher fimbriae ( stg, sth, bcf , fim, saf , sef , sta, stb, stc, std, ste , and tcf ). These fimbriae are weakly expressed in laboratory conditions and only a few are actually characterized. In this study, expression of all S . Typhi chaperone-usher fimbriae and their potential roles in pathogenesis such as interaction with host cells, motility, or biofilm formation were assessed. All S . Typhi fimbriae were better expressed in minimal broth. Each system was overexpressed and only the fimbrial gene clusters without pseudogenes demonstrated a putative major subunits of about 17 kDa on SDS-PAGE. Six of these (Fim, Saf, Sta, Stb, Std, and Tcf) also show extracellular structure by electron microscopy. The impact of fimbrial deletion in a wild-type strain or addition of each individual fimbrial system to an S . Typhi afimbrial strain were tested for interactions with host cells, biofilm formation and motility. Several fimbriae modified bacterial interactions with human cells (THP-1 and INT-407) and biofilm formation. However, only Fim fimbriae had a deleterious effect on motility when overexpressed. Overall, chaperone-usher fimbriae seem to be an important part of the balance between the different steps (motility, adhesion, host invasion and persistence) of S . Typhi pathogenesis.

  12. Integrative cluster analysis in bioinformatics

    CERN Document Server

    Abu-Jamous, Basel; Nandi, Asoke K

    2015-01-01

    Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review o

  13. Identification, characterization and metagenome analysis of oocyte-specific genes organized in clusters in the mouse genome

    Directory of Open Access Journals (Sweden)

    Vaiman Daniel

    2005-05-01

    Full Text Available Abstract Background Genes specifically expressed in the oocyte play key roles in oogenesis, ovarian folliculogenesis, fertilization and/or early embryonic development. In an attempt to identify novel oocyte-specific genes in the mouse, we have used an in silico subtraction methodology, and we have focused our attention on genes that are organized in genomic clusters. Results In the present work, five clusters have been studied: a cluster of thirteen genes characterized by an F-box domain localized on chromosome 9, a cluster of six genes related to T-cell leukaemia/lymphoma protein 1 (Tcl1 on chromosome 12, a cluster composed of a SPErm-associated glutamate (E-Rich (Speer protein expressed in the oocyte in the vicinity of four unknown genes specifically expressed in the testis on chromosome 14, a cluster composed of the oocyte secreted protein-1 (Oosp-1 gene and two Oosp-related genes on chromosome 19, all three being characterized by a partial N-terminal zona pellucida-like domain, and another small cluster of two genes on chromosome 19 as well, composed of a TWIK-Related spinal cord K+ channel encoding-gene, and an unknown gene predicted in silico to be testis-specific. The specificity of expression was confirmed by RT-PCR and in situ hybridization for eight and five of them, respectively. Finally, we showed by comparing all of the isolated and clustered oocyte-specific genes identified so far in the mouse genome, that the oocyte-specific clusters are significantly closer to telomeres than isolated oocyte-specific genes are. Conclusion We have studied five clusters of genes specifically expressed in female, some of them being also expressed in male germ-cells. Moreover, contrarily to non-clustered oocyte-specific genes, those that are organized in clusters tend to map near chromosome ends, suggesting that this specific near-telomere position of oocyte-clusters in rodents could constitute an evolutionary advantage. Understanding the biological

  14. Analysis of healthy cohorts for single nucleotide polymorphisms in C1q gene cluster

    Directory of Open Access Journals (Sweden)

    MARIA A. RADANOVA

    2015-12-01

    Full Text Available C1q is the first component of the classical pathway of complement activation. The coding region for C1q is localized on chromosome 1p34.1–36.3. Mutations or single nucleotide polymorphisms (SNPs in C1q gene cluster can cause developing of Systemic lupus erythematosus (SLE because of C1q deficiency or other unknown reason. We selected five SNPs located in 7.121 kbp region on chromosome 1, which were previously associated with SLE and/or low C1q level, but not causing C1q deficiency and analyzed them in terms of allele frequencies and genotype distribution in comparison with Hispanic, Asian, African and other Caucasian cohorts. These SNPs were: rs587585, rs292001, rs172378, rs294179 and rs631090. One hundred eighty five healthy Bulgarian volunteers were genotyped for the selected five C1q SNPs by quantative real-time PCR methods. International HapMap Project has been used for information about genotype distribution and allele frequencies of the five SNPs in, Hispanics, Asians, Africans and others Caucasian cohorts. Bulgarian healthy volunteers and another pooled Caucasian cohort had similar frequencies of genotypes and alleles of rs587585, rs292001, rs294179 and rs631090 SNPs. Nevertheless, genotype AA of rs172378 was significantly overrepresented in Bulgarians when compared to other healthy Caucasians from USA and UK (60% vs 31%. Genotype distribution of rs172378 in Bulgarians was similar to Greek-Cyriot Caucasians. For all Caucasians the major allele of rs172378 was A. This is the first study analyzing the allele frequencies and genotype distribution of C1q gene cluster SNPs in Bulgarian healthy population.

  15. Cluster analysis for applications

    CERN Document Server

    Anderberg, Michael R

    1973-01-01

    Cluster Analysis for Applications deals with methods and various applications of cluster analysis. Topics covered range from variables and scales to measures of association among variables and among data units. Conceptual problems in cluster analysis are discussed, along with hierarchical and non-hierarchical clustering methods. The necessary elements of data analysis, statistics, cluster analysis, and computer implementation are integrated vertically to cover the complete path from raw data to a finished analysis.Comprised of 10 chapters, this book begins with an introduction to the subject o

  16. Identification and analysis of the paulomycin biosynthetic gene cluster and titer improvement of the paulomycins in Streptomyces paulus NRRL 8115.

    Directory of Open Access Journals (Sweden)

    Jine Li

    Full Text Available The paulomycins are a group of glycosylated compounds featuring a unique paulic acid moiety. To locate their biosynthetic gene clusters, the genomes of two paulomycin producers, Streptomyces paulus NRRL 8115 and Streptomyces sp. YN86, were sequenced. The paulomycin biosynthetic gene clusters were defined by comparative analyses of the two genomes together with the genome of the third paulomycin producer Streptomyces albus J1074. Subsequently, the identity of the paulomycin biosynthetic gene cluster was confirmed by inactivation of two genes involved in biosynthesis of the paulomycose branched chain (pau11 and the ring A moiety (pau18 in Streptomyces paulus NRRL 8115. After determining the gene cluster boundaries, a convergent biosynthetic model was proposed for paulomycin based on the deduced functions of the pau genes. Finally, a paulomycin high-producing strain was constructed by expressing an activator-encoding gene (pau13 in S. paulus, setting the stage for future investigations.

  17. Transcriptional analysis of the jamaicamide gene cluster from the marine cyanobacterium Lyngbya majuscula and identification of possible regulatory proteins

    Directory of Open Access Journals (Sweden)

    Dorrestein Pieter C

    2009-12-01

    Full Text Available Abstract Background The marine cyanobacterium Lyngbya majuscula is a prolific producer of bioactive secondary metabolites. Although biosynthetic gene clusters encoding several of these compounds have been identified, little is known about how these clusters of genes are transcribed or regulated, and techniques targeting genetic manipulation in Lyngbya strains have not yet been developed. We conducted transcriptional analyses of the jamaicamide gene cluster from a Jamaican strain of Lyngbya majuscula, and isolated proteins that could be involved in jamaicamide regulation. Results An unusually long untranslated leader region of approximately 840 bp is located between the jamaicamide transcription start site (TSS and gene cluster start codon. All of the intergenic regions between the pathway ORFs were transcribed into RNA in RT-PCR experiments; however, a promoter prediction program indicated the possible presence of promoters in multiple intergenic regions. Because the functionality of these promoters could not be verified in vivo, we used a reporter gene assay in E. coli to show that several of these intergenic regions, as well as the primary promoter preceding the TSS, are capable of driving β-galactosidase production. A protein pulldown assay was also used to isolate proteins that may regulate the jamaicamide pathway. Pulldown experiments using the intergenic region upstream of jamA as a DNA probe isolated two proteins that were identified by LC-MS/MS. By BLAST analysis, one of these had close sequence identity to a regulatory protein in another cyanobacterial species. Protein comparisons suggest a possible correlation between secondary metabolism regulation and light dependent complementary chromatic adaptation. Electromobility shift assays were used to evaluate binding of the recombinant proteins to the jamaicamide promoter region. Conclusion Insights into natural product regulation in cyanobacteria are of significant value to drug discovery

  18. Marketing research cluster analysis

    Directory of Open Access Journals (Sweden)

    Marić Nebojša

    2002-01-01

    Full Text Available One area of applications of cluster analysis in marketing is identification of groups of cities and towns with similar demographic profiles. This paper considers main aspects of cluster analysis by an example of clustering 12 cities with the use of Minitab software.

  19. Marketing research cluster analysis

    OpenAIRE

    Marić Nebojša

    2002-01-01

    One area of applications of cluster analysis in marketing is identification of groups of cities and towns with similar demographic profiles. This paper considers main aspects of cluster analysis by an example of clustering 12 cities with the use of Minitab software.

  20. Interleukin‑1 gene cluster variants in hemodialysis patients with end stage renal disease: An association and meta‑analysis

    Directory of Open Access Journals (Sweden)

    G Tripathi

    2015-01-01

    Full Text Available We evaluated whether polymorphisms in interleukin (IL-1 gene cluster (IL-1 alpha [IL-1A], IL-1 beta [IL-1B], and IL-1 receptor antagonist [IL-1RN] are associated with end stage renal disease (ESRD. A total of 258 ESRD patients and 569 ethnicity matched controls were examined for IL-1 gene cluster. These were genotyped for five single-nucleotide gene polymorphisms in the IL-1A, IL-1B and IL-1RN genes and a variable number of tandem repeats (VNTR in the IL-1RN. The IL-1B − 3953 and IL-1RN + 8006 polymorphism frequencies were significantly different between the two groups. At IL-1B, the T allele of − 3953C/T was increased among ESRD (P = 0.0001. A logistic regression model demonstrated that two repeat (240 base pair [bp] of the IL-1Ra VNTR polymorphism was associated with ESRD (P = 0.0001. The C/C/C/C/C/1 haplotype was more prevalent in ESRD = 0.007. No linkage disequilibrium (LD was observed between six loci of IL-1 gene. We further conducted a meta-analysis of existing studies and found that there is a strong association of IL-1 RN VNTR 86 bp repeat polymorphism with susceptibility to ESRD (odds ratio = 2.04, 95% confidence interval = 1.48-2.82; P = 0.000. IL-1B − 5887, +8006 and the IL-1RN VNTR polymorphisms have been implicated as potential risk factors for ESRD. The meta-analysis showed a strong association of IL-1RN 86 bp VNTR polymorphism with susceptibility to ESRD.

  1. Association of Interleukin-1 gene clusters polymorphisms with primary open-angle glaucoma: a meta-analysis.

    Science.gov (United States)

    Li, Junhua; Feng, Yifan; Sung, Mi Sun; Lee, Tae Hee; Park, Sang Woo

    2017-11-28

    Previous studies have associated the Interleukin-1 (IL-1) gene clusters polymorphisms with the risk of primary open-angle glaucoma (POAG). However, the results were not consistent. Here, we performed a meta-analysis to evaluate the role of IL-1 gene clusters polymorphisms in POAG susceptibility. PubMed, EMBASE and Cochrane Library (up to July 15, 2017) were searched by two independent investigators. All case-control studies investigating the association between single-nucleotide polymorphisms (SNPs) of IL-1 gene clusters and POAG risk were included. Odds ratios (ORs) with 95% confidence intervals (CIs) were calculated for quantifying the strength of association that has been involved in at least two studies. Five studies on IL-1β rs16944 (c. -511C > T) (1053 cases and 986 controls), 4 studies on IL-1α rs1800587 (c. -889C > T) (822 cases and 714 controls), and 4 studies on IL-1β rs1143634 (c. +3953C > T) (798 cases and 730 controls) were included. The results suggest that all three SNPs were not associated with POAG risk. Stratification analyses indicated that the rs1143634 has a suggestive associated with high tension glaucoma (HTG) under dominant (P = 0.03), heterozygote (P = 0.04) and allelic models (P = 0.02), however, the weak association was nullified after Bonferroni adjustments for multiple tests. Based on current meta-analysis, we indicated that there is lack of association between the three SNPs of IL-1 and POAG. However, this conclusion should be interpreted with caution and further well designed studies with large sample-size are required to validate the conclusion as low statistical powers.

  2. Identification and functional analysis of the gene cluster for fructan utilization in Prevotella intermedia.

    Science.gov (United States)

    Fuse, Haruka; Fukamachi, Haruka; Inoue, Mitsuko; Igarashi, Takeshi

    2013-02-25

    Fructanase enzymes hydrolyze the β-2,6 and β-2,1 linkages of levan and inulin fructans, respectively. We analyzed the influence of fructan on the growth of Prevotella intermedia. The growth of P. intermedia was enhanced by addition of inulin, implying that P. intermedia could also use inulin. Based on this finding, we identified and analyzed the genes encoding a putative fructanase (FruA), sugar transporter (FruB), and fructokinase (FruK) in the genome of strain ATCC25611. Transcript analysis by RT-PCR showed that the fruABK genes were co-transcribed as a single mRNA and semi-quantitative analysis confirmed that the fruA gene was induced in response to fructose and inulin. Recombinant FruA and FruK were purified and characterized biochemically. FruA strongly hydrolyzed inulin, with slight degradation of levan via an exo-type mechanism, revealing that FruA is an exo-β-d-fructanase. FruK converted fructose to fructose-6-phosphate in the presence of ATP, confirming that FruK is an ATP-dependent fructokinase. These results suggest that P. intermedia can utilize fructan as a carbon source for growth, and that the fructanase, sugar transporter, and fructokinase proteins we identified are involved in this fructan utilization. Copyright © 2012 Elsevier B.V. All rights reserved.

  3. Persistence drives gene clustering in bacterial genomes

    Directory of Open Access Journals (Sweden)

    Rocha Eduardo PC

    2008-01-01

    Full Text Available Abstract Background Gene clustering plays an important role in the organization of the bacterial chromosome and several mechanisms have been proposed to explain its extent. However, the controversies raised about the validity of each of these mechanisms remind us that the cause of this gene organization remains an open question. Models proposed to explain clustering did not take into account the function of the gene products nor the likely presence or absence of a given gene in a genome. However, genomes harbor two very different categories of genes: those genes present in a majority of organisms – persistent genes – and those present in very few organisms – rare genes. Results We show that two classes of genes are significantly clustered in bacterial genomes: the highly persistent and the rare genes. The clustering of rare genes is readily explained by the selfish operon theory. Yet, genes persistently present in bacterial genomes are also clustered and we try to understand why. We propose a model accounting specifically for such clustering, and show that indispensability in a genome with frequent gene deletion and insertion leads to the transient clustering of these genes. The model describes how clusters are created via the gene flux that continuously introduces new genes while deleting others. We then test if known selective processes, such as co-transcription, physical interaction or functional neighborhood, account for the stabilization of these clusters. Conclusion We show that the strong selective pressure acting on the function of persistent genes, in a permanent state of flux of genes in bacterial genomes, maintaining their size fairly constant, that drives persistent genes clustering. A further selective stabilization process might contribute to maintaining the clustering.

  4. Diametrical clustering for identifying anti-correlated gene clusters.

    Science.gov (United States)

    Dhillon, Inderjit S; Marcotte, Edward M; Roshan, Usman

    2003-09-01

    Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive-genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i). re-partitioning the genes and (ii). computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a 'diametric' cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.

  5. [Gene deletion and functional analysis of the heptyl glycosyltransferase (waaF) gene in Vibrio parahemolyticus O-antigen cluster].

    Science.gov (United States)

    Zhao, Feng; Meng, Songsong; Zhou, Deqing

    2016-02-04

    To construct heptyl glycosyltransferase gene II (waaF) gene deletion mutant of Vibrio parahaemolyticus, and explore the function of the waaF gene in Vibrio parahaemolyticus. The waaF gene deletion mutant was constructed by chitin-based transformation technology using clinical isolates, and then the growth rate, morphology and serotypes were identified. The different sources (O3, O5 and O10) waaF gene complementations were constructed through E. coli S17λpir strains conjugative transferring with Vibrio parahaemolyticus, and the function of the waaF gene was further verified by serotypes. The waaF gene deletion mutant strain was successfully constructed and it grew normally. The growth rate and morphology of mutant were similar with the wild type strains (WT), but the mutant could not occurred agglutination reaction with O antisera. The O3 and O5 sources waaF gene complementations occurred agglutination reaction with O antisera, but the O10 sources waaF gene complementations was not. The waaF gene was related with O-antigen synthesis and it was the key gene of O-antigen synthesis pathway in Vibrio parahaemolyticus. The function of different sources waaF gene were not the same.

  6. Fast gene ontology based clustering for microarray experiments.

    Science.gov (United States)

    Ovaska, Kristian; Laakso, Marko; Hautaniemi, Sampsa

    2008-11-21

    Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.

  7. IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes.

    Science.gov (United States)

    Hadjithomas, Michalis; Chen, I-Min A; Chu, Ken; Huang, Jinghua; Ratner, Anna; Palaniappan, Krishna; Andersen, Evan; Markowitz, Victor; Kyrpides, Nikos C; Ivanova, Natalia N

    2017-01-04

    Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic gene clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Genome-Wide Analysis of Secondary Metabolite Gene Clusters in Ophiostoma ulmi and Ophiostoma novo-ulmi Reveals a Fujikurin-Like Gene Cluster with a Putative Role in Infection

    Directory of Open Access Journals (Sweden)

    Nicolau Sbaraini

    2017-06-01

    Full Text Available The emergence of new microbial pathogens can result in destructive outbreaks, since their hosts have limited resistance and pathogens may be excessively aggressive. Described as the major ecological incident of the twentieth century, Dutch elm disease, caused by ascomycete fungi from the Ophiostoma genus, has caused a significant decline in elm tree populations (Ulmus sp. in North America and Europe. Genome sequencing of the two main causative agents of Dutch elm disease (Ophiostoma ulmi and Ophiostoma novo-ulmi, along with closely related species with different lifestyles, allows for unique comparisons to be made to identify how pathogens and virulence determinants have emerged. Among several established virulence determinants, secondary metabolites (SMs have been suggested to play significant roles during phytopathogen infection. Interestingly, the secondary metabolism of Dutch elm pathogens remains almost unexplored, and little is known about how SM biosynthetic genes are organized in these species. To better understand the metabolic potential of O. ulmi and O. novo-ulmi, we performed a deep survey and description of SM biosynthetic gene clusters (BGCs in these species and assessed their conservation among eight species from the Ophiostomataceae family. Among 19 identified BGCs, a fujikurin-like gene cluster (OpPKS8 was unique to Dutch elm pathogens. Phylogenetic analysis revealed that orthologs for this gene cluster are widespread among phytopathogens and plant-associated fungi, suggesting that OpPKS8 may have been horizontally acquired by the Ophiostoma genus. Moreover, the detailed identification of several BGCs paves the way for future in-depth research and supports the potential impact of secondary metabolism on Ophiostoma genus’ lifestyle.

  9. Characterization and transcriptional analysis of two gene clusters for type IV secretion machinery in Wolbachia of Armadillidium vulgare

    DEFF Research Database (Denmark)

    Félix, Christine; Pichon, Samuel; Braquart-Varnier, Christine

    2008-01-01

    Wolbachia are maternally inherited alpha-proteobacteria that induce feminization of genetic males in most terrestrial crustacean isopods. Two clusters of vir genes for a type IV secretion machinery have been identified at two separate loci and characterized for the first time in a feminizing Wolb...

  10. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Science.gov (United States)

    Noar, Roslyn D; Daub, Margaret E

    2016-01-01

    Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity) for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity) to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that they may encode

  11. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Directory of Open Access Journals (Sweden)

    Roslyn D Noar

    Full Text Available Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that

  12. In silico analysis highlights the frequency and diversity of type 1 lantibiotic gene clusters in genome sequenced bacteria

    LENUS (Irish Health Repository)

    Marsh, Alan J

    2010-11-30

    Abstract Background Lantibiotics are lanthionine-containing, post-translationally modified antimicrobial peptides. These peptides have significant, but largely untapped, potential as preservatives and chemotherapeutic agents. Type 1 lantibiotics are those in which lanthionine residues are introduced into the structural peptide (LanA) through the activity of separate lanthionine dehydratase (LanB) and lanthionine synthetase (LanC) enzymes. Here we take advantage of the conserved nature of LanC enzymes to devise an in silico approach to identify potential lantibiotic-encoding gene clusters in genome sequenced bacteria. Results In total 49 novel type 1 lantibiotic clusters were identified which unexpectedly were associated with species, genera and even phyla of bacteria which have not previously been associated with lantibiotic production. Conclusions Multiple type 1 lantibiotic gene clusters were identified at a frequency that suggests that these antimicrobials are much more widespread than previously thought. These clusters represent a rich repository which can yield a large number of valuable novel antimicrobials and biosynthetic enzymes.

  13. Conditions for the evolution of gene clusters in bacterial genomes.

    Directory of Open Access Journals (Sweden)

    Sara Ballouz

    2010-02-01

    Full Text Available Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model, genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters.

  14. Conditions for the Evolution of Gene Clusters in Bacterial Genomes

    Science.gov (United States)

    Ballouz, Sara; Francis, Andrew R.; Lan, Ruiting; Tanaka, Mark M.

    2010-01-01

    Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model), genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters. PMID:20168992

  15. Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

    Directory of Open Access Journals (Sweden)

    Olszewski Kellen L

    2007-07-01

    Full Text Available Abstract Background The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes. Results We developed Nearest Neighbor Networks (NNN, a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. Conclusion The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the

  16. A scale invariant clustering of genes on human chromosome 7

    Directory of Open Access Journals (Sweden)

    Kendal Wayne S

    2004-01-01

    Full Text Available Abstract Background Vertebrate genes often appear to cluster within the background of nontranscribed genomic DNA. Here an analysis of the physical distribution of gene structures on human chromosome 7 was performed to confirm the presence of clustering, and to elucidate possible underlying statistical and biological mechanisms. Results Clustering of genes was confirmed by virtue of a variance of the number of genes per unit physical length that exceeded the respective mean. Further evidence for clustering came from a power function relationship between the variance and mean that possessed an exponent of 1.51. This power function implied that the spatial distribution of genes on chromosome 7 was scale invariant, and that the underlying statistical distribution had a Poisson-gamma (PG form. A PG distribution for the spatial scattering of genes was validated by stringent comparisons of both the predicted variance to mean power function and its cumulative distribution function to data derived from chromosome 7. Conclusion The PG distribution was consistent with at least two different biological models: In the microrearrangement model, the number of genes per unit length of chromosome represented the contribution of a random number of smaller chromosomal segments that had originated by random breakage and reconstruction of more primitive chromosomes. Each of these smaller segments would have necessarily contained (on average a gamma distributed number of genes. In the gene cluster model, genes would be scattered randomly to begin with. Over evolutionary timescales, tandem duplication, mutation, insertion, deletion and rearrangement could act at these gene sites through a stochastic birth death and immigration process to yield a PG distribution. On the basis of the gene position data alone it was not possible to identify the biological model which best explained the observed clustering. However, the underlying PG statistical model implicated neutral

  17. Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

    Science.gov (United States)

    Dai, Zhimin; Guo, Xue; Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

    2014-01-01

    Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.

  18. Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

    Directory of Open Access Journals (Sweden)

    Zhimin Dai

    Full Text Available Biological nitrogen fixation is an essential function of acid mine drainage (AMD microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.

  19. Identification of Nitrogen-Fixing Genes and Gene Clusters from Metagenomic Library of Acid Mine Drainage

    Science.gov (United States)

    Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

    2014-01-01

    Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community. PMID:24498417

  20. Origin and distribution of epipolythiodioxopiperazine (ETP gene clusters in filamentous ascomycetes

    Directory of Open Access Journals (Sweden)

    Gardiner Donald M

    2007-09-01

    Full Text Available Abstract Background Genes responsible for biosynthesis of fungal secondary metabolites are usually tightly clustered in the genome and co-regulated with metabolite production. Epipolythiodioxopiperazines (ETPs are a class of secondary metabolite toxins produced by disparate ascomycete fungi and implicated in several animal and plant diseases. Gene clusters responsible for their production have previously been defined in only two fungi. Fungal genome sequence data have been surveyed for the presence of putative ETP clusters and cluster data have been generated from several fungal taxa where genome sequences are not available. Phylogenetic analysis of cluster genes has been used to investigate the assembly and heredity of these gene clusters. Results Putative ETP gene clusters are present in 14 ascomycete taxa, but absent in numerous other ascomycetes examined. These clusters are discontinuously distributed in ascomycete lineages. Gene content is not absolutely fixed, however, common genes are identified and phylogenies of six of these are separately inferred. In each phylogeny almost all cluster genes form monophyletic clades with non-cluster fungal paralogues being the nearest outgroups. This relatedness of cluster genes suggests that a progenitor ETP gene cluster assembled within an ancestral taxon. Within each of the cluster clades, the cluster genes group together in consistent subclades, however, these relationships do not always reflect the phylogeny of ascomycetes. Micro-synteny of several of the genes within the clusters provides further support for these subclades. Conclusion ETP gene clusters appear to have a single origin and have been inherited relatively intact rather than assembling independently in the different ascomycete lineages. This progenitor cluster has given rise to a small number of distinct phylogenetic classes of clusters that are represented in a discontinuous pattern throughout ascomycetes. The disjunct heredity of

  1. [Cluster analysis in biomedical researches].

    Science.gov (United States)

    Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

    2013-01-01

    Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.

  2. Characterization of the largest effector gene cluster of Ustilago maydis.

    Directory of Open Access Journals (Sweden)

    Thomas Brefort

    2014-07-01

    Full Text Available In the genome of the biotrophic plant pathogen Ustilago maydis, many of the genes coding for secreted protein effectors modulating virulence are arranged in gene clusters. The vast majority of these genes encode novel proteins whose expression is coupled to plant colonization. The largest of these gene clusters, cluster 19A, encodes 24 secreted effectors. Deletion of the entire cluster results in severe attenuation of virulence. Here we present the functional analysis of this genomic region. We show that a 19A deletion mutant behaves like an endophyte, i.e. is still able to colonize plants and complete the infection cycle. However, tumors, the most conspicuous symptoms of maize smut disease, are only rarely formed and fungal biomass in infected tissue is significantly reduced. The generation and analysis of strains carrying sub-deletions identified several genes significantly contributing to tumor formation after seedling infection. Another of the effectors could be linked specifically to anthocyanin induction in the infected tissue. As the individual contributions of these genes to tumor formation were small, we studied the response of maize plants to the whole cluster mutant as well as to several individual mutants by array analysis. This revealed distinct plant responses, demonstrating that the respective effectors have discrete plant targets. We propose that the analysis of plant responses to effector mutant strains that lack a strong virulence phenotype may be a general way to visualize differences in effector function.

  3. Genomewide Analysis of Aryl Hydrocarbon Receptor Binding Targets Reveals an Extensive Array of Gene Clusters that Control Morphogenetic and Developmental Programs

    Science.gov (United States)

    Sartor, Maureen A.; Schnekenburger, Michael; Marlowe, Jennifer L.; Reichard, John F.; Wang, Ying; Fan, Yunxia; Ma, Ci; Karyala, Saikumar; Halbleib, Danielle; Liu, Xiangdong; Medvedovic, Mario; Puga, Alvaro

    2009-01-01

    Background The vertebrate aryl hydrocarbon receptor (AHR) is a ligand-activated transcription factor that regulates cellular responses to environmental polycyclic and halogenated compounds. The naive receptor is believed to reside in an inactive cytosolic complex that translocates to the nucleus and induces transcription of xenobiotic detoxification genes after activation by ligand. Objectives We conducted an integrative genomewide analysis of AHR gene targets in mouse hepatoma cells and determined whether AHR regulatory functions may take place in the absence of an exogenous ligand. Methods The network of AHR-binding targets in the mouse genome was mapped through a multipronged approach involving chromatin immunoprecipitation/chip and global gene expression signatures. The findings were integrated into a prior functional knowledge base from Gene Ontology, interaction networks, Kyoto Encyclopedia of Genes and Genomes pathways, sequence motif analysis, and literature molecular concepts. Results We found the naive receptor in unstimulated cells bound to an extensive array of gene clusters with functions in regulation of gene expression, differentiation, and pattern specification, connecting multiple morphogenetic and developmental programs. Activation by the ligand displaced the receptor from some of these targets toward sites in the promoters of xenobiotic metabolism genes. Conclusions The vertebrate AHR appears to possess unsuspected regulatory functions that may be potential targets of environmental injury. PMID:19654925

  4. Hox gene clusters in the Indonesian coelacanth, Latimeria menadoensis

    Science.gov (United States)

    Koh, Esther G. L.; Lam, Kevin; Christoffels, Alan; Erdmann, Mark V.; Brenner, Sydney; Venkatesh, Byrappa

    2003-01-01

    The Hox genes encode transcription factors that play a key role in specifying body plans of metazoans. They are organized into clusters that contain up to 13 paralogue group members. The complex morphology of vertebrates has been attributed to the duplication of Hox clusters during vertebrate evolution. In contrast to the single Hox cluster in the amphioxus (Branchiostoma floridae), an invertebrate-chordate, mammals have four clusters containing 39 Hox genes. Ray-finned fishes (Actinopterygii) such as zebrafish and fugu possess more than four Hox clusters. The coelacanth occupies a basal phylogenetic position among lobe-finned fishes (Sarcopterygii), which gave rise to the tetrapod lineage. The lobe fins of sarcopterygians are considered to be the evolutionary precursors of tetrapod limbs. Thus, the characterization of Hox genes in the coelacanth should provide insights into the origin of tetrapod limbs. We have cloned the complete second exon of 33 Hox genes from the Indonesian coelacanth, Latimeria menadoensis, by extensive PCR survey and genome walking. Phylogenetic analysis shows that 32 of these genes have orthologs in the four mammalian HOX clusters, including three genes (HoxA6, D1, and D8) that are absent in ray-finned fishes. The remaining coelacanth gene is an ortholog of hoxc1 found in zebrafish but absent in mammals. Our results suggest that coelacanths have four Hox clusters bearing a gene complement more similar to mammals than to ray-finned fishes, but with an additional gene, HoxC1, which has been lost during the evolution of mammals from lobe-finned fishes. PMID:12547909

  5. Archaeal Clusters of Orthologous Genes (arCOGs): An Update and Application for Analysis of Shared Features between Thermococcales, Methanococcales, and Methanobacteriales

    OpenAIRE

    Makarova, Kira; Wolf, Yuri; Koonin, Eugene

    2015-01-01

    With the continuously accelerating genome sequencing from diverse groups of archaea and bacteria, accurate identification of gene orthology and availability of readily expandable clusters of orthologous genes are essential for the functional annotation of new genomes. We report an update of the collection of archaeal Clusters of Orthologous Genes (arCOGs) to cover, on average, 91% of the protein-coding genes in 168 archaeal genomes. The new arCOGs were constructed using refined algorithms for...

  6. Fast Gene Ontology based clustering for microarray experiments

    Directory of Open Access Journals (Sweden)

    Ovaska Kristian

    2008-11-01

    Full Text Available Abstract Background Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. Results We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Conclusion Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.

  7. Identification and Analysis of a Novel Gene Cluster Involves in Fe2+ Oxidation in Acidithiobacillus ferrooxidans ATCC 23270, a Typical Biomining Acidophile.

    Science.gov (United States)

    Ai, Chenbing; Liang, Yuting; Miao, Bo; Chen, Miao; Zeng, Weimin; Qiu, Guanzhou

    2018-07-01

    Iron-oxidizing Acidithiobacillus spp. are applied worldwide in biomining industry to extract metals from sulfide minerals. They derive energy for survival through Fe 2+ oxidation and generate Fe 3+ for the dissolution of sulfide minerals. However, molecular mechanisms of their iron oxidation still remain elusive. A novel two-cytochrome-encoding gene cluster (named tce gene cluster) encoding a high-molecular-weight cytochrome c (AFE_1428) and a c 4 -type cytochrome c 552 (AFE_1429) in A. ferrooxidans ATCC 23270 was first identified in this study. Bioinformatic analysis together with transcriptional study showed that AFE_1428 and AFE_1429 were the corresponding paralog of Cyc2 (AFE_3153) and Cyc1 (AFE_3152) which were encoded by the extensively studied rus operon and had been proven involving in ferrous iron oxidation. Both AFE_1428 and AFE_1429 contained signal peptide and the classic heme-binding motif(s) as their corresponding paralog. The modeled structure of AFE_1429 showed high resemblance to Cyc1. AFE_1428 and AFE_1429 were preferentially transcribed as their corresponding paralogs in the presence of ferrous iron as sole energy source as compared with sulfur. The tce gene cluster is highly conserved in the genomes of four phylogenetic-related A. ferrooxidans strains that were originally isolated from different sites separated with huge geographical distance, which further implies the importance of this gene cluster. Collectively, AFE_1428 and AFE_1429 involve in Fe 2+ oxidation like their corresponding paralog by integrating with the metalloproteins encoded by rus operon. This study provides novel insights into the Fe 2+ oxidation mechanism in Fe 2+ -oxidizing A. ferrooxidans ssp.

  8. Association analysis of the IL-1 gene cluster polymorphisms with aggressive and chronic periodontitis in the Algerian population.

    Science.gov (United States)

    Boukortt, Kawther Nourelhouda; Saidi-Ouahrani, Nadjia; Boukerzaza, Boubaker; Ouhaibi-Djellouli, Hadjira; Hachmaoui, Khalida; Benaissa, Fatima Zohra; Taleb, Leila; Drabla-Ouahrani, Hayet; Deba, Tahria; Ouledhamou, Sid Ahmed; Mehtar, Nadhera; Boudjema, Abdellah

    2015-10-01

    There is strong evidence that genetic as well as environmental factors affect the development of periodontitis. Various studies suggest that genetic polymorphisms of the interleukin-1 (IL-1) genes are associated with an increased risk of developing the pathogenesis. The aim of the present study was to investigate the possible relationship between two polymorphisms of IL-1 gene cluster IL-1B (C+3954T) (rs1143634) and IL-1A (C-889T) (rs1800587) SNPs and the aggressive and chronic periodontitis risk in a case control study in Algerian population. 279 subjects were recruited and received a periodontal examination: 128 healthy controls and 151 cases. From cases, 91 patients were having a chronic disease whereas 60 subjects with aggressive form. All these subjects were genotyped for IL-1A (C-889T) and IL-1B (C+3954T) polymorphisms using TaqMan real time PCR technology. Frequencies of IL-1 alleles, genotypes and the haplotypes were also examined. Significant differences were found in the carriage rate of both minor alleles of the IL-1A (C-889T) and IL-1B (C+3954T) polymorphisms of aggressive periodontitis cases compared with healthy controls (OR [95%CI]=1.61 [1.03-2.49], p=0.03), (OR [95%CI]=1.69 [1.09-2.63], p=0.01), respectively. The result did not reach significance with the chronic form. The studied polymorphisms of the IL-1 genes appear to be associated with susceptibility to aggressive periodontitis (AgP) in the Algerian population. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Profiling of secondary metabolite gene clusters regulated by LaeA in Aspergillus niger FGSC A1279 based on genome sequencing and transcriptome analysis.

    Science.gov (United States)

    Wang, Bin; Lv, Yangyong; Li, Xuejie; Lin, Yiying; Deng, Hai; Pan, Li

    The global regulator LaeA controls the production of many fungal secondary metabolites, possibly via chromatin remodeling. Here we aimed to survey the secondary metabolite profile regulated by LaeA in Aspergillus niger FGSC A1279 by genome sequencing and comparative transcriptomics between the laeA deletion (ΔlaeA) and overexpressing (OE-laeA) mutants. Genome sequencing revealed four putative polyketide synthase genes specific to FGSC A1279, suggesting that the corresponding polyketide compounds might be unique to FGSC A1279. RNA-seq data revealed 281 putative secondary metabolite genes upregulated in the OE-laeA mutants, including 22 secondary metabolite backbone genes. LC-MS chemical profiling illustrated that many secondary metabolites were produced in OE-laeA mutants compared to wild type and ΔlaeA mutants, providing potential resources for drug discovery. KEGG analysis annotated 16 secondary metabolite clusters putatively linked to metabolic pathways. Furthermore, 34 of 61 Zn 2 Cys 6 transcription factors located in secondary metabolite clusters were differentially expressed between ΔlaeA and OE-laeA mutants. Three secondary metabolite clusters (cluster 18, 30 and 33) containing Zn 2 Cys 6 transcription factors that were upregulated in OE-laeA mutants were putatively linked to KEGG pathways, suggesting that Zn 2 Cys 6 transcription factors might play an important role in synthesizing secondary metabolites regulated by LaeA. Taken together, LaeA dramatically influences the secondary metabolite profile in FGSC A1279. Copyright © 2017 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  10. Functional clustering of time series gene expression data by Granger causality

    Science.gov (United States)

    2012-01-01

    Background A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them. PMID:23107425

  11. Measurement of circulating transcripts and gene cluster analysis predicts and defines therapeutic efficacy of peptide receptor radionuclide therapy (PRRT) in neuroendocrine tumors

    International Nuclear Information System (INIS)

    Bodei, L.; Kidd, M.; Modlin, I.M.; Severi, S.; Nicolini, S.; Paganelli, G.; Drozdov, I.; Kwekkeboom, D.J.; Krenning, E.P.; Baum, R.P.

    2016-01-01

    Peptide receptor radionuclide therapy (PRRT) is an effective method for treating neuroendocrine tumors (NETs). It is limited, however, in the prediction of individual tumor response and the precise and early identification of changes in tumor size. Currently, response prediction is based on somatostatin receptor expression and efficacy by morphological imaging and/or chromogranin A (CgA) measurement. The aim of this study was to assess the accuracy of circulating NET transcripts as a measure of PRRT efficacy, and moreover to identify prognostic gene clusters in pretreatment blood that could be interpolated with relevant clinical features in order to define a biological index for the tumor and a predictive quotient for PRRT efficacy. NET patients (n = 54), M: F 37:17, median age 66, bronchial: n = 13, GEP-NET: n = 35, CUP: n = 6 were treated with 177 Lu-based-PRRT (cumulative activity: 6.5-27.8 GBq, median 18.5). At baseline: 47/54 low-grade (G1/G2; bronchial typical/atypical), 31/49 18 FDG positive and 39/54 progressive. Disease status was assessed by RECIST1.1. Transcripts were measured by real-time quantitative reverse transcription PCR (qRT-PCR) and multianalyte algorithmic analysis (NETest); CgA by enzyme-linked immunosorbent assay (ELISA). Gene cluster (GC) derivations: regulatory network, protein:protein interactome analyses. Statistical analyses: chi-square, non-parametric measurements, multiple regression, receiver operating characteristic and Kaplan-Meier survival. The disease control rate was 72 %. Median PFS was not achieved (follow-up: 1-33 months, median: 16). Only grading was associated with response (p < 0.01). At baseline, 94 % of patients were NETest-positive, while CgA was elevated in 59 %. NETest accurately (89 %, χ 2 = 27.4; p = 1.2 x 10 -7 ) correlated with treatment response, while CgA was 24 % accurate. Gene cluster expression (growth-factor signalome and metabolome) had an AUC of 0.74 ± 0.08 (z-statistic = 2.92, p < 0.004) for predicting

  12. Measurement of circulating transcripts and gene cluster analysis predicts and defines therapeutic efficacy of peptide receptor radionuclide therapy (PRRT) in neuroendocrine tumors

    Energy Technology Data Exchange (ETDEWEB)

    Bodei, L. [European Institute of Oncology, Division of Nuclear Medicine, Milan (Italy); LuGenIum Consortium, Milan, Rotterdam, Bad Berka, London, Italy, Netherlands, Germany (Country Unknown); Kidd, M. [Wren Laboratories, Branford, CT (United States); Modlin, I.M. [LuGenIum Consortium, Milan, Rotterdam, Bad Berka, London, Italy, Netherlands, Germany (Country Unknown); Yale School of Medicine, New Haven, CT (United States); Severi, S.; Nicolini, S.; Paganelli, G. [Istituto Scientifico Romagnolo per lo Studio e la Cura dei Tumori (IRST) IRCCS, Nuclear Medicine and Radiometabolic Units, Meldola (Italy); Drozdov, I. [Bering Limited, London (United Kingdom); Kwekkeboom, D.J.; Krenning, E.P. [LuGenIum Consortium, Milan, Rotterdam, Bad Berka, London, Italy, Netherlands, Germany (Country Unknown); Erasmus Medical Center, Nuclear Medicine Department, Rotterdam (Netherlands); Baum, R.P. [LuGenIum Consortium, Milan, Rotterdam, Bad Berka, London, Italy, Netherlands, Germany (Country Unknown); Zentralklinik Bad Berka, Theranostics Center for Molecular Radiotherapy and Imaging, Bad Berka (Germany)

    2016-05-15

    Peptide receptor radionuclide therapy (PRRT) is an effective method for treating neuroendocrine tumors (NETs). It is limited, however, in the prediction of individual tumor response and the precise and early identification of changes in tumor size. Currently, response prediction is based on somatostatin receptor expression and efficacy by morphological imaging and/or chromogranin A (CgA) measurement. The aim of this study was to assess the accuracy of circulating NET transcripts as a measure of PRRT efficacy, and moreover to identify prognostic gene clusters in pretreatment blood that could be interpolated with relevant clinical features in order to define a biological index for the tumor and a predictive quotient for PRRT efficacy. NET patients (n = 54), M: F 37:17, median age 66, bronchial: n = 13, GEP-NET: n = 35, CUP: n = 6 were treated with {sup 177}Lu-based-PRRT (cumulative activity: 6.5-27.8 GBq, median 18.5). At baseline: 47/54 low-grade (G1/G2; bronchial typical/atypical), 31/49 {sup 18}FDG positive and 39/54 progressive. Disease status was assessed by RECIST1.1. Transcripts were measured by real-time quantitative reverse transcription PCR (qRT-PCR) and multianalyte algorithmic analysis (NETest); CgA by enzyme-linked immunosorbent assay (ELISA). Gene cluster (GC) derivations: regulatory network, protein:protein interactome analyses. Statistical analyses: chi-square, non-parametric measurements, multiple regression, receiver operating characteristic and Kaplan-Meier survival. The disease control rate was 72 %. Median PFS was not achieved (follow-up: 1-33 months, median: 16). Only grading was associated with response (p < 0.01). At baseline, 94 % of patients were NETest-positive, while CgA was elevated in 59 %. NETest accurately (89 %, χ{sup 2} = 27.4; p = 1.2 x 10{sup -7}) correlated with treatment response, while CgA was 24 % accurate. Gene cluster expression (growth-factor signalome and metabolome) had an AUC of 0.74 ± 0.08 (z-statistic = 2.92, p < 0

  13. CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

    Science.gov (United States)

    Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin

    2017-08-31

    Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.

  14. Cluster analysis of track structure

    International Nuclear Information System (INIS)

    Michalik, V.

    1991-01-01

    One of the possibilities of classifying track structures is application of conventional partition techniques of analysis of multidimensional data to the track structure. Using these cluster algorithms this paper attempts to find characteristics of radiation reflecting the spatial distribution of ionizations in the primary particle track. An absolute frequency distribution of clusters of ionizations giving the mean number of clusters produced by radiation per unit of deposited energy can serve as this characteristic. General computation techniques used as well as methods of calculations of distributions of clusters for different radiations are discussed. 8 refs.; 5 figs

  15. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    Directory of Open Access Journals (Sweden)

    Landfors Mattias

    2010-10-01

    Full Text Available Abstract Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered, missing value imputation (2, standardization of data (2, gene selection (19 or clustering method (11. The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that

  16. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    Science.gov (United States)

    2010-01-01

    Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is

  17. Sequencing and functional analysis of the nifENXorf1orf2 gene cluster of Herbaspirillum seropedicae.

    Science.gov (United States)

    Klassen, G; Pedrosa, F O; Souza, E M; Yates, M G; Rigo, L U

    1999-12-01

    A 5.1-kb DNA fragment from the nifHDK region of H. seropedicae was isolated and sequenced. Sequence analysis showed the presence of nifENXorf1orf2 but nifTY were not present. No nif or consensus promoter was identified. Furthermore, orf1 expression occurred only under nitrogen-fixing conditions and no promoter activity was detected between nifK and nifE, suggesting that these genes are expressed from the upstream nifH promoter and are parts of a unique nif operon. Mutagenesis studies indicate that nifN was essential for nitrogenase activity whereas nifXorf1orf2 were not. High homology between the C-terminal region of the NifX and NifB proteins from H. seropedicae was observed. Since the NifX and NifY proteins are important for FeMo cofactor (FeMoco) synthesis, we propose that alternative proteins with similar activities exist in H. seropedicae.

  18. Multi-gene phylogenetic analysis reveals that shochu-fermenting Saccharomyces cerevisiae strains form a distinct sub-clade of the Japanese sake cluster.

    Science.gov (United States)

    Futagami, Taiki; Kadooka, Chihiro; Ando, Yoshinori; Okutsu, Kayu; Yoshizaki, Yumiko; Setoguchi, Shinji; Takamine, Kazunori; Kawai, Mikihiko; Tamaki, Hisanori

    2017-10-01

    Shochu is a traditional Japanese distilled spirit. The formation of the distinguishing flavour of shochu produced in individual distilleries is attributed to putative indigenous yeast strains. In this study, we performed the first (to our knowledge) phylogenetic classification of shochu strains based on nucleotide gene sequences. We performed phylogenetic classification of 21 putative indigenous shochu yeast strains isolated from 11 distilleries. All of these strains were shown or confirmed to be Saccharomyces cerevisiae, sharing species identification with 34 known S. cerevisiae strains (including commonly used shochu, sake, ale, whisky, bakery, bioethanol and laboratory yeast strains and clinical isolate) that were tested in parallel. Our analysis used five genes that reflect genome-level phylogeny for the strain-level classification. In a first step, we demonstrated that partial regions of the ZAP1, THI7, PXL1, YRR1 and GLG1 genes were sufficient to reproduce previous sub-species classifications. In a second step, these five analysed regions from each of 25 strains (four commonly used shochu strains and the 21 putative indigenous shochu strains) were concatenated and used to generate a phylogenetic tree. Further analysis revealed that the putative indigenous shochu yeast strains form a monophyletic group that includes both the shochu yeasts and a subset of the sake group strains; this cluster is a sister group to other sake yeast strains, together comprising a sake-shochu group. Differences among shochu strains were small, suggesting that it may be possible to correlate subtle phenotypic differences among shochu flavours with specific differences in genome sequences. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  19. Mutational analysis of the myxovirescin biosynthetic gene cluster reveals novel insights into the functional elaboration of polyketide backbones.

    Science.gov (United States)

    Simunovic, Vesna; Müller, Rolf

    2007-07-23

    It has been proposed that two acyl carrier proteins (ACPs)-TaB and TaE--and two 3-hydroxy-3-methylglutaryl synthases (HMGSs)--TaC and TaF--could constitute two functional ACP-HMGS pairs (TaB/TaC and TaE/TaF) responsible for the incorporation of acetate and propionate units into the myxovirescin A scaffold, leading to the formation of beta-methyl and beta-ethyl groups, respectively. It has been suggested that three more proteins--TaX and TaY, which are members of the superfamily of enoyl-CoA hydratases (ECHs), and a variant ketosynthase (KS) TaK--are shared between two ACP-HMGS pairs, to give the complete set of enzymes required to perform the beta-alkylations. The beta-methyl branch is presumably further hydroxylated (by TaH) and methylated to produce the methoxymethyl group observed in myxovirescin A. To substantiate this hypothesis, a series of gene-deletion mutants were created, and the effects of these mutations on myxovirescin production were examined. As predicted, DeltataB and DeltataE ACP mutants revealed similar phenotypes to their associated HMGS mutants DeltataC and DeltataF, respectively, thus providing direct evidence for the role of TaE/TaF in the formation of the beta-ethyl branch and implying a role for TaB/TaC in the formation of the beta-methyl group. Production of myxovirescin A was dramatically reduced in a DeltataK mutant and abolished in both the DeltataX and the DeltataY mutant backgrounds. Analysis of a DeltataH mutant confirmed the role of the cytochrome P450 TaH in hydroxylation of the beta-methyl group. Taken together, these experiments support a model in which the discrete ACPs TaB and TaE are compatible only with their associated HMGSs TaC and TaF, respectively, and function in a substrate-specific manner. Both TaB and TaC are essential for myxovirescin production, and the TaB/TaC pair can rescue antibiotic production in the absence of either TaE or TaF. Finally, the reduced level of myxovirescin production in the DeltataE mutant

  20. Clustering gene expression regulators: new approach to disease subtyping.

    Directory of Open Access Journals (Sweden)

    Mikhail Pyatnitskiy

    Full Text Available One of the main challenges in modern medicine is to stratify different patient groups in terms of underlying disease molecular mechanisms as to develop more personalized approach to therapy. Here we propose novel method for disease subtyping based on analysis of activated expression regulators on a sample-by-sample basis. Our approach relies on Sub-Network Enrichment Analysis algorithm (SNEA which identifies gene subnetworks with significant concordant changes in expression between two conditions. Subnetwork consists of central regulator and downstream genes connected by relations extracted from global literature-extracted regulation database. Regulators found in each patient separately are clustered together and assigned activity scores which are used for final patients grouping. We show that our approach performs well compared to other related methods and at the same time provides researchers with complementary level of understanding of pathway-level biology behind a disease by identification of significant expression regulators. We have observed the reasonable grouping of neuromuscular disorders (triggered by structural damage vs triggered by unknown mechanisms, that was not revealed using standard expression profile clustering. For another experiment we were able to suggest the clusters of regulators, responsible for colorectal carcinoma vs adenoma discrimination and identify frequently genetically changed regulators that could be of specific importance for the individual characteristics of cancer development. Proposed approach can be regarded as biologically meaningful feature selection, reducing tens of thousands of genes down to dozens of clusters of regulators. Obtained clusters of regulators make possible to generate valuable biological hypotheses about molecular mechanisms related to a clinical outcome for individual patient.

  1. Genome based analysis of type-I polyketide synthase and nonribosomal peptide synthetase gene clusters in seven strains of five representative Nocardia species.

    Science.gov (United States)

    Komaki, Hisayuki; Ichikawa, Natsuko; Hosoyama, Akira; Takahashi-Nakaguchi, Azusa; Matsuzawa, Tetsuhiro; Suzuki, Ken-ichiro; Fujita, Nobuyuki; Gonoi, Tohru

    2014-04-30

    Actinobacteria of the genus Nocardia usually live in soil or water and play saprophytic roles, but they also opportunistically infect the respiratory system, skin, and other organs of humans and animals. Primarily because of the clinical importance of the strains, some Nocardia genomes have been sequenced, and genome sequences have accumulated. Genome sizes of Nocardia strains are similar to those of Streptomyces strains, the producers of most antibiotics. In the present work, we compared secondary metabolite biosynthesis gene clusters of type-I polyketide synthase (PKS-I) and nonribosomal peptide synthetase (NRPS) among genomes of representative Nocardia species/strains based on domain organization and amino acid sequence homology. Draft genome sequences of Nocardia asteroides NBRC 15531(T), Nocardia otitidiscaviarum IFM 11049, Nocardia brasiliensis NBRC 14402(T), and N. brasiliensis IFM 10847 were read and compared with published complete genome sequences of Nocardia farcinica IFM 10152, Nocardia cyriacigeorgica GUH-2, and N. brasiliensis HUJEG-1. Genome sizes are as follows: N. farcinica, 6.0 Mb; N. cyriacigeorgica, 6.2 Mb; N. asteroides, 7.0 Mb; N. otitidiscaviarum, 7.8 Mb; and N. brasiliensis, 8.9 - 9.4 Mb. Predicted numbers of PKS-I, NRPS, and PKS-I/NRPS hybrid clusters ranged between 4-11, 7-13, and 1-6, respectively, depending on strains, and tended to increase with increasing genome size. Domain and module structures of representative or unique clusters are discussed in the text. We conclude the following: 1) genomes of Nocardia strains carry as many PKS-I and NRPS gene clusters as those of Streptomyces strains, 2) the number of PKS-I and NRPS gene clusters in Nocardia strains varies substantially depending on species, and N. brasiliensis strains carry the largest numbers of clusters among the species studied, 3) the seven Nocardia strains studied in the present work have seven common PKS-I and/or NRPS clusters, some of whose products are yet to be studied

  2. Evolution and Diversity of Biosynthetic Gene Clusters in Fusarium

    Directory of Open Access Journals (Sweden)

    Koen Hoogendoorn

    2018-06-01

    Full Text Available Plant pathogenic fungi in the Fusarium genus cause severe damage to crops, resulting in great financial losses and health hazards. Specialized metabolites synthesized by these fungi are known to play key roles in the infection process, and to provide survival advantages inside and outside the host. However, systematic studies of the evolution of specialized metabolite-coding potential across Fusarium have been scarce. Here, we apply a combination of bioinformatic approaches to identify biosynthetic gene clusters (BGCs across publicly available genomes from Fusarium, to group them into annotated families and to study gain/loss events of BGC families throughout the history of the genus. Comparison with MIBiG reference BGCs allowed assignment of 29 gene cluster families (GCFs to pathways responsible for the production of known compounds, while for 57 GCFs, the molecular products remain unknown. Comparative analysis of BGC repertoires using ancestral state reconstruction raised several new hypotheses on how BGCs contribute to Fusarium pathogenicity or host specificity, sometimes surprisingly so: for example, a gene cluster for the biosynthesis of hexadehydro-astechrome was identified in the genome of the biocontrol strain Fusarium oxysporum Fo47, while being absent in that of the tomato pathogen F. oxysporum f.sp. lycopersici. Several BGCs were also identified on supernumerary chromosomes; heterologous expression of genes for three terpene synthases encoded on the Fusarium poae supernumerary chromosome and subsequent GC/MS analysis showed that these genes are functional and encode enzymes that each are able to synthesize koraiol; this observed functional redundancy supports the hypothesis that localization of copies of BGCs on supernumerary chromosomes provides freedom for evolutionary innovations to occur, while the original function remains conserved. Altogether, this systematic overview of biosynthetic diversity in Fusarium paves the way for

  3. Analysis of clustered point mutations in the human ribosomal RNA gene promoter by transient expression in vivo

    International Nuclear Information System (INIS)

    Jones, M.H.; Learned, R.M.; Tjian, R.

    1988-01-01

    The authors have mapped the cis regulatory elements required in vivo for initiation at the human rRNA promoter by RNA polymerase I. Transient expression in COS-7 cells was used to evaluate the transcription phenotype of clustered base substitution mutations in the human rRNA promoter. The promoter consists of two major elements: a large upstream region, composed of several domains, that lies between nucleotides -234 and -107 relative to the transcription initiation site and affects transcription up to 100-fold and a core element that lies between nucleotides -45 and +20 and affects transcription up to 1000-fold. The upstream regions is able to retain partial function when positioned within 100-160 nucleotides of the transcription initiation site, but it cannot stimulate transcription from distances of ≥ 600 nucleotides. In addition, they demonstrate, using mouse-human hybrid rRNA promoters, that the sequences responsible for human species-specific transcription in vivo appear to reside in both the core and upstream elements, and sequences from the mouse rRNA promoter cannot be substituted for them

  4. Hox gene cluster of the ascidian, Halocynthia roretzi, reveals multiple ancient steps of cluster disintegration during ascidian evolution.

    Science.gov (United States)

    Sekigami, Yuka; Kobayashi, Takuya; Omi, Ai; Nishitsuji, Koki; Ikuta, Tetsuro; Fujiyama, Asao; Satoh, Noriyuki; Saiga, Hidetoshi

    2017-01-01

    Hox gene clusters with at least 13 paralog group (PG) members are common in vertebrate genomes and in that of amphioxus. Ascidians, which belong to the subphylum Tunicata (Urochordata), are phylogenetically positioned between vertebrates and amphioxus, and traditionally divided into two groups: the Pleurogona and the Enterogona. An enterogonan ascidian, Ciona intestinalis ( Ci ), possesses nine Hox genes localized on two chromosomes; thus, the Hox gene cluster is disintegrated. We investigated the Hox gene cluster of a pleurogonan ascidian, Halocynthia roretzi ( Hr ) to investigate whether Hox gene cluster disintegration is common among ascidians, and if so, how such disintegration occurred during ascidian or tunicate evolution. Our phylogenetic analysis reveals that the Hr Hox gene complement comprises nine members, including one with a relatively divergent Hox homeodomain sequence. Eight of nine Hr Hox genes were orthologous to Ci-Hox1 , 2, 3, 4, 5, 10, 12 and 13. Following the phylogenetic classification into 13 PGs, we designated Hr Hox genes as Hox1, 2, 3, 4, 5, 10, 11/12/13.a , 11/12/13.b and HoxX . To address the chromosomal arrangement of the nine Hox genes, we performed two-color chromosomal fluorescent in situ hybridization, which revealed that the nine Hox genes are localized on a single chromosome in Hr , distinct from their arrangement in Ci . We further examined the order of the nine Hox genes on the chromosome by chromosome/scaffold walking. This analysis suggested a gene order of Hox1 , 11/12/13.b, 11/12/13.a, 10, 5, X, followed by either Hox4, 3, 2 or Hox2, 3, 4 on the chromosome. Based on the present results and those previously reported in Ci , we discuss the establishment of the Hox gene complement and disintegration of Hox gene clusters during the course of ascidian or tunicate evolution. The Hox gene cluster and the genome must have experienced extensive reorganization during the course of evolution from the ancestral tunicate to Hr and Ci

  5. Recursive Cluster Elimination (RCE for classification and feature selection from gene expression data

    Directory of Open Access Journals (Sweden)

    Showe Louise C

    2007-05-01

    Full Text Available Abstract Background Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE rather than recursive feature elimination (RFE. We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. Results We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs, a supervised machine learning classification method, to identify and score (rank those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA with recursive feature elimination (SVM-RFE and PDA-RFE are used to remove genes based on their individual discriminant weights. Conclusion SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together

  6. GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data.

    Science.gov (United States)

    Schulz, Tizian; Stoye, Jens; Doerr, Daniel

    2018-05-08

    Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species. We present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse. By identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.

  7. In Silico Analysis of Gene Expression Network Components Underlying Pigmentation Phenotypes in the Python Identified Evolutionarily Conserved Clusters of Transcription Factor Binding Sites

    Directory of Open Access Journals (Sweden)

    Kristopher J. L. Irizarry

    2016-01-01

    Full Text Available Color variation provides the opportunity to investigate the genetic basis of evolution and selection. Reptiles are less studied than mammals. Comparative genomics approaches allow for knowledge gained in one species to be leveraged for use in another species. We describe a comparative vertebrate analysis of conserved regulatory modules in pythons aimed at assessing bioinformatics evidence that transcription factors important in mammalian pigmentation phenotypes may also be important in python pigmentation phenotypes. We identified 23 python orthologs of mammalian genes associated with variation in coat color phenotypes for which we assessed the extent of pairwise protein sequence identity between pythons and mouse, dog, horse, cow, chicken, anole lizard, and garter snake. We next identified a set of melanocyte/pigment associated transcription factors (CREB, FOXD3, LEF-1, MITF, POU3F2, and USF-1 that exhibit relatively conserved sequence similarity within their DNA binding regions across species based on orthologous alignments across multiple species. Finally, we identified 27 evolutionarily conserved clusters of transcription factor binding sites within ~200-nucleotide intervals of the 1500-nucleotide upstream regions of AIM1, DCT, MC1R, MITF, MLANA, OA1, PMEL, RAB27A, and TYR from Python bivittatus. Our results provide insight into pigment phenotypes in pythons.

  8. Time-series clustering of gene expression in irradiated and bystander fibroblasts: an application of FBPA clustering

    Directory of Open Access Journals (Sweden)

    Markatou Marianthi

    2011-01-01

    Full Text Available Abstract Background The radiation bystander effect is an important component of the overall biological response of tissues and organisms to ionizing radiation, but the signaling mechanisms between irradiated and non-irradiated bystander cells are not fully understood. In this study, we measured a time-series of gene expression after α-particle irradiation and applied the Feature Based Partitioning around medoids Algorithm (FBPA, a new clustering method suitable for sparse time series, to identify signaling modules that act in concert in the response to direct irradiation and bystander signaling. We compared our results with those of an alternate clustering method, Short Time series Expression Miner (STEM. Results While computational evaluations of both clustering results were similar, FBPA provided more biological insight. After irradiation, gene clusters were enriched for signal transduction, cell cycle/cell death and inflammation/immunity processes; but only FBPA separated clusters by function. In bystanders, gene clusters were enriched for cell communication/motility, signal transduction and inflammation processes; but biological functions did not separate as clearly with either clustering method as they did in irradiated samples. Network analysis confirmed p53 and NF-κB transcription factor-regulated gene clusters in irradiated and bystander cells and suggested novel regulators, such as KDM5B/JARID1B (lysine (K-specific demethylase 5B and HDACs (histone deacetylases, which could epigenetically coordinate gene expression after irradiation. Conclusions In this study, we have shown that a new time series clustering method, FBPA, can provide new leads to the mechanisms regulating the dynamic cellular response to radiation. The findings implicate epigenetic control of gene expression in addition to transcription factor networks.

  9. Cluster analysis for portfolio optimization

    OpenAIRE

    Vincenzo Tola; Fabrizio Lillo; Mauro Gallegati; Rosario N. Mantegna

    2005-01-01

    We consider the problem of the statistical uncertainty of the correlation matrix in the optimization of a financial portfolio. We show that the use of clustering algorithms can improve the reliability of the portfolio in terms of the ratio between predicted and realized risk. Bootstrap analysis indicates that this improvement is obtained in a wide range of the parameters N (number of assets) and T (investment horizon). The predicted and realized risk level and the relative portfolio compositi...

  10. Statistical indicators of collective behavior and functional clusters in gene networks of yeast

    Science.gov (United States)

    Živković, J.; Tadić, B.; Wick, N.; Thurner, S.

    2006-03-01

    We analyze gene expression time-series data of yeast (S. cerevisiae) measured along two full cell-cycles. We quantify these data by using q-exponentials, gene expression ranking and a temporal mean-variance analysis. We construct gene interaction networks based on correlation coefficients and study the formation of the corresponding giant components and minimum spanning trees. By coloring genes according to their cell function we find functional clusters in the correlation networks and functional branches in the associated trees. Our results suggest that a percolation point of functional clusters can be identified on these gene expression correlation networks.

  11. A robust approach based on Weibull distribution for clustering gene expression data

    Directory of Open Access Journals (Sweden)

    Gong Binsheng

    2011-05-01

    Full Text Available Abstract Background Clustering is a widely used technique for analysis of gene expression data. Most clustering methods group genes based on the distances, while few methods group genes according to the similarities of the distributions of the gene expression levels. Furthermore, as the biological annotation resources accumulated, an increasing number of genes have been annotated into functional categories. As a result, evaluating the performance of clustering methods in terms of the functional consistency of the resulting clusters is of great interest. Results In this paper, we proposed the WDCM (Weibull Distribution-based Clustering Method, a robust approach for clustering gene expression data, in which the gene expressions of individual genes are considered as the random variables following unique Weibull distributions. Our WDCM is based on the concept that the genes with similar expression profiles have similar distribution parameters, and thus the genes are clustered via the Weibull distribution parameters. We used the WDCM to cluster three cancer gene expression data sets from the lung cancer, B-cell follicular lymphoma and bladder carcinoma and obtained well-clustered results. We compared the performance of WDCM with k-means and Self Organizing Map (SOM using functional annotation information given by the Gene Ontology (GO. The results showed that the functional annotation ratios of WDCM are higher than those of the other methods. We also utilized the external measure Adjusted Rand Index to validate the performance of the WDCM. The comparative results demonstrate that the WDCM provides the better clustering performance compared to k-means and SOM algorithms. The merit of the proposed WDCM is that it can be applied to cluster incomplete gene expression data without imputing the missing values. Moreover, the robustness of WDCM is also evaluated on the incomplete data sets. Conclusions The results demonstrate that our WDCM produces clusters

  12. Differential Retention of Gene Functions in a Secondary Metabolite Cluster.

    Science.gov (United States)

    Reynolds, Hannah T; Slot, Jason C; Divon, Hege H; Lysøe, Erik; Proctor, Robert H; Brown, Daren W

    2017-08-01

    In fungi, distribution of secondary metabolite (SM) gene clusters is often associated with host- or environment-specific benefits provided by SMs. In the plant pathogen Alternaria brassicicola (Dothideomycetes), the DEP cluster confers an ability to synthesize the SM depudecin, a histone deacetylase inhibitor that contributes weakly to virulence. The DEP cluster includes genes encoding enzymes, a transporter, and a transcription regulator. We investigated the distribution and evolution of the DEP cluster in 585 fungal genomes and found a wide but sporadic distribution among Dothideomycetes, Sordariomycetes, and Eurotiomycetes. We confirmed DEP gene expression and depudecin production in one fungus, Fusarium langsethiae. Phylogenetic analyses suggested 6-10 horizontal gene transfers (HGTs) of the cluster, including a transfer that led to the presence of closely related cluster homologs in Alternaria and Fusarium. The analyses also indicated that HGTs were frequently followed by loss/pseudogenization of one or more DEP genes. Independent cluster inactivation was inferred in at least four fungal classes. Analyses of transitions among functional, pseudogenized, and absent states of DEP genes among Fusarium species suggest enzyme-encoding genes are lost at higher rates than the transporter (DEP3) and regulatory (DEP6) genes. The phenotype of an experimentally-induced DEP3 mutant of Fusarium did not support the hypothesis that selective retention of DEP3 and DEP6 protects fungi from exogenous depudecin. Together, the results suggest that HGT and gene loss have contributed significantly to DEP cluster distribution, and that some DEP genes provide a greater fitness benefit possibly due to a differential tendency to form network connections. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2017. This work is written by US Government employees and is in the public domain in the US.

  13. Clustering approaches to identifying gene expression patterns from DNA microarray data.

    Science.gov (United States)

    Do, Jin Hwan; Choi, Dong-Kug

    2008-04-30

    The analysis of microarray data is essential for large amounts of gene expression data. In this review we focus on clustering techniques. The biological rationale for this approach is the fact that many co-expressed genes are co-regulated, and identifying co-expressed genes could aid in functional annotation of novel genes, de novo identification of transcription factor binding sites and elucidation of complex biological pathways. Co-expressed genes are usually identified in microarray experiments by clustering techniques. There are many such methods, and the results obtained even for the same datasets may vary considerably depending on the algorithms and metrics for dissimilarity measures used, as well as on user-selectable parameters such as desired number of clusters and initial values. Therefore, biologists who want to interpret microarray data should be aware of the weakness and strengths of the clustering methods used. In this review, we survey the basic principles of clustering of DNA microarray data from crisp clustering algorithms such as hierarchical clustering, K-means and self-organizing maps, to complex clustering algorithms like fuzzy clustering.

  14. An Effective Tri-Clustering Algorithm Combining Expression Data with Gene Regulation Information

    Directory of Open Access Journals (Sweden)

    Ao Li

    2009-04-01

    Full Text Available Motivation: Bi-clustering algorithms aim to identify sets of genes sharing similar expression patterns across a subset of conditions. However direct interpretation or prediction of gene regulatory mechanisms may be difficult as only gene expression data is used. Information about gene regulators may also be available, most commonly about which transcription factors may bind to the promoter region and thus control the expression level of a gene. Thus a method to integrate gene expression and gene regulation information is desirable for clustering and analyzing. Methods: By incorporating gene regulatory information with gene expression data, we define regulated expression values (REV as indicators of how a gene is regulated by a specific factor. Existing bi-clustering methods are extended to a three dimensional data space by developing a heuristic TRI-Clustering algorithm. An additional approach named Automatic Boundary Searching algorithm (ABS is introduced to automatically determine the boundary threshold. Results: Results based on incorporating ChIP-chip data representing transcription factor-gene interactions show that the algorithms are efficient and robust for detecting tri-clusters. Detailed analysis of the tri-cluster extracted from yeast sporulation REV data shows genes in this cluster exhibited significant differences during the middle and late stages. The implicated regulatory network was then reconstructed for further study of defined regulatory mechanisms. Topological and statistical analysis of this network demonstrated evidence of significant changes of TF activities during the different stages of yeast sporulation, and suggests this approach might be a general way to study regulatory networks undergoing transformations.

  15. HOXA genes cluster: clinical implications of the smallest deletion

    OpenAIRE

    Pezzani, Lidia; Milani, Donatella; Manzoni, Francesca; Baccarin, Marco; Silipigni, Rosamaria; Guerneri, Silvana; Esposito, Susanna

    2015-01-01

    Background HOXA genes cluster plays a fundamental role in embryologic development. Deletion of the entire cluster is known to cause a clinically recognizable syndrome with mild developmental delay, characteristic facies, small feet with unusually short and big halluces, abnormal thumbs, and urogenital malformations. The clinical manifestations may vary with different ranges of deletions of HOXA cluster and flanking regions. Case presentation We report a girl with the smallest deletion reporte...

  16. Minimum Information about a Biosynthetic Gene cluster : commentary

    NARCIS (Netherlands)

    Medema, Marnix H; Kottmann, Renzo; Yilmaz, Pelin; Cummings, Matthew; Biggins, John B; Blin, Kai; de Bruijn, Irene; Chooi, Yit Heng; Claesen, Jan; Coates, R Cameron; Cruz-Morales, Pablo; Duddela, Srikanth; Dusterhus, Stephanie; Edwards, Daniel J; Fewer, David P; Garg, Neha; Geiger, Christoph; Gomez-Escribano, Juan Pablo; Greule, Anja; Hadjithomas, Michalis; Haines, Anthony S; Helfrich, Eric J N; Hillwig, Matthew L; Ishida, Keishi; Jones, Adam C; Jones, Carla S; Jungmann, Katrin; Kegler, Carsten; Kim, Hyun Uk; Kotter, Peter; Krug, Daniel; Masschelein, Joleen; Melnik, Alexey V; Mantovani, Simone M; Monroe, Emily A; Moore, Marcus; Moss, Nathan; Nutzmann, Hans-Wilhelm; Pan, Guohui; Pati, Amrita; Petras, Daniel; Reen, F Jerry; Rosconi, Federico; Rui, Zhe; Tian, Zhenhua; Tobias, Nicholas J; Tsunematsu, Yuta; Wiemann, Philipp; Wyckoff, Elizabeth; Yan, Xiaohui; Yim, Grace; Yu, Fengan; Xie, Yunchang; Aigle, Bertrand; Apel, Alexander K; Balibar, Carl J; Balskus, Emily P; Barona-Gomez, Francisco; Bechthold, Andreas; Bode, Helge B; Borriss, Rainer; Brady, Sean F; Brakhage, Axel A; Caffrey, Patrick; Cheng, Yi-Qiang; Clardy, Jon; Cox, Russell J; De Mot, Rene; Donadio, Stefano; Donia, Mohamed S; van der Donk, Wilfred A; Dorrestein, Pieter C; Doyle, Sean; Driessen, Arnold J M; Ehling-Schulz, Monika; Entian, Karl-Dieter; Fischbach, Michael A; Gerwick, Lena; Gerwick, William H; Gross, Harald; Gust, Bertolt; Hertweck, Christian; Hofte, Monica; Jensen, Susan E; Ju, Jianhua; Katz, Leonard; Kaysser, Leonard; Klassen, Jonathan L; Keller, Nancy P; Kormanec, Jan; Kuipers, Oscar P; Kuzuyama, Tomohisa; Kyrpides, Nikos C; Kwon, Hyung-Jin; Lautru, Sylvie; Lavigne, Rob; Lee, Chia Y; Linquan, Bai; Liu, Xinyu; Liu, Wen; Luzhetskyy, Andriy; Mahmud, Taifo; Mast, Yvonne; Mendez, Carmen; Metsa-Ketela, Mikko; Micklefield, Jason; Mitchell, Douglas A; Moore, Bradley S; Moreira, Leonilde M; Muller, Rolf; Neilan, Brett A; Nett, Markus; Nielsen, Jens; O'Gara, Fergal; Oikawa, Hideaki; Osbourn, Anne; Osburne, Marcia S; Ostash, Bohdan; Payne, Shelley M; Pernodet, Jean-Luc; Petricek, Miroslav; Piel, Jorn; Ploux, Olivier; Raaijmakers, Jos M; Salas, Jose A; Schmitt, Esther K; Scott, Barry; Seipke, Ryan F; Shen, Ben; Sherman, David H; Sivonen, Kaarina; Smanski, Michael J; Sosio, Margherita; Stegmann, Evi; Sussmuth, Roderich D; Tahlan, Kapil; Thomas, Christopher M; Tang, Yi; Truman, Andrew W; Viaud, Muriel; Walton, Jonathan D; Walsh, Christopher T; Weber, Tilmann; van Wezel, Gilles P; Wilkinson, Barrie; Willey, Joanne M; Wohlleben, Wolfgang; Wright, Gerard D; Ziemert, Nadine; Zhang, Changsheng; Zotchev, Sergey B; Breitling, Rainer; Takano, Eriko; Glockner, Frank Oliver

    A wide variety of enzymatic pathways that produce specialized metabolites in bacteria, fungi and plants are known to be encoded in biosynthetic gene clusters. Information about these clusters, pathways and metabolites is currently dispersed throughout the literature, making it difficult to exploit.

  17. The gsdf gene locus harbors evolutionary conserved and clustered genes preferentially expressed in fish previtellogenic oocytes.

    Science.gov (United States)

    Gautier, Aude; Le Gac, Florence; Lareyre, Jean-Jacques

    2011-02-01

    The gonadal soma-derived factor (GSDF) belongs to the transforming growth factor-β superfamily and is conserved in teleostean fish species. Gsdf is specifically expressed in the gonads, and gene expression is restricted to the granulosa and Sertoli cells in trout and medaka. The gsdf gene expression is correlated to early testis differentiation in medaka and was shown to stimulate primordial germ cell and spermatogonia proliferation in trout. In the present study, we show that the gsdf gene localizes to a syntenic chromosomal fragment conserved among vertebrates although no gsdf-related gene is detected on the corresponding genomic region in tetrapods. We demonstrate using quantitative RT-PCR that most of the genes localized in the synteny are specifically expressed in medaka gonads. Gsdf is the only gene of the synteny with a much higher expression in the testis compared to the ovary. In contrast, gene expression pattern analysis of the gsdf surrounding genes (nup54, aff1, klhl8, sdad1, and ptpn13) indicates that these genes are preferentially expressed in the female gonads. The tissue distribution of these genes is highly similar in medaka and zebrafish, two teleostean species that have diverged more than 110 million years ago. The cellular localization of these genes was determined in medaka gonads using the whole-mount in situ hybridization technique. We confirm that gsdf gene expression is restricted to Sertoli and granulosa cells in contact with the premeiotic and meiotic cells. The nup54 gene is expressed in spermatocytes and previtellogenic oocytes. Transcripts corresponding to the ovary-specific genes (aff1, klhl8, and sdad1) are detected only in previtellogenic oocytes. No expression was detected in the gonocytes in 10 dpf embryos. In conclusion, we show that the gsdf gene localizes to a syntenic chromosomal fragment harboring evolutionary conserved genes in vertebrates. These genes are preferentially expressed in previtelloogenic oocytes, and thus, they

  18. MADIBA: A web server toolkit for biological interpretation of Plasmodium and plant gene clusters

    Directory of Open Access Journals (Sweden)

    Louw Abraham I

    2008-02-01

    Full Text Available Abstract Background Microarray technology makes it possible to identify changes in gene expression of an organism, under various conditions. Data mining is thus essential for deducing significant biological information such as the identification of new biological mechanisms or putative drug targets. While many algorithms and software have been developed for analysing gene expression, the extraction of relevant information from experimental data is still a substantial challenge, requiring significant time and skill. Description MADIBA (MicroArray Data Interface for Biological Annotation facilitates the assignment of biological meaning to gene expression clusters by automating the post-processing stage. A relational database has been designed to store the data from gene to pathway for Plasmodium, rice and Arabidopsis. Tools within the web interface allow rapid analyses for the identification of the Gene Ontology terms relevant to each cluster; visualising the metabolic pathways where the genes are implicated, their genomic localisations, putative common transcriptional regulatory elements in the upstream sequences, and an analysis specific to the organism being studied. Conclusion MADIBA is an integrated, online tool that will assist researchers in interpreting their results and understand the meaning of the co-expression of a cluster of genes. Functionality of MADIBA was validated by analysing a number of gene clusters from several published experiments – expression profiling of the Plasmodium life cycle, and salt stress treatments of Arabidopsis and rice. In most of the cases, the same conclusions found by the authors were quickly and easily obtained after analysing the gene clusters with MADIBA.

  19. Outcome-Driven Cluster Analysis with Application to Microarray Data.

    Directory of Open Access Journals (Sweden)

    Jessie J Hsu

    Full Text Available One goal of cluster analysis is to sort characteristics into groups (clusters so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes into groups of highly correlated genes that have the same effect on the outcome (recovery. We propose a random effects model where the genes within each group (cluster equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.

  20. Large clusters of co-expressed genes in the Drosophila genome.

    Science.gov (United States)

    Boutanaev, Alexander M; Kalmykova, Alla I; Shevelyov, Yuri Y; Nurminsky, Dmitry I

    2002-12-12

    Clustering of co-expressed, non-homologous genes on chromosomes implies their co-regulation. In lower eukaryotes, co-expressed genes are often found in pairs. Clustering of genes that share aspects of transcriptional regulation has also been reported in higher eukaryotes. To advance our understanding of the mode of coordinated gene regulation in multicellular organisms, we performed a genome-wide analysis of the chromosomal distribution of co-expressed genes in Drosophila. We identified a total of 1,661 testes-specific genes, one-third of which are clustered on chromosomes. The number of clusters of three or more genes is much higher than expected by chance. We observed a similar trend for genes upregulated in the embryo and in the adult head, although the expression pattern of individual genes cannot be predicted on the basis of chromosomal position alone. Our data suggest that the prevalent mechanism of transcriptional co-regulation in higher eukaryotes operates with extensive chromatin domains that comprise multiple genes.

  1. Cluster analysis in phenotyping a Portuguese population.

    Science.gov (United States)

    Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J

    2015-09-03

    Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.

  2. Clustering gene expression data based on predicted differential effects of GV interaction.

    Science.gov (United States)

    Pan, Hai-Yan; Zhu, Jun; Han, Dan-Fu

    2005-02-01

    Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.

  3. Clustering based gene expression feature selection method: A computational approach to enrich the classifier efficiency of differentially expressed genes

    KAUST Repository

    Abusamra, Heba

    2016-07-20

    The native nature of high dimension low sample size of gene expression data make the classification task more challenging. Therefore, feature (gene) selection become an apparent need. Selecting a meaningful and relevant genes for classifier not only decrease the computational time and cost, but also improve the classification performance. Among different approaches of feature selection methods, however most of them suffer from several problems such as lack of robustness, validation issues etc. Here, we present a new feature selection technique that takes advantage of clustering both samples and genes. Materials and methods We used leukemia gene expression dataset [1]. The effectiveness of the selected features were evaluated by four different classification methods; support vector machines, k-nearest neighbor, random forest, and linear discriminate analysis. The method evaluate the importance and relevance of each gene cluster by summing the expression level for each gene belongs to this cluster. The gene cluster consider important, if it satisfies conditions depend on thresholds and percentage otherwise eliminated. Results Initial analysis identified 7120 differentially expressed genes of leukemia (Fig. 15a), after applying our feature selection methodology we end up with specific 1117 genes discriminating two classes of leukemia (Fig. 15b). Further applying the same method with more stringent higher positive and lower negative threshold condition, number reduced to 58 genes have be tested to evaluate the effectiveness of the method (Fig. 15c). The results of the four classification methods are summarized in Table 11. Conclusions The feature selection method gave good results with minimum classification error. Our heat-map result shows distinct pattern of refines genes discriminating between two classes of leukemia.

  4. AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number

    Directory of Open Access Journals (Sweden)

    Cooper James B

    2010-03-01

    Full Text Available Abstract Background Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry. Results We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four. Conclusions By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at http://jimcooperlab.mcdb.ucsb.edu/autosome.

  5. Calcitonin gene-related peptide antagonism and cluster headache

    DEFF Research Database (Denmark)

    Ashina, Håkan; Newman, Lawrence; Ashina, Sait

    2017-01-01

    Calcitonin gene-related peptide (CGRP) is a key signaling molecule involved in migraine pathophysiology. Efficacy of CGRP monoclonal antibodies and antagonists in migraine treatment has fueled an increasing interest in the prospect of treating cluster headache (CH) with CGRP antagonism. The exact...... role of CGRP and its mechanism of action in CH have not been fully clarified. A search for original studies and randomized controlled trials (RCTs) published in English was performed in PubMed and in ClinicalTrials.gov . The search term used was "cluster headache and calcitonin gene related peptide......" and "primary headaches and calcitonin gene related peptide." Reference lists of identified articles were also searched for additional relevant papers. Human experimental studies have reported elevated plasma CGRP levels during both spontaneous and glyceryl trinitrate-induced cluster attacks. CGRP may play...

  6. Two Horizontally Transferred Xenobiotic Resistance Gene Clusters Associated with Detoxification of Benzoxazolinones by Fusarium Species

    Science.gov (United States)

    Glenn, Anthony E.; Davis, C. Britton; Gao, Minglu; Gold, Scott E.; Mitchell, Trevor R.; Proctor, Robert H.; Stewart, Jane E.; Snook, Maurice E.

    2016-01-01

    Microbes encounter a broad spectrum of antimicrobial compounds in their environments and often possess metabolic strategies to detoxify such xenobiotics. We have previously shown that Fusarium verticillioides, a fungal pathogen of maize known for its production of fumonisin mycotoxins, possesses two unlinked loci, FDB1 and FDB2, necessary for detoxification of antimicrobial compounds produced by maize, including the γ-lactam 2-benzoxazolinone (BOA). In support of these earlier studies, microarray analysis of F. verticillioides exposed to BOA identified the induction of multiple genes at FDB1 and FDB2, indicating the loci consist of gene clusters. One of the FDB1 cluster genes encoded a protein having domain homology to the metallo-β-lactamase (MBL) superfamily. Deletion of this gene (MBL1) rendered F. verticillioides incapable of metabolizing BOA and thus unable to grow on BOA-amended media. Deletion of other FDB1 cluster genes, in particular AMD1 and DLH1, did not affect BOA degradation. Phylogenetic analyses and topology testing of the FDB1 and FDB2 cluster genes suggested two horizontal transfer events among fungi, one being transfer of FDB1 from Fusarium to Colletotrichum, and the second being transfer of the FDB2 cluster from Fusarium to Aspergillus. Together, the results suggest that plant-derived xenobiotics have exerted evolutionary pressure on these fungi, leading to horizontal transfer of genes that enhance fitness or virulence. PMID:26808652

  7. Gene structure and expression characteristic of a novel odorant receptor gene cluster in the parasitoid wasp Microplitis mediator (Hymenoptera: Braconidae).

    Science.gov (United States)

    Wang, S-N; Shan, S; Zheng, Y; Peng, Y; Lu, Z-Y; Yang, Y-Q; Li, R-J; Zhang, Y-J; Guo, Y-Y

    2017-08-01

    Odorant receptors (ORs) expressed in the antennae of parasitoid wasps are responsible for detection of various lipophilic airborne molecules. In the present study, 107 novel OR genes were identified from Microplitis mediator antennal transcriptome data. Phylogenetic analysis of the set of OR genes from M. mediator and Microplitis demolitor revealed that M. mediator OR (MmedOR) genes can be classified into different subfamilies, and the majority of MmedORs in each subfamily shared high sequence identities and clear orthologous relationships to M. demolitor ORs. Within a subfamily, six MmedOR genes, MmedOR98, 124, 125, 126, 131 and 155, shared a similar gene structure and were tightly linked in the genome. To evaluate whether the clustered MmedOR genes share common regulatory features, the transcription profile and expression characteristics of the six closely related OR genes were investigated in M. mediator. Rapid amplification of cDNA ends-PCR experiments revealed that the OR genes within the cluster were transcribed as single mRNAs, and a bicistronic mRNA for two adjacent genes (MmedOR124 and MmedOR98) was also detected in female antennae by reverse transcription PCR. In situ hybridization experiments indicated that each OR gene within the cluster was expressed in a different number of cells. Moreover, there was no co-expression of the two highly related OR genes, MmedOR124 and MmedOR98, which appeared to be individually expressed in a distinct population of neurons. Overall, there were distinct expression profiles of closely related MmedOR genes from the same cluster in M. mediator. These data provide a basic understanding of the olfactory coding in parasitoid wasps. © 2017 The Royal Entomological Society.

  8. Identification and comparative analysis of the protocadherin cluster in a reptile, the green anole lizard.

    Directory of Open Access Journals (Sweden)

    Xiao-Juan Jiang

    Full Text Available BACKGROUND: The vertebrate protocadherins are a subfamily of cell adhesion molecules that are predominantly expressed in the nervous system and are believed to play an important role in establishing the complex neural network during animal development. Genes encoding these molecules are organized into a cluster in the genome. Comparative analysis of the protocadherin subcluster organization and gene arrangements in different vertebrates has provided interesting insights into the history of vertebrate genome evolution. Among tetrapods, protocadherin clusters have been fully characterized only in mammals. In this study, we report the identification and comparative analysis of the protocadherin cluster in a reptile, the green anole lizard (Anolis carolinensis. METHODOLOGY/PRINCIPAL FINDINGS: We show that the anole protocadherin cluster spans over a megabase and encodes a total of 71 genes. The number of genes in the anole protocadherin cluster is significantly higher than that in the coelacanth (49 genes and mammalian (54-59 genes clusters. The anole protocadherin genes are organized into four subclusters: the delta, alpha, beta and gamma. This subcluster organization is identical to that of the coelacanth protocadherin cluster, but differs from the mammalian clusters which lack the delta subcluster. The gene number expansion in the anole protocadherin cluster is largely due to the extensive gene duplication in the gammab subgroup. Similar to coelacanth and elephant shark protocadherin genes, the anole protocadherin genes have experienced a low frequency of gene conversion. CONCLUSIONS/SIGNIFICANCE: Our results suggest that similar to the protocadherin clusters in other vertebrates, the evolution of anole protocadherin cluster is driven mainly by lineage-specific gene duplications and degeneration. Our analysis also shows that loss of the protocadherin delta subcluster in the mammalian lineage occurred after the divergence of mammals and reptiles

  9. Ensemble attribute profile clustering: discovering and characterizing groups of genes with similar patterns of biological features

    Directory of Open Access Journals (Sweden)

    Bissell MJ

    2006-03-01

    Full Text Available Abstract Background Ensemble attribute profile clustering is a novel, text-based strategy for analyzing a user-defined list of genes and/or proteins. The strategy exploits annotation data present in gene-centered corpora and utilizes ideas from statistical information retrieval to discover and characterize properties shared by subsets of the list. The practical utility of this method is demonstrated by employing it in a retrospective study of two non-overlapping sets of genes defined by a published investigation as markers for normal human breast luminal epithelial cells and myoepithelial cells. Results Each genetic locus was characterized using a finite set of biological properties and represented as a vector of features indicating attributes associated with the locus (a gene attribute profile. In this study, the vector space models for a pre-defined list of genes were constructed from the Gene Ontology (GO terms and the Conserved Domain Database (CDD protein domain terms assigned to the loci by the gene-centered corpus LocusLink. This data set of GO- and CDD-based gene attribute profiles, vectors of binary random variables, was used to estimate multiple finite mixture models and each ensuing model utilized to partition the profiles into clusters. The resultant partitionings were combined using a unanimous voting scheme to produce consensus clusters, sets of profiles that co-occured consistently in the same cluster. Attributes that were important in defining the genes assigned to a consensus cluster were identified. The clusters and their attributes were inspected to ascertain the GO and CDD terms most associated with subsets of genes and in conjunction with external knowledge such as chromosomal location, used to gain functional insights into human breast biology. The 52 luminal epithelial cell markers and 89 myoepithelial cell markers are disjoint sets of genes. Ensemble attribute profile clustering-based analysis indicated that both lists

  10. Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

    Science.gov (United States)

    Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

    2013-03-01

    Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.

  11. Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

    Directory of Open Access Journals (Sweden)

    Li Weizhong

    2008-04-01

    Full Text Available Abstract Background The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools. Results We present a computational improvement to a sequence clustering approach that we developed previously to identify and classify protein coding genes in large microbial metagenomic datasets. The clustering approach can be used to identify protein coding genes in prokaryotes, viruses, and intron-less eukaryotes. The computational improvement is based on an incremental clustering method that does not require the expensive all-against-all compute that was required by the original approach, while still preserving the remote homology detection capabilities. We present evaluations of the clustering approach in protein-coding gene identification and classification, and also present the results of updating the protein clusters from our previous work with recent genomic and metagenomic sequences. The clustering results are available via CAMERA, (http://camera.calit2.net. Conclusion The clustering paradigm is shown to be a very useful tool in the analysis of microbial metagenomic data. The incremental clustering method is shown to be much faster than the original approach in identifying genes, grouping sequences into existing protein families, and also identifying novel families that have multiple members in a metagenomic dataset. These clusters provide a basis for further studies of protein families.

  12. antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters

    DEFF Research Database (Denmark)

    Weber, Tilmann; Blin, Kai; Duddela, Srikanth

    2015-01-01

    Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we...... introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration...... of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products...

  13. Transcriptional analysis of ESAT-6 cluster 3 in Mycobacterium smegmatis

    Directory of Open Access Journals (Sweden)

    Riccardi Giovanna

    2009-03-01

    Full Text Available Abstract Background The ESAT-6 (early secreted antigenic target, 6 kDa family collects small mycobacterial proteins secreted by Mycobacterium tuberculosis, particularly in the early phase of growth. There are 23 ESAT-6 family members in M. tuberculosis H37Rv. In a previous work, we identified the Zur- dependent regulation of five proteins of the ESAT-6/CFP-10 family (esxG, esxH, esxQ, esxR, and esxS. esxG and esxH are part of ESAT-6 cluster 3, whose expression was already known to be induced by iron starvation. Results In this research, we performed EMSA experiments and transcriptional analysis of ESAT-6 cluster 3 in Mycobacterium smegmatis (msmeg0615-msmeg0625 and M. tuberculosis. In contrast to what we had observed in M. tuberculosis, we found that in M. smegmatis ESAT-6 cluster 3 responds only to iron and not to zinc. In both organisms we identified an internal promoter, a finding which suggests the presence of two transcriptional units and, by consequence, a differential expression of cluster 3 genes. We compared the expression of msmeg0615 and msmeg0620 in different growth and stress conditions by means of relative quantitative PCR. The expression of msmeg0615 and msmeg0620 genes was essentially similar; they appeared to be repressed in most of the tested conditions, with the exception of acid stress (pH 4.2 where msmeg0615 was about 4-fold induced, while msmeg0620 was repressed. Analysis revealed that in acid stress conditions M. tuberculosis rv0282 gene was 3-fold induced too, while rv0287 induction was almost insignificant. Conclusion In contrast with what has been reported for M. tuberculosis, our results suggest that in M. smegmatis only IdeR-dependent regulation is retained, while zinc has no effect on gene expression. The role of cluster 3 in M. tuberculosis virulence is still to be defined; however, iron- and zinc-dependent expression strongly suggests that cluster 3 is highly expressed in the infective process, and that the cluster

  14. The ergot alkaloid gene cluster: Functional analyses and evolutionary aspects

    Czech Academy of Sciences Publication Activity Database

    Lorenz, N.; Haarmann, T.; Pažoutová, Sylvie; Jung, M.; Tudzynski, P.

    2009-01-01

    Roč. 70, 15-16 (2009), s. 1822-1832 ISSN 0031-9422 Institutional research plan: CEZ:AV0Z50200510 Keywords : Claviceps purpurea * Ergot fungus * Ergot alkaloid gene cluster Subject RIV: EE - Microbiology, Virology Impact factor: 3.104, year: 2009

  15. Robust cluster analysis and variable selection

    CERN Document Server

    Ritter, Gunter

    2014-01-01

    Clustering remains a vibrant area of research in statistics. Although there are many books on this topic, there are relatively few that are well founded in the theoretical aspects. In Robust Cluster Analysis and Variable Selection, Gunter Ritter presents an overview of the theory and applications of probabilistic clustering and variable selection, synthesizing the key research results of the last 50 years. The author focuses on the robust clustering methods he found to be the most useful on simulated data and real-time applications. The book provides clear guidance for the varying needs of bot

  16. Transcriptional regulation of gene expression clusters in motor neurons following spinal cord injury

    Directory of Open Access Journals (Sweden)

    Westerdahl Ann-Charlotte

    2010-06-01

    Full Text Available Abstract Background Spinal cord injury leads to neurological dysfunctions affecting the motor, sensory as well as the autonomic systems. Increased excitability of motor neurons has been implicated in injury-induced spasticity, where the reappearance of self-sustained plateau potentials in the absence of modulatory inputs from the brain correlates with the development of spasticity. Results Here we examine the dynamic transcriptional response of motor neurons to spinal cord injury as it evolves over time to unravel common gene expression patterns and their underlying regulatory mechanisms. For this we use a rat-tail-model with complete spinal cord transection causing injury-induced spasticity, where gene expression profiles are obtained from labeled motor neurons extracted with laser microdissection 0, 2, 7, 21 and 60 days post injury. Consensus clustering identifies 12 gene clusters with distinct time expression profiles. Analysis of these gene clusters identifies early immunological/inflammatory and late developmental responses as well as a regulation of genes relating to neuron excitability that support the development of motor neuron hyper-excitability and the reappearance of plateau potentials in the late phase of the injury response. Transcription factor motif analysis identifies differentially expressed transcription factors involved in the regulation of each gene cluster, shaping the expression of the identified biological processes and their associated genes underlying the changes in motor neuron excitability. Conclusions This analysis provides important clues to the underlying mechanisms of transcriptional regulation responsible for the increased excitability observed in motor neurons in the late chronic phase of spinal cord injury suggesting alternative targets for treatment of spinal cord injury. Several transcription factors were identified as potential regulators of gene clusters containing elements related to motor neuron hyper

  17. Transcriptional regulation of gene expression clusters in motor neurons following spinal cord injury.

    Science.gov (United States)

    Ryge, Jesper; Winther, Ole; Wienecke, Jacob; Sandelin, Albin; Westerdahl, Ann-Charlotte; Hultborn, Hans; Kiehn, Ole

    2010-06-09

    Spinal cord injury leads to neurological dysfunctions affecting the motor, sensory as well as the autonomic systems. Increased excitability of motor neurons has been implicated in injury-induced spasticity, where the reappearance of self-sustained plateau potentials in the absence of modulatory inputs from the brain correlates with the development of spasticity. Here we examine the dynamic transcriptional response of motor neurons to spinal cord injury as it evolves over time to unravel common gene expression patterns and their underlying regulatory mechanisms. For this we use a rat-tail-model with complete spinal cord transection causing injury-induced spasticity, where gene expression profiles are obtained from labeled motor neurons extracted with laser microdissection 0, 2, 7, 21 and 60 days post injury. Consensus clustering identifies 12 gene clusters with distinct time expression profiles. Analysis of these gene clusters identifies early immunological/inflammatory and late developmental responses as well as a regulation of genes relating to neuron excitability that support the development of motor neuron hyper-excitability and the reappearance of plateau potentials in the late phase of the injury response. Transcription factor motif analysis identifies differentially expressed transcription factors involved in the regulation of each gene cluster, shaping the expression of the identified biological processes and their associated genes underlying the changes in motor neuron excitability. This analysis provides important clues to the underlying mechanisms of transcriptional regulation responsible for the increased excitability observed in motor neurons in the late chronic phase of spinal cord injury suggesting alternative targets for treatment of spinal cord injury. Several transcription factors were identified as potential regulators of gene clusters containing elements related to motor neuron hyper-excitability, the manipulation of which potentially could be

  18. Hessian regularization based non-negative matrix factorization for gene expression data clustering.

    Science.gov (United States)

    Liu, Xiao; Shi, Jun; Wang, Congzhi

    2015-01-01

    Since a key step in the analysis of gene expression data is to detect groups of genes that have similar expression patterns, clustering technique is then commonly used to analyze gene expression data. Data representation plays an important role in clustering analysis. The non-negative matrix factorization (NMF) is a widely used data representation method with great success in machine learning. Although the traditional manifold regularization method, Laplacian regularization (LR), can improve the performance of NMF, LR still suffers from the problem of its weak extrapolating power. Hessian regularization (HR) is a newly developed manifold regularization method, whose natural properties make it more extrapolating, especially for small sample data. In this work, we propose the HR-based NMF (HR-NMF) algorithm, and then apply it to represent gene expression data for further clustering task. The clustering experiments are conducted on five commonly used gene datasets, and the results indicate that the proposed HR-NMF outperforms LR-based NMM and original NMF, which suggests the potential application of HR-NMF for gene expression data.

  19. Evolutionary conservation of regulatory elements in vertebrate HOX gene clusters

    Energy Technology Data Exchange (ETDEWEB)

    Santini, Simona; Boore, Jeffrey L.; Meyer, Axel

    2003-12-31

    Due to their high degree of conservation, comparisons of DNA sequences among evolutionarily distantly-related genomes permit to identify functional regions in noncoding DNA. Hox genes are optimal candidate sequences for comparative genome analyses, because they are extremely conserved in vertebrates and occur in clusters. We aligned (Pipmaker) the nucleotide sequences of HoxA clusters of tilapia, pufferfish, striped bass, zebrafish, horn shark, human and mouse (over 500 million years of evolutionary distance). We identified several highly conserved intergenic sequences, likely to be important in gene regulation. Only a few of these putative regulatory elements have been previously described as being involved in the regulation of Hox genes, while several others are new elements that might have regulatory functions. The majority of these newly identified putative regulatory elements contain short fragments that are almost completely conserved and are identical to known binding sites for regulatory proteins (Transfac). The conserved intergenic regions located between the most rostrally expressed genes in the developing embryo are longer and better retained through evolution. We document that presumed regulatory sequences are retained differentially in either A or A clusters resulting from a genome duplication in the fish lineage. This observation supports both the hypothesis that the conserved elements are involved in gene regulation and the Duplication-Deletion-Complementation model.

  20. Conservation of gene linkage in dispersed vertebrate NK homeobox clusters.

    Science.gov (United States)

    Wotton, Karl R; Weierud, Frida K; Juárez-Morales, José L; Alvares, Lúcia E; Dietrich, Susanne; Lewis, Katharine E

    2009-10-01

    Nk homeobox genes are important regulators of many different developmental processes including muscle, heart, central nervous system and sensory organ development. They are thought to have arisen as part of the ANTP megacluster, which also gave rise to Hox and ParaHox genes, and at least some NK genes remain tightly linked in all animals examined so far. The protostome-deuterostome ancestor probably contained a cluster of nine Nk genes: (Msx)-(Nk4/tinman)-(Nk3/bagpipe)-(Lbx/ladybird)-(Tlx/c15)-(Nk7)-(Nk6/hgtx)-(Nk1/slouch)-(Nk5/Hmx). Of these genes, only NKX2.6-NKX3.1, LBX1-TLX1 and LBX2-TLX2 remain tightly linked in humans. However, it is currently unclear whether this is unique to the human genome as we do not know which of these Nk genes are clustered in other vertebrates. This makes it difficult to assess whether the remaining linkages are due to selective pressures or because chance rearrangements have "missed" certain genes. In this paper, we identify all of the paralogs of these ancestrally clustered NK genes in several distinct vertebrates. We demonstrate that tight linkages of Lbx1-Tlx1, Lbx2-Tlx2 and Nkx3.1-Nkx2.6 have been widely maintained in both the ray-finned and lobe-finned fish lineages. Moreover, the recently duplicated Hmx2-Hmx3 genes are also tightly linked. Finally, we show that Lbx1-Tlx1 and Hmx2-Hmx3 are flanked by highly conserved noncoding elements, suggesting that shared regulatory regions may have resulted in evolutionary pressure to maintain these linkages. Consistent with this, these pairs of genes have overlapping expression domains. In contrast, Lbx2-Tlx2 and Nkx3.1-Nkx2.6, which do not seem to be coexpressed, are also not associated with conserved noncoding sequences, suggesting that an alternative mechanism may be responsible for the continued clustering of these genes.

  1. Exact WKB analysis and cluster algebras

    International Nuclear Information System (INIS)

    Iwaki, Kohei; Nakanishi, Tomoki

    2014-01-01

    We develop the mutation theory in the exact WKB analysis using the framework of cluster algebras. Under a continuous deformation of the potential of the Schrödinger equation on a compact Riemann surface, the Stokes graph may change the topology. We call this phenomenon the mutation of Stokes graphs. Along the mutation of Stokes graphs, the Voros symbols, which are monodromy data of the equation, also mutate due to the Stokes phenomenon. We show that the Voros symbols mutate as variables of a cluster algebra with surface realization. As an application, we obtain the identities of Stokes automorphisms associated with periods of cluster algebras. The paper also includes an extensive introduction of the exact WKB analysis and the surface realization of cluster algebras for nonexperts. This article is part of a special issue of Journal of Physics A: Mathematical and Theoretical devoted to ‘Cluster algebras in mathematical physics’. (paper)

  2. Some statistical properties of gene expression clustering for array data

    DEFF Research Database (Denmark)

    Abreu, G C G; Pinheiro, A; Drummond, R D

    2010-01-01

    DNA array data without a corresponding statistical error measure. We propose an easy-to-implement and simple-to-use technique that uses bootstrap re-sampling to evaluate the statistical error of the nodes provided by SOM-based clustering. Comparisons between SOM and parametric clustering are presented...... for simulated as well as for two real data sets. We also implement a bootstrap-based pre-processing procedure for SOM, that improves the false discovery ratio of differentially expressed genes. Code in Matlab is freely available, as well as some supplementary material, at the following address: https...

  3. Gene duplication, modularity and adaptation in the evolution of the aflatoxin gene cluster

    Directory of Open Access Journals (Sweden)

    Jakobek Judy L

    2007-07-01

    Full Text Available Abstract Background The biosynthesis of aflatoxin (AF involves over 20 enzymatic reactions in a complex polyketide pathway that converts acetate and malonate to the intermediates sterigmatocystin (ST and O-methylsterigmatocystin (OMST, the respective penultimate and ultimate precursors of AF. Although these precursors are chemically and structurally very similar, their accumulation differs at the species level for Aspergilli. Notable examples are A. nidulans that synthesizes only ST, A. flavus that makes predominantly AF, and A. parasiticus that generally produces either AF or OMST. Whether these differences are important in the evolutionary/ecological processes of species adaptation and diversification is unknown. Equally unknown are the specific genomic mechanisms responsible for ordering and clustering of genes in the AF pathway of Aspergillus. Results To elucidate the mechanisms that have driven formation of these clusters, we performed systematic searches of aflatoxin cluster homologs across five Aspergillus genomes. We found a high level of gene duplication and identified seven modules consisting of highly correlated gene pairs (aflA/aflB, aflR/aflS, aflX/aflY, aflF/aflE, aflT/aflQ, aflC/aflW, and aflG/aflL. With the exception of A. nomius, contrasts of mean Ka/Ks values across all cluster genes showed significant differences in selective pressure between section Flavi and non-section Flavi species. A. nomius mean Ka/Ks values were more similar to partial clusters in A. fumigatus and A. terreus. Overall, mean Ka/Ks values were significantly higher for section Flavi than for non-section Flavi species. Conclusion Our results implicate several genomic mechanisms in the evolution of ST, OMST and AF cluster genes. Gene modules may arise from duplications of a single gene, whereby the function of the pre-duplication gene is retained in the copy (aflF/aflE or the copies may partition the ancestral function (aflA/aflB. In some gene modules, the

  4. Resistance gene candidates identified by PCR with degenerate oligonucleotide primers map to clusters of resistance genes in lettuce.

    Science.gov (United States)

    Shen, K A; Meyers, B C; Islam-Faridi, M N; Chin, D B; Stelly, D M; Michelmore, R W

    1998-08-01

    The recent cloning of genes for resistance against diverse pathogens from a variety of plants has revealed that many share conserved sequence motifs. This provides the possibility of isolating numerous additional resistance genes by polymerase chain reaction (PCR) with degenerate oligonucleotide primers. We amplified resistance gene candidates (RGCs) from lettuce with multiple combinations of primers with low degeneracy designed from motifs in the nucleotide binding sites (NBSs) of RPS2 of Arabidopsis thaliana and N of tobacco. Genomic DNA, cDNA, and bacterial artificial chromosome (BAC) clones were successfully used as templates. Four families of sequences were identified that had the same similarity to each other as to resistance genes from other species. The relationship of the amplified products to resistance genes was evaluated by several sequence and genetic criteria. The amplified products contained open reading frames with additional sequences characteristic of NBSs. Hybridization of RGCs to genomic DNA and to BAC clones revealed large numbers of related sequences. Genetic analysis demonstrated the existence of clustered multigene families for each of the four RGC sequences. This parallels classical genetic data on clustering of disease resistance genes. Two of the four families mapped to known clusters of resistance genes; these two families were therefore studied in greater detail. Additional evidence that these RGCs could be resistance genes was gained by the identification of leucine-rich repeat (LRR) regions in sequences adjoining the NBS similar to those in RPM1 and RPS2 of A. thaliana. Fluorescent in situ hybridization confirmed the clustered genomic distribution of these sequences. The use of PCR with degenerate oligonucleotide primers is therefore an efficient method to identify numerous RGCs in plants.

  5. Comprehensive annotation of secondary metabolite biosynthetic genes and gene clusters of Aspergillus nidulans, A. fumigatus, A. niger and A. oryzae

    Science.gov (United States)

    2013-01-01

    Background Secondary metabolite production, a hallmark of filamentous fungi, is an expanding area of research for the Aspergilli. These compounds are potent chemicals, ranging from deadly toxins to therapeutic antibiotics to potential anti-cancer drugs. The genome sequences for multiple Aspergilli have been determined, and provide a wealth of predictive information about secondary metabolite production. Sequence analysis and gene overexpression strategies have enabled the discovery of novel secondary metabolites and the genes involved in their biosynthesis. The Aspergillus Genome Database (AspGD) provides a central repository for gene annotation and protein information for Aspergillus species. These annotations include Gene Ontology (GO) terms, phenotype data, gene names and descriptions and they are crucial for interpreting both small- and large-scale data and for aiding in the design of new experiments that further Aspergillus research. Results We have manually curated Biological Process GO annotations for all genes in AspGD with recorded functions in secondary metabolite production, adding new GO terms that specifically describe each secondary metabolite. We then leveraged these new annotations to predict roles in secondary metabolism for genes lacking experimental characterization. As a starting point for manually annotating Aspergillus secondary metabolite gene clusters, we used antiSMASH (antibiotics and Secondary Metabolite Analysis SHell) and SMURF (Secondary Metabolite Unknown Regions Finder) algorithms to identify potential clusters in A. nidulans, A. fumigatus, A. niger and A. oryzae, which we subsequently refined through manual curation. Conclusions This set of 266 manually curated secondary metabolite gene clusters will facilitate the investigation of novel Aspergillus secondary metabolites. PMID:23617571

  6. Molecular comparison of the structural proteins encoding gene clusters of two related Lactobacillus delbrueckii bacteriophages.

    Science.gov (United States)

    Vasala, A; Dupont, L; Baumann, M; Ritzenthaler, P; Alatossava, T

    1993-01-01

    Virulent phage LL-H and temperate phage mv4 are two related bacteriophages of Lactobacillus delbrueckii. The gene clusters encoding structural proteins of these two phages have been sequenced and further analyzed. Six open reading frames (ORF-1 to ORF-6) were detected. Protein sequencing and Western immunoblotting experiments confirmed that ORF-3 (g34) encoded the main capsid protein Gp34. The presence of a putative late promoter in front of the phage LL-H g34 gene was suggested by primer extension experiments. Comparative sequence analysis between phage LL-H and phage mv4 revealed striking similarities in the structure and organization of this gene cluster, suggesting that the genes encoding phage structural proteins belong to a highly conservative module. Images PMID:8497043

  7. Co-evolution of secondary metabolite gene clusters and their host

    DEFF Research Database (Denmark)

    Kjærbølling, Inge; Vesth, Tammi Camilla; Frisvad, Jens Christian

    Secondary metabolite gene cluster evolution is mainly driven by two events: gene duplication and annexation and horizontal gene transfer. Here we use comparative genomics of Aspergillus species to investigate the evolution of secondary metabolite (SM) gene clusters across a wide spectrum of speci....... We investigate the dynamic evolutionary relationship between the cluster and the host by examining the genes within the cluster and the number of homologous genes found within the host and in closely related species.......Secondary metabolite gene cluster evolution is mainly driven by two events: gene duplication and annexation and horizontal gene transfer. Here we use comparative genomics of Aspergillus species to investigate the evolution of secondary metabolite (SM) gene clusters across a wide spectrum of species...

  8. Cluster analysis of obesity and asthma phenotypes.

    Directory of Open Access Journals (Sweden)

    E Rand Sutherland

    Full Text Available Asthma is a heterogeneous disease with variability among patients in characteristics such as lung function, symptoms and control, body weight, markers of inflammation, and responsiveness to glucocorticoids (GC. Cluster analysis of well-characterized cohorts can advance understanding of disease subgroups in asthma and point to unsuspected disease mechanisms. We utilized an hypothesis-free cluster analytical approach to define the contribution of obesity and related variables to asthma phenotype.In a cohort of clinical trial participants (n = 250, minimum-variance hierarchical clustering was used to identify clinical and inflammatory biomarkers important in determining disease cluster membership in mild and moderate persistent asthmatics. In a subset of participants, GC sensitivity was assessed via expression of GC receptor alpha (GCRα and induction of MAP kinase phosphatase-1 (MKP-1 expression by dexamethasone. Four asthma clusters were identified, with body mass index (BMI, kg/m(2 and severity of asthma symptoms (AEQ score the most significant determinants of cluster membership (F = 57.1, p<0.0001 and F = 44.8, p<0.0001, respectively. Two clusters were composed of predominantly obese individuals; these two obese asthma clusters differed from one another with regard to age of asthma onset, measures of asthma symptoms (AEQ and control (ACQ, exhaled nitric oxide concentration (F(ENO and airway hyperresponsiveness (methacholine PC(20 but were similar with regard to measures of lung function (FEV(1 (% and FEV(1/FVC, airway eosinophilia, IgE, leptin, adiponectin and C-reactive protein (hsCRP. Members of obese clusters demonstrated evidence of reduced expression of GCRα, a finding which was correlated with a reduced induction of MKP-1 expression by dexamethasoneObesity is an important determinant of asthma phenotype in adults. There is heterogeneity in expression of clinical and inflammatory biomarkers of asthma across obese individuals

  9. GenClust: A genetic algorithm for clustering gene expression data

    Directory of Open Access Journals (Sweden)

    Raimondi Alessandra

    2005-12-01

    Full Text Available Abstract Background Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. Results GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a a novel coding of the search space that is simple, compact and easy to update; (b it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. Conclusion Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology.

  10. Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study.

    Science.gov (United States)

    Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein

    2014-11-01

    Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. The Serratia gene cluster encoding biosynthesis of the red antibiotic, prodigiosin, shows species- and strain-dependent genome context variation

    DEFF Research Database (Denmark)

    Harris, Abigail K P; Williamson, Neil R; Slater, Holly

    2004-01-01

    The prodigiosin biosynthesis gene cluster (pig cluster) from two strains of Serratia (S. marcescens ATCC 274 and Serratia sp. ATCC 39006) has been cloned, sequenced and expressed in heterologous hosts. Sequence analysis of the respective pig clusters revealed 14 ORFs in S. marcescens ATCC 274...... and 15 ORFs in Serratia sp. ATCC 39006. In each Serratia species, predicted gene products showed similarity to polyketide synthases (PKSs), non-ribosomal peptide synthases (NRPSs) and the Red proteins of Streptomyces coelicolor A3(2). Comparisons between the two Serratia pig clusters and the red cluster...... from Str. coelicolor A3(2) revealed some important differences. A modified scheme for the biosynthesis of prodigiosin, based on the pathway recently suggested for the synthesis of undecylprodigiosin, is proposed. The distribution of the pig cluster within several Serratia sp. isolates is demonstrated...

  12. Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms.

    Science.gov (United States)

    Esplin, M Sean; Manuck, Tracy A; Varner, Michael W; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M; Ilekis, John

    2015-09-01

    We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators' ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB. We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks' gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software. One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ(2) analysis (PD and DH, P cluster 3 of SPTB. We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Analysis of multiplex gene expression maps obtained by voxelation.

    Science.gov (United States)

    An, Li; Xie, Hongbo; Chin, Mark H; Obradovic, Zoran; Smith, Desmond J; Megalooikonomou, Vasileios

    2009-04-29

    Gene expression signatures in the mammalian brain hold the key to understanding neural development and neurological disease. Researchers have previously used voxelation in combination with microarrays for acquisition of genome-wide atlases of expression patterns in the mouse brain. On the other hand, some work has been performed on studying gene functions, without taking into account the location information of a gene's expression in a mouse brain. In this paper, we present an approach for identifying the relation between gene expression maps obtained by voxelation and gene functions. To analyze the dataset, we chose typical genes as queries and aimed at discovering similar gene groups. Gene similarity was determined by using the wavelet features extracted from the left and right hemispheres averaged gene expression maps, and by the Euclidean distance between each pair of feature vectors. We also performed a multiple clustering approach on the gene expression maps, combined with hierarchical clustering. Among each group of similar genes and clusters, the gene function similarity was measured by calculating the average gene function distances in the gene ontology structure. By applying our methodology to find similar genes to certain target genes we were able to improve our understanding of gene expression patterns and gene functions. By applying the clustering analysis method, we obtained significant clusters, which have both very similar gene expression maps and very similar gene functions respectively to their corresponding gene ontologies. The cellular component ontology resulted in prominent clusters expressed in cortex and corpus callosum. The molecular function ontology gave prominent clusters in cortex, corpus callosum and hypothalamus. The biological process ontology resulted in clusters in cortex, hypothalamus and choroid plexus. Clusters from all three ontologies combined were most prominently expressed in cortex and corpus callosum. The experimental

  14. Analysis of multiplex gene expression maps obtained by voxelation

    Directory of Open Access Journals (Sweden)

    Smith Desmond J

    2009-04-01

    Full Text Available Abstract Background Gene expression signatures in the mammalian brain hold the key to understanding neural development and neurological disease. Researchers have previously used voxelation in combination with microarrays for acquisition of genome-wide atlases of expression patterns in the mouse brain. On the other hand, some work has been performed on studying gene functions, without taking into account the location information of a gene's expression in a mouse brain. In this paper, we present an approach for identifying the relation between gene expression maps obtained by voxelation and gene functions. Results To analyze the dataset, we chose typical genes as queries and aimed at discovering similar gene groups. Gene similarity was determined by using the wavelet features extracted from the left and right hemispheres averaged gene expression maps, and by the Euclidean distance between each pair of feature vectors. We also performed a multiple clustering approach on the gene expression maps, combined with hierarchical clustering. Among each group of similar genes and clusters, the gene function similarity was measured by calculating the average gene function distances in the gene ontology structure. By applying our methodology to find similar genes to certain target genes we were able to improve our understanding of gene expression patterns and gene functions. By applying the clustering analysis method, we obtained significant clusters, which have both very similar gene expression maps and very similar gene functions respectively to their corresponding gene ontologies. The cellular component ontology resulted in prominent clusters expressed in cortex and corpus callosum. The molecular function ontology gave prominent clusters in cortex, corpus callosum and hypothalamus. The biological process ontology resulted in clusters in cortex, hypothalamus and choroid plexus. Clusters from all three ontologies combined were most prominently expressed in

  15. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    Phylogenetic analysis suggests that our sequences are clustered with sequences reported from Japan. This is the first phylogenetic analysis of HCV core gene from Pakistani population. Our sequences and sequences from Japan are grouped into same cluster in the phylogenetic tree. Sequence comparison and ...

  16. Factor Analysis for Clustered Observations.

    Science.gov (United States)

    Longford, N. T.; Muthen, B. O.

    1992-01-01

    A two-level model for factor analysis is defined, and formulas for a scoring algorithm for this model are derived. A simple noniterative method based on decomposition of total sums of the squares and cross-products is discussed and illustrated with simulated data and data from the Second International Mathematics Study. (SLD)

  17. Genomic organization of the rat alpha 2u-globulin gene cluster.

    Science.gov (United States)

    McFadyen, D A; Addison, W; Locke, J

    1999-05-01

    The alpha 2u-globulin are a group of similar proteins, belonging to the lipocalin superfamily of proteins, that are synthesized in a subset of secretory tissues in rats. The many alpha 2u-globulin isoforms are encoded by a multigene family that exhibits extensive homology. Despite a high degree of sequence identity, individual family members show diverse expression patterns involving complex hormonal, tissue-specific, and developmental regulation. Analysis suggests that there are approximately 20 alpha 2u-globulin genes in the rat genome. We have used fluorescence in situ hybridization (FISH) to show that the alpha 2u-globulin genes are clustered at a single site on rat Chromosome (Chr) 5 (5q22-24). Southern blots of rat genomic DNA separated by pulsed field gel electrophoresis indicated that the alpha 2u-globulin genes are contained on two NruI fragments with a total size of 880 kbp. Analysis of three P1 clones containing alpha 2u-globulin genes indicated that the alpha 2u-globulin genes are tandemly arranged in a head-to-tail fashion. The organization of the alpha 2u-globulin genes in the rat as a tandem array of single genes differs from the homologous major urinary protein genes in the mouse, which are organized as tandem arrays of divergently oriented gene pairs. The structure of these gene clusters may have consequences for the proposed function, as a pheromone transporter, for the protein products encoded by these genes.

  18. The entire β-globin gene cluster is deleted in a form of τδβ-thalassemia.

    NARCIS (Netherlands)

    E.R. Fearon; H.H.Jr. Kazazian; P.G. Waber (Pamela); J.I. Lee (Joseph); S.E. Antonarakis; S.H. Orkin (Stuart); E.F. Vanin; P.S. Henthorn; F.G. Grosveld (Frank); A.F. Scott; G.R. Buchanan

    1983-01-01

    textabstractWe have used restriction endonuclease mapping to study a deletion involving the beta-globin gene cluster in a Mexican-American family with gamma delta beta-thalassemia. Analysis of DNA polymorphisms demonstrated deletion of the beta-globin gene from the affected chromosome. Using a DNA

  19. Cluster analysis for DNA methylation profiles having a detection threshold

    Directory of Open Access Journals (Sweden)

    Siegmund Kimberly D

    2006-07-01

    Full Text Available Abstract Background DNA methylation, a molecular feature used to investigate tumor heterogeneity, can be measured on many genomic regions using the MethyLight technology. Due to the combination of the underlying biology of DNA methylation and the MethyLight technology, the measurements, while being generated on a continuous scale, have a large number of 0 values. This suggests that conventional clustering methodology may not perform well on this data. Results We compare performance of existing methodology (such as k-means with two novel methods that explicitly allow for the preponderance of values at 0. We also consider how the ability to successfully cluster such data depends upon the number of informative genes for which methylation is measured and the correlation structure of the methylation values for those genes. We show that when data is collected for a sufficient number of genes, our models do improve clustering performance compared to methods, such as k-means, that do not explicitly respect the supposed biological realities of the situation. Conclusion The performance of analysis methods depends upon how well the assumptions of those methods reflect the properties of the data being analyzed. Differing technologies will lead to data with differing properties, and should therefore be analyzed differently. Consequently, it is prudent to give thought to what the properties of the data are likely to be, and which analysis method might therefore be likely to best capture those properties.

  20. Increasing Power by Sharing Information from Genetic Background and Treatment in Clustering of Gene Expression Time Series

    Directory of Open Access Journals (Sweden)

    Sura Zaki Alrashid

    2018-02-01

    Full Text Available Clustering of gene expression time series gives insight into which genes may be co-regulated, allowing us to discern the activity of pathways in a given microarray experiment. Of particular interest is how a given group of genes varies with different conditions or genetic background. This paper develops
a new clustering method that allows each cluster to be parameterised according to whether the behaviour of the genes across conditions is correlated or anti-correlated. By specifying correlation between such genes,more information is gain within the cluster about how the genes interrelate. Amyotrophic lateral sclerosis (ALS is an irreversible neurodegenerative disorder that kills the motor neurons and results in death within 2 to 3 years from the symptom onset. Speed of progression for different patients are heterogeneous with significant variability. The SOD1G93A transgenic mice from different backgrounds (129Sv and C57 showed consistent phenotypic differences for disease progression. A hierarchy of Gaussian isused processes to model condition-specific and gene-specific temporal co-variances. This study demonstrated about finding some significant gene expression profiles and clusters of associated or co-regulated gene expressions together from four groups of data (SOD1G93A and Ntg from 129Sv and C57 backgrounds. Our study shows the effectiveness of sharing information between replicates and different model conditions when modelling gene expression time series. Further gene enrichment score analysis and ontology pathway analysis of some specified clusters for a particular group may lead toward identifying features underlying the differential speed of disease progression.

  1. An improved Pearson's correlation proximity-based hierarchical clustering for mining biological association between genes.

    Science.gov (United States)

    Booma, P M; Prabhakaran, S; Dhanalakshmi, R

    2014-01-01

    Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality.

  2. Cluster analysis for determining distribution center location

    Science.gov (United States)

    Lestari Widaningrum, Dyah; Andika, Aditya; Murphiyanto, Richard Dimas Julian

    2017-12-01

    Determination of distribution facilities is highly important to survive in the high level of competition in today’s business world. Companies can operate multiple distribution centers to mitigate supply chain risk. Thus, new problems arise, namely how many and where the facilities should be provided. This study examines a fast-food restaurant brand, which located in the Greater Jakarta. This brand is included in the category of top 5 fast food restaurant chain based on retail sales. There were three stages in this study, compiling spatial data, cluster analysis, and network analysis. Cluster analysis results are used to consider the location of the additional distribution center. Network analysis results show a more efficient process referring to a shorter distance to the distribution process.

  3. Gravitation field algorithm and its application in gene cluster

    Directory of Open Access Journals (Sweden)

    Zheng Ming

    2010-09-01

    Full Text Available Abstract Background Searching optima is one of the most challenging tasks in clustering genes from available experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used by biotechnologists. All these algorithms are based on the imitation of natural phenomena. Results This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM of planetary formation. GFA simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem. And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene Expression Omnibus. Conclusions The mathematical proof demonstrates that GFA could be convergent in the global optimum by probability 1 in three conditions for one independent variable mass functions. In addition to these results, the fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and the inherent defects in SA and GA. Some results and source code (in Matlab are publicly available at http://ccst.jlu.edu.cn/CSBG/GFA.

  4. Comparative Genomic Analysis of Soybean Flowering Genes

    Science.gov (United States)

    Jung, Chol-Hee; Wong, Chui E.; Singh, Mohan B.; Bhalla, Prem L.

    2012-01-01

    Flowering is an important agronomic trait that determines crop yield. Soybean is a major oilseed legume crop used for human and animal feed. Legumes have unique vegetative and floral complexities. Our understanding of the molecular basis of flower initiation and development in legumes is limited. Here, we address this by using a computational approach to examine flowering regulatory genes in the soybean genome in comparison to the most studied model plant, Arabidopsis. For this comparison, a genome-wide analysis of orthologue groups was performed, followed by an in silico gene expression analysis of the identified soybean flowering genes. Phylogenetic analyses of the gene families highlighted the evolutionary relationships among these candidates. Our study identified key flowering genes in soybean and indicates that the vernalisation and the ambient-temperature pathways seem to be the most variant in soybean. A comparison of the orthologue groups containing flowering genes indicated that, on average, each Arabidopsis flowering gene has 2-3 orthologous copies in soybean. Our analysis highlighted that the CDF3, VRN1, SVP, AP3 and PIF3 genes are paralogue-rich genes in soybean. Furthermore, the genome mapping of the soybean flowering genes showed that these genes are scattered randomly across the genome. A paralogue comparison indicated that the soybean genes comprising the largest orthologue group are clustered in a 1.4 Mb region on chromosome 16 of soybean. Furthermore, a comparison with the undomesticated soybean (Glycine soja) revealed that there are hundreds of SNPs that are associated with putative soybean flowering genes and that there are structural variants that may affect the genes of the light-signalling and ambient-temperature pathways in soybean. Our study provides a framework for the soybean flowering pathway and insights into the relationship and evolution of flowering genes between a short-day soybean and the long-day plant, Arabidopsis. PMID:22679494

  5. A novel polyketide biosynthesis gene cluster is involved in fruiting body morphogenesis in the filamentous fungi Sordaria macrospora and Neurospora crassa.

    Science.gov (United States)

    Nowrousian, Minou

    2009-04-01

    During fungal fruiting body development, hyphae aggregate to form multicellular structures that protect and disperse the sexual spores. Analysis of microarray data revealed a gene cluster strongly upregulated during fruiting body development in the ascomycete Sordaria macrospora. Real time PCR analysis showed that the genes from the orthologous cluster in Neurospora crassa are also upregulated during development. The cluster encodes putative polyketide biosynthesis enzymes, including a reducing polyketide synthase. Analysis of knockout strains of a predicted dehydrogenase gene from the cluster showed that mutants in N. crassa and S. macrospora are delayed in fruiting body formation. In addition to the upregulated cluster, the N. crassa genome comprises another cluster containing a polyketide synthase gene, and five additional reducing polyketide synthase (rpks) genes that are not part of clusters. To study the role of these genes in sexual development, expression of the predicted rpks genes in S. macrospora (five genes) and N. crassa (six genes) was analyzed; all but one are upregulated during sexual development. Analysis of knockout strains for the N. crassa rpks genes showed that one of them is essential for fruiting body formation. These data indicate that polyketides produced by RPKSs are involved in sexual development in filamentous ascomycetes.

  6. Structure-related clustering of gene expression fingerprints of thp-1 cells exposed to smaller polycyclic aromatic hydrocarbons.

    Science.gov (United States)

    Wan, B; Yarbrough, J W; Schultz, T W

    2008-01-01

    This study was undertaken to test the hypothesis that structurally similar PAHs induce similar gene expression profiles. THP-1 cells were exposed to a series of 12 selected PAHs at 50 microM for 24 hours and gene expressions profiles were analyzed using both unsupervised and supervised methods. Clustering analysis of gene expression profiles revealed that the 12 tested chemicals were grouped into five clusters. Within each cluster, the gene expression profiles are more similar to each other than to the ones outside the cluster. One-methylanthracene and 1-methylfluorene were found to have the most similar profiles; dibenzothiophene and dibenzofuran were found to share common profiles with fluorine. As expression pattern comparisons were expanded, similarity in genomic fingerprint dropped off dramatically. Prediction analysis of microarrays (PAM) based on the clustering pattern generated 49 predictor genes that can be used for sample discrimination. Moreover, a significant analysis of Microarrays (SAM) identified 598 genes being modulated by tested chemicals with a variety of biological processes, such as cell cycle, metabolism, and protein binding and KEGG pathways being significantly (p < 0.05) affected. It is feasible to distinguish structurally different PAHs based on their genomic fingerprints, which are mechanism based.

  7. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations

    Science.gov (United States)

    2014-01-01

    Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations

  8. Recurrent adenylation domain replacement in the microcystin synthetase gene cluster

    Directory of Open Access Journals (Sweden)

    Laakso Kati

    2007-10-01

    Full Text Available Abstract Background Microcystins are small cyclic heptapeptide toxins produced by a range of distantly related cyanobacteria. Microcystins are synthesized on large NRPS-PKS enzyme complexes. Many structural variants of microcystins are produced simulatenously. A recombination event between the first module of mcyB (mcyB1 and mcyC in the microcystin synthetase gene cluster is linked to the simultaneous production of microcystin variants in strains of the genus Microcystis. Results Here we undertook a phylogenetic study to investigate the order and timing of recombination between the mcyB1 and mcyC genes in a diverse selection of microcystin producing cyanobacteria. Our results provide support for complex evolutionary processes taking place at the mcyB1 and mcyC adenylation domains which recognize and activate the amino acids found at X and Z positions. We find evidence for recent recombination between mcyB1 and mcyC in strains of the genera Anabaena, Microcystis, and Hapalosiphon. We also find clear evidence for independent adenylation domain conversion of mcyB1 by unrelated peptide synthetase modules in strains of the genera Nostoc and Microcystis. The recombination events replace only the adenylation domain in each case and the condensation domains of mcyB1 and mcyC are not transferred together with the adenylation domain. Our findings demonstrate that the mcyB1 and mcyC adenylation domains are recombination hotspots in the microcystin synthetase gene cluster. Conclusion Recombination is thought to be one of the main mechanisms driving the diversification of NRPSs. However, there is very little information on how recombination takes place in nature. This study demonstrates that functional peptide synthetases are created in nature through transfer of adenylation domains without the concomitant transfer of condensation domains.

  9. MeSH key terms for validation and annotation of gene expression clusters

    Energy Technology Data Exchange (ETDEWEB)

    Rechtsteiner, A. (Andreas); Rocha, L. M. (Luis Mateus)

    2004-01-01

    Integration of different sources of information is a great challenge for the analysis of gene expression data, and for the field of Functional Genomics in general. As the availability of numerical data from high-throughput methods increases, so does the need for technologies that assist in the validation and evaluation of the biological significance of results extracted from these data. In mRNA assaying with microarrays, for example, numerical analysis often attempts to identify clusters of co-expressed genes. The important task to find the biological significance of the results and validate them has so far mostly fallen to the biological expert who had to perform this task manually. One of the most promising avenues to develop automated and integrative technology for such tasks lies in the application of modern Information Retrieval (IR) and Knowledge Management (KM) algorithms to databases with biomedical publications and data. Examples of databases available for the field are bibliographic databases c ntaining scientific publications (e.g. MEDLINE/PUBMED), databases containing sequence data (e.g. GenBank) and databases of semantic annotations (e.g. the Gene Ontology Consortium and Medical Subject Headings (MeSH)). We present here an approach that uses the MeSH terms and their concept hierarchies to validate and obtain functional information for gene expression clusters. The controlled and hierarchical MeSH vocabulary is used by the National Library of Medicine (NLM) to index all the articles cited in MEDLINE. Such indexing with a controlled vocabulary eliminates some of the ambiguity due to polysemy (terms that have multiple meanings) and synonymy (multiple terms have similar meaning) that would be encountered if terms would be extracted directly from the articles due to differing article contexts or author preferences and background. Further, the hierarchical organization of the MeSH terms can illustrate the conceptuallfunctional relationships of genes

  10. Mycobiota and identification of aflatoxin gene cluster in marketed spices in West Africa

    DEFF Research Database (Denmark)

    Gnonlonfin, G. J. B.; Adjovi, Y. C.; Tokpo, A. F.

    2013-01-01

    Fungal infection and aflatoxin contamination were evaluated on 114 samples of dried and milled spices such as ginger, garlic and black pepper from southern Benin and Togo collected in November 2008 -January 2009. These products are dried to preserve them for lean periods available throughout...... of Aspergillus were dominant on all marketed dried and milled spices irrespective of country. Gene characterization and amplification analysis showed that most of the Aspergillus flavus isolates possess the cluster genes for aflatoxin production. Aflatoxin B1 assessment by Thin Layer Chromatography showed...... further for other products such as dried and milled spices. Crown Copyright (C) 2013 Published by Elsevier Ltd. All rights reserved....

  11. Characterization of the biosynthetic gene cluster for cryptic phthoxazolin A in Streptomyces avermitilis.

    Directory of Open Access Journals (Sweden)

    Dian Anggraini Suroto

    Full Text Available Phthoxazolin A, an oxazole-containing polyketide, has a broad spectrum of anti-oomycete activity and herbicidal activity. We recently identified phthoxazolin A as a cryptic metabolite of Streptomyces avermitilis that produces the important anthelmintic agent avermectin. Even though genome data of S. avermitilis is publicly available, no plausible biosynthetic gene cluster for phthoxazolin A is apparent in the sequence data. Here, we identified and characterized the phthoxazolin A (ptx biosynthetic gene cluster through genome sequencing, comparative genomic analysis, and gene disruption. Sequence analysis uncovered that the putative ptx biosynthetic genes are laid on an extra genomic region that is not found in the public database, and 8 open reading frames in the extra genomic region could be assigned roles in the biosynthesis of the oxazole ring, triene polyketide and carbamoyl moieties. Disruption of the ptxA gene encoding a discrete acyltransferase resulted in a complete loss of phthoxazolin A production, confirming that the trans-AT type I PKS system is responsible for the phthoxazolin A biosynthesis. Based on the predicted functional domains in the ptx assembly line, we propose the biosynthetic pathway of phthoxazolin A.

  12. Hierarchical clustering of breast cancer methylomes revealed differentially methylated and expressed breast cancer genes.

    Directory of Open Access Journals (Sweden)

    I-Hsuan Lin

    Full Text Available Oncogenic transformation of normal cells often involves epigenetic alterations, including histone modification and DNA methylation. We conducted whole-genome bisulfite sequencing to determine the DNA methylomes of normal breast, fibroadenoma, invasive ductal carcinomas and MCF7. The emergence, disappearance, expansion and contraction of kilobase-sized hypomethylated regions (HMRs and the hypomethylation of the megabase-sized partially methylated domains (PMDs are the major forms of methylation changes observed in breast tumor samples. Hierarchical clustering of HMR revealed tumor-specific hypermethylated clusters and differential methylated enhancers specific to normal or breast cancer cell lines. Joint analysis of gene expression and DNA methylation data of normal breast and breast cancer cells identified differentially methylated and expressed genes associated with breast and/or ovarian cancers in cancer-specific HMR clusters. Furthermore, aberrant patterns of X-chromosome inactivation (XCI was found in breast cancer cell lines as well as breast tumor samples in the TCGA BRCA (breast invasive carcinoma dataset. They were characterized with differentially hypermethylated XIST promoter, reduced expression of XIST, and over-expression of hypomethylated X-linked genes. High expressions of these genes were significantly associated with lower survival rates in breast cancer patients. Comprehensive analysis of the normal and breast tumor methylomes suggests selective targeting of DNA methylation changes during breast cancer progression. The weak causal relationship between DNA methylation and gene expression observed in this study is evident of more complex role of DNA methylation in the regulation of gene expression in human epigenetics that deserves further investigation.

  13. Deletion and Gene Expression Analyses Define the Paxilline Biosynthetic Gene Cluster in Penicillium paxilli

    Directory of Open Access Journals (Sweden)

    Emily J. Parker

    2013-08-01

    Full Text Available The indole-diterpene paxilline is an abundant secondary metabolite synthesized by Penicillium paxilli. In total, 21 genes have been identified at the PAX locus of which six have been previously confirmed to have a functional role in paxilline biosynthesis. A combination of bioinformatics, gene expression and targeted gene replacement analyses were used to define the boundaries of the PAX gene cluster. Targeted gene replacement identified seven genes, paxG, paxA, paxM, paxB, paxC, paxP and paxQ that were all required for paxilline production, with one additional gene, paxD, required for regular prenylation of the indole ring post paxilline synthesis. The two putative transcription factors, PP104 and PP105, were not co-regulated with the pax genes and based on targeted gene replacement, including the double knockout, did not have a role in paxilline production. The relationship of indole dimethylallyl transferases involved in prenylation of indole-diterpenes such as paxilline or lolitrem B, can be found as two disparate clades, not supported by prenylation type (e.g., regular or reverse. This paper provides insight into the P. paxilli indole-diterpene locus and reviews the recent advances identified in paxilline biosynthesis.

  14. Variations in CCL3L gene cluster sequence and non-specific gene copy numbers

    Directory of Open Access Journals (Sweden)

    Edberg Jeffrey C

    2010-03-01

    Full Text Available Abstract Background Copy number variations (CNVs of the gene CC chemokine ligand 3-like1 (CCL3L1 have been implicated in HIV-1 susceptibility, but the association has been inconsistent. CCL3L1 shares homology with a cluster of genes localized to chromosome 17q12, namely CCL3, CCL3L2, and, CCL3L3. These genes are involved in host defense and inflammatory processes. Several CNV assays have been developed for the CCL3L1 gene. Findings Through pairwise and multiple alignments of these genes, we have shown that the homology between these genes ranges from 50% to 99% in complete gene sequences and from 70-100% in the exonic regions, with CCL3L1 and CCL3L3 being identical. By use of MEGA 4 and BioEdit, we aligned sense primers, anti-sense primers, and probes used in several previously described assays against pre-multiple alignments of all four chemokine genes. Each set of probes and primers aligned and matched with overlapping sequences in at least two of the four genes, indicating that previously utilized RT-PCR based CNV assays are not specific for only CCL3L1. The four available assays measured median copies of 2 and 3-4 in European and African American, respectively. The concordance between the assays ranged from 0.44-0.83 suggesting individual discordant calls and inconsistencies with the assays from the expected gene coverage from the known sequence. Conclusions This indicates that some of the inconsistencies in the association studies could be due to assays that provide heterogenous results. Sequence information to determine CNV of the three genes separately would allow to test whether their association with the pathogenesis of a human disease or phenotype is affected by an individual gene or by a combination of these genes.

  15. antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters.

    Science.gov (United States)

    Weber, Tilmann; Blin, Kai; Duddela, Srikanth; Krug, Daniel; Kim, Hyun Uk; Bruccoleri, Robert; Lee, Sang Yup; Fischbach, Michael A; Müller, Rolf; Wohlleben, Wolfgang; Breitling, Rainer; Takano, Eriko; Medema, Marnix H

    2015-07-01

    Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. MANNER OF STOCKS SORTING USING CLUSTER ANALYSIS METHODS

    Directory of Open Access Journals (Sweden)

    Jana Halčinová

    2014-06-01

    Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.

  17. Advanced analysis of forest fire clustering

    Science.gov (United States)

    Kanevski, Mikhail; Pereira, Mario; Golay, Jean

    2017-04-01

    Analysis of point pattern clustering is an important topic in spatial statistics and for many applications: biodiversity, epidemiology, natural hazards, geomarketing, etc. There are several fundamental approaches used to quantify spatial data clustering using topological, statistical and fractal measures. In the present research, the recently introduced multi-point Morisita index (mMI) is applied to study the spatial clustering of forest fires in Portugal. The data set consists of more than 30000 fire events covering the time period from 1975 to 2013. The distribution of forest fires is very complex and highly variable in space. mMI is a multi-point extension of the classical two-point Morisita index. In essence, mMI is estimated by covering the region under study by a grid and by computing how many times more likely it is that m points selected at random will be from the same grid cell than it would be in the case of a complete random Poisson process. By changing the number of grid cells (size of the grid cells), mMI characterizes the scaling properties of spatial clustering. From mMI, the data intrinsic dimension (fractal dimension) of the point distribution can be estimated as well. In this study, the mMI of forest fires is compared with the mMI of random patterns (RPs) generated within the validity domain defined as the forest area of Portugal. It turns out that the forest fires are highly clustered inside the validity domain in comparison with the RPs. Moreover, they demonstrate different scaling properties at different spatial scales. The results obtained from the mMI analysis are also compared with those of fractal measures of clustering - box counting and sand box counting approaches. REFERENCES Golay J., Kanevski M., Vega Orozco C., Leuenberger M., 2014: The multipoint Morisita index for the analysis of spatial patterns. Physica A, 406, 191-202. Golay J., Kanevski M. 2015: A new estimator of intrinsic dimension based on the multipoint Morisita index

  18. Cluster Analysis in Rapeseed (Brassica Napus L.)

    International Nuclear Information System (INIS)

    Mahasi, J.M

    2002-01-01

    With widening edible deficit, Kenya has become increasingly dependent on imported edible oils. Many oilseed crops (e.g. sunflower, soya beans, rapeseed/mustard, sesame, groundnuts etc) can be grown in Kenya. But oilseed rape is preferred because it very high yielding (1.5 tons-4.0 tons/ha) with oil content of 42-46%. Other uses include fitting in various cropping systems as; relay/inter crops, rotational crops, trap crops and fodder. It is soft seeded hence oil extraction is relatively easy. The meal is high in protein and very useful in livestock supplementation. Rapeseed can be straight combined using adjusted wheat combines. The priority is to expand domestic oilseed production, hence the need to introduce improved rapeseed germplasm from other countries. The success of any crop improvement programme depends on the extent of genetic diversity in the material. Hence, it is essential to understand the adaptation of introduced genotypes and the similarities if any among them. Evaluation trials were carried out on 17 rapeseed genotypes (nine Canadian origin and eight of European origin) grown at 4 locations namely Endebess, Njoro, Timau and Mau Narok in three years (1992, 1993 and 1994). Results for 1993 were discarded due to severe drought. An analysis of variance was carried out only on seed yields and the treatments were found to be significantly different. Cluster analysis was then carried out on mean seed yields and based on this analysis; only one major group exists within the material. In 1992, varieties 2,3,8 and 9 didn't fall in the same cluster as the rest. Variety 8 was the only one not classified with the rest of the Canadian varieties. Three European varieties (2,3 and 9) were however not classified with the others. In 1994, varieties 10 and 6 didn't fall in the major cluster. Of these two, variety 10 is of Canadian origin. Varieties were more similar in 1994 than 1992 due to favorable weather. It is evident that, genotypes from different geographical

  19. Heterologous expression of pikromycin biosynthetic gene cluster using Streptomyces artificial chromosome system.

    Science.gov (United States)

    Pyeon, Hye-Rim; Nah, Hee-Ju; Kang, Seung-Hoon; Choi, Si-Sun; Kim, Eung-Soo

    2017-05-31

    Heterologous expression of biosynthetic gene clusters of natural microbial products has become an essential strategy for titer improvement and pathway engineering of various potentially-valuable natural products. A Streptomyces artificial chromosomal conjugation vector, pSBAC, was previously successfully applied for precise cloning and tandem integration of a large polyketide tautomycetin (TMC) biosynthetic gene cluster (Nah et al. in Microb Cell Fact 14(1):1, 2015), implying that this strategy could be employed to develop a custom overexpression scheme of natural product pathway clusters present in actinomycetes. To validate the pSBAC system as a generally-applicable heterologous overexpression system for a large-sized polyketide biosynthetic gene cluster in Streptomyces, another model polyketide compound, the pikromycin biosynthetic gene cluster, was preciously cloned and heterologously expressed using the pSBAC system. A unique HindIII restriction site was precisely inserted at one of the border regions of the pikromycin biosynthetic gene cluster within the chromosome of Streptomyces venezuelae, followed by site-specific recombination of pSBAC into the flanking region of the pikromycin gene cluster. Unlike the previous cloning process, one HindIII site integration step was skipped through pSBAC modification. pPik001, a pSBAC containing the pikromycin biosynthetic gene cluster, was directly introduced into two heterologous hosts, Streptomyces lividans and Streptomyces coelicolor, resulting in the production of 10-deoxymethynolide, a major pikromycin derivative. When two entire pikromycin biosynthetic gene clusters were tandemly introduced into the S. lividans chromosome, overproduction of 10-deoxymethynolide and the presence of pikromycin, which was previously not detected, were both confirmed. Moreover, comparative qRT-PCR results confirmed that the transcription of pikromycin biosynthetic genes was significantly upregulated in S. lividans containing tandem

  20. Glycosulfatase-Encoding Gene Cluster in Bifidobacterium breve UCC2003.

    Science.gov (United States)

    Egan, Muireann; Jiang, Hao; O'Connell Motherway, Mary; Oscarson, Stefan; van Sinderen, Douwe

    2016-11-15

    Bifidobacteria constitute a specific group of commensal bacteria typically found in the gastrointestinal tract (GIT) of humans and other mammals. Bifidobacterium breve strains are numerically prevalent among the gut microbiota of many healthy breastfed infants. In the present study, we investigated glycosulfatase activity in a bacterial isolate from a nursling stool sample, B. breve UCC2003. Two putative sulfatases were identified on the genome of B. breve UCC2003. The sulfated monosaccharide N-acetylglucosamine-6-sulfate (GlcNAc-6-S) was shown to support the growth of B. breve UCC2003, while N-acetylglucosamine-3-sulfate, N-acetylgalactosamine-3-sulfate, and N-acetylgalactosamine-6-sulfate did not support appreciable growth. By using a combination of transcriptomic and functional genomic approaches, a gene cluster designated ats2 was shown to be specifically required for GlcNAc-6-S metabolism. Transcription of the ats2 cluster is regulated by a repressor open reading frame kinase (ROK) family transcriptional repressor. This study represents the first description of glycosulfatase activity within the Bifidobacterium genus. Bifidobacteria are saccharolytic organisms naturally found in the digestive tract of mammals and insects. Bifidobacterium breve strains utilize a variety of plant- and host-derived carbohydrates that allow them to be present as prominent members of the infant gut microbiota as well as being present in the gastrointestinal tract of adults. In this study, we introduce a previously unexplored area of carbohydrate metabolism in bifidobacteria, namely, the metabolism of sulfated carbohydrates. B. breve UCC2003 was shown to metabolize N-acetylglucosamine-6-sulfate (GlcNAc-6-S) through one of two sulfatase-encoding gene clusters identified on its genome. GlcNAc-6-S can be found in terminal or branched positions of mucin oligosaccharides, the glycoprotein component of the mucous layer that covers the digestive tract. The results of this study provide

  1. Tweets clustering using latent semantic analysis

    Science.gov (United States)

    Rasidi, Norsuhaili Mahamed; Bakar, Sakhinah Abu; Razak, Fatimah Abdul

    2017-04-01

    Social media are becoming overloaded with information due to the increasing number of information feeds. Unlike other social media, Twitter users are allowed to broadcast a short message called as `tweet". In this study, we extract tweets related to MH370 for certain of time. In this paper, we present overview of our approach for tweets clustering to analyze the users' responses toward tragedy of MH370. The tweets were clustered based on the frequency of terms obtained from the classification process. The method we used for the text classification is Latent Semantic Analysis. As a result, there are two types of tweets that response to MH370 tragedy which is emotional and non-emotional. We show some of our initial results to demonstrate the effectiveness of our approach.

  2. Physical and genetic map of the major nif gene cluster from Azotobacter vinelandii.

    OpenAIRE

    Jacobson, M R; Brigle, K E; Bennett, L T; Setterquist, R A; Wilson, M S; Cash, V L; Beynon, J; Newton, W E; Dean, D R

    1989-01-01

    Determination of a 28,793-base-pair DNA sequence of a region from the Azotobacter vinelandii genome that includes and flanks the nitrogenase structural gene region was completed. This information was used to revise the previously proposed organization of the major nif cluster. The major nif cluster from A. vinelandii encodes 15 nif-specific genes whose products bear significant structural identity to the corresponding nif-specific gene products from Klebsiella pneumoniae. These genes include ...

  3. Structure and gene cluster of the O-antigen of Escherichia coli O54.

    Science.gov (United States)

    Naumenko, Olesya I; Guo, Xi; Senchenkova, Sof'ya N; Geng, Peng; Perepelov, Andrei V; Shashkov, Alexander S; Liu, Bin; Knirel, Yuriy A

    2018-06-15

    Mild acid hydrolysis of the lipopolysaccharide of Escherichia coli O54 afforded an O-polysaccharide, which was studied by sugar analysis, solvolysis with anhydrous trifluoroacetic acid, and 1 H and 13 C NMR spectroscopy. Solvolysis cleaved predominantly the linkage of β-d-Ribf and, to a lesser extent, that of β-d-GlcpNAc, whereas the other linkages, including the linkage of α-l-Rhap, were stable under selected conditions (40 °C, 5 h). The following structure of the O-polysaccharide was established: →4)-α-d-GalpA-(1 → 2)-α-l-Rhap-(1 → 2)-β-d-Ribf-(1 → 4)-β-d-Galp-(1 → 3)-β-d-GlcpNAc-(1→ The O-antigen gene cluster of E. coli O54 was analyzed and found to be consistent in general with the O-polysaccharide structure established but there were two exceptions: i) in the cluster, there were genes for phosphoserine phosphatase and serine transferase, which have no apparent role in the O-polysaccharide synthesis, and ii) no ribofuranosyltransferase gene was present in the cluster. Both uncommon features are shared by some other enteric bacteria. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. Genetic clusters and sex-biased gene flow in a unicolonial Formica ant

    Directory of Open Access Journals (Sweden)

    Chapuisat Michel

    2009-03-01

    Full Text Available Abstract Background Animal societies are diverse, ranging from small family-based groups to extraordinarily large social networks in which many unrelated individuals interact. At the extreme of this continuum, some ant species form unicolonial populations in which workers and queens can move among multiple interconnected nests without eliciting aggression. Although unicoloniality has been mostly studied in invasive ants, it also occurs in some native non-invasive species. Unicoloniality is commonly associated with very high queen number, which may result in levels of relatedness among nestmates being so low as to raise the question of the maintenance of altruism by kin selection in such systems. However, the actual relatedness among cooperating individuals critically depends on effective dispersal and the ensuing pattern of genetic structuring. In order to better understand the evolution of unicoloniality in native non-invasive ants, we investigated the fine-scale population genetic structure and gene flow in three unicolonial populations of the wood ant F. paralugubris. Results The analysis of geo-referenced microsatellite genotypes and mitochondrial haplotypes revealed the presence of cryptic clusters of genetically-differentiated nests in the three populations of F. paralugubris. Because of this spatial genetic heterogeneity, members of the same clusters were moderately but significantly related. The comparison of nuclear (microsatellite and mitochondrial differentiation indicated that effective gene flow was male-biased in all populations. Conclusion The three unicolonial populations exhibited male-biased and mostly local gene flow. The high number of queens per nest, exchanges among neighbouring nests and restricted long-distance gene flow resulted in large clusters of genetically similar nests. The positive relatedness among clustermates suggests that kin selection may still contribute to the maintenance of altruism in unicolonial

  5. Multisource Images Analysis Using Collaborative Clustering

    Directory of Open Access Journals (Sweden)

    Pierre Gançarski

    2008-04-01

    Full Text Available The development of very high-resolution (VHR satellite imagery has produced a huge amount of data. The multiplication of satellites which embed different types of sensors provides a lot of heterogeneous images. Consequently, the image analyst has often many different images available, representing the same area of the Earth surface. These images can be from different dates, produced by different sensors, or even at different resolutions. The lack of machine learning tools using all these representations in an overall process constraints to a sequential analysis of these various images. In order to use all the information available simultaneously, we propose a framework where different algorithms can use different views of the scene. Each one works on a different remotely sensed image and, thus, produces different and useful information. These algorithms work together in a collaborative way through an automatic and mutual refinement of their results, so that all the results have almost the same number of clusters, which are statistically similar. Finally, a unique result is produced, representing a consensus among the information obtained by each clustering method on its own image. The unified result and the complementarity of the single results (i.e., the agreement between the clustering methods as well as the disagreement lead to a better understanding of the scene. The experiments carried out on multispectral remote sensing images have shown that this method is efficient to extract relevant information and to improve the scene understanding.

  6. A phylogenomic gene cluster resource: The phylogeneticallyinferred groups (PhlGs) database

    Energy Technology Data Exchange (ETDEWEB)

    Dehal, Paramvir S.; Boore, Jeffrey L.

    2005-08-25

    We present here the PhIGs database, a phylogenomic resource for sequenced genomes. Although many methods exist for clustering gene families, very few attempt to create truly orthologous clusters sharing descent from a single ancestral gene across a range of evolutionary depths. Although these non-phylogenetic gene family clusters have been used broadly for gene annotation, errors are known to be introduced by the artifactual association of slowly evolving paralogs and lack of annotation for those more rapidly evolving. A full phylogenetic framework is necessary for accurate inference of function and for many studies that address pattern and mechanism of the evolution of the genome. The automated generation of evolutionary gene clusters, creation of gene trees, determination of orthology and paralogy relationships, and the correlation of this information with gene annotations, expression information, and genomic context is an important resource to the scientific community.

  7. The complete coenzyme B12 biosynthesis gene cluster of Lactobacillus reuteri CRL 1098

    NARCIS (Netherlands)

    Santos, dos F.; Vera, J.L.; Heijden, van der R.; Valdez, G.F.; Vos, de W.M.; Sesma, F.; Hugenholtz, J.

    2008-01-01

    The coenzyme B12 production pathway in Lactobacillus reuteri has been deduced using a combination of genetic, biochemical and bioinformatics approaches. The coenzyme B12 gene cluster of Lb. reuteri CRL1098 has the unique feature of clustering together the cbi, cob and hem genes. It consists of 29

  8. Variation in sequence and location of the fumonisin mycotoxin niosynthetic gene cluster in Fusarium

    NARCIS (Netherlands)

    Proctor, R.H.; Hove, van F.; Susca, A.; Stea, A.; Busman, M.; Lee, van der T.A.J.; Waalwijk, C.; Moretti, A.

    2010-01-01

    In Fusarium, the ability to produce fumonisins is governed by a 17-gene fumonisin biosynthetic gene (FUM) cluster. Here, we examined the cluster in F. oxysporum strain O-1890 and nine other species selected to represent a wide range of the genetic diversity within the GFSC.

  9. Dominant control region of the human β- like globin gene cluster

    NARCIS (Netherlands)

    Blom van Assendelft, Margaretha van

    1989-01-01

    The structure and regulation of the human β -like globin gene cluster has been studied extensively. Genetic disorders connected with this gene cluster are responsible for human diseases associated with high levels of morbidity and mortality, such as β-thalassaemia and sickle cell anaemia. The work

  10. Analysis of the Aspergillus fumigatus proteome reveals metabolic changes and the activation of the pseurotin A biosynthesis gene cluster in response to hypoxia.

    Science.gov (United States)

    Vödisch, Martin; Scherlach, Kirstin; Winkler, Robert; Hertweck, Christian; Braun, Hans-Peter; Roth, Martin; Haas, Hubertus; Werner, Ernst R; Brakhage, Axel A; Kniemeyer, Olaf

    2011-05-06

    The mold Aspergillus fumigatus is the most important airborne fungal pathogen. Adaptation to hypoxia represents an important virulence attribute for A. fumigatus. Therefore, we aimed at obtaining a comprehensive overview about this process on the proteome level. To ensure highly reproducible growth conditions, an oxygen-controlled, glucose-limited chemostat cultivation was established. Two-dimensional gel electrophoresis analysis of mycelial and mitochondrial proteins as well as two-dimensional Blue Native/SDS-gel separation of mitochondrial membrane proteins led to the identification of 117 proteins with an altered abundance under hypoxic in comparison to normoxic conditions. Hypoxia induced an increased activity of glycolysis, the TCA-cycle, respiration, and amino acid metabolism. Consistently, the cellular contents in heme, iron, copper, and zinc increased. Furthermore, hypoxia induced biosynthesis of the secondary metabolite pseurotin A as demonstrated at proteomic, transcriptional, and metabolite levels. The observed and so far not reported stimulation of the biosynthesis of a secondary metabolite by oxygen depletion may also affect the survival of A. fumigatus in hypoxic niches of the human host. Among the proteins so far not implicated in hypoxia adaptation, an NO-detoxifying flavohemoprotein was one of the most highly up-regulated proteins which indicates a link between hypoxia and the generation of nitrosative stress in A. fumigatus.

  11. Transcriptional organization of the DNA region controlling expression of the K99 gene cluster.

    Science.gov (United States)

    Roosendaal, B; Damoiseaux, J; Jordi, W; de Graaf, F K

    1989-01-01

    The transcriptional organization of the K99 gene cluster was investigated in two ways. First, the DNA region, containing the transcriptional signals was analyzed using a transcription vector system with Escherichia coli galactokinase (GalK) as assayable marker and second, an in vitro transcription system was employed. A detailed analysis of the transcription signals revealed that a strong promoter PA and a moderate promoter PB are located upstream of fanA and fanB, respectively. No promoter activity was detected in the intercistronic region between fanB and fanC. Factor-dependent terminators of transcription were detected and are probably located in the intercistronic region between fanA and fanB (T1), and between fanB and fanC (T2). A third terminator (T3) was observed between fanC and fanD and has an efficiency of 90%. Analysis of the regulatory region in an in vitro transcription system confirmed the location of the respective transcription signals. A model for the transcriptional organization of the K99 cluster is presented. Indications were obtained that the trans-acting regulatory polypeptides FanA and FanB both function as anti-terminators. A model for the regulation of expression of the K99 gene cluster is postulated.

  12. Constructing storyboards based on hierarchical clustering analysis

    Science.gov (United States)

    Hasebe, Satoshi; Sami, Mustafa M.; Muramatsu, Shogo; Kikuchi, Hisakazu

    2005-07-01

    There are growing needs for quick preview of video contents for the purpose of improving accessibility of video archives as well as reducing network traffics. In this paper, a storyboard that contains a user-specified number of keyframes is produced from a given video sequence. It is based on hierarchical cluster analysis of feature vectors that are derived from wavelet coefficients of video frames. Consistent use of extracted feature vectors is the key to avoid a repetition of computationally-intensive parsing of the same video sequence. Experimental results suggest that a significant reduction in computational time is gained by this strategy.

  13. Cloning and Characterization of the Polyether Salinomycin Biosynthesis Gene Cluster of Streptomyces albus XM211

    Science.gov (United States)

    Jiang, Chunyan; Wang, Hougen; Kang, Qianjin; Liu, Jing

    2012-01-01

    Salinomycin is widely used in animal husbandry as a food additive due to its antibacterial and anticoccidial activities. However, its biosynthesis had only been studied by feeding experiments with isotope-labeled precursors. A strategy with degenerate primers based on the polyether-specific epoxidase sequences was successfully developed to clone the salinomycin gene cluster. Using this strategy, a putative epoxidase gene, slnC, was cloned from the salinomycin producer Streptomyces albus XM211. The targeted replacement of slnC and subsequent trans-complementation proved its involvement in salinomycin biosynthesis. A 127-kb DNA region containing slnC was sequenced, including genes for polyketide assembly and release, oxidative cyclization, modification, export, and regulation. In order to gain insight into the salinomycin biosynthesis mechanism, 13 gene replacements and deletions were conducted. Including slnC, 7 genes were identified as essential for salinomycin biosynthesis and putatively responsible for polyketide chain release, oxidative cyclization, modification, and regulation. Moreover, 6 genes were found to be relevant to salinomycin biosynthesis and possibly involved in precursor supply, removal of aberrant extender units, and regulation. Sequence analysis and a series of gene replacements suggest a proposed pathway for the biosynthesis of salinomycin. The information presented here expands the understanding of polyether biosynthesis mechanisms and paves the way for targeted engineering of salinomycin activity and productivity. PMID:22156425

  14. Association of paraoxonase gene cluster polymorphisms with ALS in France, Quebec, and Sweden.

    Science.gov (United States)

    Valdmanis, P N; Kabashi, E; Dyck, A; Hince, P; Lee, J; Dion, P; D'Amour, M; Souchon, F; Bouchard, J-P; Salachas, F; Meininger, V; Andersen, P M; Camu, W; Dupré, N; Rouleau, G A

    2008-08-12

    The paraoxonase gene cluster on chromosome 7 comprising the PON1-3 genes is an attractive candidate for association in amyotrophic lateral sclerosis (ALS) given the role of paraoxonase genes during the response to oxidative stress and their contribution to the enzymatic break down of nerve toxins. Oxidative stress is considered one of the mechanisms involved in ALS pathogenesis. Evidence for this includes the fact that mutations of SOD1, which normally reduce the production of toxic superoxide anion, account for 12% to 23% of familial cases in ALS. In addition, PON variants were shown to be associated with susceptibility to ALS in several North American and European populations. We extended this analysis to examine 20 single nucleotide polymorphisms (SNPs) across the PON gene cluster in a set of patients from France (480 cases, 475 controls), Quebec (159 cases, 95 controls), and Sweden (558 cases, 506 controls). Although individual SNPs were not considered associated on their own, a haplotype of SNPs at the C-terminal portion of PON2 that includes the PON2 C311S amino acid change was significant in the French (p value 0.0075) and Quebec (p value 0.026) populations as well as all three populations combined (p value 1.69 x 10(-6)). Stratification of the samples showed that this variation was pertinent to ALS susceptibility as a whole, and not to a particular subset of patients. These findings contribute to the increasing weight of evidence that genetic variants in the paraoxonase gene cluster are associated with amyotrophic lateral sclerosis.

  15. CAR gene cluster and transcript levels of carotenogenic genes in Rhodotorula mucilaginosa.

    Science.gov (United States)

    Landolfo, Sara; Ianiri, Giuseppe; Camiolo, Salvatore; Porceddu, Andrea; Mulas, Giuliana; Chessa, Rossella; Zara, Giacomo; Mannazzu, Ilaria

    2018-01-01

    A molecular approach was applied to the study of the carotenoid biosynthetic pathway of Rhodotorula mucilaginosa. At first, functional annotation of the genome of R. mucilaginosa C2.5t1 was carried out and gene ontology categories were assigned to 4033 predicted proteins. Then, a set of genes involved in different steps of carotenogenesis was identified and those coding for phytoene desaturase, phytoene synthase/lycopene cyclase and carotenoid dioxygenase (CAR genes) proved to be clustered within a region of ~10 kb. Quantitative PCR of the genes involved in carotenoid biosynthesis showed that genes coding for 3-hydroxy-3-methylglutharyl-CoA reductase and mevalonate kinase are induced during exponential phase while no clear trend of induction was observed for phytoene synthase/lycopene cyclase and phytoene dehydrogenase encoding genes. Thus, in R. mucilaginosa the induction of genes involved in the early steps of carotenoid biosynthesis is transient and accompanies the onset of carotenoid production, while that of CAR genes does not correlate with the amount of carotenoids produced. The transcript levels of genes coding for carotenoid dioxygenase, superoxide dismutase and catalase A increased during the accumulation of carotenoids, thus suggesting the activation of a mechanism aimed at the protection of cell structures from oxidative stress during carotenoid biosynthesis. The data presented herein, besides being suitable for the elucidation of the mechanisms that underlie carotenoid biosynthesis, will contribute to boosting the biotechnological potential of this yeast by improving the outcome of further research efforts aimed at also exploring other features of interest.

  16. A recently transferred cluster of bacterial genes in Trichomonas vaginalis - lateral gene transfer and the fate of acquired genes

    Science.gov (United States)

    2014-01-01

    Background Lateral Gene Transfer (LGT) has recently gained recognition as an important contributor to some eukaryote proteomes, but the mechanisms of acquisition and fixation in eukaryotic genomes are still uncertain. A previously defined norm for LGTs in microbial eukaryotes states that the majority are genes involved in metabolism, the LGTs are typically localized one by one, surrounded by vertically inherited genes on the chromosome, and phylogenetics shows that a broad collection of bacterial lineages have contributed to the transferome. Results A unique 34 kbp long fragment with 27 clustered genes (TvLF) of prokaryote origin was identified in the sequenced genome of the protozoan parasite Trichomonas vaginalis. Using a PCR based approach we confirmed the presence of the orthologous fragment in four additional T. vaginalis strains. Detailed sequence analyses unambiguously suggest that TvLF is the result of one single, recent LGT event. The proposed donor is a close relative to the firmicute bacterium Peptoniphilus harei. High nucleotide sequence similarity between T. vaginalis strains, as well as to P. harei, and the absence of homologs in other Trichomonas species, suggests that the transfer event took place after the radiation of the genus Trichomonas. Some genes have undergone pseudogenization and degradation, indicating that they may not be retained in the future. Functional annotations reveal that genes involved in informational processes are particularly prone to degradation. Conclusions We conclude that, although the majority of eukaryote LGTs are single gene occurrences, they may be acquired in clusters of several genes that are subsequently cleansed of evolutionarily less advantageous genes. PMID:24898731

  17. Genomic characterization of a new endophytic Streptomyces kebangsaanensis identifies biosynthetic pathway gene clusters for novel phenazine antibiotic production

    Directory of Open Access Journals (Sweden)

    Juwairiah Remali

    2017-11-01

    Full Text Available Background Streptomyces are well known for their capability to produce many bioactive secondary metabolites with medical and industrial importance. Here we report a novel bioactive phenazine compound, 6-((2-hydroxy-4-methoxyphenoxy carbonyl phenazine-1-carboxylic acid (HCPCA extracted from Streptomyces kebangsaanensis, an endophyte isolated from the ethnomedicinal Portulaca oleracea. Methods The HCPCA chemical structure was determined using nuclear magnetic resonance spectroscopy. We conducted whole genome sequencing for the identification of the gene cluster(s believed to be responsible for phenazine biosynthesis in order to map its corresponding pathway, in addition to bioinformatics analysis to assess the potential of S. kebangsaanensis in producing other useful secondary metabolites. Results The S. kebangsaanensis genome comprises an 8,328,719 bp linear chromosome with high GC content (71.35% consisting of 12 rRNA operons, 81 tRNA, and 7,558 protein coding genes. We identified 24 gene clusters involved in polyketide, nonribosomal peptide, terpene, bacteriocin, and siderophore biosynthesis, as well as a gene cluster predicted to be responsible for phenazine biosynthesis. Discussion The HCPCA phenazine structure was hypothesized to derive from the combination of two biosynthetic pathways, phenazine-1,6-dicarboxylic acid and 4-methoxybenzene-1,2-diol, originated from the shikimic acid pathway. The identification of a biosynthesis pathway gene cluster for phenazine antibiotics might facilitate future genetic engineering design of new synthetic phenazine antibiotics. Additionally, these findings confirm the potential of S. kebangsaanensis for producing various antibiotics and secondary metabolites.

  18. Regulatory role of tetR gene in a novel gene cluster of Acidovorax avenae subsp. avenae RS-1 under oxidative stress

    OpenAIRE

    Liu, He; Yang, Chun-Lan; Ge, Meng-Yu; Ibrahim, Muhammad; Li, Bin; Zhao, Wen-Jun; Chen, Gong-You; Zhu, Bo; Xie, Guan-Lin

    2014-01-01

    Acidovorax avenae subsp. avenae is the causal agent of bacterial brown stripe disease in rice. In this study, we characterized a novel horizontal transfer of a gene cluster, including tetR, on the chromosome of A. avenae subsp. avenae RS-1 by genome-wide analysis. TetR acted as a repressor in this gene cluster and the oxidative stress resistance was enhanced in tetR-deletion mutant strain. Electrophoretic mobility shift assay demonstrated that TetR regulator bound directly to the promoter of ...

  19. Gene expression patterns of oxidative phosphorylation complex I subunits are organized in clusters.

    Directory of Open Access Journals (Sweden)

    Yael Garbian

    Full Text Available After the radiation of eukaryotes, the NUO operon, controlling the transcription of the NADH dehydrogenase complex of the oxidative phosphorylation system (OXPHOS complex I, was broken down and genes encoding this protein complex were dispersed across the nuclear genome. Seven genes, however, were retained in the genome of the mitochondrion, the ancient symbiote of eukaryotes. This division, in combination with the three-fold increase in subunit number from bacteria (N = approximately 14 to man (N = 45, renders the transcription regulation of OXPHOS complex I a challenge. Recently bioinformatics analysis of the promoter regions of all OXPHOS genes in mammals supported patterns of co-regulation, suggesting that natural selection favored a mechanism facilitating the transcriptional regulatory control of genes encoding subunits of these large protein complexes. Here, using real time PCR of mitochondrial (mtDNA- and nuclear DNA (nDNA-encoded transcripts in a panel of 13 different human tissues, we show that the expression pattern of OXPHOS complex I genes is regulated in several clusters. Firstly, all mtDNA-encoded complex I subunits (N = 7 share a similar expression pattern, distinct from all tested nDNA-encoded subunits (N = 10. Secondly, two sub-clusters of nDNA-encoded transcripts with significantly different expression patterns were observed. Thirdly, the expression patterns of two nDNA-encoded genes, NDUFA4 and NDUFA5, notably diverged from the rest of the nDNA-encoded subunits, suggesting a certain degree of tissue specificity. Finally, the expression pattern of the mtDNA-encoded ND4L gene diverged from the rest of the tested mtDNA-encoded transcripts that are regulated by the same promoter, consistent with post-transcriptional regulation. These findings suggest, for the first time, that the regulation of complex I subunits expression in humans is complex rather than reflecting global co-regulation.

  20. Identification of the Regulator Gene Responsible for the Acetone-Responsive Expression of the Binuclear Iron Monooxygenase Gene Cluster in Mycobacteria ▿

    Science.gov (United States)

    Furuya, Toshiki; Hirose, Satomi; Semba, Hisashi; Kino, Kuniki

    2011-01-01

    The mimABCD gene cluster encodes the binuclear iron monooxygenase that oxidizes propane and phenol in Mycobacterium smegmatis strain MC2 155 and Mycobacterium goodii strain 12523. Interestingly, expression of the mimABCD gene cluster is induced by acetone. In this study, we investigated the regulator gene responsible for this acetone-responsive expression. In the genome sequence of M. smegmatis strain MC2 155, the mimABCD gene cluster is preceded by a gene designated mimR, which is divergently transcribed. Sequence analysis revealed that MimR exhibits amino acid similarity with the NtrC family of transcriptional activators, including AcxR and AcoR, which are involved in acetone and acetoin metabolism, respectively. Unexpectedly, many homologs of the mimR gene were also found in the sequenced genomes of actinomycetes. A plasmid carrying a transcriptional fusion of the intergenic region between the mimR and mimA genes with a promoterless green fluorescent protein (GFP) gene was constructed and introduced into M. smegmatis strain MC2 155. Using a GFP reporter system, we confirmed by deletion and complementation analyses that the mimR gene product is the positive regulator of the mimABCD gene cluster expression that is responsive to acetone. M. goodii strain 12523 also utilized the same regulatory system as M. smegmatis strain MC2 155. Although transcriptional activators of the NtrC family generally control transcription using the σ54 factor, a gene encoding the σ54 factor was absent from the genome sequence of M. smegmatis strain MC2 155. These results suggest the presence of a novel regulatory system in actinomycetes, including mycobacteria. PMID:21856847

  1. Functional characterization of KanP, a methyltransferase from the kanamycin biosynthetic gene cluster of Streptomyces kanamyceticus.

    Science.gov (United States)

    Nepal, Keshav Kumar; Yoo, Jin Cheol; Sohng, Jae Kyung

    2010-09-20

    KanP, a putative methyltransferase, is located in the kanamycin biosynthetic gene cluster of Streptomyces kanamyceticus ATCC12853. Amino acid sequence analysis of KanP revealed the presence of S-adenosyl-L-methionine binding motifs, which are present in other O-methyltransferases. The kanP gene was expressed in Escherichia coli BL21 (DE3) to generate the E. coli KANP recombinant strain. The conversion of external quercetin to methylated quercetin in the culture extract of E. coli KANP proved the function of kanP as S-adenosyl-L-methionine-dependent methyltransferase. This is the first report concerning the identification of an O-methyltransferase gene from the kanamycin gene cluster. The resistant activity assay and RT-PCR analysis demonstrated the leeway for obtaining methylated kanamycin derivatives from the wild-type strain of kanamycin producer. 2009 Elsevier GmbH. All rights reserved.

  2. Cluster Analysis of Maize Inbred Lines

    Directory of Open Access Journals (Sweden)

    Jiban Shrestha

    2016-12-01

    Full Text Available The determination of diversity among inbred lines is important for heterosis breeding. Sixty maize inbred lines were evaluated for their eight agro morphological traits during winter season of 2011 to analyze their genetic diversity. Clustering was done by average linkage method. The inbred lines were grouped into six clusters. Inbred lines grouped into Clusters II had taller plants with maximum number of leaves. The cluster III was characterized with shorter plants with minimum number of leaves. The inbred lines categorized into cluster V had early flowering whereas the group into cluster VI had late flowering time. The inbred lines grouped into the cluster III were characterized by higher value of anthesis silking interval (ASI and those of cluster VI had lower value of ASI. These results showed that the inbred lines having widely divergent clusters can be utilized in hybrid breeding programme.

  3. Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering.

    Science.gov (United States)

    Liu, Ying; Ciliax, Brian J; Borges, Karin; Dasigi, Venu; Ram, Ashwin; Navathe, Shamkant B; Dingledine, Ray

    2004-01-01

    One of the key challenges of microarray studies is to derive biological insights from the unprecedented quatities of data on gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the nature of the functional links among genes within the derived clusters. However, the quality of the keyword lists extracted from biomedical literature for each gene significantly affects the clustering results. We extracted keywords from MEDLINE that describes the most prominent functions of the genes, and used the resulting weights of the keywords as feature vectors for gene clustering. By analyzing the resulting cluster quality, we compared two keyword weighting schemes: normalized z-score and term frequency-inverse document frequency (TFIDF). The best combination of background comparison set, stop list and stemming algorithm was selected based on precision and recall metrics. In a test set of four known gene groups, a hierarchical algorithm correctly assigned 25 of 26 genes to the appropriate clusters based on keywords extracted by the TDFIDF weighting scheme, but only 23 og 26 with the z-score method. To evaluate the effectiveness of the weighting schemes for keyword extraction for gene clusters from microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle were used as a second test set. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords had higher purity, lower entropy, and higher mutual information than those produced from normalized z-score weighted keywords. The optimized algorithms should be useful for sorting genes from microarray lists into functionally discrete clusters.

  4. Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer

    Directory of Open Access Journals (Sweden)

    Wolf Yuri I

    2012-12-01

    Full Text Available Abstract Background Collections of Clusters of Orthologous Genes (COGs provide indispensable tools for comparative genomic analysis, evolutionary reconstruction and functional annotation of new genomes. Initially, COGs were made for all complete genomes of cellular life forms that were available at the time. However, with the accumulation of thousands of complete genomes, construction of a comprehensive COG set has become extremely computationally demanding and prone to error propagation, necessitating the switch to taxon-specific COG collections. Previously, we reported the collection of COGs for 41 genomes of Archaea (arCOGs. Here we present a major update of the arCOGs and describe evolutionary reconstructions to reveal general trends in the evolution of Archaea. Results The updated version of the arCOG database incorporates 91% of the pangenome of 120 archaea (251,032 protein-coding genes altogether into 10,335 arCOGs. Using this new set of arCOGs, we performed maximum likelihood reconstruction of the genome content of archaeal ancestral forms and gene gain and loss events in archaeal evolution. This reconstruction shows that the last Common Ancestor of the extant Archaea was an organism of greater complexity than most of the extant archaea, probably with over 2,500 protein-coding genes. The subsequent evolution of almost all archaeal lineages was apparently dominated by gene loss resulting in genome streamlining. Overall, in the evolution of Archaea as well as a representative set of bacteria that was similarly analyzed for comparison, gene losses are estimated to outnumber gene gains at least 4 to 1. Analysis of specific patterns of gene gain in Archaea shows that, although some groups, in particular Halobacteria, acquire substantially more genes than others, on the whole, gene exchange between major groups of Archaea appears to be largely random, with no major ‘highways’ of horizontal gene transfer. Conclusions The updated collection

  5. Sequence analysis of porothramycin biosynthetic gene cluster

    Czech Academy of Sciences Publication Activity Database

    Najmanová, Lucie; Ulanová, Dana; Jelínková, Markéta; Kameník, Zdeněk; Kettnerová, Eliška; Koběrská, Markéta; Gažák, Radek; Radojevič, Bojana; Janata, Jiří

    2014-01-01

    Roč. 59, č. 6 (2014), s. 543-552 ISSN 0015-5632 R&D Projects: GA MŠk(CZ) ED1.1.00/02.0109; GA MŠk(CZ) EE2.3.20.0055; GA MŠk(CZ) EE2.3.30.0003 Institutional support: RVO:61388971 Keywords : BIOLOGICAL-ACTIVITY * ANTHRAMYCIN * SPECIFICITY Subject RIV: EE - Microbiology, Virology Impact factor: 1.000, year: 2014

  6. Regulatory role of tetR gene in a novel gene cluster of Acidovorax avenae subsp. avenae RS-1 under oxidative stress

    Directory of Open Access Journals (Sweden)

    He eLiu

    2014-10-01

    Full Text Available Acidovorax avenae subsp. avenae is the causal agent of bacterial brown stripe disease in rice. In this study, we characterized a novel horizontal transfer of a gene cluster, including tetR, on the chromosome of A. avenae subsp. avenae RS-1 by genome-wide analysis. TetR acted as a repressor in this gene cluster and the oxidative stress resistance was enhanced in tetR-deletion mutant strain. Electrophoretic mobility shift assay (EMSA demonstrated that TetR regulator bound directly to the promoter of this gene cluster. Consistently, the results of quantitative real-time PCR also showed alterations in expression of associated genes. Moreover, the proteins affected by TetR under oxidative stress were revealed by comparing proteomic profiles of wild-type and mutant strains via 1D SDS-PAGE and LC-MS/MS analyses. Taken together, our results demonstrated that tetR gene in this novel gene cluster contributed to cell survival under oxidative stress, and TetR protein played an important regulatory role in growth kinetics, biofilm-forming capability, SOD and catalase activity, and oxide detoxicating ability.

  7. Regulatory role of tetR gene in a novel gene cluster of Acidovorax avenae subsp. avenae RS-1 under oxidative stress.

    Science.gov (United States)

    Liu, He; Yang, Chun-Lan; Ge, Meng-Yu; Ibrahim, Muhammad; Li, Bin; Zhao, Wen-Jun; Chen, Gong-You; Zhu, Bo; Xie, Guan-Lin

    2014-01-01

    Acidovorax avenae subsp. avenae is the causal agent of bacterial brown stripe disease in rice. In this study, we characterized a novel horizontal transfer of a gene cluster, including tetR, on the chromosome of A. avenae subsp. avenae RS-1 by genome-wide analysis. TetR acted as a repressor in this gene cluster and the oxidative stress resistance was enhanced in tetR-deletion mutant strain. Electrophoretic mobility shift assay demonstrated that TetR regulator bound directly to the promoter of this gene cluster. Consistently, the results of quantitative real-time PCR also showed alterations in expression of associated genes. Moreover, the proteins affected by TetR under oxidative stress were revealed by comparing proteomic profiles of wild-type and mutant strains via 1D SDS-PAGE and LC-MS/MS analyses. Taken together, our results demonstrated that tetR gene in this novel gene cluster contributed to cell survival under oxidative stress, and TetR protein played an important regulatory role in growth kinetics, biofilm-forming capability, superoxide dismutase and catalase activity, and oxide detoxicating ability.

  8. SU-E-J-98: Radiogenomics: Correspondence Between Imaging and Genetic Features Based On Clustering Analysis

    International Nuclear Information System (INIS)

    Harmon, S; Wendelberger, B; Jeraj, R

    2014-01-01

    Purpose: Radiogenomics aims to establish relationships between patient genotypes and imaging phenotypes. An open question remains on how best to integrate information from these distinct datasets. This work investigates if similarities in genetic features across patients correspond to similarities in PET-imaging features, assessed with various clustering algorithms. Methods: [ 18 F]FDG PET data was obtained for 26 NSCLC patients from a public database (TCIA). Tumors were contoured using an in-house segmentation algorithm combining gradient and region-growing techniques; resulting ROIs were used to extract 54 PET-based features. Corresponding genetic microarray data containing 48,778 elements were also obtained for each tumor. Given mismatch in feature sizes, two dimension reduction techniques were also applied to the genetic data: principle component analysis (PCA) and selective filtering of 25 NSCLC-associated genes-ofinterest (GOI). Gene datasets (full, PCA, and GOI) and PET feature datasets were independently clustered using K-means and hierarchical clustering using variable number of clusters (K). Jaccard Index (JI) was used to score similarity of cluster assignments across different datasets. Results: Patient clusters from imaging data showed poor similarity to clusters from gene datasets, regardless of clustering algorithms or number of clusters (JI mean = 0.3429±0.1623). Notably, we found clustering algorithms had different sensitivities to data reduction techniques. Using hierarchical clustering, the PCA dataset showed perfect cluster agreement to the full-gene set (JI =1) for all values of K, and the agreement between the GOI set and the full-gene set decreased as number of clusters increased (JI=0.9231 and 0.5769 for K=2 and 5, respectively). K-means clustering assignments were highly sensitive to data reduction and showed poor stability for different values of K (JI range : 0.2301–1). Conclusion: Using commonly-used clustering algorithms, we found

  9. SU-E-J-98: Radiogenomics: Correspondence Between Imaging and Genetic Features Based On Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Harmon, S; Wendelberger, B [University of Wisconsin-Madison, Madison, WI (United States); Jeraj, R [University of Wisconsin-Madison, Madison, WI (United States); University of Ljubljana (Slovenia)

    2014-06-01

    Purpose: Radiogenomics aims to establish relationships between patient genotypes and imaging phenotypes. An open question remains on how best to integrate information from these distinct datasets. This work investigates if similarities in genetic features across patients correspond to similarities in PET-imaging features, assessed with various clustering algorithms. Methods: [{sup 18}F]FDG PET data was obtained for 26 NSCLC patients from a public database (TCIA). Tumors were contoured using an in-house segmentation algorithm combining gradient and region-growing techniques; resulting ROIs were used to extract 54 PET-based features. Corresponding genetic microarray data containing 48,778 elements were also obtained for each tumor. Given mismatch in feature sizes, two dimension reduction techniques were also applied to the genetic data: principle component analysis (PCA) and selective filtering of 25 NSCLC-associated genes-ofinterest (GOI). Gene datasets (full, PCA, and GOI) and PET feature datasets were independently clustered using K-means and hierarchical clustering using variable number of clusters (K). Jaccard Index (JI) was used to score similarity of cluster assignments across different datasets. Results: Patient clusters from imaging data showed poor similarity to clusters from gene datasets, regardless of clustering algorithms or number of clusters (JI{sub mean}= 0.3429±0.1623). Notably, we found clustering algorithms had different sensitivities to data reduction techniques. Using hierarchical clustering, the PCA dataset showed perfect cluster agreement to the full-gene set (JI =1) for all values of K, and the agreement between the GOI set and the full-gene set decreased as number of clusters increased (JI=0.9231 and 0.5769 for K=2 and 5, respectively). K-means clustering assignments were highly sensitive to data reduction and showed poor stability for different values of K (JI{sub range}: 0.2301–1). Conclusion: Using commonly-used clustering algorithms

  10. antiSMASH 4.0-improvements in chemistry prediction and gene cluster boundary identification

    DEFF Research Database (Denmark)

    Blin, Kai; Wolf, Thomas; Chevrette, Marc G.

    2017-01-01

    Many antibiotics, chemotherapeutics, crop protection agents and food preservatives originate from molecules produced by bacteria, fungi or plants. In recent years, genome mining methodologies have been widely adopted to identify and characterize the biosynthetic gene clusters encoding...... the production of such compounds. Since 2011, the 'antibiotics and secondary metabolite analysis shell-antiSMASH' has assisted researchers in efficiently performing this, both as a web server and a standalone tool. Here, we present the thoroughly updated antiSMASH version 4, which adds several novel features...

  11. Conserved syntenic clusters of protein coding genes are missing in birds.

    Science.gov (United States)

    Lovell, Peter V; Wirthlin, Morgan; Wilhelm, Larry; Minx, Patrick; Lazar, Nathan H; Carbone, Lucia; Warren, Wesley C; Mello, Claudio V

    2014-01-01

    Birds are one of the most highly successful and diverse groups of vertebrates, having evolved a number of distinct characteristics, including feathers and wings, a sturdy lightweight skeleton and unique respiratory and urinary/excretion systems. However, the genetic basis of these traits is poorly understood. Using comparative genomics based on extensive searches of 60 avian genomes, we have found that birds lack approximately 274 protein coding genes that are present in the genomes of most vertebrate lineages and are for the most part organized in conserved syntenic clusters in non-avian sauropsids and in humans. These genes are located in regions associated with chromosomal rearrangements, and are largely present in crocodiles, suggesting that their loss occurred subsequent to the split of dinosaurs/birds from crocodilians. Many of these genes are associated with lethality in rodents, human genetic disorders, or biological functions targeting various tissues. Functional enrichment analysis combined with orthogroup analysis and paralog searches revealed enrichments that were shared by non-avian species, present only in birds, or shared between all species. Together these results provide a clearer definition of the genetic background of extant birds, extend the findings of previous studies on missing avian genes, and provide clues about molecular events that shaped avian evolution. They also have implications for fields that largely benefit from avian studies, including development, immune system, oncogenesis, and brain function and cognition. With regards to the missing genes, birds can be considered ‘natural knockouts’ that may become invaluable model organisms for several human diseases.

  12. Analysis of genetic association using hierarchical clustering and cluster validation indices.

    Science.gov (United States)

    Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L

    2017-10-01

    It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. Cluster analysis of word frequency dynamics

    Science.gov (United States)

    Maslennikova, Yu S.; Bochkarev, V. V.; Belashova, I. A.

    2015-01-01

    This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations.

  14. Cluster analysis of word frequency dynamics

    International Nuclear Information System (INIS)

    Maslennikova, Yu S; Bochkarev, V V; Belashova, I A

    2015-01-01

    This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations

  15. Leveraging long sequencing reads to investigate R-gene clustering and variation in sugar beet

    Science.gov (United States)

    Host-pathogen interactions are of prime importance to modern agriculture. Plants utilize various types of resistance genes to mitigate pathogen damage. Identification of the specific gene responsible for a specific resistance can be difficult due to duplication and clustering within R-gene families....

  16. From virtual clustering analysis to self-consistent clustering analysis: a mathematical study

    Science.gov (United States)

    Tang, Shaoqiang; Zhang, Lei; Liu, Wing Kam

    2018-03-01

    In this paper, we propose a new homogenization algorithm, virtual clustering analysis (VCA), as well as provide a mathematical framework for the recently proposed self-consistent clustering analysis (SCA) (Liu et al. in Comput Methods Appl Mech Eng 306:319-341, 2016). In the mathematical theory, we clarify the key assumptions and ideas of VCA and SCA, and derive the continuous and discrete Lippmann-Schwinger equations. Based on a key postulation of "once response similarly, always response similarly", clustering is performed in an offline stage by machine learning techniques (k-means and SOM), and facilitates substantial reduction of computational complexity in an online predictive stage. The clear mathematical setup allows for the first time a convergence study of clustering refinement in one space dimension. Convergence is proved rigorously, and found to be of second order from numerical investigations. Furthermore, we propose to suitably enlarge the domain in VCA, such that the boundary terms may be neglected in the Lippmann-Schwinger equation, by virtue of the Saint-Venant's principle. In contrast, they were not obtained in the original SCA paper, and we discover these terms may well be responsible for the numerical dependency on the choice of reference material property. Since VCA enhances the accuracy by overcoming the modeling error, and reduce the numerical cost by avoiding an outer loop iteration for attaining the material property consistency in SCA, its efficiency is expected even higher than the recently proposed SCA algorithm.

  17. ICGE: an R package for detecting relevant clusters and atypical units in gene expression

    Directory of Open Access Journals (Sweden)

    Irigoien Itziar

    2012-02-01

    Full Text Available Abstract Background Gene expression technologies have opened up new ways to diagnose and treat cancer and other diseases. Clustering algorithms are a useful approach with which to analyze genome expression data. They attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. An important problem associated with gene classification is to discern whether the clustering process can find a relevant partition as well as the identification of new genes classes. There are two key aspects to classification: the estimation of the number of clusters, and the decision as to whether a new unit (gene, tumor sample... belongs to one of these previously identified clusters or to a new group. Results ICGE is a user-friendly R package which provides many functions related to this problem: identify the number of clusters using mixed variables, usually found by applied biomedical researchers; detect whether the data have a cluster structure; identify whether a new unit belongs to one of the pre-identified clusters or to a novel group, and classify new units into the corresponding cluster. The functions in the ICGE package are accompanied by help files and easy examples to facilitate its use. Conclusions We demonstrate the utility of ICGE by analyzing simulated and real data sets. The results show that ICGE could be very useful to a broad research community.

  18. An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks.

    Science.gov (United States)

    Botía, Juan A; Vandrovcova, Jana; Forabosco, Paola; Guelfi, Sebastian; D'Sa, Karishma; Hardy, John; Lewis, Cathryn M; Ryten, Mina; Weale, Michael E

    2017-04-12

    Weighted Gene Co-expression Network Analysis (WGCNA) is a widely used R software package for the generation of gene co-expression networks (GCN). WGCNA generates both a GCN and a derived partitioning of clusters of genes (modules). We propose k-means clustering as an additional processing step to conventional WGCNA, which we have implemented in the R package km2gcn (k-means to gene co-expression network, https://github.com/juanbot/km2gcn ). We assessed our method on networks created from UKBEC data (10 different human brain tissues), on networks created from GTEx data (42 human tissues, including 13 brain tissues), and on simulated networks derived from GTEx data. We observed substantially improved module properties, including: (1) few or zero misplaced genes; (2) increased counts of replicable clusters in alternate tissues (x3.1 on average); (3) improved enrichment of Gene Ontology terms (seen in 48/52 GCNs) (4) improved cell type enrichment signals (seen in 21/23 brain GCNs); and (5) more accurate partitions in simulated data according to a range of similarity indices. The results obtained from our investigations indicate that our k-means method, applied as an adjunct to standard WGCNA, results in better network partitions. These improved partitions enable more fruitful downstream analyses, as gene modules are more biologically meaningful.

  19. Analysis of pan-genome to identify the core genes and essential genes of Brucella spp.

    Science.gov (United States)

    Yang, Xiaowen; Li, Yajie; Zang, Juan; Li, Yexia; Bie, Pengfei; Lu, Yanli; Wu, Qingmin

    2016-04-01

    Brucella spp. are facultative intracellular pathogens, that cause a contagious zoonotic disease, that can result in such outcomes as abortion or sterility in susceptible animal hosts and grave, debilitating illness in humans. For deciphering the survival mechanism of Brucella spp. in vivo, 42 Brucella complete genomes from NCBI were analyzed for the pan-genome and core genome by identification of their composition and function of Brucella genomes. The results showed that the total 132,143 protein-coding genes in these genomes were divided into 5369 clusters. Among these, 1710 clusters were associated with the core genome, 1182 clusters with strain-specific genes and 2477 clusters with dispensable genomes. COG analysis indicated that 44 % of the core genes were devoted to metabolism, which were mainly responsible for energy production and conversion (COG category C), and amino acid transport and metabolism (COG category E). Meanwhile, approximately 35 % of the core genes were in positive selection. In addition, 1252 potential essential genes were predicted in the core genome by comparison with a prokaryote database of essential genes. The results suggested that the core genes in Brucella genomes are relatively conservation, and the energy and amino acid metabolism play a more important role in the process of growth and reproduction in Brucella spp. This study might help us to better understand the mechanisms of Brucella persistent infection and provide some clues for further exploring the gene modules of the intracellular survival in Brucella spp.

  20. Horizontal transfer of a nitrate assimilation gene cluster and ecological transitions in fungi: a phylogenetic study.

    Directory of Open Access Journals (Sweden)

    Jason C Slot

    Full Text Available High affinity nitrate assimilation genes in fungi occur in a cluster (fHANT-AC that can be coordinately regulated. The clustered genes include nrt2, which codes for a high affinity nitrate transporter; euknr, which codes for nitrate reductase; and NAD(PH-nir, which codes for nitrite reductase. Homologs of genes in the fHANT-AC occur in other eukaryotes and prokaryotes, but they have only been found clustered in the oomycete Phytophthora (heterokonts. We performed independent and concatenated phylogenetic analyses of homologs of all three genes in the fHANT-AC. Phylogenetic analyses limited to fungal sequences suggest that the fHANT-AC has been transferred horizontally from a basidiomycete (mushrooms and smuts to an ancestor of the ascomycetous mold Trichoderma reesei. Phylogenetic analyses of sequences from diverse eukaryotes and eubacteria, and cluster structure, are consistent with a hypothesis that the fHANT-AC was assembled in a lineage leading to the oomycetes and was subsequently transferred to the Dikarya (Ascomycota+Basidiomycota, which is a derived fungal clade that includes the vast majority of terrestrial fungi. We propose that the acquisition of high affinity nitrate assimilation contributed to the success of Dikarya on land by allowing exploitation of nitrate in aerobic soils, and the subsequent transfer of a complete assimilation cluster improved the fitness of T. reesei in a new niche. Horizontal transmission of this cluster of functionally integrated genes supports the "selfish operon" hypothesis for maintenance of gene clusters.

  1. Draft genome sequence of Streptomyces coelicoflavus ZG0656 reveals the putative biosynthetic gene cluster of acarviostatin family α-amylase inhibitors.

    Science.gov (United States)

    Guo, X; Geng, P; Bai, F; Bai, G; Sun, T; Li, X; Shi, L; Zhong, Q

    2012-08-01

    The aims of this study are to obtain the draft genome sequence of Streptomyces coelicoflavus ZG0656, which produces novel acarviostatin family α-amylase inhibitors, and then to reveal the putative acarviostatin-related gene cluster and the biosynthetic pathway. The draft genome sequence of S. coelicoflavus ZG0656 was generated using a shotgun approach employing a combination of 454 and Solexa sequencing technologies. Genome analysis revealed a putative gene cluster for acarviostatin biosynthesis, termed sct-cluster. The cluster contains 13 acarviostatin synthetic genes, six transporter genes, four starch degrading or transglycosylation enzyme genes and two regulator genes. On the basis of bioinformatic analysis, we proposed a putative biosynthetic pathway of acarviostatins. The intracellular steps produce a structural core, acarviostatin I00-7-P, and the extracellular assemblies lead to diverse acarviostatin end products. The draft genome sequence of S. coelicoflavus ZG0656 revealed the putative biosynthetic gene cluster of acarviostatins and a putative pathway of acarviostatin production. To our knowledge, S. coelicoflavus ZG0656 is the first strain in this species for which a genome sequence has been reported. The analysis of sct-cluster provided important insights into the biosynthesis of acarviostatins. This work will be a platform for producing novel variants and yield improvement. © 2012 The Authors. Letters in Applied Microbiology © 2012 The Society for Applied Microbiology.

  2. Activation and clustering of a Plasmodium falciparum var gene are affected by subtelomeric sequences.

    Science.gov (United States)

    Duffy, Michael F; Tang, Jingyi; Sumardy, Fransisca; Nguyen, Hanh H T; Selvarajah, Shamista A; Josling, Gabrielle A; Day, Karen P; Petter, Michaela; Brown, Graham V

    2017-01-01

    The Plasmodium falciparum var multigene family encodes the cytoadhesive, variant antigen PfEMP1. P. falciparum antigenic variation and cytoadhesion specificity are controlled by epigenetic switching between the single, or few, simultaneously expressed var genes. Most var genes are maintained in perinuclear clusters of heterochromatic telomeres. The active var gene(s) occupy a single, perinuclear var expression site. It is unresolved whether the var expression site forms in situ at a telomeric cluster or whether it is an extant compartment to which single chromosomes travel, thus controlling var switching. Here we show that transcription of a var gene did not require decreased colocalisation with clusters of telomeres, supporting var expression site formation in situ. However following recombination within adjacent subtelomeric sequences, the same var gene was persistently activated and did colocalise less with telomeric clusters. Thus, participation in stable, heterochromatic, telomere clusters and var switching are independent but are both affected by subtelomeric sequences. The var expression site colocalised with the euchromatic mark H3K27ac to a greater extent than it did with heterochromatic H3K9me3. H3K27ac was enriched within the active var gene promoter even when the var gene was transiently repressed in mature parasites and thus H3K27ac may contribute to var gene epigenetic memory. © 2016 Federation of European Biochemical Societies.

  3. Function analysis of unknown genes

    DEFF Research Database (Denmark)

    Rogowska-Wrzesinska, A.

    2002-01-01

      This thesis entitled "Function analysis of unknown genes" presents the use of proteome analysis for the characterisation of yeast (Saccharomyces cerevisiae) genes and their products (proteins especially those of unknown function). This study illustrates that proteome analysis can be used...... to describe different aspects of molecular biology of the cell, to study changes that occur in the cell due to overexpression or deletion of a gene and to identify various protein modifications. The biological questions and the results of the described studies show the diversity of the information that can...... genes and proteins. It reports the first global proteome database collecting 36 yeast single gene deletion mutants and selecting over 650 differences between analysed mutants and the wild type strain. The obtained results show that two-dimensional gel electrophoresis and mass spectrometry based proteome...

  4. Cluster analysis in severe emphysema subjects using phenotype and genotype data: an exploratory investigation

    Directory of Open Access Journals (Sweden)

    Martinez Fernando J

    2010-03-01

    Full Text Available Abstract Background Numerous studies have demonstrated associations between genetic markers and COPD, but results have been inconsistent. One reason may be heterogeneity in disease definition. Unsupervised learning approaches may assist in understanding disease heterogeneity. Methods We selected 31 phenotypic variables and 12 SNPs from five candidate genes in 308 subjects in the National Emphysema Treatment Trial (NETT Genetics Ancillary Study cohort. We used factor analysis to select a subset of phenotypic variables, and then used cluster analysis to identify subtypes of severe emphysema. We examined the phenotypic and genotypic characteristics of each cluster. Results We identified six factors accounting for 75% of the shared variability among our initial phenotypic variables. We selected four phenotypic variables from these factors for cluster analysis: 1 post-bronchodilator FEV1 percent predicted, 2 percent bronchodilator responsiveness, and quantitative CT measurements of 3 apical emphysema and 4 airway wall thickness. K-means cluster analysis revealed four clusters, though separation between clusters was modest: 1 emphysema predominant, 2 bronchodilator responsive, with higher FEV1; 3 discordant, with a lower FEV1 despite less severe emphysema and lower airway wall thickness, and 4 airway predominant. Of the genotypes examined, membership in cluster 1 (emphysema-predominant was associated with TGFB1 SNP rs1800470. Conclusions Cluster analysis may identify meaningful disease subtypes and/or groups of related phenotypic variables even in a highly selected group of severe emphysema subjects, and may be useful for genetic association studies.

  5. Organization of nif gene cluster in Frankia sp. EuIK1 strain, a symbiont of Elaeagnus umbellata.

    Science.gov (United States)

    Oh, Chang Jae; Kim, Ho Bang; Kim, Jitae; Kim, Won Jin; Lee, Hyoungseok; An, Chung Sun

    2012-01-01

    The nucleotide sequence of a 20.5-kb genomic region harboring nif genes was determined and analyzed. The fragment was obtained from Frankia sp. EuIK1 strain, an indigenous symbiont of Elaeagnus umbellata. A total of 20 ORFs including 12 nif genes were identified and subjected to comparative analysis with the genome sequences of 3 Frankia strains representing diverse host plant specificities. The nucleotide and deduced amino acid sequences showed highest levels of identity with orthologous genes from an Elaeagnus-infecting strain. The gene organization patterns around the nif gene clusters were well conserved among all 4 Frankia strains. However, characteristic features appeared in the location of the nifV gene for each Frankia strain, depending on the type of host plant. Sequence analysis was performed to determine the transcription units and suggested that there could be an independent operon starting from the nifW gene in the EuIK strain. Considering the organization patterns and their total extensions on the genome, we propose that the nif gene clusters remained stable despite genetic variations occurring in the Frankia genomes.

  6. Gene set analysis for GWAS

    DEFF Research Database (Denmark)

    Debrabant, Birgit; Soerensen, Mette

    2014-01-01

    Abstract We discuss the use of modified Kolmogorov-Smirnov (KS) statistics in the context of gene set analysis and review corresponding null and alternative hypotheses. Especially, we show that, when enhancing the impact of highly significant genes in the calculation of the test statistic, the co...

  7. An analysis of hospital brand mark clusters.

    Science.gov (United States)

    Vollmers, Stacy M; Miller, Darryl W; Kilic, Ozcan

    2010-07-01

    This study analyzed brand mark clusters (i.e., various types of brand marks displayed in combination) used by hospitals in the United States. The brand marks were assessed against several normative criteria for creating brand marks that are memorable and that elicit positive affect. Overall, results show a reasonably high level of adherence to many of these normative criteria. Many of the clusters exhibited pictorial elements that reflected benefits and that were conceptually consistent with the verbal content of the cluster. Also, many clusters featured icons that were balanced and moderately complex. However, only a few contained interactive imagery or taglines communicating benefits.

  8. Directed natural product biosynthesis gene cluster capture and expression in the model bacterium Bacillus subtilis

    KAUST Repository

    Li, Yongxin; Li, Zhongrui; Yamanaka, Kazuya; Xu, Ying; Zhang, Weipeng; Vlamakis, Hera; Kolter, Roberto; Moore, Bradley S.; Qian, Pei-Yuan

    2015-01-01

    validating this direct cloning plug-and-playa approach with surfactin, we genetically interrogated amicoumacin biosynthetic gene cluster from the marine isolate Bacillus subtilis 1779. Its heterologous expression allowed us to explore an unusual maturation

  9. Variation in the fumonisin biosynthetic gene cluster in fumonisin-producing and nonproducing black aspergilli.

    Science.gov (United States)

    Susca, Antonia; Proctor, Robert H; Butchko, Robert A E; Haidukowski, Miriam; Stea, Gaetano; Logrieco, Antonio; Moretti, Antonio

    2014-12-01

    The ability to produce fumonisin mycotoxins varies among members of the black aspergilli. Previously, analyses of selected genes in the fumonisin biosynthetic gene (fum) cluster in black aspergilli from California grapes indicated that fumonisin-nonproducing isolates of Aspergillus welwitschiae lack six fum genes, but nonproducing isolates of Aspergillus niger do not. In the current study, analyses of black aspergilli from grapes from the Mediterranean Basin indicate that the genomic context of the fum cluster is the same in isolates of A. niger and A. welwitschiae regardless of fumonisin-production ability and that full-length clusters occur in producing isolates of both species and nonproducing isolates of A. niger. In contrast, the cluster has undergone an eight-gene deletion in fumonisin-nonproducing isolates of A. welwitschiae. Phylogenetic analyses suggest each species consists of a mixed population of fumonisin-producing and nonproducing individuals, and that existence of both production phenotypes may provide a selective advantage to these species. Differences in gene content of fum cluster homologues and phylogenetic relationships of fum genes suggest that the mutation(s) responsible for the nonproduction phenotype differs, and therefore arose independently, in the two species. Partial fum cluster homologues were also identified in genome sequences of four other black Aspergillus species. Gene content of these partial clusters and phylogenetic relationships of fum sequences indicate that non-random partial deletion of the cluster has occurred multiple times among the species. This in turn suggests that an intact cluster and fumonisin production were once more widespread among black aspergilli. Copyright © 2014 Elsevier Inc. All rights reserved.

  10. Expression-based clustering of CAZyme-encoding genes of Aspergillus niger.

    Science.gov (United States)

    Gruben, Birgit S; Mäkelä, Miia R; Kowalczyk, Joanna E; Zhou, Miaomiao; Benoit-Gelber, Isabelle; De Vries, Ronald P

    2017-11-23

    The Aspergillus niger genome contains a large repertoire of genes encoding carbohydrate active enzymes (CAZymes) that are targeted to plant polysaccharide degradation enabling A. niger to grow on a wide range of plant biomass substrates. Which genes need to be activated in certain environmental conditions depends on the composition of the available substrate. Previous studies have demonstrated the involvement of a number of transcriptional regulators in plant biomass degradation and have identified sets of target genes for each regulator. In this study, a broad transcriptional analysis was performed of the A. niger genes encoding (putative) plant polysaccharide degrading enzymes. Microarray data focusing on the initial response of A. niger to the presence of plant biomass related carbon sources were analyzed of a wild-type strain N402 that was grown on a large range of carbon sources and of the regulatory mutant strains ΔxlnR, ΔaraR, ΔamyR, ΔrhaR and ΔgalX that were grown on their specific inducing compounds. The cluster analysis of the expression data revealed several groups of co-regulated genes, which goes beyond the traditionally described co-regulated gene sets. Additional putative target genes of the selected regulators were identified, based on their expression profile. Notably, in several cases the expression profile puts questions on the function assignment of uncharacterized genes that was based on homology searches, highlighting the need for more extensive biochemical studies into the substrate specificity of enzymes encoded by these non-characterized genes. The data also revealed sets of genes that were upregulated in the regulatory mutants, suggesting interaction between the regulatory systems and a therefore even more complex overall regulatory network than has been reported so far. Expression profiling on a large number of substrates provides better insight in the complex regulatory systems that drive the conversion of plant biomass by fungi. In

  11. Smartness and Italian Cities. A Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Flavio Boscacci

    2014-05-01

    Full Text Available Smart cities have been recently recognized as the most pleasing and attractive places to live in; due to this, both scholars and policy-makers pay close attention to this topic. Specifically, urban “smartness” has been identified by plenty of characteristics that can be grouped into six dimensions (Giffinger et al. 2007: smart Economy (competitiveness, smart People (social and human capital, smart Governance (participation, smart Mobility (both ICTs and transport, smart Environment (natural resources, and smart Living (quality of life. According to this analytical framework, in the present paper the relation between urban attractiveness and the “smart” characteristics has been investigated in the 103 Italian NUTS3 province capitals in the year 2011. To this aim, a descriptive statistics has been followed by a regression analysis (OLS, where the dependent variable measuring the urban attractiveness has been proxied by housing market prices. Besides, a Cluster Analysis (CA has been developed in order to find differences and commonalities among the province capitals.The OLS results indicate that living, people and economy are the key drivers for achieving a better urban attractiveness. Environment, instead, keeps on playing a minor role. Besides, the CA groups the province capitals a

  12. Characterization of the fumonisin B2 biosynthetic gene cluster in Aspergillus niger and A. awamori.

    Science.gov (United States)

    Aspergillus niger and A. awamori strains isolated from grapes cultivated in Mediterranean basin were examined for fumonisin B2 (FB2) production and presence/absence of sequences within the fumonisin biosynthetic gene (fum) cluster. Presence of 13 regions in the fum cluster was evaluated by PCR assay...

  13. Clusters of Antibiotic Resistance Genes Enriched Together Stay Together in Swine Agriculture.

    Science.gov (United States)

    Johnson, Timothy A; Stedtfeld, Robert D; Wang, Qiong; Cole, James R; Hashsham, Syed A; Looft, Torey; Zhu, Yong-Guan; Tiedje, James M

    2016-04-12

    Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundance of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk. Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of resistance genes if

  14. Patterns of genetic diversity and differentiation in resistance gene clusters of two hybridizing European Populus species

    OpenAIRE

    Casey, Céline; Stölting, Kai N.; Barbará, Thelma; González-Martínez, Santiago C.; Lexer, Christian

    2015-01-01

    Resistance genes (R-genes) are essential for long-lived organisms such as forest trees, which are exposed to diverse herbivores and pathogens. In short-lived model species, R-genes have been shown to be involved in species isolation. Here, we studied more than 400 trees from two natural hybrid zones of the European Populus species Populus alba and Populus tremula for microsatellite markers located in three R-gene clusters, including one cluster situated in the incipient sex chromosome region....

  15. Unusual Gene Order and Organization of the Sea Urchin HoxCluster

    Energy Technology Data Exchange (ETDEWEB)

    Richardson, Paul M.; Lucas, Susan; Cameron, R. Andrew; Rowen,Lee; Nesbitt, Ryan; Bloom, Scott; Rast, Jonathan P.; Berney, Kevin; Arenas-Mena, Cesar; Martinez, Pedro; Davidson, Eric H.; Peterson, KevinJ.; Hood, Leroy

    2005-05-10

    The highly consistent gene order and axial colinear expression patterns found in vertebrate hox gene clusters are less well conserved across the rest of bilaterians. We report the first deuterostome instance of an intact hox cluster with a unique gene order where the paralog groups are not expressed in a sequential manner. The finished sequence from BAC clones from the genome of the sea urchin, Strongylocentrotus purpuratus, reveals a gene order wherein the anterior genes (Hox1, Hox2 and Hox3) lie nearest the posterior genes in the cluster such that the most 3' gene is Hox5. (The gene order is : 5'-Hox1,2, 3, 11/13c, 11/13b, '11/13a, 9/10, 8, 7, 6, 5 - 3)'. The finished sequence result is corroborated by restriction mapping evidence and BAC-end scaffold analyses. Comparisons with a putative ancestral deuterostome Hox gene cluster suggest that the rearrangements leading to the sea urchin gene order were many and complex.

  16. Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient.

    Science.gov (United States)

    Yao, Jianchao; Chang, Chunqi; Salmi, Mari L; Hung, Yeung Sam; Loraine, Ann; Roux, Stanley J

    2008-06-18

    Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. This study shows that SCC is an alternative to the Pearson

  17. Lampreys, the jawless vertebrates, contain only two ParaHox gene clusters.

    Science.gov (United States)

    Zhang, Huixian; Ravi, Vydianathan; Tay, Boon-Hui; Tohari, Sumanty; Pillai, Nisha E; Prasad, Aravind; Lin, Qiang; Brenner, Sydney; Venkatesh, Byrappa

    2017-08-22

    ParaHox genes ( Gsx , Pdx , and Cdx ) are an ancient family of developmental genes closely related to the Hox genes. They play critical roles in the patterning of brain and gut. The basal chordate, amphioxus, contains a single ParaHox cluster comprising one member of each family, whereas nonteleost jawed vertebrates contain four ParaHox genomic loci with six or seven ParaHox genes. Teleosts, which have experienced an additional whole-genome duplication, contain six ParaHox genomic loci with six ParaHox genes. Jawless vertebrates, represented by lampreys and hagfish, are the most ancient group of vertebrates and are crucial for understanding the origin and evolution of vertebrate gene families. We have previously shown that lampreys contain six Hox gene loci. Here we report that lampreys contain only two ParaHox gene clusters (designated as α- and β-clusters) bearing five ParaHox genes ( Gsxα , Pdxα , Cdxα , Gsxβ , and Cdxβ ). The order and orientation of the three genes in the α-cluster are identical to that of the single cluster in amphioxus. However, the orientation of Gsxβ in the β-cluster is inverted. Interestingly, Gsxβ is expressed in the eye, unlike its homologs in jawed vertebrates, which are expressed mainly in the brain. The lamprey Pdxα is expressed in the pancreas similar to jawed vertebrate Pdx genes, indicating that the pancreatic expression of Pdx was acquired before the divergence of jawless and jawed vertebrate lineages. It is likely that the lamprey Pdxα plays a crucial role in pancreas specification and insulin production similar to the Pdx of jawed vertebrates.

  18. Characterization and detection of a widely distributed gene cluster that predicts anaerobic choline utilization by human gut bacteria.

    Science.gov (United States)

    Martínez-del Campo, Ana; Bodea, Smaranda; Hamer, Hilary A; Marks, Jonathan A; Haiser, Henry J; Turnbaugh, Peter J; Balskus, Emily P

    2015-04-14

    Elucidation of the molecular mechanisms underlying the human gut microbiota's effects on health and disease has been complicated by difficulties in linking metabolic functions associated with the gut community as a whole to individual microorganisms and activities. Anaerobic microbial choline metabolism, a disease-associated metabolic pathway, exemplifies this challenge, as the specific human gut microorganisms responsible for this transformation have not yet been clearly identified. In this study, we established the link between a bacterial gene cluster, the choline utilization (cut) cluster, and anaerobic choline metabolism in human gut isolates by combining transcriptional, biochemical, bioinformatic, and cultivation-based approaches. Quantitative reverse transcription-PCR analysis and in vitro biochemical characterization of two cut gene products linked the entire cluster to growth on choline and supported a model for this pathway. Analyses of sequenced bacterial genomes revealed that the cut cluster is present in many human gut bacteria, is predictive of choline utilization in sequenced isolates, and is widely but discontinuously distributed across multiple bacterial phyla. Given that bacterial phylogeny is a poor marker for choline utilization, we were prompted to develop a degenerate PCR-based method for detecting the key functional gene choline TMA-lyase (cutC) in genomic and metagenomic DNA. Using this tool, we found that new choline-metabolizing gut isolates universally possessed cutC. We also demonstrated that this gene is widespread in stool metagenomic data sets. Overall, this work represents a crucial step toward understanding anaerobic choline metabolism in the human gut microbiota and underscores the importance of examining this microbial community from a function-oriented perspective. Anaerobic choline utilization is a bacterial metabolic activity that occurs in the human gut and is linked to multiple diseases. While bacterial genes responsible for

  19. A formal concept analysis approach to consensus clustering of multi-experiment expression data

    Science.gov (United States)

    2014-01-01

    Background Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them. Results We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group. These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological

  20. Taxonomical analysis of the Cancer cluster of galaxies

    International Nuclear Information System (INIS)

    Perea, J.; Olmo, A. del; Moles, M.

    1986-01-01

    A description is presented of the Cancer cluster of galaxies, based on a taxonomical analysis in (α,delta, Vsub(r)) space. Earlier results by previous authors on the lack of dynamical entity of the cluster are confirmed. The present analysis points out the existence of a binary structure in the most populated region of the complex. (author)

  1. Using Cluster Analysis for Data Mining in Educational Technology Research

    Science.gov (United States)

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

    2012-01-01

    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  2. Simultaneous Two-Way Clustering of Multiple Correspondence Analysis

    Science.gov (United States)

    Hwang, Heungsun; Dillon, William R.

    2010-01-01

    A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is…

  3. Microbial communication leading to the activation of silent fungal secondary metabolite gene clusters

    Directory of Open Access Journals (Sweden)

    Tina eNetzker

    2015-04-01

    Full Text Available Microorganisms form diverse multispecies communities in various ecosystems. The high abundance of fungal and bacterial species in these consortia results in specific communication between the microorganisms. A key role in this communication is played by secondary metabolites (SMs, which are also called natural products. Recently, it was shown that interspecies ‘talk’ between microorganisms represents a physiological trigger to activate silent gene clusters leading to the formation of novel SMs by the involved species. This review focuses on mixed microbial cultivation, mainly between bacteria and fungi, with a special emphasis on the induced formation of fungal SMs in co-cultures. In addition, the role of chromatin remodeling in the induction is examined, and methodical perspectives for the analysis of natural products are presented. As an example for an intermicrobial interaction elucidated at the molecular level, we discuss the specific interaction between the filamentous fungi Aspergillus nidulans and Aspergillus fumigatus with the soil bacterium Streptomyces rapamycinicus, which provides an excellent model system to enlighten molecular concepts behind regulatory mechanisms and will pave the way to a novel avenue of drug discovery through targeted activation of silent SM gene clusters through co-cultivations of microorganisms.

  4. The Genome of Tolypocladium inflatum: Evolution, Organization, and Expression of the Cyclosporin Biosynthetic Gene Cluster

    Science.gov (United States)

    Bushley, Kathryn E.; Raja, Rajani; Jaiswal, Pankaj; Cumbie, Jason S.; Nonogaki, Mariko; Boyd, Alexander E.; Owensby, C. Alisha; Knaus, Brian J.; Elser, Justin; Miller, Daniel; Di, Yanming; McPhail, Kerry L.; Spatafora, Joseph W.

    2013-01-01

    The ascomycete fungus Tolypocladium inflatum, a pathogen of beetle larvae, is best known as the producer of the immunosuppressant drug cyclosporin. The draft genome of T. inflatum strain NRRL 8044 (ATCC 34921), the isolate from which cyclosporin was first isolated, is presented along with comparative analyses of the biosynthesis of cyclosporin and other secondary metabolites in T. inflatum and related taxa. Phylogenomic analyses reveal previously undetected and complex patterns of homology between the nonribosomal peptide synthetase (NRPS) that encodes for cyclosporin synthetase (simA) and those of other secondary metabolites with activities against insects (e.g., beauvericin, destruxins, etc.), and demonstrate the roles of module duplication and gene fusion in diversification of NRPSs. The secondary metabolite gene cluster responsible for cyclosporin biosynthesis is described. In addition to genes necessary for cyclosporin biosynthesis, it harbors a gene for a cyclophilin, which is a member of a family of immunophilins known to bind cyclosporin. Comparative analyses support a lineage specific origin of the cyclosporin gene cluster rather than horizontal gene transfer from bacteria or other fungi. RNA-Seq transcriptome analyses in a cyclosporin-inducing medium delineate the boundaries of the cyclosporin cluster and reveal high levels of expression of the gene cluster cyclophilin. In medium containing insect hemolymph, weaker but significant upregulation of several genes within the cyclosporin cluster, including the highly expressed cyclophilin gene, was observed. T. inflatum also represents the first reference draft genome of Ophiocordycipitaceae, a third family of insect pathogenic fungi within the fungal order Hypocreales, and supports parallel and qualitatively distinct radiations of insect pathogens. The T. inflatum genome provides additional insight into the evolution and biosynthesis of cyclosporin and lays a foundation for further investigations of the role

  5. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    Science.gov (United States)

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.

  6. Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values.

    Science.gov (United States)

    Bhattacharya, Anindya; De, Rajat K

    2010-08-01

    Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to

  7. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...

  8. Multiscale Embedded Gene Co-expression Network Analysis.

    Directory of Open Access Journals (Sweden)

    Won-Min Song

    2015-11-01

    Full Text Available Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3, the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA by: i introducing quality control of co-expression similarities, ii parallelizing embedded network construction, and iii developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs. We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA. MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.

  9. Multiscale Embedded Gene Co-expression Network Analysis.

    Science.gov (United States)

    Song, Won-Min; Zhang, Bin

    2015-11-01

    Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.

  10. Comparison of Expression of Secondary Metabolite Biosynthesis Cluster Genes in Aspergillus flavus, A. parasiticus, and A. oryzae

    OpenAIRE

    Ehrlich, Kenneth C.; Mack, Brian M.

    2014-01-01

    Fifty six secondary metabolite biosynthesis gene clusters are predicted to be in the Aspergillus flavus genome. In spite of this, the biosyntheses of only seven metabolites, including the aflatoxins, kojic acid, cyclopiazonic acid and aflatrem, have been assigned to a particular gene cluster. We used RNA-seq to compare expression of secondary metabolite genes in gene clusters for the closely related fungi A. parasiticus, A. oryzae, and A. flavus S and L sclerotial morphotypes. The data help ...

  11. Increasing Power by Sharing Information from Genetic Background and Treatment in Clustering of Gene Expression Time Series

    OpenAIRE

    Sura Zaki Alrashid; Muhammad Arifur Rahman; Nabeel H Al-Aaraji; Neil D Lawrence; Paul R Heath

    2018-01-01

    Clustering of gene expression time series gives insight into which genes may be co-regulated, allowing us to discern the activity of pathways in a given microarray experiment. Of particular interest is how a given group of genes varies with different conditions or genetic background. This paper develops
a new clustering method that allows each cluster to be parameterised according to whether the behaviour of the genes across conditions is correlated or anti-correlated. By specifying correlati...

  12. TimesVector: a vectorized clustering approach to the analysis of time series transcriptome data from multiple phenotypes.

    Science.gov (United States)

    Jung, Inuk; Jo, Kyuri; Kang, Hyejin; Ahn, Hongryul; Yu, Youngjae; Kim, Sun

    2017-12-01

    Identifying biologically meaningful gene expression patterns from time series gene expression data is important to understand the underlying biological mechanisms. To identify significantly perturbed gene sets between different phenotypes, analysis of time series transcriptome data requires consideration of time and sample dimensions. Thus, the analysis of such time series data seeks to search gene sets that exhibit similar or different expression patterns between two or more sample conditions, constituting the three-dimensional data, i.e. gene-time-condition. Computational complexity for analyzing such data is very high, compared to the already difficult NP-hard two dimensional biclustering algorithms. Because of this challenge, traditional time series clustering algorithms are designed to capture co-expressed genes with similar expression pattern in two sample conditions. We present a triclustering algorithm, TimesVector, specifically designed for clustering three-dimensional time series data to capture distinctively similar or different gene expression patterns between two or more sample conditions. TimesVector identifies clusters with distinctive expression patterns in three steps: (i) dimension reduction and clustering of time-condition concatenated vectors, (ii) post-processing clusters for detecting similar and distinct expression patterns and (iii) rescuing genes from unclassified clusters. Using four sets of time series gene expression data, generated by both microarray and high throughput sequencing platforms, we demonstrated that TimesVector successfully detected biologically meaningful clusters of high quality. TimesVector improved the clustering quality compared to existing triclustering tools and only TimesVector detected clusters with differential expression patterns across conditions successfully. The TimesVector software is available at http://biohealth.snu.ac.kr/software/TimesVector/. sunkim.bioinfo@snu.ac.kr. Supplementary data are available at

  13. Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.

    Science.gov (United States)

    Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun

    2017-01-01

    Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.

  14. The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

    Science.gov (United States)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-07-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  15. Transcriptional interference networks coordinate the expression of functionally related genes clustered in the same genomic loci.

    Science.gov (United States)

    Boldogköi, Zsolt

    2012-01-01

    The regulation of gene expression is essential for normal functioning of biological systems in every form of life. Gene expression is primarily controlled at the level of transcription, especially at the phase of initiation. Non-coding RNAs are one of the major players at every level of genetic regulation, including the control of chromatin organization, transcription, various post-transcriptional processes, and translation. In this study, the Transcriptional Interference Network (TIN) hypothesis was put forward in an attempt to explain the global expression of antisense RNAs and the overall occurrence of tandem gene clusters in the genomes of various biological systems ranging from viruses to mammalian cells. The TIN hypothesis suggests the existence of a novel layer of genetic regulation, based on the interactions between the transcriptional machineries of neighboring genes at their overlapping regions, which are assumed to play a fundamental role in coordinating gene expression within a cluster of functionally linked genes. It is claimed that the transcriptional overlaps between adjacent genes are much more widespread in genomes than is thought today. The Waterfall model of the TIN hypothesis postulates a unidirectional effect of upstream genes on the transcription of downstream genes within a cluster of tandemly arrayed genes, while the Seesaw model proposes a mutual interdependence of gene expression between the oppositely oriented genes. The TIN represents an auto-regulatory system with an exquisitely timed and highly synchronized cascade of gene expression in functionally linked genes located in close physical proximity to each other. In this study, we focused on herpesviruses. The reason for this lies in the compressed nature of viral genes, which allows a tight regulation and an easier investigation of the transcriptional interactions between genes. However, I believe that the same or similar principles can be applied to cellular organisms too.

  16. De novo deletion of HOXB gene cluster in a patient with failure to thrive, developmental delay, gastroesophageal reflux and bronchiectasis.

    Science.gov (United States)

    Pajusalu, Sander; Reimand, Tiia; Uibo, Oivi; Vasar, Maire; Talvik, Inga; Zilina, Olga; Tammur, Pille; Õunap, Katrin

    2015-01-01

    We report a female patient with a complex phenotype consisting of failure to thrive, developmental delay, congenital bronchiectasis, gastroesophageal reflux and bilateral inguinal hernias. Chromosomal microarray analysis revealed a 230 kilobase deletion in chromosomal region 17q21.32 (arr[hg19] 17q21.32(46 550 362-46 784 039)×1) encompassing only 9 genes - HOXB1 to HOXB9. The deletion was not found in her mother or father. This is the first report of a patient with a HOXB gene cluster deletion involving only HOXB1 to HOXB9 genes. By comparing our case to previously reported five patients with larger chromosomal aberrations involving the HOXB gene cluster, we can suppose that HOXB gene cluster deletions are responsible for growth retardation, developmental delay, and specific facial dysmorphic features. Also, we suppose that bilateral inguinal hernias, tracheo-esophageal abnormalities, and lung malformations represent features with incomplete penetrance. Interestingly, previously published knock-out mice with targeted heterozygous deletion comparable to our patient did not show phenotypic alterations. Copyright © 2015 Elsevier Masson SAS. All rights reserved.

  17. A functional bikaverin biosynthesis gene cluster in rare strains of Botrytis cinerea is positively controlled by VELVET.

    Directory of Open Access Journals (Sweden)

    Julia Schumacher

    Full Text Available The gene cluster responsible for the biosynthesis of the red polyketidic pigment bikaverin has only been characterized in Fusarium ssp. so far. Recently, a highly homologous but incomplete and nonfunctional bikaverin cluster has been found in the genome of the unrelated phytopathogenic fungus Botrytis cinerea. In this study, we provided evidence that rare B. cinerea strains such as 1750 have a complete and functional cluster comprising the six genes orthologous to Fusarium fujikuroi ffbik1-ffbik6 and do produce bikaverin. Phylogenetic analysis confirmed that the whole cluster was acquired from Fusarium through a horizontal gene transfer (HGT. In the bikaverin-nonproducing strain B05.10, the genes encoding bikaverin biosynthesis enzymes are nonfunctional due to deleterious mutations (bcbik2-3 or missing (bcbik1 but interestingly, the genes encoding the regulatory proteins BcBIK4 and BcBIK5 do not harbor deleterious mutations which suggests that they may still be functional. Heterologous complementation of the F. fujikuroi Δffbik4 mutant confirmed that bcbik4 of strain B05.10 is indeed fully functional. Deletion of bcvel1 in the pink strain 1750 resulted in loss of bikaverin and overproduction of melanin indicating that the VELVET protein BcVEL1 regulates the biosynthesis of the two pigments in an opposite manner. Although strain 1750 itself expresses a truncated BcVEL1 protein (100 instead of 575 aa that is nonfunctional with regard to sclerotia formation, virulence and oxalic acid formation, it is sufficient to regulate pigment biosynthesis (bikaverin and melanin and fenhexamid HydR2 type of resistance. Finally, a genetic cross between strain 1750 and a bikaverin-nonproducing strain sensitive to fenhexamid revealed that the functional bikaverin cluster is genetically linked to the HydR2 locus.

  18. Allergen Sensitization Pattern by Sex: A Cluster Analysis in Korea.

    Science.gov (United States)

    Ohn, Jungyoon; Paik, Seung Hwan; Doh, Eun Jin; Park, Hyun-Sun; Yoon, Hyun-Sun; Cho, Soyun

    2017-12-01

    Allergens tend to sensitize simultaneously. Etiology of this phenomenon has been suggested to be allergen cross-reactivity or concurrent exposure. However, little is known about specific allergen sensitization patterns. To investigate the allergen sensitization characteristics according to gender. Multiple allergen simultaneous test (MAST) is widely used as a screening tool for detecting allergen sensitization in dermatologic clinics. We retrospectively reviewed the medical records of patients with MAST results between 2008 and 2014 in our Department of Dermatology. A cluster analysis was performed to elucidate the allergen-specific immunoglobulin (Ig)E cluster pattern. The results of MAST (39 allergen-specific IgEs) from 4,360 cases were analyzed. By cluster analysis, 39items were grouped into 8 clusters. Each cluster had characteristic features. When compared with female, the male group tended to be sensitized more frequently to all tested allergens, except for fungus allergens cluster. The cluster and comparative analysis results demonstrate that the allergen sensitization is clustered, manifesting allergen similarity or co-exposure. Only the fungus cluster allergens tend to sensitize female group more frequently than male group.

  19. Identification and manipulation of the pleuromutilin gene cluster from Clitopilus passeckerianus for increased rapid antibiotic production

    Science.gov (United States)

    Bailey, Andy M.; Alberti, Fabrizio; Kilaru, Sreedhar; Collins, Catherine M.; de Mattos-Shipley, Kate; Hartley, Amanda J.; Hayes, Patrick; Griffin, Alison; Lazarus, Colin M.; Cox, Russell J.; Willis, Christine L.; O'Dwyer, Karen; Spence, David W.; Foster, Gary D.

    2016-05-01

    Semi-synthetic derivatives of the tricyclic diterpene antibiotic pleuromutilin from the basidiomycete Clitopilus passeckerianus are important in combatting bacterial infections in human and veterinary medicine. These compounds belong to the only new class of antibiotics for human applications, with novel mode of action and lack of cross-resistance, representing a class with great potential. Basidiomycete fungi, being dikaryotic, are not generally amenable to strain improvement. We report identification of the seven-gene pleuromutilin gene cluster and verify that using various targeted approaches aimed at increasing antibiotic production in C. passeckerianus, no improvement in yield was achieved. The seven-gene pleuromutilin cluster was reconstructed within Aspergillus oryzae giving production of pleuromutilin in an ascomycete, with a significant increase (2106%) in production. This is the first gene cluster from a basidiomycete to be successfully expressed in an ascomycete, and paves the way for the exploitation of a metabolically rich but traditionally overlooked group of fungi.

  20. Clustering Gene Expression Time Series with Coregionalization: Speed propagation of ALS

    OpenAIRE

    Rahman, Muhammad Arifur; Heath, Paul R.; Lawrence, Neil D.

    2018-01-01

    Clustering of gene expression time series gives insight into which genes may be coregulated, allowing us to discern the activity of pathways in a given microarray experiment. Of particular interest is how a given group of genes varies with different model conditions or genetic background. Amyotrophic lateral sclerosis (ALS), an irreversible diverse neurodegenerative disorder showed consistent phenotypic differences and the disease progression is heterogeneous with significant variability. Thi...

  1. Fine Mapping of Two Wheat Powdery Mildew Resistance Genes Located at the Pm1 Cluster

    Directory of Open Access Journals (Sweden)

    Junchao Liang

    2016-07-01

    Full Text Available Powdery mildew caused by (DC. f. sp. ( is a globally devastating foliar disease of wheat ( L.. More than a dozen genes against this disease, identified from wheat germplasms of different ploidy levels, have been mapped to the region surrounding the locus on the long arm of chromosome 7A, which forms a resistance (-gene cluster. and from einkorn wheat ( L. were two of the genes belonging to this cluster. This study was initiated to fine map these two genes toward map-based cloning. Comparative genomics study showed that macrocolinearity exists between L. chromosome 1 (Bd1 and the – region, which allowed us to develop markers based on the wheat sequences orthologous to genes contained in the Bd1 region. With these and other newly developed and published markers, high-resolution maps were constructed for both and using large F populations. Moreover, a physical map of was constructed through chromosome walking with bacterial artificial chromosome (BAC clones and comparative mapping. Eventually, and were restricted to a 0.12- and 0.86-cM interval, respectively. Based on the closely linked common markers, , , and (another powdery mildew resistance gene in the cluster were not allelic to one another. Severe recombination suppression and disruption of synteny were noted in the region encompassing . These results provided useful information for map-based cloning of the genes in the cluster and interpretation of their evolution.

  2. Clustering of users of digital libraries through log file analysis

    Directory of Open Access Journals (Sweden)

    Juan Antonio Martínez-Comeche

    2017-09-01

    Full Text Available This study analyzes how users perform information retrieval tasks when introducing queries to the Hispanic Digital Library. Clusters of users are differentiated based on their distinct information behavior. The study used the log files collected by the server over a year and different possible clustering algorithms are compared. The k-means algorithm is found to be a suitable clustering method for the analysis of large log files from digital libraries. In the case of the Hispanic Digital Library the results show three clusters of users and the characteristic information behavior of each group is described.

  3. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.

    Science.gov (United States)

    Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy

    2016-01-01

    Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.

  4. A gene network bioinformatics analysis for pemphigoid autoimmune blistering diseases.

    Science.gov (United States)

    Barone, Antonio; Toti, Paolo; Giuca, Maria Rita; Derchi, Giacomo; Covani, Ugo

    2015-07-01

    In this theoretical study, a text mining search and clustering analysis of data related to genes potentially involved in human pemphigoid autoimmune blistering diseases (PAIBD) was performed using web tools to create a gene/protein interaction network. The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database was employed to identify a final set of PAIBD-involved genes and to calculate the overall significant interactions among genes: for each gene, the weighted number of links, or WNL, was registered and a clustering procedure was performed using the WNL analysis. Genes were ranked in class (leader, B, C, D and so on, up to orphans). An ontological analysis was performed for the set of 'leader' genes. Using the above-mentioned data network, 115 genes represented the final set; leader genes numbered 7 (intercellular adhesion molecule 1 (ICAM-1), interferon gamma (IFNG), interleukin (IL)-2, IL-4, IL-6, IL-8 and tumour necrosis factor (TNF)), class B genes were 13, whereas the orphans were 24. The ontological analysis attested that the molecular action was focused on extracellular space and cell surface, whereas the activation and regulation of the immunity system was widely involved. Despite the limited knowledge of the present pathologic phenomenon, attested by the presence of 24 genes revealing no protein-protein direct or indirect interactions, the network showed significant pathways gathered in several subgroups: cellular components, molecular functions, biological processes and the pathologic phenomenon obtained from the Kyoto Encyclopaedia of Genes and Genomes (KEGG) database. The molecular basis for PAIBD was summarised and expanded, which will perhaps give researchers promising directions for the identification of new therapeutic targets.

  5. Identification of Human HK Genes and Gene Expression Regulation Study in Cancer from Transcriptomics Data Analysis

    Science.gov (United States)

    Zhang, Zhang; Liu, Jingxing; Wu, Jiayan; Yu, Jun

    2013-01-01

    The regulation of gene expression is essential for eukaryotes, as it drives the processes of cellular differentiation and morphogenesis, leading to the creation of different cell types in multicellular organisms. RNA-Sequencing (RNA-Seq) provides researchers with a powerful toolbox for characterization and quantification of transcriptome. Many different human tissue/cell transcriptome datasets coming from RNA-Seq technology are available on public data resource. The fundamental issue here is how to develop an effective analysis method to estimate expression pattern similarities between different tumor tissues and their corresponding normal tissues. We define the gene expression pattern from three directions: 1) expression breadth, which reflects gene expression on/off status, and mainly concerns ubiquitously expressed genes; 2) low/high or constant/variable expression genes, based on gene expression level and variation; and 3) the regulation of gene expression at the gene structure level. The cluster analysis indicates that gene expression pattern is higher related to physiological condition rather than tissue spatial distance. Two sets of human housekeeping (HK) genes are defined according to cell/tissue types, respectively. To characterize the gene expression pattern in gene expression level and variation, we firstly apply improved K-means algorithm and a gene expression variance model. We find that cancer-associated HK genes (a HK gene is specific in cancer group, while not in normal group) are expressed higher and more variable in cancer condition than in normal condition. Cancer-associated HK genes prefer to AT-rich genes, and they are enriched in cell cycle regulation related functions and constitute some cancer signatures. The expression of large genes is also avoided in cancer group. These studies will help us understand which cell type-specific patterns of gene expression differ among different cell types, and particularly for cancer. PMID:23382867

  6. A CLUSTERING OF DJA STOCKS - THE APPLICATION IN FINANCE OF A METHOD FIRST USED IN GENE TRAJECTORY STUDY

    Directory of Open Access Journals (Sweden)

    Silaghi Gheorghe Cosmin

    2009-05-01

    Full Text Available Previously we employed the Gene Trajectory Clustering methodology to search for different associations of the stocks composing the DJA index, with the aim of finding different, logic clusters, supported by economic reasons, preferably different than the

  7. A SURVEY ON DOCUMENT CLUSTERING APPROACH FOR COMPUTER FORENSIC ANALYSIS

    OpenAIRE

    Monika Raghuvanshi*, Rahul Patel

    2016-01-01

    In a forensic analysis, large numbers of files are examined. Much of the information comprises of in unstructured format, so it’s quite difficult task for computer forensic to perform such analysis. That’s why to do the forensic analysis of document within a limited period of time require a special approach such as document clustering. This paper review different document clustering algorithms methodologies for example K-mean, K-medoid, single link, complete link, average link in accorandance...

  8. A genomics based discovery of secondary metabolite biosynthetic gene clusters in Aspergillus ustus.

    Directory of Open Access Journals (Sweden)

    Borui Pi

    Full Text Available Secondary metabolites (SMs produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.

  9. A Genomics Based Discovery of Secondary Metabolite Biosynthetic Gene Clusters in Aspergillus ustus

    Science.gov (United States)

    Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong

    2015-01-01

    Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic. PMID:25706180

  10. Merging Galaxy Clusters: Analysis of Simulated Analogs

    Science.gov (United States)

    Nguyen, Jayke; Wittman, David; Cornell, Hunter

    2018-01-01

    The nature of dark matter can be better constrained by observing merging galaxy clusters. However, uncertainty in the viewing angle leads to uncertainty in dynamical quantities such as 3-d velocities, 3-d separations, and time since pericenter. The classic timing argument links these quantities via equations of motion, but neglects effects of nonzero impact parameter (i.e. it assumes velocities are parallel to the separation vector), dynamical friction, substructure, and larger-scale environment. We present a new approach using n-body cosmological simulations that naturally incorporate these effects. By uniformly sampling viewing angles about simulated cluster analogs, we see projected merger parameters in the many possible configurations of a given cluster. We select comparable simulated analogs and evaluate the likelihood of particular merger parameters as a function of viewing angle. We present viewing angle constraints for a sample of observed mergers including the Bullet cluster and El Gordo, and show that the separation vectors are closer to the plane of the sky than previously reported.

  11. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    Science.gov (United States)

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  12. Analysis of Aspects of Innovation in a Brazilian Cluster

    Directory of Open Access Journals (Sweden)

    Adriana Valélia Saraceni

    2012-09-01

    Full Text Available Innovation through clustering has become very important on the increased significance that interaction represents on innovation and learning process concept. This study aims to identify whereas a case analysis on innovation process in a cluster represents on the learning process. Therefore, this study is developed in two stages. First, we used a preliminary case study verifying a cluster innovation analysis and it Innovation Index, for further, exploring a combined body of theory and practice. Further, the second stage is developed by exploring the learning process concept. Both stages allowed us building a theory model for the learning process development in clusters. The main results of the model development come up with a mechanism of improvement implementation on clusters when case studies are applied.

  13. Genetic recombination as a major cause of mutagenesis in the human globin gene clusters.

    Science.gov (United States)

    Borg, Joseph; Georgitsi, Marianthi; Aleporou-Marinou, Vassiliki; Kollia, Panagoula; Patrinos, George P

    2009-12-01

    Homologous recombination is a frequent phenomenon in multigene families and as such it occurs several times in both the alpha- and beta-like globin gene families. In numerous occasions, genetic recombination has been previously implicated as a major mechanism that drives mutagenesis in the human globin gene clusters, either in the form of unequal crossover or gene conversion. Unequal crossover results in the increase or decrease of the human globin gene copies, accompanied in the majority of cases with minor phenotypic consequences, while gene conversion contributes either to maintaining sequence homogeneity or generating sequence diversity. The role of genetic recombination, particularly gene conversion in the evolution of the human globin gene families has been discussed elsewhere. Here, we summarize our current knowledge and review existing experimental evidence outlining the role of genetic recombination in the mutagenic process in the human globin gene families.

  14. Methods for simultaneously identifying coherent local clusters with smooth global patterns in gene expression profiles

    Directory of Open Access Journals (Sweden)

    Lee Yun-Shien

    2008-03-01

    Full Text Available Abstract Background The hierarchical clustering tree (HCT with a dendrogram 1 and the singular value decomposition (SVD with a dimension-reduced representative map 2 are popular methods for two-way sorting the gene-by-array matrix map employed in gene expression profiling. While HCT dendrograms tend to optimize local coherent clustering patterns, SVD leading eigenvectors usually identify better global grouping and transitional structures. Results This study proposes a flipping mechanism for a conventional agglomerative HCT using a rank-two ellipse (R2E, an improved SVD algorithm for sorting purpose seriation by Chen 3 as an external reference. While HCTs always produce permutations with good local behaviour, the rank-two ellipse seriation gives the best global grouping patterns and smooth transitional trends. The resulting algorithm automatically integrates the desirable properties of each method so that users have access to a clustering and visualization environment for gene expression profiles that preserves coherent local clusters and identifies global grouping trends. Conclusion We demonstrate, through four examples, that the proposed method not only possesses better numerical and statistical properties, it also provides more meaningful biomedical insights than other sorting algorithms. We suggest that sorted proximity matrices for genes and arrays, in addition to the gene-by-array expression matrix, can greatly aid in the search for comprehensive understanding of gene expression structures. Software for the proposed methods can be obtained at http://gap.stat.sinica.edu.tw/Software/GAP.

  15. DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis

    Directory of Open Access Journals (Sweden)

    Baseler Michael W

    2007-11-01

    Full Text Available Abstract Background Due to the complex and distributed nature of biological research, our current biological knowledge is spread over many redundant annotation databases maintained by many independent groups. Analysts usually need to visit many of these bioinformatics databases in order to integrate comprehensive annotation information for their genes, which becomes one of the bottlenecks, particularly for the analytic task associated with a large gene list. Thus, a highly centralized and ready-to-use gene-annotation knowledgebase is in demand for high throughput gene functional analysis. Description The DAVID Knowledgebase is built around the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of gene/protein identifiers from a variety of public genomic resources into DAVID gene clusters. The grouping of such identifiers improves the cross-reference capability, particularly across NCBI and UniProt systems, enabling more than 40 publicly available functional annotation sources to be comprehensively integrated and centralized by the DAVID gene clusters. The simple, pair-wise, text format files which make up the DAVID Knowledgebase are freely downloadable for various data analysis uses. In addition, a well organized web interface allows users to query different types of heterogeneous annotations in a high-throughput manner. Conclusion The DAVID Knowledgebase is designed to facilitate high throughput gene functional analysis. For a given gene list, it not only provides the quick accessibility to a wide range of heterogeneous annotation data in a centralized location, but also enriches the level of biological information for an individual gene. Moreover, the entire DAVID Knowledgebase is freely downloadable or searchable at http://david.abcc.ncifcrf.gov/knowledgebase/.

  16. A remarkably stable TipE gene cluster: evolution of insect Para sodium channel auxiliary subunits

    Directory of Open Access Journals (Sweden)

    Li Jia

    2011-11-01

    Full Text Available Abstract Background First identified in fruit flies with temperature-sensitive paralysis phenotypes, the Drosophila melanogaster TipE locus encodes four voltage-gated sodium (NaV channel auxiliary subunits. This cluster of TipE-like genes on chromosome 3L, and a fifth family member on chromosome 3R, are important for the optional expression and functionality of the Para NaV channel but appear quite distinct from auxiliary subunits in vertebrates. Here, we exploited available arthropod genomic resources to trace the origin of TipE-like genes by mapping their evolutionary histories and examining their genomic architectures. Results We identified a remarkably conserved synteny block of TipE-like orthologues with well-maintained local gene arrangements from 21 insect species. Homologues in the water flea, Daphnia pulex, suggest an ancestral pancrustacean repertoire of four TipE-like genes; a subsequent gene duplication may have generated functional redundancy allowing gene losses in the silk moth and mosquitoes. Intronic nesting of the insect TipE gene cluster probably occurred following the divergence from crustaceans, but in the flour beetle and silk moth genomes the clusters apparently escaped from nesting. Across Pancrustacea, TipE gene family members have experienced intronic nesting, escape from nesting, retrotransposition, translocation, and gene loss events while generally maintaining their local gene neighbourhoods. D. melanogaster TipE-like genes exhibit coordinated spatial and temporal regulation of expression distinct from their host gene but well-correlated with their regulatory target, the Para NaV channel, suggesting that functional constraints may preserve the TipE gene cluster. We identified homology between TipE-like NaV channel regulators and vertebrate Slo-beta auxiliary subunits of big-conductance calcium-activated potassium (BKCa channels, which suggests that ion channel regulatory partners have evolved distinct lineage

  17. A Flocking Based algorithm for Document Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.

  18. Form gene clustering method about pan-ethnic-group products based on emotional semantic

    Science.gov (United States)

    Chen, Dengkai; Ding, Jingjing; Gao, Minzhuo; Ma, Danping; Liu, Donghui

    2016-09-01

    The use of pan-ethnic-group products form knowledge primarily depends on a designer's subjective experience without user participation. The majority of studies primarily focus on the detection of the perceptual demands of consumers from the target product category. A pan-ethnic-group products form gene clustering method based on emotional semantic is constructed. Consumers' perceptual images of the pan-ethnic-group products are obtained by means of product form gene extraction and coding and computer aided product form clustering technology. A case of form gene clustering about the typical pan-ethnic-group products is investigated which indicates that the method is feasible. This paper opens up a new direction for the future development of product form design which improves the agility of product design process in the era of Industry 4.0.

  19. Reproducibility of Cognitive Profiles in Psychosis Using Cluster Analysis.

    Science.gov (United States)

    Lewandowski, Kathryn E; Baker, Justin T; McCarthy, Julie M; Norris, Lesley A; Öngür, Dost

    2018-04-01

    Cognitive dysfunction is a core symptom dimension that cuts across the psychoses. Recent findings support classification of patients along the cognitive dimension using cluster analysis; however, data-derived groupings may be highly determined by sampling characteristics and the measures used to derive the clusters, and so their interpretability must be established. We examined cognitive clusters in a cross-diagnostic sample of patients with psychosis and associations with clinical and functional outcomes. We then compared our findings to a previous report of cognitive clusters in a separate sample using a different cognitive battery. Participants with affective or non-affective psychosis (n=120) and healthy controls (n=31) were administered the MATRICS Consensus Cognitive Battery, and clinical and community functioning assessments. Cluster analyses were performed on cognitive variables, and clusters were compared on demographic, cognitive, and clinical measures. Results were compared to findings from our previous report. A four-cluster solution provided a good fit to the data; profiles included a neuropsychologically normal cluster, a globally impaired cluster, and two clusters of mixed profiles. Cognitive burden was associated with symptom severity and poorer community functioning. The patterns of cognitive performance by cluster were highly consistent with our previous findings. We found evidence of four cognitive subgroups of patients with psychosis, with cognitive profiles that map closely to those produced in our previous work. Clusters were associated with clinical and community variables and a measure of premorbid functioning, suggesting that they reflect meaningful groupings: replicable, and related to clinical presentation and functional outcomes. (JINS, 2018, 24, 382-390).

  20. Network Analysis Tools: from biological networks to clusters and pathways.

    Science.gov (United States)

    Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Vanderstocken, Gilles; van Helden, Jacques

    2008-01-01

    Network Analysis Tools (NeAT) is a suite of computer tools that integrate various algorithms for the analysis of biological networks: comparison between graphs, between clusters, or between graphs and clusters; network randomization; analysis of degree distribution; network-based clustering and path finding. The tools are interconnected to enable a stepwise analysis of the network through a complete analytical workflow. In this protocol, we present a typical case of utilization, where the tasks above are combined to decipher a protein-protein interaction network retrieved from the STRING database. The results returned by NeAT are typically subnetworks, networks enriched with additional information (i.e., clusters or paths) or tables displaying statistics. Typical networks comprising several thousands of nodes and arcs can be analyzed within a few minutes. The complete protocol can be read and executed in approximately 1 h.

  1. Evaluation of gene-expression clustering via mutual information distance measure

    Directory of Open Access Journals (Sweden)

    Maimon Oded

    2007-03-01

    Full Text Available Abstract Background The definition of a distance measure plays a key role in the evaluation of different clustering solutions of gene expression profiles. In this empirical study we compare different clustering solutions when using the Mutual Information (MI measure versus the use of the well known Euclidean distance and Pearson correlation coefficient. Results Relying on several public gene expression datasets, we evaluate the homogeneity and separation scores of different clustering solutions. It was found that the use of the MI measure yields a more significant differentiation among erroneous clustering solutions. The proposed measure was also used to analyze the performance of several known clustering algorithms. A comparative study of these algorithms reveals that their "best solutions" are ranked almost oppositely when using different distance measures, despite the found correspondence between these measures when analysing the averaged scores of groups of solutions. Conclusion In view of the results, further attention should be paid to the selection of a proper distance measure for analyzing the clustering of gene expression data.

  2. Unique nucleotide polymorphism of ankyrin gene cluster in ...

    Indian Academy of Sciences (India)

    gene order is nonrandomly distributed in eukaryote genomes. (Lercher et al. 2002 ... Birth in a birth-and-death process relates to the origin of paralogues, presumably ... are small, or the rate of concerted evolution is very slow (Nei et al. 2000).

  3. Sexuality generates diversity in the aflatoxin gene cluster: evidence on a global scale.

    Directory of Open Access Journals (Sweden)

    Geromy G Moore

    Full Text Available Aflatoxins are produced by Aspergillus flavus and A. parasiticus in oil-rich seed and grain crops and are a serious problem in agriculture, with aflatoxin B₁ being the most carcinogenic natural compound known. Sexual reproduction in these species occurs between individuals belonging to different vegetative compatibility groups (VCGs. We examined natural genetic variation in 758 isolates of A. flavus, A. parasiticus and A. minisclerotigenes sampled from single peanut fields in the United States (Georgia, Africa (Benin, Argentina (Córdoba, Australia (Queensland and India (Karnataka. Analysis of DNA sequence variation across multiple intergenic regions in the aflatoxin gene clusters of A. flavus, A. parasiticus and A. minisclerotigenes revealed significant linkage disequilibrium (LD organized into distinct blocks that are conserved across different localities, suggesting that genetic recombination is nonrandom and a global occurrence. To assess the contributions of asexual and sexual reproduction to fixation and maintenance of toxin chemotype diversity in populations from each locality/species, we tested the null hypothesis of an equal number of MAT1-1 and MAT1-2 mating-type individuals, which is indicative of a sexually recombining population. All samples were clone-corrected using multi-locus sequence typing which associates closely with VCG. For both A. flavus and A. parasiticus, when the proportions of MAT1-1 and MAT1-2 were significantly different, there was more extensive LD in the aflatoxin cluster and populations were fixed for specific toxin chemotype classes, either the non-aflatoxigenic class in A. flavus or the B₁-dominant and G₁-dominant classes in A. parasiticus. A mating type ratio close to 1∶1 in A. flavus, A. parasiticus and A. minisclerotigenes was associated with higher recombination rates in the aflatoxin cluster and less pronounced chemotype differences in populations. This work shows that the reproductive nature of

  4. Cluster analysis of typhoid cases in Kota Bharu, Kelantan, Malaysia

    Directory of Open Access Journals (Sweden)

    Nazarudin Safian

    2008-09-01

    Full Text Available Typhoid fever is still a major public health problem globally as well as in Malaysia. This study was done to identify the spatial epidemiology of typhoid fever in the Kota Bharu District of Malaysia as a first step to developing more advanced analysis of the whole country. The main characteristic of the epidemiological pattern that interested us was whether typhoid cases occurred in clusters or whether they were evenly distributed throughout the area. We also wanted to know at what spatial distances they were clustered. All confirmed typhoid cases that were reported to the Kota Bharu District Health Department from the year 2001 to June of 2005 were taken as the samples. From the home address of the cases, the location of the house was traced and a coordinate was taken using handheld GPS devices. Spatial statistical analysis was done to determine the distribution of typhoid cases, whether clustered, random or dispersed. The spatial statistical analysis was done using CrimeStat III software to determine whether typhoid cases occur in clusters, and later on to determine at what distances it clustered. From 736 cases involved in the study there was significant clustering for cases occurring in the years 2001, 2002, 2003 and 2005. There was no significant clustering in year 2004. Typhoid clustering also occurred strongly for distances up to 6 km. This study shows that typhoid cases occur in clusters, and this method could be applicable to describe spatial epidemiology for a specific area. (Med J Indones 2008; 17: 175-82Keywords: typhoid, clustering, spatial epidemiology, GIS

  5. Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-Means Cluster Analysis

    Science.gov (United States)

    de Craen, Saskia; Commandeur, Jacques J. F.; Frank, Laurence E.; Heiser, Willem J.

    2006-01-01

    K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these…

  6. Identification of new genes in a cell envelope-cell division gene cluster of Escherichia coli: cell envelope gene murG.

    Science.gov (United States)

    Salmond, G P; Lutkenhaus, J F; Donachie, W D

    1980-01-01

    We report the identification, cloning, and mapping of a new cell envelope gene, murG. This lies in a group of five genes of similar phenotype (in the order murE murF murG murC ddl) all concerned with peptidoglycan biosynthesis. This group is in a larger cluster of at least 10 genes, all of which are involved in some way with cell envelope growth. Images PMID:6998962

  7. Mapping in an apple (Malus x domestica) F1 segregating population based on physical clustering of differentially expressed genes.

    Science.gov (United States)

    Jensen, Philip J; Fazio, Gennaro; Altman, Naomi; Praul, Craig; McNellis, Timothy W

    2014-04-04

    Apple tree breeding is slow and difficult due to long generation times, self-incompatibility, and complex genetics. The identification of molecular markers linked to traits of interest is a way to expedite the breeding process. In the present study, we aimed to identify genes whose steady-state transcript abundance was associated with inheritance of specific traits segregating in an apple (Malus × domestica) rootstock F1 breeding population, including resistance to powdery mildew (Podosphaera leucotricha) disease and woolly apple aphid (Eriosoma lanigerum). Transcription profiling was performed for 48 individual F1 apple trees from a cross of two highly heterozygous parents, using RNA isolated from healthy, actively-growing shoot tips and a custom apple DNA oligonucleotide microarray representing 26,000 unique transcripts. Genome-wide expression profiles were not clear indicators of powdery mildew or woolly apple aphid resistance phenotype. However, standard differential gene expression analysis between phenotypic groups of trees revealed relatively small sets of genes with trait-associated expression levels. For example, thirty genes were identified that were differentially expressed between trees resistant and susceptible to powdery mildew. Interestingly, the genes encoding twenty-four of these transcripts were physically clustered on chromosome 12. Similarly, seven genes were identified that were differentially expressed between trees resistant and susceptible to woolly apple aphid, and the genes encoding five of these transcripts were also clustered, this time on chromosome 17. In each case, the gene clusters were in the vicinity of previously identified major quantitative trait loci for the corresponding trait. Similar results were obtained for a series of molecular traits. Several of the differentially expressed genes were used to develop DNA polymorphism markers linked to powdery mildew disease and woolly apple aphid resistance. Gene expression profiling

  8. Comparative genomic and transcriptomic analysis of selected fatty acid biosynthesis genes and CNL disease resistance genes in oil palm

    Science.gov (United States)

    Rosli, Rozana; Amiruddin, Nadzirah; Ab Halim, Mohd Amin; Chan, Pek-Lan; Chan, Kuang-Lim; Azizi, Norazah; Morris, Priscilla E.; Leslie Low, Eng-Ti; Ong-Abdullah, Meilina; Sambanthamurthi, Ravigadevi; Singh, Rajinder

    2018-01-01

    Comparative genomics and transcriptomic analyses were performed on two agronomically important groups of genes from oil palm versus other major crop species and the model organism, Arabidopsis thaliana. The first analysis was of two gene families with key roles in regulation of oil quality and in particular the accumulation of oleic acid, namely stearoyl ACP desaturases (SAD) and acyl-acyl carrier protein (ACP) thioesterases (FAT). In both cases, these were found to be large gene families with complex expression profiles across a wide range of tissue types and developmental stages. The detailed classification of the oil palm SAD and FAT genes has enabled the updating of the latest version of the oil palm gene model. The second analysis focused on disease resistance (R) genes in order to elucidate possible candidates for breeding of pathogen tolerance/resistance. Ortholog analysis showed that 141 out of the 210 putative oil palm R genes had homologs in banana and rice. These genes formed 37 clusters with 634 orthologous genes. Classification of the 141 oil palm R genes showed that the genes belong to the Kinase (7), CNL (95), MLO-like (8), RLK (3) and Others (28) categories. The CNL R genes formed eight clusters. Expression data for selected R genes also identified potential candidates for breeding of disease resistance traits. Furthermore, these findings can provide information about the species evolution as well as the identification of agronomically important genes in oil palm and other major crops. PMID:29672525

  9. Comparative genomic and transcriptomic analysis of selected fatty acid biosynthesis genes and CNL disease resistance genes in oil palm.

    Science.gov (United States)

    Rosli, Rozana; Amiruddin, Nadzirah; Ab Halim, Mohd Amin; Chan, Pek-Lan; Chan, Kuang-Lim; Azizi, Norazah; Morris, Priscilla E; Leslie Low, Eng-Ti; Ong-Abdullah, Meilina; Sambanthamurthi, Ravigadevi; Singh, Rajinder; Murphy, Denis J

    2018-01-01

    Comparative genomics and transcriptomic analyses were performed on two agronomically important groups of genes from oil palm versus other major crop species and the model organism, Arabidopsis thaliana. The first analysis was of two gene families with key roles in regulation of oil quality and in particular the accumulation of oleic acid, namely stearoyl ACP desaturases (SAD) and acyl-acyl carrier protein (ACP) thioesterases (FAT). In both cases, these were found to be large gene families with complex expression profiles across a wide range of tissue types and developmental stages. The detailed classification of the oil palm SAD and FAT genes has enabled the updating of the latest version of the oil palm gene model. The second analysis focused on disease resistance (R) genes in order to elucidate possible candidates for breeding of pathogen tolerance/resistance. Ortholog analysis showed that 141 out of the 210 putative oil palm R genes had homologs in banana and rice. These genes formed 37 clusters with 634 orthologous genes. Classification of the 141 oil palm R genes showed that the genes belong to the Kinase (7), CNL (95), MLO-like (8), RLK (3) and Others (28) categories. The CNL R genes formed eight clusters. Expression data for selected R genes also identified potential candidates for breeding of disease resistance traits. Furthermore, these findings can provide information about the species evolution as well as the identification of agronomically important genes in oil palm and other major crops.

  10. Investigation of pathogenic genes in peri-implantitis from implant clustering failure patients: a whole-exome sequencing pilot study.

    Directory of Open Access Journals (Sweden)

    Soohyung Lee

    Full Text Available Peri-implantitis is a frequently occurring gum disease linked to multi-factorial traits with various environmental and genetic causalities and no known concrete pathogenesis. The varying severity of peri-implantitis among patients with relatively similar environments suggests a genetic aspect which needs to be investigated to understand and regulate the pathogenesis of the disease. Six unrelated individuals with multiple clusterization implant failure due to severe peri-implantitis were chosen for this study. These six individuals had relatively healthy lifestyles, with minimal environmental causalities affecting peri-implantitis. Research was undertaken to investigate pathogenic genes in peri-implantitis albeit with a small number of subjects and incomplete elimination of environmental causalities. Whole-exome sequencing was performed on collected saliva samples via self DNA collection kit. Common variants with minor allele frequencies (MAF > = 0.05 from all control datasets were eliminated and variants having high and moderate impact and loss of function were used for comparison. Gene set enrichment analysis was performed to reveal functional groups associated with the genetic variants. 2,022 genes were left after filtering against dbSNP, the 1000 Genomes East Asian population, and healthy Korean randomized subsample data (GSK project. 175 (p-value <0.05 out of 927 gene sets were obtained via GSEA (DAVID. The top 10 was chosen (p-value <0.05 from cluster enrichment showing significance of cytoskeleton, cell adhesion, and metal ion binding. Network analysis was applied to find relationships between functional clusters. Among the functional groups, ion metal binding was located in the center of all clusters, indicating dysfunction of regulation in metal ion concentration might affect cell morphology or cell adhesion, resulting in implant failure. This result may demonstrate the feasibility of and provide pilot data for a larger research

  11. Ancestral Variations of the PCDHG Gene Cluster Predispose to Dyslexia in a Multiplex Family

    Directory of Open Access Journals (Sweden)

    Teesta Naskar

    2018-02-01

    Full Text Available Dyslexia is a heritable neurodevelopmental disorder characterized by difficulties in reading and writing. In this study, we describe the identification of a set of 17 polymorphisms located across 1.9 Mb region on chromosome 5q31.3, encompassing genes of the PCDHG cluster, TAF7, PCDH1 and ARHGAP26, dominantly inherited with dyslexia in a multi-incident family. Strikingly, the non-risk form of seven variations of the PCDHG cluster, are preponderant in the human lineage, while risk alleles are ancestral and conserved across Neanderthals to non-human primates. Four of these seven ancestral variations (c.460A > C [p.Ile154Leu], c.541G > A [p.Ala181Thr], c.2036G > C [p.Arg679Pro] and c.2059A > G [p.Lys687Glu] result in amino acid alterations. p.Ile154Leu and p.Ala181Thr are present at EC2: EC3 interacting interface of γA3-PCDH and γA4-PCDH respectively might affect trans-homophilic interaction and hence neuronal connectivity. p.Arg679Pro and p.Lys687Glu are present within the linker region connecting trans-membrane to extracellular domain. Sequence analysis indicated the importance of p.Ile154, p.Arg679 and p.Lys687 in maintaining class specificity. Thus the observed association of PCDHG genes encoding neural adhesion proteins reinforces the hypothesis of aberrant neuronal connectivity in the pathophysiology of dyslexia. Additionally, the striking conservation of the identified variants indicates a role of PCDHG in the evolution of highly specialized cognitive skills critical to reading.

  12. Defining reference sequences for Nocardia species by similarity and clustering analyses of 16S rRNA gene sequence data.

    Directory of Open Access Journals (Sweden)

    Manal Helal

    Full Text Available BACKGROUND: The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. METHODS: A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization. RESULTS: The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52% corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as 'centroids' in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578. CONCLUSION: The identification of centroids of 16S rRNA gene sequence clusters using novel distance matrix clustering enables the identification of the most representative sequences for each individual species of Nocardia and allows the quantitation of inter- and intra

  13. Sequencing, physical organization and kinetic expression of the patulin biosynthetic gene cluster from Penicillium expansum

    International Nuclear Information System (INIS)

    Tannous, J.; El Khoury, R.; El Khoury, A.; Lteif, R.; Snini, S.; Lippi, Y.; Oswald, I.; Olivier, P.; Atoui, A.

    2014-01-01

    Patulin is a polyketide-derived mycotoxin produced by numerous filamentous fungi. Among them, Penicillium expansum is by far the most problematic species. This fungus is a destructive phytopathogen capable of growing on fruit, provoking the blue mold decay of apples and producing significant amounts of patulin. The biosynthetic pathway of this mycotoxin is chemically well-characterized, but its genetic bases remain largely unknown with only few characterized genes in less economic relevant species. The present study consisted of the identification and positional organization of the patulin gene cluster in P. expansum strain NRRL 35695. Several amplification reactions were performed with degenerative primers that were designed based on sequences from the orthologous genes available in other species. An improved genome Walking approach was used in order to sequence the remaining adjacent genes of the cluster. RACE-PCR was also carried out from mRNAs to determine the start and stop codons of the coding sequences. The patulin gene cluster in P. expansum consists of 15 genes in the following order: patH, patG, patF, patE, patD, patC, patB, patA, patM, patN, patO, patL, patI, patJ, and patK. These genes share 60–70% of identity with orthologous genes grouped differently, within a putative patulin cluster described in a non-producing strain of Aspergillus clavatus. The kinetics of patulin cluster genes expression was studied under patulin-permissive conditions (natural apple-based medium) and patulin-restrictive conditions (Eagle's minimal essential medium), and demonstrated a significant association between gene expression and patulin production. In conclusion, the sequence of the patulin cluster in P. expansum constitutes a key step for a better understanding of themechanisms leading to patulin production in this fungus. It will allow the role of each gene to be elucidated, and help to define strategies to reduce patulin production in apple-based products

  14. Using cluster analysis to organize and explore regional GPS velocities

    Science.gov (United States)

    Simpson, Robert W.; Thatcher, Wayne; Savage, James C.

    2012-01-01

    Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.

  15. Isolation of Hox cluster genes from insects reveals an accelerated sequence evolution rate.

    Directory of Open Access Journals (Sweden)

    Heike Hadrys

    Full Text Available Among gene families it is the Hox genes and among metazoan animals it is the insects (Hexapoda that have attracted particular attention for studying the evolution of development. Surprisingly though, no Hox genes have been isolated from 26 out of 35 insect orders yet, and the existing sequences derive mainly from only two orders (61% from Hymenoptera and 22% from Diptera. We have designed insect specific primers and isolated 37 new partial homeobox sequences of Hox cluster genes (lab, pb, Hox3, ftz, Antp, Scr, abd-a, Abd-B, Dfd, and Ubx from six insect orders, which are crucial to insect phylogenetics. These new gene sequences provide a first step towards comparative Hox gene studies in insects. Furthermore, comparative distance analyses of homeobox sequences reveal a correlation between gene divergence rate and species radiation success with insects showing the highest rate of homeobox sequence evolution.

  16. Biosynthesis of Akaeolide and Lorneic Acids and Annotation of Type I Polyketide Synthase Gene Clusters in the Genome of Streptomyces sp. NPS554

    Directory of Open Access Journals (Sweden)

    Tao Zhou

    2015-01-01

    Full Text Available The incorporation pattern of biosynthetic precursors into two structurally unique polyketides, akaeolide and lorneic acid A, was elucidated by feeding experiments with 13C-labeled precursors. In addition, the draft genome sequence of the producer, Streptomyces sp. NPS554, was performed and the biosynthetic gene clusters for these polyketides were identified. The putative gene clusters contain all the polyketide synthase (PKS domains necessary for assembly of the carbon skeletons. Combined with the 13C-labeling results, gene function prediction enabled us to propose biosynthetic pathways involving unusual carbon-carbon bond formation reactions. Genome analysis also indicated the presence of at least ten orphan type I PKS gene clusters that might be responsible for the production of new polyketides.

  17. A Novel Divisive Hierarchical Clustering Algorithm for Geospatial Analysis

    Directory of Open Access Journals (Sweden)

    Shaoning Li

    2017-01-01

    Full Text Available In the fields of geographic information systems (GIS and remote sensing (RS, the clustering algorithm has been widely used for image segmentation, pattern recognition, and cartographic generalization. Although clustering analysis plays a key role in geospatial modelling, traditional clustering methods are limited due to computational complexity, noise resistant ability and robustness. Furthermore, traditional methods are more focused on the adjacent spatial context, which makes it hard for the clustering methods to be applied to multi-density discrete objects. In this paper, a new method, cell-dividing hierarchical clustering (CDHC, is proposed based on convex hull retraction. The main steps are as follows. First, a convex hull structure is constructed to describe the global spatial context of geospatial objects. Then, the retracting structure of each borderline is established in sequence by setting the initial parameter. The objects are split into two clusters (i.e., “sub-clusters” if the retracting structure intersects with the borderlines. Finally, clusters are repeatedly split and the initial parameter is updated until the terminate condition is satisfied. The experimental results show that CDHC separates the multi-density objects from noise sufficiently and also reduces complexity compared to the traditional agglomerative hierarchical clustering algorithm.

  18. A Distributed Flocking Approach for Information Stream Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Intelligence analysts are currently overwhelmed with the amount of information streams generated everyday. There is a lack of comprehensive tool that can real-time analyze the information streams. Document clustering analysis plays an important role in improving the accuracy of information retrieval. However, most clustering technologies can only be applied for analyzing the static document collection because they normally require a large amount of computation resource and long time to get accurate result. It is very difficult to cluster a dynamic changed text information streams on an individual computer. Our early research has resulted in a dynamic reactive flock clustering algorithm which can continually refine the clustering result and quickly react to the change of document contents. This character makes the algorithm suitable for cluster analyzing dynamic changed document information, such as text information stream. Because of the decentralized character of this algorithm, a distributed approach is a very natural way to increase the clustering speed of the algorithm. In this paper, we present a distributed multi-agent flocking approach for the text information stream clustering and discuss the decentralized architectures and communication schemes for load balance and status information synchronization in this approach.

  19. Sequence analysis of the N-acetyltransferase 2 gene (NAT2) among ...

    African Journals Online (AJOL)

    Yazun Bashir Jarrar

    2017-11-26

    Nov 26, 2017 ... Sequence analysis of the N-acetyltransferase 2 gene (NAT2) among Jordanian volunteers, Libyan. Journal of Medicine .... For molecular modeling of NAT2 protein, visualized ..... cal clustering. .... cular dynamics simulation.

  20. Analysis and comparison of very large metagenomes with fast clustering and functional annotation

    Directory of Open Access Journals (Sweden)

    Li Weizhong

    2009-10-01

    Full Text Available Abstract Background The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. Results The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes". Conclusion RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from http://tools.camera.calit2.net/camera/rammcap/.

  1. The effect of alcohol on the differential expression of cluster of differentiation 14 gene, associated pathways, and genetic network.

    Directory of Open Access Journals (Sweden)

    Diana X Zhou

    Full Text Available Alcohol consumption affects human health in part by compromising the immune system. In this study, we examined the expression of the Cd14 (cluster of differentiation 14 gene, which is involved in the immune system through a proinflammatory cascade. Expression was evaluated in BXD mice treated with saline or acute 1.8 g/kg i.p. ethanol (12.5% v/v. Hippocampal gene expression data were generated to examine differential expression and to perform systems genetics analyses. The Cd14 gene expression showed significant changes among the BXD strains after ethanol treatment, and eQTL mapping revealed that Cd14 is a cis-regulated gene. We also identified eighteen ethanol-related phenotypes correlated with Cd14 expression related to either ethanol responses or ethanol consumption. Pathway analysis was performed to identify possible biological pathways involved in the response to ethanol and Cd14. We also constructed a genetic network for Cd14 using the top 20 correlated genes and present several genes possibly involved in Cd14 and ethanol responses based on differential gene expression. In conclusion, we found Cd14, along with several other genes and pathways, to be involved in ethanol responses in the hippocampus, such as increased susceptibility to lipopolysaccharides and neuroinflammation.

  2. Accurate prediction of secondary metabolite gene clusters in filamentous fungi

    DEFF Research Database (Denmark)

    Andersen, Mikael Rørdam; Nielsen, Jakob Blæsbjerg; Klitgaard, Andreas

    2013-01-01

    Biosynthetic pathways of secondary metabolites from fungi are currently subject to an intense effort to elucidate the genetic basis for these compounds due to their large potential within pharmaceutics and synthetic biochemistry. The preferred method is methodical gene deletions to identify...... used A. nidulans for our method development and validation due to the wealth of available biochemical data, but the method can be applied to any fungus with a sequenced and assembled genome, thus supporting further secondary metabolite pathway elucidation in the fungal kingdom....

  3. Cluster analysis of clinical data identifies fibromyalgia subgroups.

    Directory of Open Access Journals (Sweden)

    Elisa Docampo

    Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.

  4. Clustering Trajectories by Relevant Parts for Air Traffic Analysis.

    Science.gov (United States)

    Andrienko, Gennady; Andrienko, Natalia; Fuchs, Georg; Garcia, Jose Manuel Cordero

    2018-01-01

    Clustering of trajectories of moving objects by similarity is an important technique in movement analysis. Existing distance functions assess the similarity between trajectories based on properties of the trajectory points or segments. The properties may include the spatial positions, times, and thematic attributes. There may be a need to focus the analysis on certain parts of trajectories, i.e., points and segments that have particular properties. According to the analysis focus, the analyst may need to cluster trajectories by similarity of their relevant parts only. Throughout the analysis process, the focus may change, and different parts of trajectories may become relevant. We propose an analytical workflow in which interactive filtering tools are used to attach relevance flags to elements of trajectories, clustering is done using a distance function that ignores irrelevant elements, and the resulting clusters are summarized for further analysis. We demonstrate how this workflow can be useful for different analysis tasks in three case studies with real data from the domain of air traffic. We propose a suite of generic techniques and visualization guidelines to support movement data analysis by means of relevance-aware trajectory clustering.

  5. IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites.

    Science.gov (United States)

    Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Szeto, Ernest; Huang, Jinghua; Reddy, T B K; Cimermančič, Peter; Fischbach, Michael A; Ivanova, Natalia N; Markowitz, Victor M; Kyrpides, Nikos C; Pati, Amrita

    2015-07-14

    In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of "big" genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to

  6. Genetic studies on the APOA1-C3-A5 gene cluster in Asian Indians with premature coronary artery disease

    Directory of Open Access Journals (Sweden)

    Hebbagodi Sridhara

    2008-09-01

    Full Text Available Abstract Background The APOA1-C3-A5 gene cluster plays an important role in the regulation of lipids. Asian Indians have an increased tendency for abnormal lipid levels and high risk of Coronary Artery Disease (CAD. Therefore, the present study aimed to elucidate the relationship of four single nucleotide polymorphisms (SNPs in the Apo11q cluster, namely the -75G>A, +83C>T SNPs in the APOA1 gene, the Sac1 SNP in the APOC3 gene and the S19W variant in the APOA5 gene to plasma lipids and CAD in 190 affected sibling pairs (ASPs belonging to Asian Indian families with a strong CAD history. Methods & results Genotyping and lipid assays were carried out using standard protocols. Plasma lipids showed a strong heritability (h2 48% – 70%; P P A (LOD score 2.77 SNPs by single-point analysis (P A (pi 0.56 and +83C>T (pi 0.52 (P P A SNPs along with hypertension showed maximized correlations with TC, TG and Apo B by association analysis. Conclusion The APOC3-Sac1 SNP is an important genetic variant that is associated with CAD through its interaction with plasma lipids and other standard risk factors among Asian Indians.

  7. Cluster analysis of Southeastern U.S. climate stations

    Science.gov (United States)

    Stooksbury, D. E.; Michaels, P. J.

    1991-09-01

    A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.

  8. Synteny in toxigenic Fusarium species: the fumonisin gene cluster and the mating type region as examples

    NARCIS (Netherlands)

    Waalwijk, C.; Lee, van der T.A.J.; Vries, de P.M.; Hesselink, T.; Arts, J.; Kema, G.H.J.

    2004-01-01

    A comparative genomic approach was used to study the mating type locus and the gene cluster involved in toxin production ( fumonisin) in Fusarium proliferatum, a pathogen with a wide host range and a complex toxin profile. A BAC library, generated from F. proliferatum isolate ITEM 2287, was used to

  9. Evolutionary history of the phl gene cluster in the plant-associated bacterium Pseudomonas fluorescens

    NARCIS (Netherlands)

    Moynihan, J.A.; Morrissey, J.P.; Coppoolse, E.; Stiekema, W.J.; O'Gara, F.; Boyd, E.F.

    2009-01-01

    Pseudomonas fluorescens is of agricultural and economic importance as a biological control agent largely because of its plant-association and production of secondary metabolites, in particular 2, 4-diacetylphloroglucinol (2, 4-DAPG). This polyketide, which is encoded by the eight gene phl cluster,

  10. Molecular population genetics of the β-esterase gene cluster of ...

    Indian Academy of Sciences (India)

    We suggest that the demographic history (bottleneck and admixture of genetically differentiated populations) is the major factor shaping the pattern of nucleotide polymorphism in the -esterase gene cluster. However there are some 'footprints' of directional and balancing selection shaping specific distribution of nucleotide ...

  11. Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea

    OpenAIRE

    Wolf Yuri I; Novichkov Pavel S; Sorokin Alexander V; Makarova Kira S; Koonin Eugene V

    2007-01-01

    Abstract Background An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs ...

  12. Development of small scale cluster computer for numerical analysis

    Science.gov (United States)

    Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.

    2017-09-01

    In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.

  13. IMG-ABC: An Atlas of Biosynthetic Gene Clusters to Fuel the Discovery of Novel Secondary Metabolites

    Energy Technology Data Exchange (ETDEWEB)

    Chen, I-Min; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Huang, Jinghua; Reddy, T. B.K.; Cimermancic, Peter; Fischbach, Michael; Ivanova, Natalia; Markowitz, Victor; Kyrpides, Nikos; Pati, Amrita

    2014-10-28

    In the discovery of secondary metabolites (SMs), large-scale analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of relevant computational resources. We present IMG-ABC (https://img.jgi.doe.gov/abc/) -- An Atlas of Biosynthetic gene Clusters within the Integrated Microbial Genomes (IMG) system1. IMG-ABC is a rich repository of both validated and predicted biosynthetic clusters (BCs) in cultured isolates, single-cells and metagenomes linked with the SM chemicals they produce and enhanced with focused analysis tools within IMG. The underlying scalable framework enables traversal of phylogenetic dark matter and chemical structure space -- serving as a doorway to a new era in the discovery of novel molecules.

  14. Clustering of two genes putatively involved in cyanate detoxification evolved recently and independently in multiple fungal lineages

    Science.gov (United States)

    Fungi that have the enzymes cyanase and carbonic anhydrase show a limited capacity to detoxify cyanate, a fungicide employed by both plants and humans. Here, we describe a novel two-gene cluster that comprises duplicated cyanase and carbonic anhydrase copies, which we name the CCA gene cluster, trac...

  15. Predicting healthcare outcomes in prematurely born infants using cluster analysis.

    Science.gov (United States)

    MacBean, Victoria; Lunt, Alan; Drysdale, Simon B; Yarzi, Muska N; Rafferty, Gerrard F; Greenough, Anne

    2018-05-23

    Prematurely born infants are at high risk of respiratory morbidity following neonatal unit discharge, though prediction of outcomes is challenging. We have tested the hypothesis that cluster analysis would identify discrete groups of prematurely born infants with differing respiratory outcomes during infancy. A total of 168 infants (median (IQR) gestational age 33 (31-34) weeks) were recruited in the neonatal period from consecutive births in a tertiary neonatal unit. The baseline characteristics of the infants were used to classify them into hierarchical agglomerative clusters. Rates of viral lower respiratory tract infections (LRTIs) were recorded for 151 infants in the first year after birth. Infants could be classified according to birth weight and duration of neonatal invasive mechanical ventilation (MV) into three clusters. Cluster one (MV ≤5 days) had few LRTIs. Clusters two and three (both MV ≥6 days, but BW ≥or <882 g respectively), had significantly higher LRTI rates. Cluster two had a higher proportion of infants experiencing respiratory syncytial virus LRTIs (P = 0.01) and cluster three a higher proportion of rhinovirus LRTIs (P < 0.001) CONCLUSIONS: Readily available clinical data allowed classification of prematurely born infants into one of three distinct groups with differing subsequent respiratory morbidity in infancy. © 2018 Wiley Periodicals, Inc.

  16. New gene cluster from the thermophile Bacillus fordii MH602 in the conversion of DL-5-substituted hydantoins to L-amino acids.

    Science.gov (United States)

    Mei, Yan-Zhen; Wan, Yong-Min; He, Bing-Fang; Ying, Han-Jie; Ouyang, Ping-Kai

    2009-12-01

    The thermophile Bacillus fordii MH602 was screened for stereospecifically hydrolyzing DL-5-substituted hydantoins to L-alpha-amino acids. Since the reaction at higher temperature, the advantageous for enhancement of substrate solubility and for racemization of DL-5-substituted hydantoins during the conversion were achieved. The hydantoin metabolism gene cluster from thermophile was firstly reported in this paper. The genes involved in hydantoin utilization (hyu) were isolated on an 8.2 kb DNA fragment by Restriction Site-dependent PCR, and six ORFs were identified by DNA sequence analysis. The hyu gene cluster contained four genes with novel cluster organization characteristics: the hydantoinase gene hyuH, putative transport protein hyuP, hyperprotein hyuHP, and L-carbamoylase gene hyuC. The hyuH and hyuC genes were heterogeneously expressed in E. coli. The results indicated that hyuH and hyuC are involved in the conversion of DL-5-substituted hydantoins to an N-carbamyl intermediate that is subsequently converted to L-alpha-amino acids. Hydantoinase and carbamoylase from B. fordii MH602 comparing respectively with reported hydantoinase and carbamoylase showed the highest identities of 71% and 39%. The novel cluster organization characteristics and the difference of the key enzymes between thermopile B. fordii MH602 and other mesophiles were presumed to be related to the evolutionary origins of concerned metabolism.

  17. A highly divergent gene cluster in honey bees encodes a novel silk family

    OpenAIRE

    Sutherland, Tara D.; Campbell, Peter M.; Weisman, Sarah; Trueman, Holly E.; Sriskantha, Alagacone; Wanjura, Wolfgang J.; Haritos, Victoria S.

    2006-01-01

    The pupal cocoon of the domesticated silk moth Bombyx mori is the best known and most extensively studied insect silk. It is not widely known that Apis mellifera larvae also produce silk. We have used a combination of genomic and proteomic techniques to identify four honey bee fiber genes (AmelFibroin1–4) and two silk-associated genes (AmelSA1 and 2). The four fiber genes are small, comprise a single exon each, and are clustered on a short genomic region where the open reading frames are GC-r...

  18. Strategies to regulate transcription factor-mediated gene positioning and interchromosomal clustering at the nuclear periphery.

    Science.gov (United States)

    Randise-Hinchliff, Carlo; Coukos, Robert; Sood, Varun; Sumner, Michael Chas; Zdraljevic, Stefan; Meldi Sholl, Lauren; Garvey Brickner, Donna; Ahmed, Sara; Watchmaker, Lauren; Brickner, Jason H

    2016-03-14

    In budding yeast, targeting of active genes to the nuclear pore complex (NPC) and interchromosomal clustering is mediated by transcription factor (TF) binding sites in the gene promoters. For example, the binding sites for the TFs Put3, Ste12, and Gcn4 are necessary and sufficient to promote positioning at the nuclear periphery and interchromosomal clustering. However, in all three cases, gene positioning and interchromosomal clustering are regulated. Under uninducing conditions, local recruitment of the Rpd3(L) histone deacetylase by transcriptional repressors blocks Put3 DNA binding. This is a general function of yeast repressors: 16 of 21 repressors blocked Put3-mediated subnuclear positioning; 11 of these required Rpd3. In contrast, Ste12-mediated gene positioning is regulated independently of DNA binding by mitogen-activated protein kinase phosphorylation of the Dig2 inhibitor, and Gcn4-dependent targeting is up-regulated by increasing Gcn4 protein levels. These different regulatory strategies provide either qualitative switch-like control or quantitative control of gene positioning over different time scales. © 2016 Randise-Hinchliff et al.

  19. A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data

    Directory of Open Access Journals (Sweden)

    Scherer Stephen W

    2011-05-01

    Full Text Available Abstract Background Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. Results We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. Conclusions The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

  20. Biclustering methods: biological relevance and application in gene expression analysis.

    Directory of Open Access Journals (Sweden)

    Ali Oghabian

    Full Text Available DNA microarray technologies are used extensively to profile the expression levels of thousands of genes under various conditions, yielding extremely large data-matrices. Thus, analyzing this information and extracting biologically relevant knowledge becomes a considerable challenge. A classical approach for tackling this challenge is to use clustering (also known as one-way clustering methods where genes (or respectively samples are grouped together based on the similarity of their expression profiles across the set of all samples (or respectively genes. An alternative approach is to develop biclustering methods to identify local patterns in the data. These methods extract subgroups of genes that are co-expressed across only a subset of samples and may feature important biological or medical implications. In this study we evaluate 13 biclustering and 2 clustering (k-means and hierarchical methods. We use several approaches to compare their performance on two real gene expression data sets. For this purpose we apply four evaluation measures in our analysis: (1 we examine how well the considered (biclustering methods differentiate various sample types; (2 we evaluate how well the groups of genes discovered by the (biclustering methods are annotated with similar Gene Ontology categories; (3 we evaluate the capability of the methods to differentiate genes that are known to be specific to the particular sample types we study and (4 we compare the running time of the algorithms. In the end, we conclude that as long as the samples are well defined and annotated, the contamination of the samples is limited, and the samples are well replicated, biclustering methods such as Plaid and SAMBA are useful for discovering relevant subsets of genes and samples.

  1. Cluster analysis of radionuclide concentrations in beach sand

    NARCIS (Netherlands)

    de Meijer, R.J.; James, I.; Jennings, P.J.; Keoyers, J.E.

    This paper presents a method in which natural radionuclide concentrations of beach sand minerals are traced along a stretch of coast by cluster analysis. This analysis yields two groups of mineral deposit with different origins. The method deviates from standard methods of following dispersal of

  2. Principal Component Clustering Approach to Teaching Quality Discriminant Analysis

    Science.gov (United States)

    Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan

    2016-01-01

    Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…

  3. Open reading frame 176 in the photosynthesis gene cluster of Rhodobacter capsulatus encodes idi, a gene for isopentenyl diphosphate isomerase.

    OpenAIRE

    Hahn, F M; Baker, J A; Poulter, C D

    1996-01-01

    Isopentenyl diphosphate (IPP) isomerase catalyzes an essential activation step in the isoprenoid biosynthetic pathway. A database search based on probes from the highly conserved regions in three eukaryotic IPP isomerases revealed substantial similarity with ORF176 in the photosynthesis gene cluster in Rhodobacter capsulatus. The open reading frame was cloned into an Escherichia coli expression vector. The encoded 20-kDa protein, which was purified in two steps by ion exchange and hydrophobic...

  4. Characterizing Suicide in Toronto: An Observational Study and Cluster Analysis

    Science.gov (United States)

    Sinyor, Mark; Schaffer, Ayal; Streiner, David L

    2014-01-01

    Objective: To determine whether people who have died from suicide in a large epidemiologic sample form clusters based on demographic, clinical, and psychosocial factors. Method: We conducted a coroner’s chart review for 2886 people who died in Toronto, Ontario, from 1998 to 2010, and whose death was ruled as suicide by the Office of the Chief Coroner of Ontario. A cluster analysis using known suicide risk factors was performed to determine whether suicide deaths separate into distinct groups. Clusters were compared according to person- and suicide-specific factors. Results: Five clusters emerged. Cluster 1 had the highest proportion of females and nonviolent methods, and all had depression and a past suicide attempt. Cluster 2 had the highest proportion of people with a recent stressor and violent suicide methods, and all were married. Cluster 3 had mostly males between the ages of 20 and 64, and all had either experienced recent stressors, suffered from mental illness, or had a history of substance abuse. Cluster 4 had the youngest people and the highest proportion of deaths by jumping from height, few were married, and nearly one-half had bipolar disorder or schizophrenia. Cluster 5 had all unmarried people with no prior suicide attempts, and were the least likely to have an identified mental illness and most likely to leave a suicide note. Conclusions: People who die from suicide assort into different patterns of demographic, clinical, and death-specific characteristics. Identifying and studying subgroups of suicides may advance our understanding of the heterogeneous nature of suicide and help to inform development of more targeted suicide prevention strategies. PMID:24444321

  5. Pattern recognition in menstrual bleeding diaries by statistical cluster analysis

    Directory of Open Access Journals (Sweden)

    Wessel Jens

    2009-07-01

    Full Text Available Abstract Background The aim of this paper is to empirically identify a treatment-independent statistical method to describe clinically relevant bleeding patterns by using bleeding diaries of clinical studies on various sex hormone containing drugs. Methods We used the four cluster analysis methods single, average and complete linkage as well as the method of Ward for the pattern recognition in menstrual bleeding diaries. The optimal number of clusters was determined using the semi-partial R2, the cubic cluster criterion, the pseudo-F- and the pseudo-t2-statistic. Finally, the interpretability of the results from a gynecological point of view was assessed. Results The method of Ward yielded distinct clusters of the bleeding diaries. The other methods successively chained the observations into one cluster. The optimal number of distinctive bleeding patterns was six. We found two desirable and four undesirable bleeding patterns. Cyclic and non cyclic bleeding patterns were well separated. Conclusion Using this cluster analysis with the method of Ward medications and devices having an impact on bleeding can be easily compared and categorized.

  6. Technology Clusters Exploration for Patent Portfolio through Patent Abstract Analysis

    Directory of Open Access Journals (Sweden)

    Gabjo Kim

    2016-12-01

    Full Text Available This study explores technology clusters through patent analysis. The aim of exploring technology clusters is to grasp competitors’ levels of sustainable research and development (R&D and establish a sustainable strategy for entering an industry. To achieve this, we first grouped the patent documents with similar technologies by applying affinity propagation (AP clustering, which is effective while grouping large amounts of data. Next, in order to define the technology clusters, we adopted the term frequency-inverse document frequency (TF-IDF weight, which lists the terms in order of importance. We collected the patent data of Korean electric car companies from the United States Patent and Trademark Office (USPTO to verify our proposed methodology. As a result, our proposed methodology presents more detailed information on the Korean electric car industry than previous studies.

  7. Two Gene Clusters Coordinate Galactose and Lactose Metabolism in Streptococcus gordonii

    Science.gov (United States)

    Zeng, Lin; Martino, Nicole C.

    2012-01-01

    Streptococcus gordonii is an early colonizer of the human oral cavity and an abundant constituent of oral biofilms. Two tandemly arranged gene clusters, designated lac and gal, were identified in the S. gordonii DL1 genome, which encode genes of the tagatose pathway (lacABCD) and sugar phosphotransferase system (PTS) enzyme II permeases. Genes encoding a predicted phospho-β-galactosidase (LacG), a DeoR family transcriptional regulator (LacR), and a transcriptional antiterminator (LacT) were also present in the clusters. Growth and PTS assays supported that the permease designated EIILac transports lactose and galactose, whereas EIIGal transports galactose. The expression of the gene for EIIGal was markedly upregulated in cells growing on galactose. Using promoter-cat fusions, a role for LacR in the regulation of the expressions of both gene clusters was demonstrated, and the gal cluster was also shown to be sensitive to repression by CcpA. The deletion of lacT caused an inability to grow on lactose, apparently because of its role in the regulation of the expression of the genes for EIILac, but had little effect on galactose utilization. S. gordonii maintained a selective advantage over Streptococcus mutans in a mixed-species competition assay, associated with its possession of a high-affinity galactose PTS, although S. mutans could persist better at low pHs. Collectively, these results support the concept that the galactose and lactose systems of S. gordonii are subject to complex regulation and that a high-affinity galactose PTS may be advantageous when S. gordonii is competing against the caries pathogen S. mutans in oral biofilms. PMID:22660715

  8. Characterization of the Second LysR-Type Regulator in the Biphenyl-Catabolic Gene Cluster of Pseudomonas pseudoalcaligenes KF707

    OpenAIRE

    Watanabe, Takahito; Fujihara, Hidehiko; Furukawa, Kensuke

    2003-01-01

    Pseudomonas pseudoalcaligenes KF707 possesses a biphenyl-catabolic (bph) gene cluster consisting of bphR1A1A2-(orf3)-bphA3A4BCX0X1X2X3D. The bphR1 (formerly orf0) gene product, which belongs to the GntR family, is a positive regulator for itself and bphX0X1X2X3D. Further analysis in this study revealed that a second regulator belonging to the LysR family (designated bphR2) is involved in the regulation of the bph genes in KF707. The bphR2 gene was not located near the bph gene cluster, and it...

  9. CLUSTER ANALYSIS UKRAINIAN REGIONAL DISTRIBUTION BY LEVEL OF INNOVATION

    Directory of Open Access Journals (Sweden)

    Roman Shchur

    2016-07-01

    Full Text Available   SWOT-analysis of the threats and benefits of innovation development strategy of Ivano-Frankivsk region in the context of financial support was сonducted. Methodical approach to determine of public-private partnerships potential that is tool of innovative economic development financing was identified. Cluster analysis of possibilities of forming public-private partnership in a particular region was carried out. Optimal set of problem areas that require urgent solutions and financial security is defined on the basis of cluster approach. It will help to form practical recommendations for the formation of an effective financial mechanism in the regions of Ukraine. Key words: the mechanism of innovation development financial provision, innovation development, public-private partnerships, cluster analysis, innovative development strategy.

  10. An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data.

    Science.gov (United States)

    Nidheesh, N; Abdul Nazeer, K A; Ameer, P M

    2017-12-01

    Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Microarray gene expression profiling and analysis in renal cell carcinoma

    Directory of Open Access Journals (Sweden)

    Sadhukhan Provash

    2004-06-01

    Full Text Available Abstract Background Renal cell carcinoma (RCC is the most common cancer in adult kidney. The accuracy of current diagnosis and prognosis of the disease and the effectiveness of the treatment for the disease are limited by the poor understanding of the disease at the molecular level. To better understand the genetics and biology of RCC, we profiled the expression of 7,129 genes in both clear cell RCC tissue and cell lines using oligonucleotide arrays. Methods Total RNAs isolated from renal cell tumors, adjacent normal tissue and metastatic RCC cell lines were hybridized to affymatrix HuFL oligonucleotide arrays. Genes were categorized into different functional groups based on the description of the Gene Ontology Consortium and analyzed based on the gene expression levels. Gene expression profiles of the tissue and cell line samples were visualized and classified by singular value decomposition. Reverse transcription polymerase chain reaction was performed to confirm the expression alterations of selected genes in RCC. Results Selected genes were annotated based on biological processes and clustered into functional groups. The expression levels of genes in each group were also analyzed. Seventy-four commonly differentially expressed genes with more than five-fold changes in RCC tissues were identified. The expression alterations of selected genes from these seventy-four genes were further verified using reverse transcription polymerase chain reaction (RT-PCR. Detailed comparison of gene expression patterns in RCC tissue and RCC cell lines shows significant differences between the two types of samples, but many important expression patterns were preserved. Conclusions This is one of the initial studies that examine the functional ontology of a large number of genes in RCC. Extensive annotation, clustering and analysis of a large number of genes based on the gene functional ontology revealed many interesting gene expression patterns in RCC. Most

  12. Cluster-based analysis of multi-model climate ensembles

    Science.gov (United States)

    Hyde, Richard; Hossaini, Ryan; Leeson, Amber A.

    2018-06-01

    Clustering - the automated grouping of similar data - can provide powerful and unique insight into large and complex data sets, in a fast and computationally efficient manner. While clustering has been used in a variety of fields (from medical image processing to economics), its application within atmospheric science has been fairly limited to date, and the potential benefits of the application of advanced clustering techniques to climate data (both model output and observations) has yet to be fully realised. In this paper, we explore the specific application of clustering to a multi-model climate ensemble. We hypothesise that clustering techniques can provide (a) a flexible, data-driven method of testing model-observation agreement and (b) a mechanism with which to identify model development priorities. We focus our analysis on chemistry-climate model (CCM) output of tropospheric ozone - an important greenhouse gas - from the recent Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP). Tropospheric column ozone from the ACCMIP ensemble was clustered using the Data Density based Clustering (DDC) algorithm. We find that a multi-model mean (MMM) calculated using members of the most-populous cluster identified at each location offers a reduction of up to ˜ 20 % in the global absolute mean bias between the MMM and an observed satellite-based tropospheric ozone climatology, with respect to a simple, all-model MMM. On a spatial basis, the bias is reduced at ˜ 62 % of all locations, with the largest bias reductions occurring in the Northern Hemisphere - where ozone concentrations are relatively large. However, the bias is unchanged at 9 % of all locations and increases at 29 %, particularly in the Southern Hemisphere. The latter demonstrates that although cluster-based subsampling acts to remove outlier model data, such data may in fact be closer to observed values in some locations. We further demonstrate that clustering can provide a viable and

  13. ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis

    Directory of Open Access Journals (Sweden)

    Saurav Mallik

    2017-12-01

    Full Text Available For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures—weighted rank-based Jaccard and Cosine measures—and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm—RANWAR—was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.

  14. ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis.

    Science.gov (United States)

    Mallik, Saurav; Zhao, Zhongming

    2017-12-28

    For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures-weighted rank-based Jaccard and Cosine measures-and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s) through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm-RANWAR-was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.

  15. Bacillus sp.CDB3 isolated from cattle dip-sites possesses two ars gene clusters

    Institute of Scientific and Technical Information of China (English)

    Somanath Bhat; Xi Luo; Zhiqiang Xu; Lixia Liu; Ren Zhang

    2011-01-01

    Contamination of soil and water by arsenic is a global problem.In Australia, the dipping of cattle in arsenic-containing solution to control cattle ticks in last centenary has left many sites heavily contaminated with arsenic and other toxicants.We had previously isolated five soil bacterial strains (CDB1-5) highly resistant to arsenic.To understand the resistance mechanism, molecular studies have been carried out.Two chromosome-encoded arsenic resistance (ars) gene clusters have been cloned from CDB3 (Bacillus sp.).They both function in Escherichia coli and cluster 1 exerts a much higher resistance to the toxic metalloid.Cluster 2 is smaller possessing four open reading frames (ORFs) arsRorf2BC, similar to that identified in Bacillus subtilis Skin element.Among the eight ORFs in cluster 1 five are analogs of common ars genes found in other bacteria, however, organized in a unique order arsRBCDA instead of arsRDABC.Three other putative genes are located directly downstream and designated as arsTIP based on the homologies of their theoretical translation sequences respectively to thioredoxin reductases, iron-sulphur cluster proteins and protein phosphatases.The latter two are novel of any known ars operons.The arsD gene from Bacillus species was cloned for the first time and the predict protein differs from the well studied E.coli ArsD by lacking two pairs of C-terrninal cysteine residues.Its functional involvement in arsenic resistance has been confirmed by a deletion experiment.There exists also an inverted repeat in the intergenic region between arsC and arsD implying some unknown transcription regulation.

  16. Genetic variations and haplotype diversity of the UGT1 gene cluster in the Chinese population.

    Directory of Open Access Journals (Sweden)

    Jing Yang

    Full Text Available Vertebrates require tremendous molecular diversity to defend against numerous small hydrophobic chemicals. UDP-glucuronosyltransferases (UGTs are a large family of detoxification enzymes that glucuronidate xenobiotics and endobiotics, facilitating their excretion from the body. The UGT1 gene cluster contains a tandem array of variable first exons, each preceded by a specific promoter, and a common set of downstream constant exons, similar to the genomic organization of the protocadherin (Pcdh, immunoglobulin, and T-cell receptor gene clusters. To assist pharmacogenomics studies in Chinese, we sequenced nine first exons, promoter and intronic regions, and five common exons of the UGT1 gene cluster in a population sample of 253 unrelated Chinese individuals. We identified 101 polymorphisms and found 15 novel SNPs. We then computed allele frequencies for each polymorphism and reconstructed their linkage disequilibrium (LD map. The UGT1 cluster can be divided into five linkage blocks: Block 9 (UGT1A9, Block 9/7/6 (UGT1A9, UGT1A7, and UGT1A6, Block 5 (UGT1A5, Block 4/3 (UGT1A4 and UGT1A3, and Block 3' UTR. Furthermore, we inferred haplotypes and selected their tagSNPs. Finally, comparing our data with those of three other populations of the HapMap project revealed ethnic specificity of the UGT1 genetic diversity in Chinese. These findings have important implications for future molecular genetic studies of the UGT1 gene cluster as well as for personalized medical therapies in Chinese.

  17. Ancestral and derived attributes of the dlx gene repertoire, cluster structure and expression patterns in an African cichlid fish

    Directory of Open Access Journals (Sweden)

    Renz Adina J

    2011-01-01

    Full Text Available Abstract Background Cichlid fishes have undergone rapid, expansive evolutionary radiations that are manifested in the diversification of their trophic morphologies, tooth patterning and coloration. Understanding the molecular mechanisms that underlie the cichlids' unique patterns of evolution requires a thorough examination of genes that pattern the neural crest, from which these diverse phenotypes are derived. Among those genes, the homeobox-containing Dlx gene family is of particular interest since it is involved in the patterning of the brain, jaws and teeth. Results In this study, we characterized the dlx genes of an African cichlid fish, Astatotilapia burtoni, to provide a baseline to later allow cross-species comparison within Cichlidae. We identified seven dlx paralogs (dlx1a, -2a, -4a, -3b, -4b, -5a and -6a, whose orthologies were validated with molecular phylogenetic trees. The intergenic regions of three dlx gene clusters (dlx1a-2a, dlx3b-4b, and dlx5a-6a were amplified with long PCR. Intensive cross-species comparison revealed a number of conserved non-coding elements (CNEs that are shared with other percomorph fishes. This analysis highlighted additional lineage-specific gains/losses of CNEs in different teleost fish lineages and a novel CNE that had previously not been identified. Our gene expression analyses revealed overlapping but distinct expression of dlx orthologs in the developing brain and pharyngeal arches. Notably, four of the seven A. burtoni dlx genes, dlx2a, dlx3b, dlx4a and dlx5a, were expressed in the developing pharyngeal teeth. Conclusion This comparative study of the dlx genes of A. burtoni has deepened our knowledge of the diversity of the Dlx gene family, in terms of gene repertoire, expression patterns and non-coding elements. We have identified possible cichlid lineage-specific changes, including losses of a subset of dlx expression domains in the pharyngeal teeth, which will be the targets of future functional

  18. Hierarchical clustering of gene expression patterns in the Eomes + lineage of excitatory neurons during early neocortical development

    Directory of Open Access Journals (Sweden)

    Cameron David A

    2012-08-01

    Full Text Available Abstract Background Cortical neurons display dynamic patterns of gene expression during the coincident processes of differentiation and migration through the developing cerebrum. To identify genes selectively expressed by the Eomes + (Tbr2 lineage of excitatory cortical neurons, GFP-expressing cells from Tg(Eomes::eGFP Gsat embryos were isolated to > 99% purity and profiled. Results We report the identification, validation and spatial grouping of genes selectively expressed within the Eomes + cortical excitatory neuron lineage during early cortical development. In these neurons 475 genes were expressed ≥ 3-fold, and 534 genes ≤ 3-fold, compared to the reference population of neuronal precursors. Of the up-regulated genes, 328 were represented at the Genepaint in situ hybridization database and 317 (97% were validated as having spatial expression patterns consistent with the lineage of differentiating excitatory neurons. A novel approach for quantifying in situ hybridization patterns (QISP across the cerebral wall was developed that allowed the hierarchical clustering of genes into putative co-regulated groups. Forty four candidate genes were identified that show spatial expression with Intermediate Precursor Cells, 49 candidate genes show spatial expression with Multipolar Neurons, while the remaining 224 genes achieved peak expression in the developing cortical plate. Conclusions This analysis of differentiating excitatory neurons revealed the expression patterns of 37 transcription factors, many chemotropic signaling molecules (including the Semaphorin, Netrin and Slit signaling pathways, and unexpected evidence for non-canonical neurotransmitter signaling and changes in mechanisms of glucose metabolism. Over half of the 317 identified genes are associated with neuronal disease making these findings a valuable resource for studies of neurological development and disease.

  19. Application of microarray analysis on computer cluster and cloud platforms.

    Science.gov (United States)

    Bernau, C; Boulesteix, A-L; Knaus, J

    2013-01-01

    Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.

  20. Cluster Analysis as an Analytical Tool of Population Policy

    Directory of Open Access Journals (Sweden)

    Oksana Mikhaylovna Shubat

    2017-12-01

    Full Text Available The predicted negative trends in Russian demography (falling birth rates, population decline actualize the need to strengthen measures of family and population policy. Our research purpose is to identify groups of Russian regions with similar characteristics in the family sphere using cluster analysis. The findings should make an important contribution to the field of family policy. We used hierarchical cluster analysis based on the Ward method and the Euclidean distance for segmentation of Russian regions. Clustering is based on four variables, which allowed assessing the family institution in the region. The authors used the data of Federal State Statistics Service from 2010 to 2015. Clustering and profiling of each segment has allowed forming a model of Russian regions depending on the features of the family institution in these regions. The authors revealed four clusters grouping regions with similar problems in the family sphere. This segmentation makes it possible to develop the most relevant family policy measures in each group of regions. Thus, the analysis has shown a high degree of differentiation of the family institution in the regions. This suggests that a unified approach to population problems’ solving is far from being effective. To achieve greater results in the implementation of family policy, a differentiated approach is needed. Methods of multidimensional data classification can be successfully applied as a relevant analytical toolkit. Further research could develop the adaptation of multidimensional classification methods to the analysis of the population problems in Russian regions. In particular, the algorithms of nonparametric cluster analysis may be of relevance in future studies.

  1. Genetic homogeneity of Clostridium botulinum type A1 strains with unique toxin gene clusters.

    Science.gov (United States)

    Raphael, Brian H; Luquez, Carolina; McCroskey, Loretta M; Joseph, Lavin A; Jacobson, Mark J; Johnson, Eric A; Maslanka, Susan E; Andreadis, Joanne D

    2008-07-01

    A group of five clonally related Clostridium botulinum type A strains isolated from different sources over a period of nearly 40 years harbored several conserved genetic properties. These strains contained a variant bont/A1 with five nucleotide polymorphisms compared to the gene in C. botulinum strain ATCC 3502. The strains also had a common toxin gene cluster composition (ha-/orfX+) similar to that associated with bont/A in type A strains containing an unexpressed bont/B [termed A(B) strains]. However, bont/B was not identified in the strains examined. Comparative genomic hybridization demonstrated identical genomic content among the strains relative to C. botulinum strain ATCC 3502. In addition, microarray data demonstrated the absence of several genes flanking the toxin gene cluster among the ha-/orfX+ A1 strains, suggesting the presence of genomic rearrangements with respect to this region compared to the C. botulinum ATCC 3502 strain. All five strains were shown to have identical flaA variable region nucleotide sequences. The pulsed-field gel electrophoresis patterns of the strains were indistinguishable when digested with SmaI, and a shift in the size of at least one band was observed in a single strain when digested with XhoI. These results demonstrate surprising genomic homogeneity among a cluster of unique C. botulinum type A strains of diverse origin.

  2. Spatial expression of Hox cluster genes in the ontogeny of a sea urchin

    Science.gov (United States)

    Arenas-Mena, C.; Cameron, A. R.; Davidson, E. H.

    2000-01-01

    The Hox cluster of the sea urchin Strongylocentrous purpuratus contains ten genes in a 500 kb span of the genome. Only two of these genes are expressed during embryogenesis, while all of eight genes tested are expressed during development of the adult body plan in the larval stage. We report the spatial expression during larval development of the five 'posterior' genes of the cluster: SpHox7, SpHox8, SpHox9/10, SpHox11/13a and SpHox11/13b. The five genes exhibit a dynamic, largely mesodermal program of expression. Only SpHox7 displays extensive expression within the pentameral rudiment itself. A spatially sequential and colinear arrangement of expression domains is found in the somatocoels, the paired posterior mesodermal structures that will become the adult perivisceral coeloms. No such sequential expression pattern is observed in endodermal, epidermal or neural tissues of either the larva or the presumptive juvenile sea urchin. The spatial expression patterns of the Hox genes illuminate the evolutionary process by which the pentameral echinoderm body plan emerged from a bilateral ancestor.

  3. Genomic organization, tissue distribution and functional characterization of the rat Pate gene cluster.

    Directory of Open Access Journals (Sweden)

    Angireddy Rajesh

    Full Text Available The cysteine rich prostate and testis expressed (Pate proteins identified till date are thought to resemble the three fingered protein/urokinase-type plasminogen activator receptor proteins. In this study, for the first time, we report the identification, cloning and characterization of rat Pate gene cluster and also determine the expression pattern. The rat Pate genes are clustered on chromosome 8 and their predicted proteins retained the ten cysteine signature characteristic to TFP/Ly-6 protein family. PATE and PATE-F three dimensional protein structure was found to be similar to that of the toxin bucandin. Though Pate gene expression is thought to be prostate and testis specific, we observed that rat Pate genes are also expressed in seminal vesicle and epididymis and in tissues beyond the male reproductive tract. In the developing rats (20-60 day old, expression of Pate genes seem to be androgen dependent in the epididymis and testis. In the adult rat, androgen ablation resulted in down regulation of the majority of Pate genes in the epididymides. PATE and PATE-F proteins were found to be expressed abundantly in the male reproductive tract of rats and on the sperm. Recombinant PATE protein exhibited potent antibacterial activity, whereas PATE-F did not exhibit any antibacterial activity. Pate expression was induced in the epididymides when challenged with LPS. Based on our results, we conclude that rat PATE proteins may contribute to the reproductive and defense functions.

  4. Linkage of the Nit1C gene cluster to bacterial cyanide assimilation as a nitrogen source.

    Science.gov (United States)

    Jones, Lauren B; Ghosh, Pallab; Lee, Jung-Hyun; Chou, Chia-Ni; Kunz, Daniel A

    2018-05-21

    A genetic linkage between a conserved gene cluster (Nit1C) and the ability of bacteria to utilize cyanide as the sole nitrogen source was demonstrated for nine different bacterial species. These included three strains whose cyanide nutritional ability has formerly been documented (Pseudomonas fluorescens Pf11764, Pseudomonas putida BCN3 and Klebsiella pneumoniae BCN33), and six not previously known to have this ability [Burkholderia (Paraburkholderia) xenovorans LB400, Paraburkholderia phymatum STM815, Paraburkholderia phytofirmans PsJN, Cupriavidus (Ralstonia) eutropha H16, Gluconoacetobacter diazotrophicus PA1 5 and Methylobacterium extorquens AM1]. For all bacteria, growth on or exposure to cyanide led to the induction of the canonical nitrilase (NitC) linked to the gene cluster, and in the case of Pf11764 in particular, transcript levels of cluster genes (nitBCDEFGH) were raised, and a nitC knock-out mutant failed to grow. Further studies demonstrated that the highly conserved nitB gene product was also significantly elevated. Collectively, these findings provide strong evidence for a genetic linkage between Nit1C and bacterial growth on cyanide, supporting use of the term cyanotrophy in describing what may represent a new nutritional paradigm in microbiology. A broader search of Nit1C genes in presently available genomes revealed its presence in 270 different bacteria, all contained within the domain Bacteria, including Gram-positive Firmicutes and Actinobacteria, and Gram-negative Proteobacteria and Cyanobacteria. Absence of the cluster in the Archaea is congruent with events that may have led to the inception of Nit1C occurring coincidentally with the first appearance of cyanogenic species on Earth, dating back 400-500 million years.

  5. Analysis of gene evolution and metabolic pathways using the Candida Gene Order Browser

    LENUS (Irish Health Repository)

    Fitzpatrick, David A

    2010-05-10

    Abstract Background Candida species are the most common cause of opportunistic fungal infection worldwide. Recent sequencing efforts have provided a wealth of Candida genomic data. We have developed the Candida Gene Order Browser (CGOB), an online tool that aids comparative syntenic analyses of Candida species. CGOB incorporates all available Candida clade genome sequences including two Candida albicans isolates (SC5314 and WO-1) and 8 closely related species (Candida dubliniensis, Candida tropicalis, Candida parapsilosis, Lodderomyces elongisporus, Debaryomyces hansenii, Pichia stipitis, Candida guilliermondii and Candida lusitaniae). Saccharomyces cerevisiae is also included as a reference genome. Results CGOB assignments of homology were manually curated based on sequence similarity and synteny. In total CGOB includes 65617 genes arranged into 13625 homology columns. We have also generated improved Candida gene sets by merging\\/removing partial genes in each genome. Interrogation of CGOB revealed that the majority of tandemly duplicated genes are under strong purifying selection in all Candida species. We identified clusters of adjacent genes involved in the same metabolic pathways (such as catabolism of biotin, galactose and N-acetyl glucosamine) and we showed that some clusters are species or lineage-specific. We also identified one example of intron gain in C. albicans. Conclusions Our analysis provides an important resource that is now available for the Candida community. CGOB is available at http:\\/\\/cgob.ucd.ie.

  6. Automated analysis of organic particles using cluster SIMS

    Energy Technology Data Exchange (ETDEWEB)

    Gillen, Greg; Zeissler, Cindy; Mahoney, Christine; Lindstrom, Abigail; Fletcher, Robert; Chi, Peter; Verkouteren, Jennifer; Bright, David; Lareau, Richard T.; Boldman, Mike

    2004-06-15

    Cluster primary ion bombardment combined with secondary ion imaging is used on an ion microscope secondary ion mass spectrometer for the spatially resolved analysis of organic particles on various surfaces. Compared to the use of monoatomic primary ion beam bombardment, the use of a cluster primary ion beam (SF{sub 5}{sup +} or C{sub 8}{sup -}) provides significant improvement in molecular ion yields and a reduction in beam-induced degradation of the analyte molecules. These characteristics of cluster bombardment, along with automated sample stage control and custom image analysis software are utilized to rapidly characterize the spatial distribution of trace explosive particles, narcotics and inkjet-printed microarrays on a variety of surfaces.

  7. A highly divergent gene cluster in honey bees encodes a novel silk family.

    Science.gov (United States)

    Sutherland, Tara D; Campbell, Peter M; Weisman, Sarah; Trueman, Holly E; Sriskantha, Alagacone; Wanjura, Wolfgang J; Haritos, Victoria S

    2006-11-01

    The pupal cocoon of the domesticated silk moth Bombyx mori is the best known and most extensively studied insect silk. It is not widely known that Apis mellifera larvae also produce silk. We have used a combination of genomic and proteomic techniques to identify four honey bee fiber genes (AmelFibroin1-4) and two silk-associated genes (AmelSA1 and 2). The four fiber genes are small, comprise a single exon each, and are clustered on a short genomic region where the open reading frames are GC-rich amid low GC intergenic regions. The genes encode similar proteins that are highly helical and predicted to form unusually tight coiled coils. Despite the similarity in size, structure, and composition of the encoded proteins, the genes have low primary sequence identity. We propose that the four fiber genes have arisen from gene duplication events but have subsequently diverged significantly. The silk-associated genes encode proteins likely to act as a glue (AmelSA1) and involved in silk processing (AmelSA2). Although the silks of honey bees and silkmoths both originate in larval labial glands, the silk proteins are completely different in their primary, secondary, and tertiary structures as well as the genomic arrangement of the genes encoding them. This implies independent evolutionary origins for these functionally related proteins.

  8. Assessment of surface water quality using hierarchical cluster analysis

    Directory of Open Access Journals (Sweden)

    Dheeraj Kumar Dabgerwal

    2016-02-01

    Full Text Available This study was carried out to assess the physicochemical quality river Varuna inVaranasi,India. Water samples were collected from 10 sites during January-June 2015. Pearson correlation analysis was used to assess the direction and strength of relationship between physicochemical parameters. Hierarchical Cluster analysis was also performed to determine the sources of pollution in the river Varuna. The result showed quite high value of DO, Nitrate, BOD, COD and Total Alkalinity, above the BIS permissible limit. The results of correlation analysis identified key water parameters as pH, electrical conductivity, total alkalinity and nitrate, which influence the concentration of other water parameters. Cluster analysis identified three major clusters of sampling sites out of total 10 sites, according to the similarity in water quality. This study illustrated the usefulness of correlation and cluster analysis for getting better information about the river water quality.International Journal of Environment Vol. 5 (1 2016,  pp: 32-44

  9. application of single-linkage clustering method in the analysis of ...

    African Journals Online (AJOL)

    Admin

    ANALYSIS OF GROWTH RATE OF GROSS DOMESTIC PRODUCT. (GDP) AT ... The end result of the algorithm is a tree of clusters called a dendrogram, which shows how the clusters are ..... Number of cluster sum from from observations of ...

  10. Up-regulation of HOXB cluster genes are epigenetically regulated in tamoxifen-resistant MCF7 breast cancer cells.

    Science.gov (United States)

    Yang, Seoyeon; Lee, Ji-Yeon; Hur, Ho; Oh, Ji Hoon; Kim, Myoung Hee

    2018-05-28

    Tamoxifen (TAM) is commonly used to treat estrogen receptor (ER)-positive breast cancer. Despite the remarkable benefits, resistance to TAM presents a serious therapeutic challenge. Since several HOX transcription factors have been proposed as strong candidates in the development of resistance to TAM therapy in breast cancer, we generated an in vitro model of acquired TAM resistance using ER-positive MCF7 breast cancer cells (MCF7-TAMR), and analyzed the expression pattern and epigenetic states of HOX genes. HOXB cluster genes were uniquely up-regulated in MCF7-TAMR cells. Survival analysis of in slico data showed the correlation of high expression of HOXB genes with poor response to TAM in ER-positive breast cancer patients treated with TAM. Gain- and loss-of-function experiments showed that the overexpression of multi HOXB genes in MCF7 renders cancer cells more resistant to TAM, whereas the knockdown restores TAM sensitivity. Furthermore, activation of HOXB genes in MCF7-TAMR was associated with histone modifications, particularly the gain of H3K9ac. These findings imply that the activation of HOXB genes mediate the development of TAM resistance, and represent a target for development of new strategies to prevent or reverse TAM resistance.

  11. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    Science.gov (United States)

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  12. Diverse and Abundant Secondary Metabolism Biosynthetic Gene Clusters in the Genomes of Marine Sponge Derived Streptomyces spp. Isolates

    Directory of Open Access Journals (Sweden)

    Stephen A. Jackson

    2018-02-01

    Full Text Available The genus Streptomyces produces secondary metabolic compounds that are rich in biological activity. Many of these compounds are genetically encoded by large secondary metabolism biosynthetic gene clusters (smBGCs such as polyketide synthases (PKS and non-ribosomal peptide synthetases (NRPS which are modular and can be highly repetitive. Due to the repeats, these gene clusters can be difficult to resolve using short read next generation datasets and are often quite poorly predicted using standard approaches. We have sequenced the genomes of 13 Streptomyces spp. strains isolated from shallow water and deep-sea sponges that display antimicrobial activities against a number of clinically relevant bacterial and yeast species. Draft genomes have been assembled and smBGCs have been identified using the antiSMASH (antibiotics and Secondary Metabolite Analysis Shell web platform. We have compared the smBGCs amongst strains in the search for novel sequences conferring the potential to produce novel bioactive secondary metabolites. The strains in this study recruit to four distinct clades within the genus Streptomyces. The marine strains host abundant smBGCs which encode polyketides, NRPS, siderophores, bacteriocins and lantipeptides. The deep-sea strains appear to be enriched with gene clusters encoding NRPS. Marine adaptations are evident in the sponge-derived strains which are enriched for genes involved in the biosynthesis and transport of compatible solutes and for heat-shock proteins. Streptomyces spp. from marine environments are a promising source of novel bioactive secondary metabolites as the abundance and diversity of smBGCs show high degrees of novelty. Sponge derived Streptomyces spp. isolates appear to display genomic adaptations to marine living when compared to terrestrial strains.

  13. Gene Cluster Responsible for Secretion of and Immunity to Multiple Bacteriocins, the NKR-5-3 Enterocins

    Science.gov (United States)

    Ishibashi, Naoki; Himeno, Kohei; Masuda, Yoshimitsu; Perez, Rodney Honrada; Iwatani, Shun; Wilaipun, Pongtep; Leelawatcharamas, Vichien; Nakayama, Jiro; Sonomoto, Kenji

    2014-01-01

    Enterococcus faecium NKR-5-3, isolated from Thai fermented fish, is characterized by the unique ability to produce five bacteriocins, namely, enterocins NKR-5-3A, -B, -C, -D, and -Z (Ent53A, Ent53B, Ent53C, Ent53D, and Ent53Z). Genetic analysis with a genome library revealed that the bacteriocin structural genes (enkA [ent53A], enkC [ent53C], enkD [ent53D], and enkZ [ent53Z]) that encode these peptides (except for Ent53B) are located in close proximity to each other. This NKR-5-3ACDZ (Ent53ACDZ) enterocin gene cluster (approximately 13 kb long) includes certain bacteriocin biosynthetic genes such as an ABC transporter gene (enkT), two immunity genes (enkIaz and enkIc), a response regulator (enkR), and a histidine protein kinase (enkK). Heterologous-expression studies of enkT and ΔenkT mutant strains showed that enkT is responsible for the secretion of Ent53A, Ent53C, Ent53D, and Ent53Z, suggesting that EnkT is a wide-range ABC transporter that contributes to the effective production of these bacteriocins. In addition, EnkIaz and EnkIc were found to confer self-immunity to the respective bacteriocins. Furthermore, bacteriocin induction assays performed with the ΔenkRK mutant strain showed that EnkR and EnkK are regulatory proteins responsible for bacteriocin production and that, together with Ent53D, they constitute a three-component regulatory system. Thus, the Ent53ACDZ gene cluster is essential for the biosynthesis and regulation of NKR-5-3 enterocins, and this is, to our knowledge, the first report that demonstrates the secretion of multiple bacteriocins by an ABC transporter. PMID:25149515

  14. Graph analysis of cell clusters forming vascular networks

    Science.gov (United States)

    Alves, A. P.; Mesquita, O. N.; Gómez-Gardeñes, J.; Agero, U.

    2018-03-01

    This manuscript describes the experimental observation of vasculogenesis in chick embryos by means of network analysis. The formation of the vascular network was observed in the area opaca of embryos from 40 to 55 h of development. In the area opaca endothelial cell clusters self-organize as a primitive and approximately regular network of capillaries. The process was observed by bright-field microscopy in control embryos and in embryos treated with Bevacizumab (Avastin), an antibody that inhibits the signalling of the vascular endothelial growth factor (VEGF). The sequence of images of the vascular growth were thresholded, and used to quantify the forming network in control and Avastin-treated embryos. This characterization is made by measuring vessels density, number of cell clusters and the largest cluster density. From the original images, the topology of the vascular network was extracted and characterized by means of the usual network metrics such as: the degree distribution, average clustering coefficient, average short path length and assortativity, among others. This analysis allows to monitor how the largest connected cluster of the vascular network evolves in time and provides with quantitative evidence of the disruptive effects that Avastin has on the tree structure of vascular networks.

  15. In silico exploration of Red Sea Bacillus genomes for natural product biosynthetic gene clusters

    KAUST Repository

    Othoum, Ghofran K

    2018-05-22

    BackgroundThe increasing spectrum of multidrug-resistant bacteria is a major global public health concern, necessitating discovery of novel antimicrobial agents. Here, members of the genus Bacillus are investigated as a potentially attractive source of novel antibiotics due to their broad spectrum of antimicrobial activities. We specifically focus on a computational analysis of the distinctive biosynthetic potential of Bacillus paralicheniformis strains isolated from the Red Sea, an ecosystem exposed to adverse, highly saline and hot conditions.ResultsWe report the complete circular and annotated genomes of two Red Sea strains, B. paralicheniformis Bac48 isolated from mangrove mud and B. paralicheniformis Bac84 isolated from microbial mat collected from Rabigh Harbor Lagoon in Saudi Arabia. Comparing the genomes of B. paralicheniformis Bac48 and B. paralicheniformis Bac84 with nine publicly available complete genomes of B. licheniformis and three genomes of B. paralicheniformis, revealed that all of the B. paralicheniformis strains in this study are more enriched in nonribosomal peptides (NRPs). We further report the first computationally identified trans-acyltransferase (trans-AT) nonribosomal peptide synthetase/polyketide synthase (PKS/ NRPS) cluster in strains of this species.ConclusionsB. paralicheniformis species have more genes associated with biosynthesis of antimicrobial bioactive compounds than other previously characterized species of B. licheniformis, which suggests that these species are better potential sources for novel antibiotics. Moreover, the genome of the Red Sea strain B. paralicheniformis Bac48 is more enriched in modular PKS genes compared to B. licheniformis strains and other B. paralicheniformis strains. This may be linked to adaptations that strains surviving in the Red Sea underwent to survive in the relatively hot and saline ecosystems.

  16. clusters

    Indian Academy of Sciences (India)

    2017-09-27

    Sep 27, 2017 ... Author for correspondence (zh4403701@126.com). MS received 15 ... lic clusters using density functional theory (DFT)-GGA of the DMOL3 package. ... In the process of geometric optimization, con- vergence thresholds ..... and Postgraduate Research & Practice Innovation Program of. Jiangsu Province ...

  17. clusters

    Indian Academy of Sciences (India)

    environmental as well as technical problems during fuel gas utilization. ... adsorption on some alloys of Pd, namely PdAu, PdAg ... ried out on small neutral and charged Au24,26,27, Cu,28 ... study of Zanti et al.29 on Pdn (n = 1–9) clusters.

  18. Output ordering and prioritisation system (OOPS): ranking biosynthetic gene clusters to enhance bioactive metabolite discovery.

    Science.gov (United States)

    Peña, Alejandro; Del Carratore, Francesco; Cummings, Matthew; Takano, Eriko; Breitling, Rainer

    2017-12-18

    The rapid increase of publicly available microbial genome sequences has highlighted the presence of hundreds of thousands of biosynthetic gene clusters (BGCs) encoding valuable secondary metabolites. The experimental characterization of new BGCs is extremely laborious and struggles to keep pace with the in silico identification of potential BGCs. Therefore, the prioritisation of promising candidates among computationally predicted BGCs represents a pressing need. Here, we propose an output ordering and prioritisation system (OOPS) which helps sorting identified BGCs by a wide variety of custom-weighted biological and biochemical criteria in a flexible and user-friendly interface. OOPS facilitates a judicious prioritisation of BGCs using G+C content, coding sequence length, gene number, cluster self-similarity and codon bias parameters, as well as enabling the user to rank BGCs based upon BGC type, novelty, and taxonomic distribution. Effective prioritisation of BGCs will help to reduce experimental attrition rates and improve the breadth of bioactive metabolites characterized.

  19. Cluster Analysis of International Information and Social Development.

    Science.gov (United States)

    Lau, Jesus

    1990-01-01

    Analyzes information activities in relation to socioeconomic characteristics in low, middle, and highly developed economies for the years 1960 and 1977 through the use of cluster analysis. Results of data from 31 countries suggest that information development is achieved mainly by countries that have also achieved social development. (26…

  20. Making Sense of Cluster Analysis: Revelations from Pakistani Science Classes

    Science.gov (United States)

    Pell, Tony; Hargreaves, Linda

    2011-01-01

    Cluster analysis has been applied to quantitative data in educational research over several decades and has been a feature of the Maurice Galton's research in primary and secondary classrooms. It has offered potentially useful insights for teaching yet its implications for practice are rarely implemented. It has been subject also to negative…

  1. Cluster analysis for validated climatology stations using precipitation in Mexico

    NARCIS (Netherlands)

    Bravo Cabrera, J. L.; Azpra-Romero, E.; Zarraluqui-Such, V.; Gay-García, C.; Estrada Porrúa, F.

    2012-01-01

    Annual average of daily precipitation was used to group climatological stations into clusters using the k-means procedure and principal component analysis with varimax rotation. After a careful selection of the stations deployed in Mexico since 1950, we selected 349 characterized by having 35 to 40

  2. A Cluster Analysis of Personality Style in Adults with ADHD

    Science.gov (United States)

    Robin, Arthur L.; Tzelepis, Angela; Bedway, Marquita

    2008-01-01

    Objective: The purpose of this study was to use hierarchical linear cluster analysis to examine the normative personality styles of adults with ADHD. Method: A total of 311 adults with ADHD completed the Millon Index of Personality Styles, which consists of 24 scales assessing motivating aims, cognitive modes, and interpersonal behaviors. Results:…

  3. Characterization of population exposure to organochlorines: A cluster analysis application

    NARCIS (Netherlands)

    R.M. Guimarães (Raphael Mendonça); S. Asmus (Sven); A. Burdorf (Alex)

    2013-01-01

    textabstractThis study aimed to show the results from a cluster analysis application in the characterization of population exposure to organochlorines through variables related to time and exposure dose. Characteristics of 354 subjects in a population exposed to organochlorine pesticides residues

  4. Robustness in cluster analysis in the presence of anomalous observations

    NARCIS (Netherlands)

    Zhuk, EE

    Cluster analysis of multivariate observations in the presence of "outliers" (anomalous observations) in a sample is studied. The expected (mean) fraction of erroneous decisions for the decision rule is computed analytically by minimizing the intraclass scatter. A robust decision rule (stable to

  5. Language Learner Motivational Types: A Cluster Analysis Study

    Science.gov (United States)

    Papi, Mostafa; Teimouri, Yasser

    2014-01-01

    The study aimed to identify different second language (L2) learner motivational types drawing on the framework of the L2 motivational self system. A total of 1,278 secondary school students learning English in Iran completed a questionnaire survey. Cluster analysis yielded five different groups based on the strength of different variables within…

  6. Cluster analysis as a prediction tool for pregnancy outcomes.

    Science.gov (United States)

    Banjari, Ines; Kenjerić, Daniela; Šolić, Krešimir; Mandić, Milena L

    2015-03-01

    Considering specific physiology changes during gestation and thinking of pregnancy as a "critical window", classification of pregnant women at early pregnancy can be considered as crucial. The paper demonstrates the use of a method based on an approach from intelligent data mining, cluster analysis. Cluster analysis method is a statistical method which makes possible to group individuals based on sets of identifying variables. The method was chosen in order to determine possibility for classification of pregnant women at early pregnancy to analyze unknown correlations between different variables so that the certain outcomes could be predicted. 222 pregnant women from two general obstetric offices' were recruited. The main orient was set on characteristics of these pregnant women: their age, pre-pregnancy body mass index (BMI) and haemoglobin value. Cluster analysis gained a 94.1% classification accuracy rate with three branch- es or groups of pregnant women showing statistically significant correlations with pregnancy outcomes. The results are showing that pregnant women both of older age and higher pre-pregnancy BMI have a significantly higher incidence of delivering baby of higher birth weight but they gain significantly less weight during pregnancy. Their babies are also longer, and these women have significantly higher probability for complications during pregnancy (gestosis) and higher probability of induced or caesarean delivery. We can conclude that the cluster analysis method can appropriately classify pregnant women at early pregnancy to predict certain outcomes.

  7. Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters.

    Science.gov (United States)

    Hensman, James; Lawrence, Neil D; Rattray, Magnus

    2013-08-20

    Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications. We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications. The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.

  8. Using SNP genetic markers to elucidate the linkage of the Co-34/Phg-3 anthracnose and angular leaf spot resistance gene cluster with the Ur-14 resistance gene

    Science.gov (United States)

    The Ouro Negro common bean cultivar contains the Co-34/Phg-3 gene cluster that confers resistance to the anthracnose (ANT) and angular leaf spot (ALS) pathogens. These genes are tightly linked on chromosome 4. Ouro Negro also has the Ur-14 rust resistance gene, reportedly in the vicinity of Co- 34; ...

  9. Comparison of expression of secondary metabolite biosynthesis cluster genes in Aspergillus flavus, A. parasiticus, and A. oryzae.

    Science.gov (United States)

    Ehrlich, Kenneth C; Mack, Brian M

    2014-06-23

    Fifty six secondary metabolite biosynthesis gene clusters are predicted to be in the Aspergillus flavus genome. In spite of this, the biosyntheses of only seven metabolites, including the aflatoxins, kojic acid, cyclopiazonic acid and aflatrem, have been assigned to a particular gene cluster. We used RNA-seq to compare expression of secondary metabolite genes in gene clusters for the closely related fungi A. parasiticus, A. oryzae, and A. flavus S and L sclerotial morphotypes. The data help to refine the identification of probable functional gene clusters within these species. Our results suggest that A. flavus, a prevalent contaminant of maize, cottonseed, peanuts and tree nuts, is capable of producing metabolites which, besides aflatoxin, could be an underappreciated contributor to its toxicity.

  10. Gene Clusters for Insecticidal Loline Alkaloids in the Grass-Endophytic Fungus Neotyphodium uncinatum

    OpenAIRE

    Spiering, Martin J.; Moon, Christina D.; Wilkinson, Heather H.; Schardl, Christopher L.

    2005-01-01

    Loline alkaloids are produced by mutualistic fungi symbiotic with grasses, and they protect the host plants from insects. Here we identify in the fungal symbiont, Neotyphodium uncinatum, two homologous gene clusters (LOL-1 and LOL-2) associated with loline-alkaloid production. Nine genes were identified in a 25-kb region of LOL-1 and designated (in order) lolF-1, lolC-1, lolD-1, lolO-1, lolA-1, lolU-1, lolP-1, lolT-1, and lolE-1. LOL-2 contained the homologs lolC-2 through lolE-2 in the same ...

  11. Functional dissection of HOXD cluster genes in regulation of neuroblastoma cell proliferation and differentiation.

    Directory of Open Access Journals (Sweden)

    Yunhong Zha

    Full Text Available Retinoic acid (RA can induce growth arrest and neuronal differentiation of neuroblastoma cells and has been used in clinic for treatment of neuroblastoma. It has been reported that RA induces the expression of several HOXD genes in human neuroblastoma cell lines, but their roles in RA action are largely unknown. The HOXD cluster contains nine genes (HOXD1, HOXD3, HOXD4, and HOXD8-13 that are positioned sequentially from 3' to 5', with HOXD1 at the 3' end and HOXD13 the 5' end. Here we show that all HOXD genes are induced by RA in the human neuroblastoma BE(2-C cells, with the genes located at the 3' end being activated generally earlier than those positioned more 5' within the cluster. Individual induction of HOXD8, HOXD9, HOXD10 or HOXD12 is sufficient to induce both growth arrest and neuronal differentiation, which is associated with downregulation of cell cycle-promoting genes and upregulation of neuronal differentiation genes. However, induction of other HOXD genes either has no effect (HOXD1 or has partial effects (HOXD3, HOXD4, HOXD11 and HOXD13 on BE(2-C cell proliferation or differentiation. We further show that knockdown of HOXD8 expression, but not that of HOXD9 expression, significantly inhibits the differentiation-inducing activity of RA. HOXD8 directly activates the transcription of HOXC9, a key effector of RA action in neuroblastoma cells. These findings highlight the distinct functions of HOXD genes in RA induction of neuroblastoma cell differentiation.

  12. Performance Analysis of Unsupervised Clustering Methods for Brain Tumor Segmentation

    Directory of Open Access Journals (Sweden)

    Tushar H Jaware

    2013-10-01

    Full Text Available Medical image processing is the most challenging and emerging field of neuroscience. The ultimate goal of medical image analysis in brain MRI is to extract important clinical features that would improve methods of diagnosis & treatment of disease. This paper focuses on methods to detect & extract brain tumour from brain MR images. MATLAB is used to design, software tool for locating brain tumor, based on unsupervised clustering methods. K-Means clustering algorithm is implemented & tested on data base of 30 images. Performance evolution of unsupervised clusteringmethods is presented.

  13. Identifying clinical course patterns in SMS data using cluster analysis

    DEFF Research Database (Denmark)

    Kent, Peter; Kongsted, Alice

    2012-01-01

    ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically important...... showed that clinical course patterns can be identified by cluster analysis using all SMS time points as cluster variables. This method is simple, intuitive and does not require a high level of statistical skill. However, there are alternative ways of managing SMS data and many different methods...

  14. Burkholderia thailandensis harbors two identical rhl gene clusters responsible for the biosynthesis of rhamnolipids

    Directory of Open Access Journals (Sweden)

    Woods Donald E

    2009-12-01

    Full Text Available Abstract Background Rhamnolipids are surface active molecules composed of rhamnose and β-hydroxydecanoic acid. These biosurfactants are produced mainly by Pseudomonas aeruginosa and have been thoroughly investigated since their early discovery. Recently, they have attracted renewed attention because of their involvement in various multicellular behaviors. Despite this high interest, only very few studies have focused on the production of rhamnolipids by Burkholderia species. Results Orthologs of rhlA, rhlB and rhlC, which are responsible for the biosynthesis of rhamnolipids in P. aeruginosa, have been found in the non-infectious Burkholderia thailandensis, as well as in the genetically similar important pathogen B. pseudomallei. In contrast to P. aeruginosa, both Burkholderia species contain these three genes necessary for rhamnolipid production within a single gene cluster. Furthermore, two identical, paralogous copies of this gene cluster are found on the second chromosome of these bacteria. Both Burkholderia spp. produce rhamnolipids containing 3-hydroxy fatty acid moieties with longer side chains than those described for P. aeruginosa. Additionally, the rhamnolipids produced by B. thailandensis contain a much larger proportion of dirhamnolipids versus monorhamnolipids when compared to P. aeruginosa. The rhamnolipids produced by B. thailandensis reduce the surface tension of water to 42 mN/m while displaying a critical micelle concentration value of 225 mg/L. Separate mutations in both rhlA alleles, which are responsible for the synthesis of the rhamnolipid precursor 3-(3-hydroxyalkanoyloxyalkanoic acid, prove that both copies of the rhl gene cluster are functional, but one contributes more to the total production than the other. Finally, a double ΔrhlA mutant that is completely devoid of rhamnolipid production is incapable of swarming motility, showing that both gene clusters contribute to this phenotype. Conclusions Collectively, these

  15. Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea

    Directory of Open Access Journals (Sweden)

    Wolf Yuri I

    2007-11-01

    Full Text Available Abstract Background An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs. Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes. Results New Archaeal Clusters of Orthologous Genes (arCOGs were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover ~88% of the genes in a genome compared to a ~76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; ~40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile

  16. Functional Genome Mining for Metabolites Encoded by Large Gene Clusters through Heterologous Expression of a Whole-Genome Bacterial Artificial Chromosome Library in Streptomyces spp.

    Science.gov (United States)

    Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin

    2016-01-01

    ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including

  17. Clustering gene expression time series data using an infinite Gaussian process mixture model.

    Science.gov (United States)

    McDowell, Ian C; Manandhar, Dinesh; Vockley, Christopher M; Schmid, Amy K; Reddy, Timothy E; Engelhardt, Barbara E

    2018-01-01

    Transcriptome-wide time series expression profiling is used to characterize the cellular response to environmental perturbations. The first step to analyzing transcriptional response data is often to cluster genes with similar responses. Here, we present a nonparametric model-based method, Dirichlet process Gaussian process mixture model (DPGP), which jointly models data clusters with a Dirichlet process and temporal dependencies with Gaussian processes. We demonstrate the accuracy of DPGP in comparison to state-of-the-art approaches using hundreds of simulated data sets. To further test our method, we apply DPGP to published microarray data from a microbial model organism exposed to stress and to novel RNA-seq data from a human cell line exposed to the glucocorticoid dexamethasone. We validate our clusters by examining local transcription factor binding and histone modifications. Our results demonstrate that jointly modeling cluster number and temporal dependencies can reveal shared regulatory mechanisms. DPGP software is freely available online at https://github.com/PrincetonUniversity/DP_GP_cluster.

  18. Clustering gene expression time series data using an infinite Gaussian process mixture model.

    Directory of Open Access Journals (Sweden)

    Ian C McDowell

    2018-01-01

    Full Text Available Transcriptome-wide time series expression profiling is used to characterize the cellular response to environmental perturbations. The first step to analyzing transcriptional response data is often to cluster genes with similar responses. Here, we present a nonparametric model-based method, Dirichlet process Gaussian process mixture model (DPGP, which jointly models data clusters with a Dirichlet process and temporal dependencies with Gaussian processes. We demonstrate the accuracy of DPGP in comparison to state-of-the-art approaches using hundreds of simulated data sets. To further test our method, we apply DPGP to published microarray data from a microbial model organism exposed to stress and to novel RNA-seq data from a human cell line exposed to the glucocorticoid dexamethasone. We validate our clusters by examining local transcription factor binding and histone modifications. Our results demonstrate that jointly modeling cluster number and temporal dependencies can reveal shared regulatory mechanisms. DPGP software is freely available online at https://github.com/PrincetonUniversity/DP_GP_cluster.

  19. High-dimensional cluster analysis with the Masked EM Algorithm

    Science.gov (United States)

    Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.

    2014-01-01

    Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694

  20. A cluster analysis investigation of workaholism as a syndrome.

    Science.gov (United States)

    Aziz, Shahnaz; Zickar, Michael J

    2006-01-01

    Workaholism has been conceptualized as a syndrome although there have been few tests that explicitly consider its syndrome status. The authors analyzed a three-dimensional scale of workaholism developed by Spence and Robbins (1992) using cluster analysis. The authors identified three clusters of individuals, one of which corresponded to Spence and Robbins's profile of the workaholic (high work involvement, high drive to work, low work enjoyment). Consistent with previously conjectured relations with workaholism, individuals in the workaholic cluster were more likely to label themselves as workaholics, more likely to have acquaintances label them as workaholics, and more likely to have lower life satisfaction and higher work-life imbalance. The importance of considering workaholism as a syndrome and the implications for effective interventions are discussed. Copyright 2006 APA.

  1. Cosmological analysis of galaxy clusters surveys in X-rays

    International Nuclear Information System (INIS)

    Clerc, N.

    2012-01-01

    Clusters of galaxies are the most massive objects in equilibrium in our Universe. Their study allows to test cosmological scenarios of structure formation with precision, bringing constraints complementary to those stemming from the cosmological background radiation, supernovae or galaxies. They are identified through the X-ray emission of their heated gas, thus facilitating their mapping at different epochs of the Universe. This report presents two surveys of galaxy clusters detected in X-rays and puts forward a method for their cosmological interpretation. Thanks to its multi-wavelength coverage extending over 10 sq. deg. and after one decade of expertise, the XMM-LSS allows a systematic census of clusters in a large volume of the Universe. In the framework of this survey, the first part of this report describes the techniques developed to the purpose of characterizing the detected objects. A particular emphasis is placed on the most distant ones (z ≥ 1) through the complementarity of observations in X-ray, optical and infrared bands. Then the X-CLASS survey is fully described. Based on XMM archival data, it provides a new catalogue of 800 clusters detected in X-rays. A cosmological analysis of this survey is performed thanks to 'CR-HR' diagrams. This new method self-consistently includes selection effects and scaling relations and provides a means to bypass the computation of individual cluster masses. Propositions are made for applying this method to future surveys as XMM-XXL and eRosita. (author) [fr

  2. Cluster analysis by optimal decomposition of induced fuzzy sets

    Energy Technology Data Exchange (ETDEWEB)

    Backer, E

    1978-01-01

    Nonsupervised pattern recognition is addressed and the concept of fuzzy sets is explored in order to provide the investigator (data analyst) additional information supplied by the pattern class membership values apart from the classical pattern class assignments. The basic ideas behind the pattern recognition problem, the clustering problem, and the concept of fuzzy sets in cluster analysis are discussed, and a brief review of the literature of the fuzzy cluster analysis is given. Some mathematical aspects of fuzzy set theory are briefly discussed; in particular, a measure of fuzziness is suggested. The optimization-clustering problem is characterized. Then the fundamental idea behind affinity decomposition is considered. Next, further analysis takes place with respect to the partitioning-characterization functions. The iterative optimization procedure is then addressed. The reclassification function is investigated and convergence properties are examined. Finally, several experiments in support of the method suggested are described. Four object data sets serve as appropriate test cases. 120 references, 70 figures, 11 tables. (RWR)

  3. In planta functions of cytochrome P450 monooxygenase genes in the phytocassane biosynthetic gene cluster on rice chromosome 2.

    Science.gov (United States)

    Ye, Zhongfeng; Yamazaki, Kohei; Minoda, Hiromi; Miyamoto, Koji; Miyazaki, Sho; Kawaide, Hiroshi; Yajima, Arata; Nojiri, Hideaki; Yamane, Hisakazu; Okada, Kazunori

    2018-06-01

    In response to environmental stressors such as blast fungal infections, rice produces phytoalexins, an antimicrobial diterpenoid compound. Together with momilactones, phytocassanes are among the major diterpenoid phytoalexins. The biosynthetic genes of diterpenoid phytoalexin are organized on the chromosome in functional gene clusters, comprising diterpene cyclase, dehydrogenase, and cytochrome P450 monooxygenase genes. Their functions have been studied extensively using in vitro enzyme assay systems. Specifically, P450 genes (CYP71Z6, Z7; CYP76M5, M6, M7, M8) on rice chromosome 2 have multifunctional activities associated with ent-copalyl diphosphate-related diterpene hydrocarbons, but the in planta contribution of these genes to diterpenoid phytoalexin production remains unknown. Here, we characterized cyp71z7 T-DNA mutant and CYP76M7/M8 RNAi lines to find that potential phytoalexin intermediates accumulated in these P450-suppressed rice plants. The results suggested that in planta, CYP71Z7 is responsible for C2-hydroxylation of phytocassanes and that CYP76M7/M8 is involved in C11α-hydroxylation of 3-hydroxy-cassadiene. Based on these results, we proposed potential routes of phytocassane biosynthesis in planta.

  4. Gene Circuit Analysis of the Terminal Gap Gene huckebein

    Science.gov (United States)

    Ashyraliyev, Maksat; Siggens, Ken; Janssens, Hilde; Blom, Joke; Akam, Michael; Jaeger, Johannes

    2009-01-01

    The early embryo of Drosophila melanogaster provides a powerful model system to study the role of genes in pattern formation. The gap gene network constitutes the first zygotic regulatory tier in the hierarchy of the segmentation genes involved in specifying the position of body segments. Here, we use an integrative, systems-level approach to investigate the regulatory effect of the terminal gap gene huckebein (hkb) on gap gene expression. We present quantitative expression data for the Hkb protein, which enable us to include hkb in gap gene circuit models. Gap gene circuits are mathematical models of gene networks used as computational tools to extract regulatory information from spatial expression data. This is achieved by fitting the model to gap gene expression patterns, in order to obtain estimates for regulatory parameters which predict a specific network topology. We show how considering variability in the data combined with analysis of parameter determinability significantly improves the biological relevance and consistency of the approach. Our models are in agreement with earlier results, which they extend in two important respects: First, we show that Hkb is involved in the regulation of the posterior hunchback (hb) domain, but does not have any other essential function. Specifically, Hkb is required for the anterior shift in the posterior border of this domain, which is now reproduced correctly in our models. Second, gap gene circuits presented here are able to reproduce mutants of terminal gap genes, while previously published models were unable to reproduce any null mutants correctly. As a consequence, our models now capture the expression dynamics of all posterior gap genes and some variational properties of the system correctly. This is an important step towards a better, quantitative understanding of the developmental and evolutionary dynamics of the gap gene network. PMID:19876378

  5. VRprofile: gene-cluster-detection-based profiling of virulence and antibiotic resistance traits encoded within genome sequences of pathogenic bacteria.

    Science.gov (United States)

    Li, Jun; Tai, Cui; Deng, Zixin; Zhong, Weihong; He, Yongqun; Ou, Hong-Yu

    2017-01-10

    VRprofile is a Web server that facilitates rapid investigation of virulence and antibiotic resistance genes, as well as extends these trait transfer-related genetic contexts, in newly sequenced pathogenic bacterial genomes. The used backend database MobilomeDB was firstly built on sets of known gene cluster loci of bacterial type III/IV/VI/VII secretion systems and mobile genetic elements, including integrative and conjugative elements, prophages, class I integrons, IS elements and pathogenicity/antibiotic resistance islands. VRprofile is thus able to co-localize the homologs of these conserved gene clusters using HMMer or BLASTp searches. With the integration of the homologous gene cluster search module with a sequence composition module, VRprofile has exhibited better performance for island-like region predictions than the other widely used methods. In addition, VRprofile also provides an integrated Web interface for aligning and visualizing identified gene clusters with MobilomeDB-archived gene clusters, or a variety set of bacterial genomes. VRprofile might contribute to meet the increasing demands of re-annotations of bacterial variable regions, and aid in the real-time definitions of disease-relevant gene clusters in pathogenic bacteria of interest. VRprofile is freely available at http://bioinfo-mml.sjtu.edu.cn/VRprofile. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  6. DGA Clustering and Analysis: Mastering Modern, Evolving Threats, DGALab

    Directory of Open Access Journals (Sweden)

    Alexander Chailytko

    2016-05-01

    Full Text Available Domain Generation Algorithms (DGA is a basic building block used in almost all modern malware. Malware researchers have attempted to tackle the DGA problem with various tools and techniques, with varying degrees of success. We present a complex solution to populate DGA feed using reversed DGAs, third-party feeds, and a smart DGA extraction and clustering based on emulation of a large number of samples. Smart DGA extraction requires no reverse engineering and works regardless of the DGA type or initialization vector, while enabling a cluster-based analysis. Our method also automatically allows analysis of the whole malware family, specific campaign, etc. We present our system and demonstrate its abilities on more than 20 malware families. This includes showing connections between different campaigns, as well as comparing results. Most importantly, we discuss how to utilize the outcome of the analysis to create smarter protections against similar malware.

  7. Analysis of RXTE data on Clusters of Galaxies

    Science.gov (United States)

    Petrosian, Vahe

    2004-01-01

    This grant provided support for the reduction, analysis and interpretation of of hard X-ray (HXR, for short) observations of the cluster of galaxies RXJO658--5557 scheduled for the week of August 23, 2002 under the RXTE Cycle 7 program (PI Vahe Petrosian, Obs. ID 70165). The goal of the observation was to search for and characterize the shape of the HXR component beyond the well established thermal soft X-ray (SXR) component. Such hard components have been detected in several nearby clusters. distant cluster would provide information on the characteristics of this radiation at a different epoch in the evolution of the imiverse and shed light on its origin. We (Petrosian, 2001) have argued that thermal bremsstrahlung, as proposed earlier, cannot be the mechanism for the production of the HXRs and that the most likely mechanism is Compton upscattering of the cosmic microwave radiation by relativistic electrons which are known to be present in the clusters and be responsible for the observed radio emission. Based on this picture we estimated that this cluster, in spite of its relatively large distance, will have HXR signal comparable to the other nearby ones. The planned observation of a relatively The proposed RXTE observations were carried out and the data have been analyzed. We detect a hard X-ray tail in the spectrum of this cluster with a flux very nearly equal to our predicted value. This has strengthen the case for the Compton scattering model. We intend the data obtained via this observation to be a part of a larger data set. We have identified other clusters of galaxies (in archival RXTE and other instrument data sets) with sufficiently high quality data where we can search for and measure (or at least put meaningful limits) on the strength of the hard component. With these studies we expect to clarify the mechanism for acceleration of particles in the intercluster medium and provide guidance for future observations of this intriguing phenomenon by instrument

  8. Mobility in Europe: Recent Trends from a Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Ioana Manafi

    2017-08-01

    Full Text Available During the past decade, Europe was confronted with major changes and events offering large opportunities for mobility. The EU enlargement process, the EU policies regarding youth, the economic crisis affecting national economies on different levels, political instabilities in some European countries, high rates of unemployment or the increasing number of refugees are only a few of the factors influencing net migration in Europe. Based on a set of socio-economic indicators for EU/EFTA countries and cluster analysis, the paper provides an overview of regional differences across European countries, related to migration magnitude in the identified clusters. The obtained clusters are in accordance with previous studies in migration, and appear stable during the period of 2005-2013, with only some exceptions. The analysis revealed three country clusters: EU/EFTA center-receiving countries, EU/EFTA periphery-sending countries and EU/EFTA outlier countries, the names suggesting not only the geographical position within Europe, but the trends in net migration flows during the years. Therewith, the results provide evidence for the persistence of a movement from periphery to center countries, which is correlated with recent flows of mobility in Europe.

  9. Full text clustering and relationship network analysis of biomedical publications.

    Directory of Open Access Journals (Sweden)

    Renchu Guan

    Full Text Available Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  10. The Productivity Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, E.

    2014-07-01

    Chennai, also called the Detroit of India, is India's second fastest growing auto market and exports auto components and vehicles to US, Germany, Japan and Brazil. For inclusive growth and sustainable development, 250 auto component industries in Ambattur, Thirumalisai and Thirumudivakkam Industrial Estates located in Chennai have adopted the Cluster Development Approach called Automotive Component Cluster. The objective is to study the Value Chain, Correlation and Data Envelopment Analysis by determining technical efficiency, peer weights, input and output slacks of 100 auto component industries in three estates. The methodology adopted is using Data Envelopment Analysis of Output Oriented Banker Charnes Cooper model by taking net worth, fixed assets, employment as inputs and gross output as outputs. The non-zero represents the weights for efficient clusters. The higher slack obtained reveals the excess net worth, fixed assets, employment and shortage in gross output. To conclude, the variables are highly correlated and the inefficient industries should increase their gross output or decrease the fixed assets or employment. Moreover for sustainable development, the cluster should strengthen infrastructure, technology, procurement, production and marketing interrelationships to decrease costs and to increase productivity and efficiency to compete in the indigenous and export market.

  11. Sirenomelia in Argentina: Prevalence, geographic clusters and temporal trends analysis.

    Science.gov (United States)

    Groisman, Boris; Liascovich, Rosa; Gili, Juan Antonio; Barbero, Pablo; Bidondo, María Paz

    2016-07-01

    Sirenomelia is a severe malformation of the lower body characterized by a single medial lower limb and a variable combination of visceral abnormalities. Given that Sirenomelia is a very rare birth defect, epidemiological studies are scarce. The aim of this study is to evaluate prevalence, geographic clusters and time trends of sirenomelia in Argentina, using data from the National Network of Congenital Anomalies of Argentina (RENAC) from November 2009 until December 2014. This is a descriptive study using data from the RENAC, a hospital-based surveillance system for newborns affected with major morphological congenital anomalies. We calculated sirenomelia prevalence throughout the period, searched for geographical clusters, and evaluated time trends. The prevalence of confirmed cases of sirenomelia throughout the period was 2.35 per 100,000 births. Cluster analysis showed no statistically significant geographical aggregates. Time-trends analysis showed that the prevalence was higher in years 2009 to 2010. The observed prevalence was higher than the observed in previous epidemiological studies in other geographic regions. We observed a likely real increase in the initial period of our study. We used strict diagnostic criteria, excluding cases that only had clinical diagnosis of sirenomelia. Therefore, real prevalence could be even higher. This study did not show any geographic clusters. Because etiology of sirenomelia has not yet been established, studies of epidemiological features of this defect may contribute to define its causes. Birth Defects Research (Part A) 106:604-611, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  12. Full text clustering and relationship network analysis of biomedical publications.

    Science.gov (United States)

    Guan, Renchu; Yang, Chen; Marchese, Maurizio; Liang, Yanchun; Shi, Xiaohu

    2014-01-01

    Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP) to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  13. Gene clusters for insecticidal loline alkaloids in the grass-endophytic fungus Neotyphodium uncinatum.

    Science.gov (United States)

    Spiering, Martin J; Moon, Christina D; Wilkinson, Heather H; Schardl, Christopher L

    2005-03-01

    Loline alkaloids are produced by mutualistic fungi symbiotic with grasses, and they protect the host plants from insects. Here we identify in the fungal symbiont, Neotyphodium uncinatum, two homologous gene clusters (LOL-1 and LOL-2) associated with loline-alkaloid production. Nine genes were identified in a 25-kb region of LOL-1 and designated (in order) lolF-1, lolC-1, lolD-1, lolO-1, lolA-1, lolU-1, lolP-1, lolT-1, and lolE-1. LOL-2 contained the homologs lolC-2 through lolE-2 in the same order and orientation. Also identified was lolF-2, but its possible linkage with either cluster was undetermined. Most lol genes were regulated in N. uncinatum and N. coenophialum, and all were expressed concomitantly with loline-alkaloid biosynthesis. A lolC-2 RNA-interference (RNAi) construct was introduced into N. uncinatum, and in two independent transformants, RNAi significantly decreased lolC expression (P lol-gene products indicate that the pathway has evolved from various different primary and secondary biosynthesis pathways.

  14. Latent cluster analysis of ALS phenotypes identifies prognostically differing groups.

    Directory of Open Access Journals (Sweden)

    Jeban Ganesalingam

    2009-09-01

    Full Text Available Amyotrophic lateral sclerosis (ALS is a degenerative disease predominantly affecting motor neurons and manifesting as several different phenotypes. Whether these phenotypes correspond to different underlying disease processes is unknown. We used latent cluster analysis to identify groupings of clinical variables in an objective and unbiased way to improve phenotyping for clinical and research purposes.Latent class cluster analysis was applied to a large database consisting of 1467 records of people with ALS, using discrete variables which can be readily determined at the first clinic appointment. The model was tested for clinical relevance by survival analysis of the phenotypic groupings using the Kaplan-Meier method.The best model generated five distinct phenotypic classes that strongly predicted survival (p<0.0001. Eight variables were used for the latent class analysis, but a good estimate of the classification could be obtained using just two variables: site of first symptoms (bulbar or limb and time from symptom onset to diagnosis (p<0.00001.The five phenotypic classes identified using latent cluster analysis can predict prognosis. They could be used to stratify patients recruited into clinical trials and generating more homogeneous disease groups for genetic, proteomic and risk factor research.

  15. Coxiella burnetii transcriptional analysis reveals serendipity clusters of regulation in intracellular bacteria.

    Directory of Open Access Journals (Sweden)

    Quentin Leroy

    Full Text Available Coxiella burnetii, the causative agent of the zoonotic disease Q fever, is mainly transmitted to humans through an aerosol route. A spore-like form allows C. burnetii to resist different environmental conditions. Because of this, analysis of the survival strategies used by this bacterium to adapt to new environmental conditions is critical for our understanding of C. burnetii pathogenicity. Here, we report the early transcriptional response of C. burnetii under temperature stresses. Our data show that C. burnetii exhibited minor changes in gene regulation under short exposure to heat or cold shock. While small differences were observed, C. burnetii seemed to respond similarly to cold and heat shock. The expression profiles obtained using microarrays produced in-house were confirmed by quantitative RT-PCR. Under temperature stresses, 190 genes were differentially expressed in at least one condition, with a fold change of up to 4. Globally, the differentially expressed genes in C. burnetii were associated with bacterial division, (pppGpp synthesis, wall and membrane biogenesis and, especially, lipopolysaccharide and peptidoglycan synthesis. These findings could be associated with growth arrest and witnessed transformation of the bacteria to a spore-like form. Unexpectedly, clusters of neighboring genes were differentially expressed. These clusters do not belong to operons or genetic networks; they have no evident associated functions and are not under the control of the same promoters. We also found undescribed but comparable clusters of regulation in previously reported transcriptomic analyses of intracellular bacteria, including Rickettsia sp. and Listeria monocytogenes. The transcriptomic patterns of C. burnetii observed under temperature stresses permits the recognition of unpredicted clusters of regulation for which the trigger mechanism remains unidentified but which may be the result of a new mechanism of epigenetic regulation.

  16. ATNT: an enhanced system for expression of polycistronic secondary metabolite gene clusters in Aspergillus niger.

    Science.gov (United States)

    Geib, Elena; Brock, Matthias

    2017-01-01

    Fungi are treasure chests for yet unexplored natural products. However, exploitation of their real potential remains difficult as a significant proportion of biosynthetic gene clusters appears silent under standard laboratory conditions. Therefore, elucidation of novel products requires gene activation or heterologous expression. For heterologous gene expression, we previously developed an expression platform in Aspergillus niger that is based on the transcriptional regulator TerR and its target promoter P terA . In this study, we extended this system by regulating expression of terR  by the doxycycline inducible Tet-on system. Reporter genes cloned under the control of the target promoter P terA remained silent in the absence of doxycycline, but were strongly expressed when doxycycline was added. Reporter quantification revealed that the coupled system results in about five times higher expression rates compared to gene expression under direct control of the Tet-on system. As production of secondary metabolites generally requires the expression of several biosynthetic genes, the suitability of the self-cleaving viral peptide sequence P2A was tested in this optimised expression system. P2A allowed polycistronic expression of genes required for Asp-melanin formation in combination with the gene coding for the red fluorescent protein tdTomato. Gene expression and Asp-melanin formation was prevented in the absence of doxycycline and strongly induced by addition of doxycycline. Fluorescence studies confirmed the correct subcellular localisation of the respective enzymes. This tightly regulated but strongly inducible expression system enables high level production of secondary metabolites most likely even those with toxic potential. Furthermore, this system is compatible with polycistronic gene expression and, thus, suitable for the discovery of novel natural products.

  17. The Quantitative Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, Ethirajan

    2016-07-01

    Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity

  18. Acquisition and evolution of plant pathogenesis-associated gene clusters and candidate determinants of tissue-specificity in xanthomonas.

    Directory of Open Access Journals (Sweden)

    Hong Lu

    Full Text Available Xanthomonas is a large genus of plant-associated and plant-pathogenic bacteria. Collectively, members cause diseases on over 392 plant species. Individually, they exhibit marked host- and tissue-specificity. The determinants of this specificity are unknown.To assess potential contributions to host- and tissue-specificity, pathogenesis-associated gene clusters were compared across genomes of eight Xanthomonas strains representing vascular or non-vascular pathogens of rice, brassicas, pepper and tomato, and citrus. The gum cluster for extracellular polysaccharide is conserved except for gumN and sequences downstream. The xcs and xps clusters for type II secretion are conserved, except in the rice pathogens, in which xcs is missing. In the otherwise conserved hrp cluster, sequences flanking the core genes for type III secretion vary with respect to insertion sequence element and putative effector gene content. Variation at the rpf (regulation of pathogenicity factors cluster is more pronounced, though genes with established functional relevance are conserved. A cluster for synthesis of lipopolysaccharide varies highly, suggesting multiple horizontal gene transfers and reassortments, but this variation does not correlate with host- or tissue-specificity. Phylogenetic trees based on amino acid alignments of gum, xps, xcs, hrp, and rpf cluster products generally reflect strain phylogeny. However, amino acid residues at four positions correlate with tissue specificity, revealing hpaA and xpsD as candidate determinants. Examination of genome sequences of xanthomonads Xylella fastidiosa and Stenotrophomonas maltophilia revealed that the hrp, gum, and xcs clusters are recent acquisitions in the Xanthomonas lineage.Our results provide insight into the ancestral Xanthomonas genome and indicate that differentiation with respect to host- and tissue-specificity involved not major modifications or wholesale exchange of clusters, but subtle changes in a small

  19. Applications of Cluster Analysis to the Creation of Perfectionism Profiles: A Comparison of two Clustering Approaches

    Directory of Open Access Journals (Sweden)

    Jocelyn H Bolin

    2014-04-01

    Full Text Available Although traditional clustering methods (e.g., K-means have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  20. Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches.

    Science.gov (United States)

    Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C

    2014-01-01

    Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  1. Inactivation of human α-globin gene expression by a de novo deletion located upstream of the α-globin gene cluster

    International Nuclear Information System (INIS)

    Liebhaber, S.A.; Weiss, I.; Cash, F.E.; Griese, E.U.; Horst, J.; Ayyub, H.; Higgs, D.R.

    1990-01-01

    Synthesis of normal human hemoglobin A, α 2 β 2 , is based upon balanced expression of genes in the α-globin gene cluster on chromosome 15 and the β-globin gene cluster on chromosome 11. Full levels of erythroid-specific activation of the β-globin cluster depend on sequences located at a considerable distance 5' to the β-globin gene, referred to as the locus-activating or dominant control region. The existence of an analogous element(s) upstream of the α-globin cluster has been suggested from observations on naturally occurring deletions and experimental studies. The authors have identified an individual with α-thalassemia in whom structurally normal α-globin genes have been inactivated in cis by a discrete de novo 35-kilobase deletion located ∼30 kilobases 5' from the α-globin gene cluster. They conclude that this deletion inactivates expression of the α-globin genes by removing one or more of the previously identified upstream regulatory sequences that are critical to expression of the α-globin genes

  2. Heterologous Reconstitution of the Intact Geodin Gene Cluster in Aspergillus nidulans through a Simple and Versatile PCR Based Approach

    DEFF Research Database (Denmark)

    Nielsen, Morten Thrane; Nielsen, Jakob Blæsbjerg; Anyaogu, Dianna Chinyere

    2013-01-01

    was transferred in a two step procedure to an expression platform in A. nidulans. The individual cluster fragments were generated by PCR and assembled via efficient USER fusion prior to ransformation and integration via re-iterative gene targeting. A total of 13 open reading frames contained in 25 kb of DNA were...... of solid methodology for genetic manipulation of most species severely hampers pathway haracterization. Here we present a simple PCR based approach for heterologous reconstitution of intact gene clusters. Specifically, the putative gene cluster responsible for geodin production from Aspergillus terreus...... successfully transferred between the two species enabling geodin synthesis in A. nidulans. Subsequently, functions of three genes in the cluster were validated by genetic and chemical analyses. Specifically, ATEG_08451 (gedC) encodes a polyketide synthase, ATEG_08453 (gedR) encodes a transcription factor...

  3. Function analysis of 5'-UTR of the cellulosomal xyl-doc cluster in Clostridium papyrosolvens.

    Science.gov (United States)

    Zou, Xia; Ren, Zhenxing; Wang, Na; Cheng, Yin; Jiang, Yuanyuan; Wang, Yan; Xu, Chenggang

    2018-01-01

    Anaerobic, mesophilic, and cellulolytic Clostridium papyrosolvens produces an efficient cellulolytic extracellular complex named cellulosome that hydrolyzes plant cell wall polysaccharides into simple sugars. Its genome harbors two long cellulosomal clusters: cip - cel operon encoding major cellulosome components (including scaffolding) and xyl - doc gene cluster encoding hemicellulases. Compared with works on cip - cel operon, there are much fewer studies on xyl - doc mainly due to its rare location in cellulolytic clostridia. Sequence analysis of xyl - doc revealed that it harbors a 5' untranslated region (5'-UTR) which potentially plays a role in the regulation of downstream gene expression. Here, we analyzed the function of 5'-UTR of xyl - doc cluster in C. papyrosolvens in vivo via transformation technology developed in this study. In this study, we firstly developed an electrotransformation method for C. papyrosolvens DSM 2782 before the analysis of 5'-UTR of xyl - doc cluster. In the optimized condition, a field with an intensity of 7.5-9.0 kV/cm was applied to a cuvette (0.2 cm gap) containing a mixture of plasmid and late cell suspended in exponential phase to form a 5 ms pulse in a sucrose-containing buffer. Afterwards, the putative promoter and the 5'-UTR of xyl - doc cluster were determined by sequence alignment. It is indicated that xyl - doc possesses a long conservative 5'-UTR with a complex secondary structure encompassing at least two perfect stem-loops which are potential candidates for controlling the transcriptional termination. In the last step, we employed an oxygen-independent flavin-based fluorescent protein (FbFP) as a quantitative reporter to analyze promoter activity and 5'-UTR function in vivo. It revealed that 5'-UTR significantly blocked transcription of downstream genes, but corn stover can relieve its suppression. In the present study, our results demonstrated that 5'-UTR of the cellulosomal xyl - doc cluster blocks the

  4. The Cremeomycin Biosynthetic Gene Cluster Encodes a Pathway for Diazo Formation.

    Science.gov (United States)

    Waldman, Abraham J; Pechersky, Yakov; Wang, Peng; Wang, Jennifer X; Balskus, Emily P

    2015-10-12

    Diazo groups are found in a range of natural products that possess potent biological activities. Despite longstanding interest in these metabolites, diazo group biosynthesis is not well understood, in part because of difficulties in identifying specific genes linked to diazo formation. Here we describe the discovery of the gene cluster that produces the o-diazoquinone natural product cremeomycin and its heterologous expression in Streptomyces lividans. We used stable isotope feeding experiments and in vitro characterization of biosynthetic enzymes to decipher the order of events in this pathway and establish that diazo construction involves late-stage N-N bond formation. This work represents the first successful production of a diazo-containing metabolite in a heterologous host, experimentally linking a set of genes with diazo formation. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Comparison of loline alkaloid gene clusters across fungal endophytes: predicting the co-regulatory sequence motifs and the evolutionary history.

    Science.gov (United States)

    Kutil, Brandi L; Greenwald, Charles; Liu, Gang; Spiering, Martin J; Schardl, Christopher L; Wilkinson, Heather H

    2007-10-01

    LOL, a fungal secondary metabolite gene cluster found in Epichloë and Neotyphodium species, is responsible for production of insecticidal loline alkaloids. To analyze the genetic architecture and to predict the evolutionary history of LOL, we compared five clusters from four fungal species (single clusters from Epichloë festucae, Neotyphodium sp. PauTG-1, Neotyphodium coenophialum, and two clusters we previously characterized in Neotyphodium uncinatum). Using PhyloCon to compare putative lol gene promoter regions, we have identified four motifs conserved across the lol genes in all five clusters. Each motif has significant similarity to known fungal transcription factor binding sites in the TRANSFAC database. Conservation of these motifs is further support for the hypothesis that the lol genes are co-regulated. Interestingly, the history of asexual Neotyphodium spp. includes multiple interspecific hybridization events. Comparing clusters from three Neotyphodium species and E. festucae allowed us to determine which Epichloë ancestors are the most likely contributors of LOL in these asexual species. For example, while no present day Epichloë typhina isolates are known to produce lolines, our data support the hypothesis that the E. typhina ancestor(s) of three asexual endophyte species contained a LOL gene cluster. Thus, these data support a model of evolution in which the polymorphism in loline alkaloid production phenotypes among endophyte species is likely due to the loss of the trait over time.

  6. Targeted insertion of the neomycin phosphotransferase gene into the tubulin gene cluster of Trypanosoma brucei

    NARCIS (Netherlands)

    ten Asbroek, A. L.; Ouellette, M.; Borst, P.

    1990-01-01

    Kinetoplastids are unicellular eukaryotes that include important parasites of man, such as trypanosomes and leishmanias. The study of these organisms received a recent boost from the development of transient transformation allowing the short-term expression of genes reintroduced into parasites like

  7. Loss of Major DNase I Hypersensitive Sites in Duplicatedglobin Gene Cluster Incompletely Silences HBB Gene Expression

    Czech Academy of Sciences Publication Activity Database

    Reading, N. S.; Shooter, C.; Song, J.; Miller, R.; Agarwal, A.; Láníková, Lucie; Clark, B.; Thein, S.L.; Divoký, V.; Prchal, J.T.

    2016-01-01

    Roč. 37, č. 11 (2016), s. 1153-1156 ISSN 1059-7794 R&D Projects: GA MŠk(CZ) LH15223 Institutional support: RVO:68378050 Keywords : globin genes * regulation * sickle cell disease * HBB duplication Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 4.601, year: 2016

  8. Statistical analysis of the spatial distribution of galaxies and clusters

    International Nuclear Information System (INIS)

    Cappi, Alberto

    1993-01-01

    This thesis deals with the analysis of the distribution of galaxies and clusters, describing some observational problems and statistical results. First chapter gives a theoretical introduction, aiming to describe the framework of the formation of structures, tracing the history of the Universe from the Planck time, t_p = 10"-"4"3 sec and temperature corresponding to 10"1"9 GeV, to the present epoch. The most usual statistical tools and models of the galaxy distribution, with their advantages and limitations, are described in chapter two. A study of the main observed properties of galaxy clustering, together with a detailed statistical analysis of the effects of selecting galaxies according to apparent magnitude or diameter, is reported in chapter three. Chapter four delineates some properties of groups of galaxies, explaining the reasons of discrepant results on group distributions. Chapter five is a study of the distribution of galaxy clusters, with different statistical tools, like correlations, percolation, void probability function and counts in cells; it is found the same scaling-invariant behaviour of galaxies. Chapter six describes our finding that rich galaxy clusters too belong to the fundamental plane of elliptical galaxies, and gives a discussion of its possible implications. Finally chapter seven reviews the possibilities offered by multi-slit and multi-fibre spectrographs, and I present some observational work on nearby and distant galaxy clusters. In particular, I show the opportunities offered by ongoing surveys of galaxies coupled with multi-object fibre spectrographs, focusing on the ESO Key Programme A galaxy redshift survey in the south galactic pole region to which I collaborate and on MEFOS, a multi-fibre instrument with automatic positioning. Published papers related to the work described in this thesis are reported in the last appendix. (author) [fr

  9. Isoeugenol monooxygenase and its putative regulatory gene are located in the eugenol metabolic gene cluster in Pseudomonas nitroreducens Jin1.

    Science.gov (United States)

    Ryu, Ji-Young; Seo, Jiyoung; Unno, Tatsuya; Ahn, Joong-Hoon; Yan, Tao; Sadowsky, Michael J; Hur, Hor-Gil

    2010-03-01

    The plant-derived phenylpropanoids eugenol and isoeugenol have been proposed as useful precursors for the production of natural vanillin. Genes involved in the metabolism of eugenol and isoeugenol were clustered in region of about a 30 kb of Pseudomonas nitroreducens Jin1. Two of the 23 ORFs in this region, ORFs 26 (iemR) and 27 (iem), were predicted to be involved in the conversion of isoeugenol to vanillin. The deduced amino acid sequence of isoeugenol monooxygenase (Iem) of strain Jin1 had 81.4% identity to isoeugenol monooxygenase from Pseudomonas putida IE27, which also transforms isoeugenol to vanillin. Iem was expressed in E. coli BL21(DE3) and was found to lead to isoeugenol to vanillin transformation. Deletion and cloning analyses indicated that the gene iemR, located upstream of iem, is required for expression of iem in the presence of isoeugenol, suggesting it to be the iem regulatory gene. Reverse transcription, real-time PCR analyses indicated that the genes involved in the metabolism of eugenol and isoeugenol were differently induced by isoeugenol, eugenol, and vanillin.

  10. Gene expression patterns combined with network analysis identify hub genes associated with bladder cancer.

    Science.gov (United States)

    Bi, Dongbin; Ning, Hao; Liu, Shuai; Que, Xinxiang; Ding, Kejia

    2015-06-01

    To explore molecular mechanisms of bladder cancer (BC), network strategy was used to find biomarkers for early detection and diagnosis. The differentially expressed genes (DEGs) between bladder carcinoma patients and normal subjects were screened using empirical Bayes method of the linear models for microarray data package. Co-expression networks were constructed by differentially co-expressed genes and links. Regulatory impact factors (RIF) metric was used to identify critical transcription factors (TFs). The protein-protein interaction (PPI) networks were constructed by the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and clusters were obtained through molecular complex detection (MCODE) algorithm. Centralities analyses for complex networks were performed based on degree, stress and betweenness. Enrichment analyses were performed based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Co-expression networks and TFs (based on expression data of global DEGs and DEGs in different stages and grades) were identified. Hub genes of complex networks, such as UBE2C, ACTA2, FABP4, CKS2, FN1 and TOP2A, were also obtained according to analysis of degree. In gene enrichment analyses of global DEGs, cell adhesion, proteinaceous extracellular matrix and extracellular matrix structural constituent were top three GO terms. ECM-receptor interaction, focal adhesion, and cell cycle were significant pathways. Our results provide some potential underlying biomarkers of BC. However, further validation is required and deep studies are needed to elucidate the pathogenesis of BC. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. Sensory over responsivity and obsessive compulsive symptoms: A cluster analysis.

    Science.gov (United States)

    Ben-Sasson, Ayelet; Podoly, Tamar Yonit

    2017-02-01

    Several studies have examined the sensory component in Obsesseive Compulsive Disorder (OCD) and described an OCD subtype which has a unique profile, and that Sensory Phenomena (SP) is a significant component of this subtype. SP has some commonalities with Sensory Over Responsivity (SOR) and might be in part a characteristic of this subtype. Although there are some studies that have examined SOR and its relation to Obsessive Compulsive Symptoms (OCS), literature lacks sufficient data on this interplay. First to further examine the correlations between OCS and SOR, and to explore the correlations between SOR modalities (i.e. smell, touch, etc.) and OCS subscales (i.e. washing, ordering, etc.). Second, to investigate the cluster analysis of SOR and OCS dimensions in adults, that is, to classify the sample using the sensory scores to find whether a sensory OCD subtype can be specified. Our third goal was to explore the psychometric features of a new sensory questionnaire: the Sensory Perception Quotient (SPQ). A sample of non clinical adults (n=350) was recruited via e-mail, social media and social networks. Participants completed questionnaires for measuring SOR, OCS, and anxiety. SOR and OCI-F scores were moderately significantly correlated (n=274), significant correlations between all SOR modalities and OCS subscales were found with no specific higher correlation between one modality to one OCS subscale. Cluster analysis revealed four distinct clusters: (1) No OC and SOR symptoms (NONE; n=100), (2) High OC and SOR symptoms (BOTH; n=28), (3) Moderate OC symptoms (OCS; n=63), (4) Moderate SOR symptoms (SOR; n=83). The BOTH cluster had significantly higher anxiety levels than the other clusters, and shared OC subscales scores with the OCS cluster. The BOTH cluster also reported higher SOR scores across tactile, vision, taste and olfactory modalities. The SPQ was found reliable and suitable to detect SOR, the sample SPQ scores was normally distributed (n=350). SOR is a

  12. Analysis of plasmaspheric plumes: CLUSTER and IMAGE observations

    Directory of Open Access Journals (Sweden)

    F. Darrouzet

    2006-07-01

    Full Text Available Plasmaspheric plumes have been routinely observed by CLUSTER and IMAGE. The CLUSTER mission provides high time resolution four-point measurements of the plasmasphere near perigee. Total electron density profiles have been derived from the electron plasma frequency identified by the WHISPER sounder supplemented, in-between soundings, by relative variations of the spacecraft potential measured by the electric field instrument EFW; ion velocity is also measured onboard these satellites. The EUV imager onboard the IMAGE spacecraft provides global images of the plasmasphere with a spatial resolution of 0.1 RE every 10 min; such images acquired near apogee from high above the pole show the geometry of plasmaspheric plumes, their evolution and motion. We present coordinated observations of three plume events and compare CLUSTER in-situ data with global images of the plasmasphere obtained by IMAGE. In particular, we study the geometry and the orientation of plasmaspheric plumes by using four-point analysis methods. We compare several aspects of plume motion as determined by different methods: (i inner and outer plume boundary velocity calculated from time delays of this boundary as observed by the wave experiment WHISPER on the four spacecraft, (ii drift velocity measured by the electron drift instrument EDI onboard CLUSTER and (iii global velocity determined from successive EUV images. These different techniques consistently indicate that plasmaspheric plumes rotate around the Earth, with their foot fully co-rotating, but with their tip rotating slower and moving farther out.

  13. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

    Science.gov (United States)

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-11

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.

  14. Gene set analysis using variance component tests.

    Science.gov (United States)

    Huang, Yen-Tsung; Lin, Xihong

    2013-06-28

    Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.

  15. HORIZONTAL BRANCH MORPHOLOGY OF GLOBULAR CLUSTERS: A MULTIVARIATE STATISTICAL ANALYSIS

    International Nuclear Information System (INIS)

    Jogesh Babu, G.; Chattopadhyay, Tanuka; Chattopadhyay, Asis Kumar; Mondal, Saptarshi

    2009-01-01

    The proper interpretation of horizontal branch (HB) morphology is crucial to the understanding of the formation history of stellar populations. In the present study a multivariate analysis is used (principal component analysis) for the selection of appropriate HB morphology parameter, which, in our case, is the logarithm of effective temperature extent of the HB (log T effHB ). Then this parameter is expressed in terms of the most significant observed independent parameters of Galactic globular clusters (GGCs) separately for coherent groups, obtained in a previous work, through a stepwise multiple regression technique. It is found that, metallicity ([Fe/H]), central surface brightness (μ v ), and core radius (r c ) are the significant parameters to explain most of the variations in HB morphology (multiple R 2 ∼ 0.86) for GGC elonging to the bulge/disk while metallicity ([Fe/H]) and absolute magnitude (M v ) are responsible for GGC belonging to the inner halo (multiple R 2 ∼ 0.52). The robustness is tested by taking 1000 bootstrap samples. A cluster analysis is performed for the red giant branch (RGB) stars of the GGC belonging to Galactic inner halo (Cluster 2). A multi-episodic star formation is preferred for RGB stars of GGC belonging to this group. It supports the asymptotic giant branch (AGB) model in three episodes instead of two as suggested by Carretta et al. for halo GGC while AGB model is suggested to be revisited for bulge/disk GGC.

  16. Comprehensive identification and clustering of CLV3/ESR-related (CLE) genes in plants finds groups with potentially shared function.

    Science.gov (United States)

    Goad, David M; Zhu, Chuanmei; Kellogg, Elizabeth A

    2017-10-01

    CLV3/ESR (CLE) proteins are important signaling peptides in plants. The short CLE peptide (12-13 amino acids) is cleaved from a larger pre-propeptide and functions as an extracellular ligand. The CLE family is large and has resisted attempts at classification because the CLE domain is too short for reliable phylogenetic analysis and the pre-propeptide is too variable. We used a model-based search for CLE domains from 57 plant genomes and used the entire pre-propeptide for comprehensive clustering analysis. In total, 1628 CLE genes were identified in land plants, with none recognizable from green algae. These CLEs form 12 groups within which CLE domains are largely conserved and pre-propeptides can be aligned. Most clusters contain sequences from monocots, eudicots and Amborella trichopoda, with sequences from Picea abies, Selaginella moellendorffii and Physcomitrella patens scattered in some clusters. We easily identified previously known clusters involved in vascular differentiation and nodulation. In addition, we found a number of discrete groups whose function remains poorly characterized. Available data indicate that CLE proteins within a cluster are likely to share function, whereas those from different clusters play at least partially different roles. Our analysis provides a foundation for future evolutionary and functional studies. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

  17. Poisson cluster analysis of cardiac arrest incidence in Columbus, Ohio.

    Science.gov (United States)

    Warden, Craig; Cudnik, Michael T; Sasson, Comilla; Schwartz, Greg; Semple, Hugh

    2012-01-01

    Scarce resources in disease prevention and emergency medical services (EMS) need to be focused on high-risk areas of out-of-hospital cardiac arrest (OHCA). Cluster analysis using geographic information systems (GISs) was used to find these high-risk areas and test potential predictive variables. This was a retrospective cohort analysis of EMS-treated adults with OHCAs occurring in Columbus, Ohio, from April 1, 2004, through March 31, 2009. The OHCAs were aggregated to census tracts and incidence rates were calculated based on their adult populations. Poisson cluster analysis determined significant clusters of high-risk census tracts. Both census tract-level and case-level characteristics were tested for association with high-risk areas by multivariate logistic regression. A total of 2,037 eligible OHCAs occurred within the city limits during the study period. The mean incidence rate was 0.85 OHCAs/1,000 population/year. There were five significant geographic clusters with 76 high-risk census tracts out of the total of 245 census tracts. In the case-level analysis, being in a high-risk cluster was associated with a slightly younger age (-3 years, adjusted odds ratio [OR] 0.99, 95% confidence interval [CI] 0.99-1.00), not being white, non-Hispanic (OR 0.54, 95% CI 0.45-0.64), cardiac arrest occurring at home (OR 1.53, 95% CI 1.23-1.71), and not receiving bystander cardiopulmonary resuscitation (CPR) (OR 0.77, 95% CI 0.62-0.96), but with higher survival to hospital discharge (OR 1.78, 95% CI 1.30-2.46). In the census tract-level analysis, high-risk census tracts were also associated with a slightly lower average age (-0.1 years, OR 1.14, 95% CI 1.06-1.22) and a lower proportion of white, non-Hispanic patients (-0.298, OR 0.04, 95% CI 0.01-0.19), but also a lower proportion of high-school graduates (-0.184, OR 0.00, 95% CI 0.00-0.00). This analysis identified high-risk census tracts and associated census tract-level and case-level characteristics that can be used to

  18. Grouping and clustering of maize Lancaster germplasm inbreds according to the results of SNP-analysis

    Directory of Open Access Journals (Sweden)

    K. V. Derkach

    2017-08-01

    Full Text Available The objective of this article is the grouping and clustering of maize inbred lines based on the results of SNP-genotyping for the verification of a separate cluster of Lancaster germplasm inbred lines. As material for the study, we used 91 maize (Zea mays L. inbred lines, including 31 Lancaster germplasm lines and 60 inbred lines of other germplasms (23 Iodent inbreds, 15 Reid inbreds, 7 Lacon inbreds, 12 Mix inbreds and 3 exotic inbreds. The majority of the given inbred lines are included in the Dnipro breeding programme. The SNP-genotyping of these inbred lines was conducted using BDI-III panel of 384 SNP-markers developed by BioDiagnostics, Inc. (USA on the base of Illumina VeraCode Bead Plate. The SNP-markers of this panel are biallelic and are located on all 10 maize chromosomes. Their range of conductivity was >0.6. The SNP-analysis was made in completely automated regime on Illumina BeadStation equipment at BioDiagnostics, Inc. (USA. A principal component analysis was applied to group a general set of 91 inbreds according to allelic states of SNP-markers and to identify a cluster of Lancaster inbreds. The clustering and determining hierarchy in 31 Lancaster germplasm inbreds used quantitative cluster analysis. The share of monomorphic markers in the studied set of 91 inbred lines equaled 0.7%, and the share of dimorphic markers equaled 99.3%. Minor allele frequency (MAF > 0.2 was observed for 80.6% of dimorphic markers, the average index of shift of gene diversity equaled 0.2984, PIC on average reached 0.3144. The index of gene diversity of markers varied from 0.1701 to 0.1901, pairwise genetic distances between inbred lines ranged from 0.0316–0.8000, the frequencies of major alleles of SNP-markers were within 0.5085–0.9821, and the frequencies of minor alleles were within 0.0179–0.4915. The average homozygosity of inbred lines was 98.8%. The principal component analysis of SNP-distances confirmed the isolation of the Lancaster

  19. Motif-Independent De Novo Detection of Secondary Metabolite Gene Clusters – Towards Identification of Novel Secondary Metabolisms from Filamentous Fungi -

    Directory of Open Access Journals (Sweden)

    Myco eUmemura

    2015-05-01

    Full Text Available Secondary metabolites are produced mostly by clustered genes that are essential to their biosynthesis. The transcriptional expression of these genes is often cooperatively regulated by a transcription factor located inside or close to a cluster. Most of the secondary metabolism biosynthesis (SMB gene clusters identified to date contain so-called core genes with distinctive sequence features, such as polyketide synthase (PKS and non-ribosomal peptide synthetase (NRPS. Recent efforts in sequencing fungal genomes have revealed far more SMB gene clusters than expected based on the number of core genes in the genomes. Several bioinformatics tools have been developed to survey SMB gene clusters using the sequence motif information of the core genes, including SMURF and antiSMASH.More recently, accompanied by the development of sequencing techniques allowing to obtain large-scale genomic and transcriptomic data, motif-independent prediction methods of SMB gene clusters, including MIDDAS-M, have been developed. Most these methods detect the clusters in which the genes are cooperatively regulated at transcriptional levels, thus allowing the identification of novel SMB gene clusters regardless of the presence of the core genes. Another type of the method, MIPS-CG, uses the characteristics of SMB genes, which are highly enriched in non-syntenic blocks (NSBs, enabling the prediction even without transcriptome data although the results have not been evaluated in detail. Considering that large portion of SMB gene clusters might be sufficiently expressed only in limited uncommon conditions, it seems that prediction of SMB gene clusters by bioinformatics and successive experimental validation is an only way to efficiently uncover hidden SMB gene clusters. Here, we describe and discuss possible novel approaches for the determination of SMB gene clusters that have not been identified using conventional methods.

  20. Deletion of the MBII-85 snoRNA gene cluster in mice results in postnatal growth retardation.

    Directory of Open Access Journals (Sweden)

    Boris V Skryabin

    2007-12-01

    Full Text Available Prader-Willi syndrome (PWS [MIM 176270] is a neurogenetic disorder characterized by decreased fetal activity, muscular hypotonia, failure to thrive, short stature, obesity, mental retardation, and hypogonadotropic hypogonadism. It is caused by the loss of function of one or more imprinted, paternally expressed genes on the proximal long arm of chromosome 15. Several potential PWS mouse models involving the orthologous region on chromosome 7C exist. Based on the analysis of deletions in the mouse and gene expression in PWS patients with chromosomal translocations, a critical region (PWScr for neonatal lethality, failure to thrive, and growth retardation was narrowed to the locus containing a cluster of neuronally expressed MBII-85 small nucleolar RNA (snoRNA genes. Here, we report the deletion of PWScr. Mice carrying the maternally inherited allele (PWScr(m-/p+ are indistinguishable from wild-type littermates. All those with the paternally inherited allele (PWScr(m+/p- consistently display postnatal growth retardation, with about 15% postnatal lethality in C57BL/6, but not FVB/N crosses. This is the first example in a multicellular organism of genetic deletion of a C/D box snoRNA gene resulting in a pronounced phenotype.

  1. Performance Based Clustering for Benchmarking of Container Ports: an Application of Dea and Cluster Analysis Technique

    Directory of Open Access Journals (Sweden)

    Jie Wu

    2010-12-01

    Full Text Available The operational performance of container ports has received more and more attentions in both academic and practitioner circles, the performance evaluation and process improvement of container ports have also been the focus of several studies. In this paper, Data Envelopment Analysis (DEA, an effective tool for relative efficiency assessment, is utilized for measuring the performances and benchmarking of the 77 world container ports in 2007. The used approaches in the current study consider four inputs (Capacity of Cargo Handling Machines, Number of Berths, Terminal Area and Storage Capacity and a single output (Container Throughput. The results for the efficiency scores are analyzed, and a unique ordering of the ports based on average cross efficiency is provided, also cluster analysis technique is used to select the more appropriate targets for poorly performing ports to use as benchmarks.

  2. Bioinformatic analysis of the nucleotide binding site-encoding disease-resistance genes in foxtail millet (Setaria italica (L.) Beauv.).

    Science.gov (United States)

    Zhu, Y B; Xie, X Q; Li, Z Y; Bai, H; Dong, L; Dong, Z P; Dong, J G

    2014-08-28

    The nucleotide-binding site (NBS) disease-resistance genes are the largest category of plant disease-resistance gene analogs. The complete set of disease-resistant candidate genes, which encode the NBS sequence, was filtered in the genomes of two varieties of foxtail millet (Yugu1 and 'Zhang gu'). This study investigated a number of characteristics of the putative NBS genes, such as structural diversity and phylogenetic relationships. A total of 269 and 281 NBS-coding sequences were identified in Yugu1 and 'Zhang gu', respectively. When the two databases were compared, 72 genes were found to be identical and 164 genes showed more than 90% similarity. Physical positioning and gene family analysis of the NBS disease-resistance genes in the genome revealed that the number of genes on each chromosome was similar in both varieties. The eighth chromosome contained the largest number of genes and the ninth chromosome contained the lowest number of genes. Exactly 34 gene clusters containing the 161 genes were found in the Yugu1 genome, with each cluster containing 4.7 genes on average. In comparison, the 'Zhang gu' genome possessed 28 gene clusters, which had 151 genes, with an average of 5.4 genes in each cluster. The largest gene cluster, located on the eighth chromosome, contained 12 genes in the Yugu1 database, whereas it contained 16 genes in the 'Zhang gu' database. The classification results showed that the CC-NBS-LRR gene made up the largest part of each chromosome in the two databases. Two TIR-NBS genes were also found in the Yugu1 genome.

  3. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis

    Science.gov (United States)

    Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

    2015-01-01

    Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383

  4. Full structure and insight into the gene cluster of the O-specific polysaccharide of Yersinia intermedia H9-36/83 (O:17).

    Science.gov (United States)

    Sizova, Olga V; Shashkov, Alexander S; Kondakova, Anna N; Knirel, Yuriy A; Shaikhutdinova, Rima Z; Ivanov, Sergei A; Kislichkina, Angelina A; Kadnikova, Lidia A; Bogun, Aleksandr G; Dentovskaya, Svetlana V

    2018-05-02

    Lipopolysaccharide was isolated from bacteria Yersinia intermedia H9-36/83 (O:17) and degraded with mild acid to give an O-specific polysaccharide, which was isolated by GPC on Sephadex G-50 and studied by sugar analysis and 1D and 2D NMR spectroscopy. The polysaccharide was found to contain 3-deoxy-3-[(R)-3-hydroxybutanoylamino]-d-fucose (d-Fuc3NR3Hb) and the following structure of the heptasaccharide repeating unit was established: The structure established is consistent with the gene content of the O-antigen gene cluster. The O-polysaccharide structure and gene cluster of Y. intermedia are related to those of Hafnia alvei 1211 and Escherichia coli O:103. Copyright © 2018 Elsevier Ltd. All rights reserved.

  5. A Proteomic Approach to Investigating Gene Cluster Expression and Secondary Metabolite Functionality in Aspergillus fumigatus

    Science.gov (United States)

    Owens, Rebecca A.; Hammel, Stephen; Sheridan, Kevin J.; Jones, Gary W.; Doyle, Sean

    2014-01-01

    A combined proteomics and metabolomics approach was utilised to advance the identification and characterisation of secondary metabolites in Aspergillus fumigatus. Here, implementation of a shotgun proteomic strategy led to the identification of non-redundant mycelial proteins (n = 414) from A. fumigatus including proteins typically under-represented in 2-D proteome maps: proteins with multiple transmembrane regions, hydrophobic proteins and proteins with extremes of molecular mass and pI. Indirect identification of secondary metabolite cluster expression was also achieved, with proteins (n = 18) from LaeA-regulated clusters detected, including GliT encoded within the gliotoxin biosynthetic cluster. Biochemical analysis then revealed that gliotoxin significantly attenuates H2O2-induced oxidative stress in A. fumigatus (p>0.0001), confirming observations from proteomics data. A complementary 2-D/LC-MS/MS approach further elucidated significantly increased abundance (pproteome and experimental strategies, plus mechanistic data pertaining to gliotoxin functionality in the organism. PMID:25198175

  6. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    Science.gov (United States)

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  7. Diagnostics of subtropical plants functional state by cluster analysis

    Directory of Open Access Journals (Sweden)

    Oksana Belous

    2016-05-01

    Full Text Available The article presents an application example of statistical methods for data analysis on diagnosis of the adaptive capacity of subtropical plants varieties. We depicted selection indicators and basic physiological parameters that were defined as diagnostic. We used evaluation on a set of parameters of water regime, there are: determination of water deficit of the leaves, determining the fractional composition of water and detection parameters of the concentration of cell sap (CCS (for tea culture flushes. These settings are characterized by high liability and high responsiveness to the effects of many abiotic factors that determined the particular care in the selection of plant material for analysis and consideration of the impact on sustainability. On the basis of the experimental data calculated the coefficients of pair correlation between climatic factors and used physiological indicators. The result was a selection of physiological and biochemical indicators proposed to assess the adaptability and included in the basis of methodical recommendations on diagnostics of the functional state of the studied cultures. Analysis of complex studies involving a large number of indicators is quite difficult, especially does not allow to quickly identify the similarity of new varieties for their adaptive responses to adverse factors, and, therefore, to set general requirements to conditions of cultivation. Use of cluster analysis suggests that in the analysis of only quantitative data; define a set of variables used to assess varieties (and the more sampling, the more accurate the clustering will happen, be sure to ascertain the measure of similarity (or difference between objects. It is shown that the identification of diagnostic features, which are subjected to statistical processing, impact the accuracy of the varieties classification. Selection in result of the mono-clusters analysis (variety tea Kolhida; hazelnut Lombardsky red; variety kiwi Monty

  8. Cluster Analysis of the International Stellarator Confinement Database

    International Nuclear Information System (INIS)

    Kus, A.; Dinklage, A.; Preuss, R.; Ascasibar, E.; Harris, J. H.; Okamura, S.; Yamada, H.; Sano, F.; Stroth, U.; Talmadge, J.

    2008-01-01

    Heterogeneous structure of collected data is one of the problems that occur during derivation of scalings for energy confinement time, and whose analysis tourns out to be wide and complicated matter. The International Stellarator Confinement Database [1], shortly ISCDB, comprises in its latest version 21 a total of 3647 observations from 8 experimental devices, 2067 therefrom beeing so far completed for upcoming analyses. For confinement scaling studies 1933 observation were chosen as the standard dataset. Here we describe a statistical method of cluster analysis for identification of possible cohesive substructures in ISDCB and present some preliminary results

  9. Genome-wide identification of physically clustered genes suggests chromatin-level co-regulation in male reproductive development in Arabidopsis thaliana.

    Science.gov (United States)

    Reimegård, Johan; Kundu, Snehangshu; Pendle, Ali; Irish, Vivian F; Shaw, Peter; Nakayama, Naomi; Sundström, Jens F; Emanuelsson, Olof

    2017-04-07

    Co-expression of physically linked genes occurs surprisingly frequently in eukaryotes. Such chromosomal clustering may confer a selective advantage as it enables coordinated gene regulation at the chromatin level. We studied the chromosomal organization of genes involved in male reproductive development in Arabidopsis thaliana. We developed an in-silico tool to identify physical clusters of co-regulated genes from gene expression data. We identified 17 clusters (96 genes) involved in stamen development and acting downstream of the transcriptional activator MS1 (MALE STERILITY 1), which contains a PHD domain associated with chromatin re-organization. The clusters exhibited little gene homology or promoter element similarity, and largely overlapped with reported repressive histone marks. Experiments on a subset of the clusters suggested a link between expression activation and chromatin conformation: qRT-PCR and mRNA in situ hybridization showed that the clustered genes were up-regulated within 48 h after MS1 induction; out of 14 chromatin-remodeling mutants studied, expression of clustered genes was consistently down-regulated only in hta9/hta11, previously associated with metabolic cluster activation; DNA fluorescence in situ hybridization confirmed that transcriptional activation of the clustered genes was correlated with open chromatin conformation. Stamen development thus appears to involve transcriptional activation of physically clustered genes through chromatin de-condensation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Genetic interrelations in the actinomycin biosynthetic gene clusters of Streptomyces antibioticus IMRU 3720 and Streptomyces chrysomallus ATCC11523, producers of actinomycin X and actinomycin C

    Science.gov (United States)

    Crnovčić, Ivana; Rückert, Christian; Semsary, Siamak; Lang, Manuel; Kalinowski, Jörn; Keller, Ullrich

    2017-01-01

    Sequencing the actinomycin (acm) biosynthetic gene cluster of Streptomyces antibioticus IMRU 3720, which produces actinomycin X (Acm X), revealed 20 genes organized into a highly similar framework as in the bi-armed acm C biosynthetic gene cluster of Streptomyces chrysomallus but without an attached additional extra arm of orthologues as in the latter. Curiously, the extra arm of the S. chrysomallus gene cluster turned out to perfectly match the single arm of the S. antibioticus gene cluster in the same order of orthologues including the the presence of two pseudogenes, scacmM and scacmN, encoding a cytochrome P450 and its ferredoxin, respectively. Orthologues of the latter genes were both missing in the principal arm of the S. chrysomallus acm C gene cluster. All orthologues of the extra arm showed a G +C-contents different from that of their counterparts in the principal arm. Moreover, the similarities of translation products from the extra arm were all higher to the corresponding translation products of orthologue genes from the S. antibioticus acm X gene cluster than to those encoded by the principal arm of their own gene cluster. This suggests that the duplicated structure of the S. chrysomallus acm C biosynthetic gene cluster evolved from previous fusion between two one-armed acm gene clusters each from a different genetic background. However, while scacmM and scacmN in the extra arm of the S. chrysomallus acm C gene cluster are mutated and therefore are non-functional, their orthologues saacmM and saacmN in the S. antibioticus acm C gene cluster show no defects seemingly encoding active enzymes with functions specific for Acm X biosynthesis. Both acm biosynthetic gene clusters lack a kynurenine-3-monooxygenase gene necessary for biosynthesis of 3-hydroxy-4-methylanthranilic acid, the building block of the Acm chromophore, which suggests participation of a genome-encoded relevant monooxygenase during Acm biosynthesis in both S. chrysomallus and S

  11. Accommodating error analysis in comparison and clustering of molecular fingerprints.

    Science.gov (United States)

    Salamon, H; Segal, M R; Ponce de Leon, A; Small, P M

    1998-01-01

    Molecular epidemiologic studies of infectious diseases rely on pathogen genotype comparisons, which usually yield patterns comprising sets of DNA fragments (DNA fingerprints). We use a highly developed genotyping system, IS6110-based restriction fragment length polymorphism analysis of Mycobacterium tuberculosis, to develop a computational method that automates comparison of large numbers of fingerprints. Because error in fragment length measurements is proportional to fragment length and is positively correlated for fragments within a lane, an align-and-count method that compensates for relative scaling of lanes reliably counts matching fragments between lanes. Results of a two-step method we developed to cluster identical fingerprints agree closely with 5 years of computer-assisted visual matching among 1,335 M. tuberculosis fingerprints. Fully documented and validated methods of automated comparison and clustering will greatly expand the scope of molecular epidemiology.

  12. Accident patterns for construction-related workers: a cluster analysis

    Science.gov (United States)

    Liao, Chia-Wen; Tyan, Yaw-Yauan

    2012-01-01

    The construction industry has been identified as one of the most hazardous industries. The risk of constructionrelated workers is far greater than that in a manufacturing based industry. However, some steps can be taken to reduce worker risk through effective injury prevention strategies. In this article, k-means clustering methodology is employed in specifying the factors related to different worker types and in identifying the patterns of industrial occupational accidents. Accident reports during the period 1998 to 2008 are extracted from case reports of the Northern Region Inspection Office of the Council of Labor Affairs of Taiwan. The results show that the cluster analysis can indicate some patterns of occupational injuries in the construction industry. Inspection plans should be proposed according to the type of construction-related workers. The findings provide a direction for more effective inspection strategies and injury prevention programs.

  13. Cluster analysis in systems of magnetic spheres and cubes

    Energy Technology Data Exchange (ETDEWEB)

    Pyanzina, E.S., E-mail: elena.pyanzina@urfu.ru [Ural Federal University, Lenin Av. 51, Ekaterinburg (Russian Federation); Gudkova, A.V. [Ural Federal University, Lenin Av. 51, Ekaterinburg (Russian Federation); Donaldson, J.G. [University of Vienna, Sensengasse 8, Vienna (Austria); Kantorovich, S.S. [Ural Federal University, Lenin Av. 51, Ekaterinburg (Russian Federation); University of Vienna, Sensengasse 8, Vienna (Austria)

    2017-06-01

    In the present work we use molecular dynamics simulations and graph-theory based cluster analysis to compare self-assembly in systems of magnetic spheres, and cubes where the dipole moment is oriented along the side of the cube in the [001] crystallographic direction. We show that under the same conditions cubes aggregate far less than their spherical counterparts. This difference can be explained in terms of the volume of phase space in which the formation of the bond is thermodynamically advantageous. It follows that this volume is much larger for a dipolar sphere than for a dipolar cube. - Highlights: • A comparison of the degree of self-assembly in systems of magnetic spheres and cubes. • Spheres are more likely to form larger clusters than cubes. • Differences in microstructure will manifest in the magnetic response of each system.

  14. Image Registration Algorithm Based on Parallax Constraint and Clustering Analysis

    Science.gov (United States)

    Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song

    2018-01-01

    To resolve the problem of slow computation speed and low matching accuracy in image registration, a new image registration algorithm based on parallax constraint and clustering analysis is proposed. Firstly, Harris corner detection algorithm is used to extract the feature points of two images. Secondly, use Normalized Cross Correlation (NCC) function to perform the approximate matching of feature points, and the initial feature pair is obtained. Then, according to the parallax constraint condition, the initial feature pair is preprocessed by K-means clustering algorithm, which is used to remove the feature point pairs with obvious errors in the approximate matching process. Finally, adopt Random Sample Consensus (RANSAC) algorithm to optimize the feature points to obtain the final feature point matching result, and the fast and accurate image registration is realized. The experimental results show that the image registration algorithm proposed in this paper can improve the accuracy of the image matching while ensuring the real-time performance of the algorithm.

  15. Network clustering coefficient approach to DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gerhardt, Guenther J.L. [Universidade Federal do Rio Grande do Sul-Hospital de Clinicas de Porto Alegre, Rua Ramiro Barcelos 2350/sala 2040/90035-003 Porto Alegre (Brazil); Departamento de Fisica e Quimica da Universidade de Caxias do Sul, Rua Francisco Getulio Vargas 1130, 95001-970 Caxias do Sul (Brazil); Lemke, Ney [Programa Interdisciplinar em Computacao Aplicada, Unisinos, Av. Unisinos, 950, 93022-000 Sao Leopoldo, RS (Brazil); Corso, Gilberto [Departamento de Biofisica e Farmacologia, Centro de Biociencias, Universidade Federal do Rio Grande do Norte, Campus Universitario, 59072 970 Natal, RN (Brazil)]. E-mail: corso@dfte.ufrn.br

    2006-05-15

    In this work we propose an alternative DNA sequence analysis tool based on graph theoretical concepts. The methodology investigates the path topology of an organism genome through a triplet network. In this network, triplets in DNA sequence are vertices and two vertices are connected if they occur juxtaposed on the genome. We characterize this network topology by measuring the clustering coefficient. We test our methodology against two main bias: the guanine-cytosine (GC) content and 3-bp (base pairs) periodicity of DNA sequence. We perform the test constructing random networks with variable GC content and imposed 3-bp periodicity. A test group of some organisms is constructed and we investigate the methodology in the light of the constructed random networks. We conclude that the clustering coefficient is a valuable tool since it gives information that is not trivially contained in 3-bp periodicity neither in the variable GC content.

  16. Multiscale visual quality assessment for cluster analysis with self-organizing maps

    Science.gov (United States)

    Bernard, Jürgen; von Landesberger, Tatiana; Bremm, Sebastian; Schreck, Tobias

    2011-01-01

    Cluster analysis is an important data mining technique for analyzing large amounts of data, reducing many objects to a limited number of clusters. Cluster visualization techniques aim at supporting the user in better understanding the characteristics and relationships among the found clusters. While promising approaches to visual cluster analysis already exist, these usually fall short of incorporating the quality of the obtained clustering results. However, due to the nature of the clustering process, quality plays an important aspect, as for most practical data sets, typically many different clusterings are possible. Being aware of clustering quality is important to judge the expressiveness of a given cluster visualization, or to adjust the clustering process with refined parameters, among others. In this work, we present an encompassing suite of visual tools for quality assessment of an important visual cluster algorithm, namely, the Self-Organizing Map (SOM) technique. We define, measure, and visualize the notion of SOM cluster quality along a hierarchy of cluster abstractions. The quality abstractions range from simple scalar-valued quality scores up to the structural comparison of a given SOM clustering with output of additional supportive clustering methods. The suite of methods allows the user to assess the SOM quality on the appropriate abstraction level, and arrive at improved clustering results. We implement our tools in an integrated system, apply it on experimental data sets, and show its applicability.

  17. A multi-Poisson dynamic mixture model to cluster developmental patterns of gene expression by RNA-seq.

    Science.gov (United States)

    Ye, Meixia; Wang, Zhong; Wang, Yaqun; Wu, Rongling

    2015-03-01

    Dynamic changes of gene expression reflect an intrinsic mechanism of how an organism responds to developmental and environmental signals. With the increasing availability of expression data across a time-space scale by RNA-seq, the classification of genes as per their biological function using RNA-seq data has become one of the most significant challenges in contemporary biology. Here we develop a clustering mixture model to discover distinct groups of genes expressed during a period of organ development. By integrating the density function of multivariate Poisson distribution, the model accommodates the discrete property of read counts characteristic of RNA-seq data. The temporal dependence of gene expression is modeled by the first-order autoregressive process. The model is implemented with the Expectation-Maximization algorithm and model selection to determine the optimal number of gene clusters and obtain the estimates of Poisson parameters that describe the pattern of time-dependent expression of genes from each cluster. The model has been demonstrated by analyzing a real data from an experiment aimed to link the pattern of gene expression to catkin development in white poplar. The usefulness of the model has been validated through computer simulation. The model provides a valuable tool for clustering RNA-seq data, facilitating our global view of expression dynamics and understanding of gene regulation mechanisms. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  18. Identification of a trichothecene gene cluster and description of the harzianum A biosynthesis pathway in the fungus Trichoderma arundinaceum

    Science.gov (United States)

    Trichothecenes are sesquiterpenes that act like mycotoxins. Their biosynthesis has been mainly studied in the fungal genera Fusarium, where most of the biosynthetic genes (tri) are grouped in a cluster regulated by ambient conditions and regulatory genes. Unexpectedly, few studies are available abou...

  19. Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering

    Directory of Open Access Journals (Sweden)

    Sharma Animesh

    2007-01-01

    Full Text Available Abstract Background The four heterogeneous childhood cancers, neuroblastoma, non-Hodgkin lymphoma, rhabdomyosarcoma, and Ewing sarcoma present a similar histology of small round blue cell tumor (SRBCT and thus often leads to misdiagnosis. Identification of biomarkers for distinguishing these cancers is a well studied problem. Existing methods typically evaluate each gene separately and do not take into account the nonlinear interaction between genes and the tools that are used to design the diagnostic prediction system. Consequently, more genes are usually identified as necessary for prediction. We propose a general scheme for finding a small set of biomarkers to design a diagnostic system for accurate classification of the cancer subgroups. We use multilayer networks with online gene selection ability and relational fuzzy clustering to identify a small set of biomarkers for accurate classification of the training and blind test cases of a well studied data set. Results Our method discerned just seven biomarkers that precisely categorized the four subgroups of cancer both in training and blind samples. For the same problem, others suggested 19–94 genes. These seven biomarkers include three novel genes (NAB2, LSP1 and EHD1 – not identified by others with distinct class-specific signatures and important role in cancer biology, including cellular proliferation, transendothelial migration and trafficking of MHC class antigens. Interestingly, NAB2 is downregulated in other tumors including Non-Hodgkin lymphoma and Neuroblastoma but we observed moderate to high upregulation in a few cases of Ewing sarcoma and Rabhdomyosarcoma, suggesting that NAB2 might be mutated in these tumors. These genes can discover the subgroups correctly with unsupervised learning, can differentiate non-SRBCT samples and they perform equally well with other machine learning tools including support vector machines. These biomarkers lead to four simple human interpretable

  20. Steady state subchannel analysis of AHWR fuel cluster

    International Nuclear Information System (INIS)

    Dasgupta, A.; Chandraker, D.K.; Vijayan, P.K.; Saha, D.

    2006-09-01

    Subchannel analysis is a technique used to predict the thermal hydraulic behavior of reactor fuel assemblies. The rod cluster is subdivided into a number of parallel interacting flow subchannels. The conservation equations are solved for each of these subchannels, taking into account subchannel interactions. Subchannel analysis of AHWR D-5 fuel cluster has been carried out to determine the variations in thermal hydraulic conditions of coolant and fuel temperatures along the length of the fuel bundle. The hottest regions within the AHWR fuel bundle have been identified. The effect of creep on the fuel performance has also been studied. MCHFR has been calculated using Jansen-Levy correlation. The calculations have been backed by sensitivity analysis for parameters whose values are not known accurately. The sensitivity analysis showed the calculations to have a very low sensitivity to these parameters. Apart from the analysis, the report also includes a brief introduction of a few subchannel codes. A brief description of the equations and solution methodology used in COBRA-IIIC and COBRA-IV-I is also given. (author)

  1. Transcriptional interference networks coordinate the expression of functionally-related genes clustered in the same genomic loci

    Directory of Open Access Journals (Sweden)

    Zsolt eBoldogkoi

    2012-07-01

    Full Text Available The regulation of gene expression is essential for normal functioning of biological systems in every form of life. Gene expression is primarily controlled at the level of transcription, especially at the phase of initiation. Non-coding RNAs are one of the major players at every level of genetic regulation, including the control of chromatin organisation, transcription, various post-transcriptional processes and translation. In this study, the Transcriptional Interference Network (TIN hypothesis was put forward in an attempt to explain the global expression of antisense RNAs and the overall occurrence of tandem gene clusters in the genomes of various biological systems ranging from viruses to mammalian cells. The TIN hypothesis suggests the existence of a novel layer of genetic regulation, based on the interactions between the transcriptional machineries of neighbouring genes at their overlapping regions, which are assumed to play a fundamental role in coordinating gene expression within a cluster of functionally-linked genes. It is claimed that the transcriptional overlaps between adjacent genes are much more widespread in genomes than is thought today. The Waterfall model of the TIN hypothesis postulates a unidirectional effect of upstream genes on the transcription of downstream genes within a cluster of tandemly-arrayed genes, while the Seesaw model proposes a mutual interdependence of gene expression between the oppositely-oriented genes. The TIN represents an auto-regulatory system with an exquisitely timed and highly synchronised cascade of gene expression in functionally-linked genes located in close physical proximity to each other. In this study, we focused on herpesviruses. The reason for this lies in the compressed nature of viral genes, which allows a tight regulation and an easier investigation of the transcriptional interactions between genes. However, I believe that the same or similar principles can be applied to cellular

  2. Heterologous reconstitution of the intact geodin gene cluster in Aspergillus nidulans through a simple and versatile PCR based approach.

    Directory of Open Access Journals (Sweden)

    Morten Thrane Nielsen

    Full Text Available Fungal natural products are a rich resource for bioactive molecules. To fully exploit this potential it is necessary to link genes to metabolites. Genetic information for numerous putative biosynthetic pathways has become available in recent years through genome sequencing. However, the lack of solid methodology for genetic manipulation of most species severely hampers pathway characterization. Here we present a simple PCR based approach for heterologous reconstitution of intact gene clusters. Specifically, the putative gene cluster responsible for geodin production from Aspergillus terreus was transferred in a two step procedure to an expression platform in A. nidulans. The individual cluster fragments were generated by PCR and assembled via efficient USER fusion prior to transformation and integration via re-iterative gene targeting. A total of 13 open reading frames contained in 25 kb of DNA were successfully transferred between the two species enabling geodin synthesis in A. nidulans. Subsequently, functions of three genes in the cluster were validated by genetic and chemical analyses. Specifically, ATEG_08451 (gedC encodes a polyketide synthase, ATEG_08453 (gedR encodes a transcription factor responsible for activation of the geodin gene cluster and ATEG_08460 (gedL encodes a halogenase that catalyzes conversion of sulochrin to dihydrogeodin. We expect that our approach for transferring intact biosynthetic pathways to a fungus with a well developed genetic toolbox will be instrumental in characterizing the many exciting pathways for secondary metabolite production that are currently being uncovered by the fungal genome sequencing projects.

  3. Human major histocompatibility complex contains a minimum of 19 genes between the complement cluster and HLA-B

    International Nuclear Information System (INIS)

    Spies, T.; Bresnahan, M.; Strominger, J.L.

    1989-01-01

    A 600-kilobase (kb) DNA segment from the human major histocompatibility complex (MHC) class III region was isolated by extension of a previous 435-kb chromosome walk. The contiguous series of cloned overlapping cosmids contains the entire 555-kb interval between C2 in the complement gene cluster and HLA-B. This region is known to encode the tumor necrosis factors (TNFs) α and β, B144, and the major heat shock protein HSP70. Moreover, a cluster of genes, BAT1-BAT5 (HLA-B-associated transcripts) have been localized in the vicinity of the genes for TNFα and TNFβ. An additional four genes were identified by isolation of corresponding cDNA clones with cosmid DNA probes. These genes for BAT6-BAT9 were mapped near the gene for C2 within a 120-kb region that includes a HSP70 gene pair. These results, together with complementary data from a similar recent study, indicated the presence of a minimum of 19 genes within the C2-HLA-B interval of the MHC class III region. Although the functional properties of most of these genes are yet unknown, they may be involved in some aspects of immunity. This idea is supported by the genetic mapping of the hematopoietic histocompatibility locus-1 (Hh-1) in recombinant mice between TNFα and H-2S, which is homologous to the complement gene cluster in humans

  4. Mutation of the RDR1 gene caused genome-wide changes in gene expression, regional variation in small RNA clusters and localized alteration in DNA methylation in rice.

    Science.gov (United States)

    Wang, Ningning; Zhang, Di; Wang, Zhenhui; Xun, Hongwei; Ma, Jian; Wang, Hui; Huang, Wei; Liu, Ying; Lin, Xiuyun; Li, Ning; Ou, Xiufang; Zhang, Chunyu; Wang, Ming-Bo; Liu, Bao

    2014-06-30

    Endogenous small (sm) RNAs (primarily si- and miRNAs) are important trans/cis-acting regulators involved in diverse cellular functions. In plants, the RNA-dependent RNA polymerases (RDRs) are essential for smRNA biogenesis. It has been established that RDR2 is involved in the 24 nt siRNA-dependent RNA-directed DNA methylation (RdDM) pathway. Recent studies have suggested that RDR1 is involved in a second RdDM pathway that relies mostly on 21 nt smRNAs and functions to silence a subset of genomic loci that are usually refractory to the normal RdDM pathway in Arabidopsis. Whether and to what extent the homologs of RDR1 may have similar functions in other plants remained unknown. We characterized a loss-of-function mutant (Osrdr1) of the OsRDR1 gene in rice (Oryza sativa L.) derived from a retrotransposon Tos17 insertion. Microarray analysis identified 1,175 differentially expressed genes (5.2% of all expressed genes in the shoot-tip tissue of rice) between Osrdr1 and WT, of which 896 and 279 genes were up- and down-regulated, respectively, in Osrdr1. smRNA sequencing revealed regional alterations in smRNA clusters across the rice genome. Some of the regions with altered smRNA clusters were associated with changes in DNA methylation. In addition, altered expression of several miRNAs was detected in Osrdr1, and at least some of which were associated with altered expression of predicted miRNA target genes. Despite these changes, no phenotypic difference was identified in Osrdr1 relative to WT under normal condition; however, ephemeral phenotypic fluctuations occurred under some abiotic stress conditions. Our results showed that OsRDR1 plays a role in regulating a substantial number of endogenous genes with diverse functions in rice through smRNA-mediated pathways involving DNA methylation, and which participates in abiotic stress response.

  5. Accumulation of transposable elements in Hox gene clusters during adaptive radiation of Anolis lizards.

    Science.gov (United States)

    Feiner, Nathalie

    2016-10-12

    Transposable elements (TEs) are DNA sequences that can insert elsewhere in the genome and modify genome structure and gene regulation. The role of TEs in evolution is contentious. One hypothesis posits that TE activity generates genomic incompatibilities that can cause reproductive isolation between incipient species. This predicts that TEs will accumulate during speciation events. Here, I tested the prediction that extant lineages with a relatively high rate of speciation have a high number of TEs in their genomes. I sequenced and analysed the TE content of a marker genomic region (Hox clusters) in Anolis lizards, a classic case of an adaptive radiation. Unlike other vertebrates, including closely related lizards, Anolis lizards have high numbers of TEs in their Hox clusters, genomic regions that regulate development of the morphological adaptations that characterize habitat specialists in these lizards. Following a burst of TE activity in the lineage leading to extant Anolis, TEs have continued to accumulate during or after speciation events, resulting in a positive relationship between TE density and lineage speciation rate. These results are consistent with the prediction that TE activity contributes to adaptive radiation by promoting speciation. Although there was no evidence that TE density per se is associated with ecological morphology, the activity of TEs in Hox clusters could have been a rich source for phenotypic variation that may have facilitated the rapid parallel morphological adaptation to microhabitats seen in extant Anolis lizards. © 2016 The Author(s).

  6. Hessian regularization based symmetric nonnegative matrix factorization for clustering gene expression and microbiome data.

    Science.gov (United States)

    Ma, Yuanyuan; Hu, Xiaohua; He, Tingting; Jiang, Xingpeng

    2016-12-01

    Nonnegative matrix factorization (NMF) has received considerable attention due to its interpretation of observed samples as combinations of different components, and has been successfully used as a clustering method. As an extension of NMF, Symmetric NMF (SNMF) inherits the advantages of NMF. Unlike NMF, however, SNMF takes a nonnegative similarity matrix as an input, and two lower rank nonnegative matrices (H, H T ) are computed as an output to approximate the original similarity matrix. Laplacian regularization has improved the clustering performance of NMF and SNMF. However, Laplacian regularization (LR), as a classic manifold regularization method, suffers some problems because of its weak extrapolating ability. In this paper, we propose a novel variant of SNMF, called Hessian regularization based symmetric nonnegative matrix factorization (HSNMF), for this purpose. In contrast to Laplacian regularization, Hessian regularization fits the data perfectly and extrapolates nicely to unseen data. We conduct extensive experiments on several datasets including text data, gene expression data and HMP (Human Microbiome Project) data. The results show that the proposed method outperforms other methods, which suggests the potential application of HSNMF in biological data clustering. Copyright © 2016. Published by Elsevier Inc.

  7. Transcriptome profiling and digital gene expression analysis of genes associated with salinity resistance in peanut

    Directory of Open Access Journals (Sweden)

    Jiongming Sui

    2018-03-01

    Full Text Available Background: Soil salinity can significantly reduce crop production, but the molecular mechanism of salinity tolerance in peanut is poorly understood. A mutant (S1 with higher salinity resistance than its mutagenic parent HY22 (S3 was obtained. Transcriptome sequencing and digital gene expression (DGE analysis were performed with leaves of S1 and S3 before and after plants were irrigated with 250 mM NaCl. Results: A total of 107,725 comprehensive transcripts were assembled into 67,738 unigenes using TIGR Gene Indices clustering tools (TGICL. All unigenes were searched against the euKaryotic Ortholog Groups (KOG, gene ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG databases, and these unigenes were assigned to 26 functional KOG categories, 56 GO terms, 32 KEGG groups, respectively. In total 112 differentially expressed genes (DEGs between S1 and S3 after salinity stress were screened, among them, 86 were responsive to salinity stress in S1 and/or S3. These 86 DEGs included genes that encoded the following kinds of proteins that are known to be involved in resistance to salinity stress: late embryogenesis abundant proteins (LEAs, major intrinsic proteins (MIPs or aquaporins, metallothioneins (MTs, lipid transfer protein (LTP, calcineurin B-like protein-interacting protein kinases (CIPKs, 9-cis-epoxycarotenoid dioxygenase (NCED and oleosins, etc. Of these 86 DEGs, 18 could not be matched with known proteins. Conclusion: The results from this study will be useful for further research on the mechanism of salinity resistance and will provide a useful gene resource for the variety breeding of salinity resistance in peanut. Keywords: Digital gene expression, Gene, Mutant, NaCl, Peanut (Arachis hypogaea L., RNA-seq, Salinity stress, Salinity tolerance, Soil salinity, Transcripts, Unigenes

  8. Analysis of Learning Development With Sugeno Fuzzy Logic And Clustering

    Directory of Open Access Journals (Sweden)

    Maulana Erwin Saputra

    2017-06-01

    Full Text Available In the first journal, I made this attempt to analyze things that affect the achievement of students in each school of course vary. Because students are one of the goals of achieving the goals of successful educational organizations. The mental influence of students’ emotions and behaviors themselves in relation to learning performance. Fuzzy logic can be used in various fields as well as Clustering for grouping, as in Learning Development analyzes. The process will be performed on students based on the symptoms that exist. In this research will use fuzzy logic and clustering. Fuzzy is an uncertain logic but its excess is capable in the process of language reasoning so that in its design is not required complicated mathematical equations. However Clustering method is K-Means method is method where data analysis is broken down by group k (k = 1,2,3, .. k. To know the optimal number of Performance group. The results of the research is with a questionnaire entered into matlab will produce a value that means in generating the graph. And simplify the school in seeing Student performance in the learning process by using certain criteria. So from the system that obtained the results for a decision-making required by the school.

  9. Segmentation of Residential Gas Consumers Using Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Marta P. Fernandes

    2017-12-01

    Full Text Available The growing environmental concerns and liberalization of energy markets have resulted in an increased competition between utilities and a strong focus on efficiency. To develop new energy efficiency measures and optimize operations, utilities seek new market-related insights and customer engagement strategies. This paper proposes a clustering-based methodology to define the segmentation of residential gas consumers. The segments of gas consumers are obtained through a detailed clustering analysis using smart metering data. Insights are derived from the segmentation, where the segments result from the clustering process and are characterized based on the consumption profiles, as well as according to information regarding consumers’ socio-economic and household key features. The study is based on a sample of approximately one thousand households over one year. The representative load profiles of consumers are essentially characterized by two evident consumption peaks, one in the morning and the other in the evening, and an off-peak consumption. Significant insights can be derived from this methodology regarding typical consumption curves of the different segments of consumers in the population. This knowledge can assist energy utilities and policy makers in the development of consumer engagement strategies, demand forecasting tools and in the design of more sophisticated tariff systems.

  10. Cluster of differentiation 14 gene polymorphism and its association with incidence of clinical mastitis in Karan fries cattle

    Directory of Open Access Journals (Sweden)

    A. Sakthivel Selvan

    2014-12-01

    Full Text Available Aim: The present study was undertaken with the objectives to characterize, identify DNA polymorphism in cluster of differentiation 14 (CD14 gene in Karan Fries (KF cattle and to analyze association between genetic variants with incidence of clinical mastitis in National Dairy Research Institute (NDRI herd, Karnal. Materials and Methods: Genomic DNA was extracted using blood of randomly selected hundred KF lactating cattle by phenol-chloroform method. After checking its quality and quantity, polymerase chain reaction (PCR was carried out using reported primers to amplify 832 base pair region covering nucleotide base position number 1012 to 1843 (part of promoter, 5’UTR, exon 1, intron 1 and part of exon 2 of bovine CD14 gene. The PCR amplified target product was purified, sequenced and further ClustalW analysis was done to align edited sequence with reported Bos taurus sequence (EU148610.1. The restriction fragment length polymorphism (RFLP analysis was performed for each KF cow using HinfI restriction enzyme (RE. Cows were assigned genotypes obtained by PCR-RFLP analysis and association study was done using Chi-square (χ2 test. Results: After PCR amplification, DNA sequencing of amplicon confirmed the 832 bases covering 1012 to 1843 nucleotide base position of bovine CD14 gene. ClustalW multiple sequence alignment program for DNA revealed six nucleotide changes in KF cows at positions T1117D, T1239G, T1291C, G1359C, G1361A, and G1811A. Cows were also screened using PCR-RFLP with HinfI RE, which revealed three genotypes CC, CD and DD that differed significantly regarding mastitis incidence. Within CC genotype, 72.73% of cows were in a mastitis non-affected group whereas, those in CD and DD genotypes 69.44% and 60.38% respectively were mastitis affected. Conclusion: KF cows with allele C of CD14 gene were less susceptibility to mastitis compared with D allele.

  11. Heterologous expression of the Halothiobacillus neapolitanus carboxysomal gene cluster in Corynebacterium glutamicum.

    Science.gov (United States)

    Baumgart, Meike; Huber, Isabel; Abdollahzadeh, Iman; Gensch, Thomas; Frunzke, Julia

    2017-09-20

    Compartmentalization represents a ubiquitous principle used by living organisms to optimize metabolic flux and to avoid detrimental interactions within the cytoplasm. Proteinaceous bacterial microcompartments (BMCs) have therefore created strong interest for the encapsulation of heterologous pathways in microbial model organisms. However, attempts were so far mostly restricted to Escherichia coli. Here, we introduced the carboxysomal gene cluster of Halothiobacillus neapolitanus into the biotechnological platform species Corynebacterium gluta-micum. Transmission electron microscopy, fluorescence microscopy and single molecule localization microscopy suggested the formation of BMC-like structures in cells expressing the complete carboxysome operon or only the shell proteins. Purified carboxysomes consisted of the expected protein components as verified by mass spectrometry. Enzymatic assays revealed the functional production of RuBisCO in C. glutamicum both in the presence and absence of carboxysomal shell proteins. Furthermore, we could show that eYFP is targeted to the carboxysomes by fusion to the large RuBisCO subunit. Overall, this study represents the first transfer of an α-carboxysomal gene cluster into a Gram-positive model species supporting the modularity and orthogonality of these microcompartments, but also identified important challenges which need to be addressed on the way towards biotechnological application. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Directed natural product biosynthesis gene cluster capture and expression in the model bacterium Bacillus subtilis

    Science.gov (United States)

    Li, Yongxin; Li, Zhongrui; Yamanaka, Kazuya; Xu, Ying; Zhang, Weipeng; Vlamakis, Hera; Kolter, Roberto; Moore, Bradley S.; Qian, Pei-Yuan

    2015-03-01

    Bacilli are ubiquitous low G+C environmental Gram-positive bacteria that produce a wide assortment of specialized small molecules. Although their natural product biosynthetic potential is high, robust molecular tools to support the heterologous expression of large biosynthetic gene clusters in Bacillus hosts are rare. Herein we adapt transformation-associated recombination (TAR) in yeast to design a single genomic capture and expression vector for antibiotic production in Bacillus subtilis. After validating this direct cloning ``plug-and-play'' approach with surfactin, we genetically interrogated amicoumacin biosynthetic gene cluster from the marine isolate Bacillus subtilis 1779. Its heterologous expression allowed us to explore an unusual maturation process involving the N-acyl-asparagine pro-drug intermediates preamicoumacins, which are hydrolyzed by the asparagine-specific peptidase into the active component amicoumacin A. This work represents the first direct cloning based heterologous expression of natural products in the model organism B. subtilis and paves the way to the development of future genome mining efforts in this genus.

  13. Directed natural product biosynthesis gene cluster capture and expression in the model bacterium Bacillus subtilis

    KAUST Repository

    Li, Yongxin

    2015-03-24

    Bacilli are ubiquitous low G+C environmental Gram-positive bacteria that produce a wide assortment of specialized small molecules. Although their natural product biosynthetic potential is high, robust molecular tools to support the heterologous expression of large biosynthetic gene clusters in Bacillus hosts are rare. Herein we adapt transformation-associated recombination (TAR) in yeast to design a single genomic capture and expression vector for antibiotic production in Bacillus subtilis. After validating this direct cloning plug-and-playa approach with surfactin, we genetically interrogated amicoumacin biosynthetic gene cluster from the marine isolate Bacillus subtilis 1779. Its heterologous expression allowed us to explore an unusual maturation process involving the N-acyl-asparagine pro-drug intermediates preamicoumacins, which are hydrolyzed by the asparagine-specific peptidase into the active component amicoumacin A. This work represents the first direct cloning based heterologous expression of natural products in the model organism B. subtilis and paves the way to the development of future genome mining efforts in this genus.

  14. Lactobacillus plantarum gene clusters encoding putative cell-surface protein complexes for carbohydrate utilization are conserved in specific gram-positive bacteria

    Directory of Open Access Journals (Sweden)

    Muscariello Lidia

    2006-05-01

    Full Text Available Abstract Background Genomes of gram-positive bacteria encode many putative cell-surface proteins, of which the majority has no known function. From the rapidly increasing number of available genome sequences it has become apparent that many cell-surface proteins are conserved, and frequently encoded in gene clusters or operons, suggesting common functions, and interactions of multiple components. Results A novel gene cluster encoding exclusively cell-surface proteins was identified, which is conserved in a subgroup of gram-positive bacteria. Each gene cluster generally has one copy of four new gene families called cscA, cscB, cscC and cscD. Clusters encoding these cell-surface proteins were found only in complete genomes of Lactobacillus plantarum, Lactobacillus sakei, Enterococcus faecalis, Listeria innocua, Listeria monocytogenes, Lactococcus lactis ssp lactis and Bacillus cereus and in incomplete genomes of L. lactis ssp cremoris, Lactobacillus casei, Enterococcus faecium, Pediococcus pentosaceus, Lactobacillius brevis, Oenococcus oeni, Leuconostoc mesenteroides, and Bacillus thuringiensis. These genes are neither present in the genomes of streptococci, staphylococci and clostridia, nor in the Lactobacillus acidophilus group, suggesting a niche-specific distribution, possibly relating to association with plants. All encoded proteins have a signal peptide for secretion by the Sec-dependent pathway, while some have cell-surface anchors, novel WxL domains, and putative domains for sugar binding and degradation. Transcriptome analysis in L. plantarum shows that the cscA-D genes are co-expressed, supporting their operon organization. Many gene clusters are significantly up-regulated in a glucose-grown, ccpA-mutant derivative of L. plantarum, suggesting catabolite control. This is supported by the presence of predicted CRE-sites upstream or inside the up-regulated cscA-D gene clusters. Conclusion We propose that the CscA, CscB, CscC and Csc

  15. ClusterSignificance: A bioconductor package facilitating statistical analysis of class cluster separations in dimensionality reduced data

    DEFF Research Database (Denmark)

    Serviss, Jason T.; Gådin, Jesper R.; Eriksson, Per

    2017-01-01

    , e.g. genes in a specific pathway, alone can separate samples into these established classes. Despite this, the evaluation of class separations is often subjective and performed via visualization. Here we present the ClusterSignificance package; a set of tools designed to assess the statistical...... significance of class separations downstream of dimensionality reduction algorithms. In addition, we demonstrate the design and utility of the ClusterSignificance package and utilize it to determine the importance of long non-coding RNA expression in the identity of multiple hematological malignancies....

  16. Serial analysis of gene expression (SAGE)

    NARCIS (Netherlands)

    van Ruissen, Fred; Baas, Frank

    2007-01-01

    In 1995, serial analysis of gene expression (SAGE) was developed as a versatile tool for gene expression studies. SAGE technology does not require pre-existing knowledge of the genome that is being examined and therefore SAGE can be applied to many different model systems. In this chapter, the SAGE

  17. Feasibility Study of Parallel Finite Element Analysis on Cluster-of-Clusters

    Science.gov (United States)

    Muraoka, Masae; Okuda, Hiroshi

    With the rapid growth of WAN infrastructure and development of Grid middleware, it's become a realistic and attractive methodology to connect cluster machines on wide-area network for the execution of computation-demanding applications. Many existing parallel finite element (FE) applications have been, however, designed and developed with a single computing resource in mind, since such applications require frequent synchronization and communication among processes. There have been few FE applications that can exploit the distributed environment so far. In this study, we explore the feasibility of FE applications on the cluster-of-clusters. First, we classify FE applications into two types, tightly coupled applications (TCA) and loosely coupled applications (LCA) based on their communication pattern. A prototype of each application is implemented on the cluster-of-clusters. We perform numerical experiments executing TCA and LCA on both the cluster-of-clusters and a single cluster. Thorough these experiments, by comparing the performances and communication cost in each case, we evaluate the feasibility of FEA on the cluster-of-clusters.

  18. Cluster analysis of obsessive-compulsive spectrum disorders in patients with obsessive-compulsive disorder: clinical and genetic correlates.

    Science.gov (United States)

    Lochner, Christine; Hemmings, Sian M J; Kinnear, Craig J; Niehaus, Dana J H; Nel, Daniel G; Corfield, Valerie A; Moolman-Smook, Johanna C; Seedat, Soraya; Stein, Dan J

    2005-01-01

    Comorbidity of certain obsessive-compulsive spectrum disorders (OCSDs; such as Tourette's disorder) in obsessive-compulsive disorder (OCD) may serve to define important OCD subtypes characterized by differing phenomenology and neurobiological mechanisms. Comorbidity of the putative OCSDs in OCD has, however, not often been systematically investigated. The Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition , Axis I Disorders-Patient Version as well as a Structured Clinical Interview for Putative OCSDs (SCID-OCSD) were administered to 210 adult patients with OCD (N = 210, 102 men and 108 women; mean age, 35.7 +/- 13.3). A subset of Caucasian subjects (with OCD, n = 171; control subjects, n = 168), including subjects from the genetically homogeneous Afrikaner population (with OCD, n = 77; control subjects, n = 144), was genotyped for polymorphisms in genes involved in monoamine function. Because the items of the SCID-OCSD are binary (present/absent), a cluster analysis (Ward's method) using the items of SCID-OCSD was conducted. The association of identified clusters with demographic variables (age, gender), clinical variables (age of onset, obsessive-compulsive symptom severity and dimensions, level of insight, temperament/character, treatment response), and monoaminergic genotypes was examined. Cluster analysis of the OCSDs in our sample of patients with OCD identified 3 separate clusters at a 1.1 linkage distance level. The 3 clusters were named as follows: (1) "reward deficiency" (including trichotillomania, Tourette's disorder, pathological gambling, and hypersexual disorder), (2) "impulsivity" (including compulsive shopping, kleptomania, eating disorders, self-injury, and intermittent explosive disorder), and (3) "somatic" (including body dysmorphic disorder and hypochondriasis). Several significant associations were found between cluster scores and other variables; for example, cluster I scores were associated

  19. Plasmid Complement of Lactococcus lactis NCDO712 Reveals a Novel Pilus Gene Cluster.

    Science.gov (United States)

    Tarazanova, Mariya; Beerthuyzen, Marke; Siezen, Roland; Fernandez-Gutierrez, Marcela M; de Jong, Anne; van der Meulen, Sjoerd; Kok, Jan; Bachmann, Herwig

    2016-01-01

    Lactococcus lactis MG1363 is an important gram-positive model organism. It is a plasmid-free and phage-cured derivative of strain NCDO712. Plasmid-cured strains facilitate studies on molecular biological aspects, but many properties which make L. lactis an important organism in the dairy industry are plasmid encoded. We sequenced the total DNA of strain NCDO712 and, contrary to earlier reports, revealed that the strain carries 6 rather than 5 plasmids. A new 50-kb plasmid, designated pNZ712, encodes functional nisin immunity (nisCIP) and copper resistance (lcoRSABC). The copper resistance could be used as a marker for the conjugation of pNZ712 to L. lactis MG1614. A genome comparison with the plasmid cured daughter strain MG1363 showed that the number of single nucleotide polymorphisms that accumulated in the laboratory since the strains diverted more than 30 years ago is limited to 11 of w